Ceph MDS MetaBlob 分析
CephFS 中会把多种事件记录在不同类型的 LogEvent
中,其中最常见的也是最重要的就是读写流程中处理客户端请求时记录的 EUpdate
事件,而在 EUpdate
中又会把元数据的状态保存在 MetaBlob
结构中,本文就主要对 MetaBlob
结构以及其中保存的内容做一个总结。
MetaBlob
类定义看上去比较复杂,实际最重要的成员只有三个:
1 2 3 4 5 6 7 class EMetaBlob {public : std::vector<dirfrag_t > lump_order; std::map<dirfrag_t , dirlump> lump_map; std::list<fullbit> roots;
其中 lump_order
保存从根目录到父母路的 Dir 分片信息 dirfrag_t
,lump_map
中保存 dirfrag_t
到 dirlump
的映射关系, dirlump
中保存的就是相关的元数据信息,最后 roots
保存的就是 root 节点的 fullbit
信息,fullbit
表示一个存在的 dn + inode
,相对的 MetaBlob
中使用 null_bit
结构保存一个不存在的 dentry
直接通过一个例子来理解:
当我们通过 mv
命令将文件 /AppleDir/BananaDir/BananaFile
移动到 /AppleDir/CherryDir/LemonDir/
同时(重命名)覆盖原来的 MangoFile
文件,在 rename 过程中需要填充的 MetaBlob
信息如下(在 _rename_prepare
中),注意 BananaFile
同时还是 /AppleDir/AppleFile
的硬链接:
给出相关的目录和文件的 inode 1 2 3 4 5 6 7 8 9 10 11 12 13 0x601: ~mds0/stray1/ 0x1: / 0x100000102ea: /AppleDir/ 0x100000102ec: /AppleDir/CherryDir/ 0x100000102ed: /AppleDir/CherryDir/LemonDir/ 0x100: ~mds0/ 0x100000102eb: /AppleDir/BananaDir/ 0x100000102f4: /AppleDir/CherryDir/LemonDir/MangoFile 0x100000102ef: /AppleDir/BananaDir/BananaFile
首先将 stray
目录加入 metablob
1 dirlump 0x601 v 631439 state 4 num 0/0/0
通过 predirty_journal_parents
oldin
destdn->get_dir
将移动前的 MangoFile
的 parents
加入 metablob
1 2 3 4 5 6 7 dirlump 0x601 v 631439 state 4 num 0/0/0 dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 43 state 0 num 1/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= dirlump 0x100000102ec v 23 state 0 num 1/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
通过 predirty_journal_parents
oldin
straydn->get_dir
将移动后的被删除文件 0x100000102f4
的 parents
(也就是 stray
目录)加入 metablob
1 2 3 4 5 6 7 8 9 dirlump 0x601 v 631439 state 4 num 0/0/0 dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 43 state 0 num 1/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= dirlump 0x100000102ec v 23 state 0 num 1/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
通过 predirty_journal_parents
srci
srcdn->get_dir()
将移动前的 BananaFile
的 parents
加入 metablob
1 2 3 4 5 6 7 8 9 10 dirlump 0x601 v 631439 state 4 num 0/0/0 dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 45 state 4 num 2/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state= dirlump 0x100000102ec v 23 state 0 num 1/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
再通过 predirty_journal_parents
srci
destdn->get_dir()
将移动后的新 MangoFile
(AppleFile
的硬链接文件) 的 parents
加入 metablob
(注意这里和之前的 LemonDir
相比版本 dnv
发生了变化)
1 2 3 4 5 6 7 8 9 10 11 dirlump 0x601 v 631439 state 4 num 0/0/0 dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 45 state 4 num 2/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state= dirlump 0x100000102ec v 25 state 4 num 2/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
通过 add_primary_dentry
将 oldin
记录到 stray
目录的 dirlump
中, 表示这个 inode
被删除
1 2 3 4 5 6 7 8 9 10 11 12 dirlump 0x601 v 631440 state 4 num 1/0/0 fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state= dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 45 state 4 num 2/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state= dirlump 0x100000102ec v 25 state 4 num 2/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
同样的,通过 add_primary_dentry
将 srci
0x100000102ef
记录到 LemonDir
的 dirlump
中, 这里是表示移动后的 BananaFile
,但是由于它同时是 AppleFile
的硬链接所以这里 dump
时候打的是 AppleFile
, inode
是一样的
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 dirlump 0x601 v 631440 state 4 num 1/0/0 fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state= dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 45 state 4 num 3/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state= fullbit dn AppleFile [2,head] dnv 41 inode 0x100000102ef state= dirlump 0x100000102ec v 25 state 4 num 2/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state= dirlump 0x100000102ed v 12 state 0 num 0/1/0 remotebit dn MangoFile [2,head] dnv 10 ino 0x100000102ef dirty=1
最后通过 add_null_dentry
向 BananaDir
添加 nullbit
表示删除原来 BananaFile
的 dentry
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 dirlump 0x601 v 631440 state 4 num 1/0/0 fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state= dirlump 0x1 v 73684 state 4 num 1/0/0 fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state= dirlump 0x100000102ea v 45 state 4 num 3/0/0 fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state= fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state= fullbit dn AppleFile [2,head] dnv 41 inode 0x100000102ef state= dirlump 0x100000102ec v 25 state 4 num 2/0/0 fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state= fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state= dirlump 0x100 v 132673 state 4 num 1/0/0 fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state= dirlump 0x100000102ed v 12 state 0 num 0/1/0 remotebit dn MangoFile [2,head] dnv 10 ino 0x100000102ef dirty=1 dirlump 0x100000102eb v 11 state 0 num 0/0/1 nullbit dn BananaFile [2,head] dnv 10 dirty=1
流程整理出来之后还是比较容易理解的,可以看到往 MetaBlob
中保存元数据就是通过 predirty_journal_parents
、add_primary_dentry
和 add_null_dentry
完成的,而在 MetaBlob
中记录实际上也就是本次操作中涉及到的文件和目录变化。
以上就是对 MetaBlob
结构如何保存元数据信息的总结,希望对大家有所帮助