发布于 

Ceph MDS MetaBlob 分析

CephFS 中会把多种事件记录在不同类型的 LogEvent 中,其中最常见的也是最重要的就是读写流程中处理客户端请求时记录的 EUpdate 事件,而在 EUpdate 中又会把元数据的状态保存在 MetaBlob 结构中,本文就主要对 MetaBlob 结构以及其中保存的内容做一个总结。

MetaBlob 类定义看上去比较复杂,实际最重要的成员只有三个:

1
2
3
4
5
6
7
class EMetaBlob {

public:
// my lumps. preserve the order we added them in a list.
std::vector<dirfrag_t> lump_order;
std::map<dirfrag_t, dirlump> lump_map;
std::list<fullbit> roots;

其中 lump_order 保存从根目录到父母路的 Dir 分片信息 dirfrag_tlump_map 中保存 dirfrag_tdirlump 的映射关系, dirlump 中保存的就是相关的元数据信息,最后 roots 保存的就是 root 节点的 fullbit 信息,fullbit 表示一个存在的 dn + inode,相对的 MetaBlob 中使用 null_bit 结构保存一个不存在的 dentry

直接通过一个例子来理解:

当我们通过 mv 命令将文件 /AppleDir/BananaDir/BananaFile 移动到 /AppleDir/CherryDir/LemonDir/ 同时(重命名)覆盖原来的 MangoFile 文件,在 rename 过程中需要填充的 MetaBlob 信息如下(在 _rename_prepare 中),注意 BananaFile 同时还是 /AppleDir/AppleFile 的硬链接:

给出相关的目录和文件的 inode
1
2
3
4
5
6
7
8
9
10
11
12
13
0x601: ~mds0/stray1/

0x1: /
0x100000102ea: /AppleDir/
0x100000102ec: /AppleDir/CherryDir/
0x100000102ed: /AppleDir/CherryDir/LemonDir/

0x100: ~mds0/

0x100000102eb: /AppleDir/BananaDir/

0x100000102f4: /AppleDir/CherryDir/LemonDir/MangoFile
0x100000102ef: /AppleDir/BananaDir/BananaFile

首先将 stray 目录加入 metablob

1
dirlump 0x601 v 631439 state 4 num 0/0/0

通过 predirty_journal_parents oldin destdn->get_dir 将移动前的 MangoFileparents 加入 metablob

1
2
3
4
5
6
7
dirlump 0x601 v 631439 state 4 num 0/0/0
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 43 state 0 num 1/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
dirlump 0x100000102ec v 23 state 0 num 1/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=

通过 predirty_journal_parents oldin straydn->get_dir 将移动后的被删除文件 0x100000102f4parents(也就是 stray 目录)加入 metablob

1
2
3
4
5
6
7
8
9
dirlump 0x601 v 631439 state 4 num 0/0/0
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 43 state 0 num 1/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
dirlump 0x100000102ec v 23 state 0 num 1/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=

通过 predirty_journal_parents srci srcdn->get_dir() 将移动前的 BananaFileparents 加入 metablob

1
2
3
4
5
6
7
8
9
10
dirlump 0x601 v 631439 state 4 num 0/0/0
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 45 state 4 num 2/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state=
dirlump 0x100000102ec v 23 state 0 num 1/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=

再通过 predirty_journal_parents srci destdn->get_dir() 将移动后的新 MangoFile (AppleFile 的硬链接文件) 的 parents 加入 metablob(注意这里和之前的 LemonDir 相比版本 dnv 发生了变化)

1
2
3
4
5
6
7
8
9
10
11
dirlump 0x601 v 631439 state 4 num 0/0/0
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 45 state 4 num 2/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state=
dirlump 0x100000102ec v 25 state 4 num 2/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=

通过 add_primary_dentryoldin 记录到 stray 目录的 dirlump 中, 表示这个 inode 被删除

1
2
3
4
5
6
7
8
9
10
11
12
dirlump 0x601 v 631440 state 4 num 1/0/0
fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state=
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 45 state 4 num 2/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state=
dirlump 0x100000102ec v 25 state 4 num 2/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=

同样的,通过 add_primary_dentrysrci 0x100000102ef 记录到 LemonDirdirlump 中, 这里是表示移动后的 BananaFile,但是由于它同时是 AppleFile 的硬链接所以这里 dump 时候打的是 AppleFileinode 是一样的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
dirlump 0x601 v 631440 state 4 num 1/0/0
fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state=
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 45 state 4 num 3/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state=
fullbit dn AppleFile [2,head] dnv 41 inode 0x100000102ef state=
dirlump 0x100000102ec v 25 state 4 num 2/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
dirlump 0x100000102ed v 12 state 0 num 0/1/0
remotebit dn MangoFile [2,head] dnv 10 ino 0x100000102ef dirty=1

最后通过 add_null_dentryBananaDir 添加 nullbit 表示删除原来 BananaFiledentry

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
dirlump 0x601 v 631440 state 4 num 1/0/0
fullbit dn 100000102f4 [2,head] dnv 631439 inode 0x100000102f4 state=
dirlump 0x1 v 73684 state 4 num 1/0/0
fullbit dn AppleDir [2,head] dnv 73683 inode 0x100000102ea state=
dirlump 0x100000102ea v 45 state 4 num 3/0/0
fullbit dn CherryDir [2,head] dnv 42 inode 0x100000102ec state=
fullbit dn BananaDir [2,head] dnv 44 inode 0x100000102eb state=
fullbit dn AppleFile [2,head] dnv 41 inode 0x100000102ef state=
dirlump 0x100000102ec v 25 state 4 num 2/0/0
fullbit dn LemonDir [2,head] dnv 22 inode 0x100000102ed state=
fullbit dn LemonDir [2,head] dnv 24 inode 0x100000102ed state=
dirlump 0x100 v 132673 state 4 num 1/0/0
fullbit dn stray1 [2,head] dnv 132672 inode 0x601 state=
dirlump 0x100000102ed v 12 state 0 num 0/1/0
remotebit dn MangoFile [2,head] dnv 10 ino 0x100000102ef dirty=1
dirlump 0x100000102eb v 11 state 0 num 0/0/1
nullbit dn BananaFile [2,head] dnv 10 dirty=1

流程整理出来之后还是比较容易理解的,可以看到往 MetaBlob 中保存元数据就是通过 predirty_journal_parentsadd_primary_dentryadd_null_dentry 完成的,而在 MetaBlob 中记录实际上也就是本次操作中涉及到的文件和目录变化。

以上就是对 MetaBlob 结构如何保存元数据信息的总结,希望对大家有所帮助