close-to-open cache consistency
NFS3的元数据
NFS3的元数据的表示比较简单,描述的是一个完整的属性。
struct fattr3 {
ftype3 type;
mode3 mode;
uint32 nlink;
uid3 uid;
gid3 gid;
size3 size;
size3 used;
specdata3 rdev;
uint64 fsid;
fileid3 fileid;
nfstime3 atime;
nfstime3 mtime;
nfstime3 ctime;
};
NFS4的元数据
NFS4的元数据,先用bitmap描述里面有什么,后面是内容。fattr4的大小取决于bitmap里的内容。
struct fattr4 {
bitmap4 attrmask;
attrlist4 attr_vals;
};
/* Mandatory Attributes */
#define FATTR4_WORD0_SUPPORTED_ATTRS (1UL << 0)
#define FATTR4_WORD0_TYPE (1UL << 1)
#define FATTR4_WORD0_FH_EXPIRE_TYPE (1UL << 2)
#define FATTR4_WORD0_CHANGE (1UL << 3)
#define FATTR4_WORD0_SIZE (1UL << 4)
...
Linux Kernel对attribute的解码
对于NFS3来讲,attribute是全量的,一个整体。对于NFS4来说,attribte是个子集。因此对于NFS3来说,NFS_ATTR_FATTR_CHANGE这个bit是一定存在的,对于NFS4来说需要从bitmap里检测出来。
NFS3的解码
static int decode_fattr3(struct xdr_stream *xdr, struct nfs_fattr *fattr)
{
...
fattr->valid |= NFS_ATTR_FATTR_V3; //将NFS_ATTR_FATTR_CHANGE等bit都置上
...
}
NFS4的解码
decode_getfattr => decode_attr_change
static int decode_attr_change(struct xdr_stream *xdr, uint32_t *bitmap, uint64_t *change)
{
...
if (likely(bitmap[0] & FATTR4_WORD0_CHANGE)) {
p = xdr_inline_decode(xdr, 8);
if (unlikely(!p))
goto out_overflow;
xdr_decode_hyper(p, change);
bitmap[0] &= ~FATTR4_WORD0_CHANGE;
ret = NFS_ATTR_FATTR_CHANGE;
}
...
}
NFS3的WCC(Weak Cache Consistency)
NFS读文件之前发送GETATTR请求,判断ctime是否修改。如果ctime有新的变化则意味着数据缓存失效,否则继续相信数据缓存。如果发生一次写请求,ctime一定会发生变化。这样在缓存中的数据,就要被丢弃,这很浪费。NFS3设计了一个办法,RPC写操作会返回两个值,before(ctime,size)和after(ctime,size)。如果before的ctime,size和本地的ctime,size相等,就认为缓存中的内容和服务器同步,还继续相信缓存。否则将缓存失效。仅仅根据ctime判断缓存是否失效,不是太严格
struct wcc_data {
pre_op_attr before;
post_op_attr after;
};
struct wcc_attr {
size3 size;
nfstime3 mtime;
nfstime3 ctime;
};
其中before和after都是可选项,before对应`wcc_attr`数据结构,after对应`fattr3`数据结构。
以下是NFS3协议里对WCC的详细描述
When a client performs an operation that modifies the state of a
file or directory on the server, it cannot immediately determine
from the post-operation attributes whether the operation just
performed was the only operation on the object since the last
time the client received the attributes for the object. This is
important, since if an intervening operation has changed the
object, the client will need to invalidate any cached data for
the object (except for the data that it just wrote).
To deal with this, the notion of weak cache consistency data or
wcc_data is introduced. A wcc_data structure consists of certain
key fields from the object attributes before the operation,
together with the object attributes after the operation. This
information allows the client to manage its cache more
accurately than in NFS version 2 protocol implementations. The
term, weak cache consistency, emphasizes the fact that this
mechanism does not provide the strict server-client consistency
that a cache consistency protocol would provide.
In order to support the weak cache consistency model, the server
will need to be able to get the pre-operation attributes of the
object, perform the intended modify operation, and then get the
post-operation attributes atomically. If there is a window for
the object to get modified between the operation and either of
the get attributes operations, then the client will not be able
to determine whether it was the only entity to modify the
object. Some information will have been lost, thus weakening the
weak cache consistency guarantees.
pre_change_attr VS pre_ctime
NFS3的文档上定义了pre_ctime,但没有定义pre_change_attr。其实pre_change_attr是pre_ctime转换成64bit的数字
static int decode_wcc_attr(struct xdr_stream *xdr, struct nfs_fattr *fattr)
{
...
fattr->pre_change_attr = nfs_timespec_to_change_attr(&fattr->pre_ctime);
...
}
wcc_data中before字段数据的恢复
从抓包看,写请求结束返回的wcc_data,往往只有after部分,而没有before部分。从下面代码看出,如果before部分没有,可以用之前的内容代替。
int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr)
{
...
if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 &&
(fattr->valid & NFS_ATTR_FATTR_PRECHANGE) == 0) {
fattr->pre_change_attr = inode->i_version;
fattr->valid |= NFS_ATTR_FATTR_PRECHANGE;
}
...
}
NFS3数据的缓存失效
前面讲过change_attr是ctime转化的64位整数。一旦元数据的这个属性发生变化,则认为元数据失效、ACL失效、数据内容失效。
static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
{
...
/* More cache consistency checks */
if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
if (inode->i_version != fattr->change_attr) {
dprintk("NFS: change_attr change on server for file %s/%ld\n",
inode->i_sb->s_id, inode->i_ino);
invalid |= NFS_INO_INVALID_ATTR
| NFS_INO_INVALID_DATA
| NFS_INO_INVALID_ACCESS
| NFS_INO_INVALID_ACL
| NFS_INO_REVAL_PAGECACHE;
if (S_ISDIR(inode->i_mode))
nfs_force_lookup_revalidate(inode);
inode->i_version = fattr->change_attr;
}
}
...
}