NFS Client元数据管理

close-to-open cache consistency

NFS3的元数据

NFS3的元数据的表示比较简单,描述的是一个完整的属性。

struct fattr3 {
    ftype3     type;
    mode3      mode;
    uint32     nlink;
    uid3       uid;
    gid3       gid;
    size3      size;
    size3      used;
    specdata3  rdev;
    uint64     fsid;
    fileid3    fileid;
    nfstime3   atime;
    nfstime3   mtime;
    nfstime3   ctime;
};

NFS4的元数据

NFS4的元数据,先用bitmap描述里面有什么,后面是内容。fattr4的大小取决于bitmap里的内容。

struct fattr4 {
        bitmap4       attrmask;
        attrlist4     attr_vals;
};
/* Mandatory Attributes */
#define FATTR4_WORD0_SUPPORTED_ATTRS    (1UL << 0)
#define FATTR4_WORD0_TYPE               (1UL << 1)
#define FATTR4_WORD0_FH_EXPIRE_TYPE     (1UL << 2)
#define FATTR4_WORD0_CHANGE             (1UL << 3)
#define FATTR4_WORD0_SIZE               (1UL << 4)
...

Linux Kernel对attribute的解码

对于NFS3来讲,attribute是全量的,一个整体。对于NFS4来说,attribte是个子集。因此对于NFS3来说,NFS_ATTR_FATTR_CHANGE这个bit是一定存在的,对于NFS4来说需要从bitmap里检测出来。

NFS3的解码

static int decode_fattr3(struct xdr_stream *xdr, struct nfs_fattr *fattr)
{
...
  fattr->valid |= NFS_ATTR_FATTR_V3; //将NFS_ATTR_FATTR_CHANGE等bit都置上
...
}

NFS4的解码

decode_getfattr => decode_attr_change

static int decode_attr_change(struct xdr_stream *xdr, uint32_t *bitmap, uint64_t *change)
{
...
    if (likely(bitmap[0] & FATTR4_WORD0_CHANGE)) {
        p = xdr_inline_decode(xdr, 8);
        if (unlikely(!p))
            goto out_overflow;
        xdr_decode_hyper(p, change);
        bitmap[0] &= ~FATTR4_WORD0_CHANGE;
        ret = NFS_ATTR_FATTR_CHANGE;
    }
...
}

NFS3的WCC(Weak Cache Consistency)

NFS读文件之前发送GETATTR请求,判断ctime是否修改。如果ctime有新的变化则意味着数据缓存失效,否则继续相信数据缓存。如果发生一次写请求,ctime一定会发生变化。这样在缓存中的数据,就要被丢弃,这很浪费。NFS3设计了一个办法,RPC写操作会返回两个值,before(ctime,size)和after(ctime,size)。如果before的ctime,size和本地的ctime,size相等,就认为缓存中的内容和服务器同步,还继续相信缓存。否则将缓存失效。仅仅根据ctime判断缓存是否失效,不是太严格

struct wcc_data {
    pre_op_attr    before;
    post_op_attr   after;
};
struct wcc_attr {
    size3       size;
    nfstime3    mtime;
    nfstime3    ctime;
};
其中before和after都是可选项,before对应`wcc_attr`数据结构,after对应`fattr3`数据结构。

以下是NFS3协议里对WCC的详细描述

   When a client performs an operation that modifies the state of a
   file or directory on the server, it cannot immediately determine
   from the post-operation attributes whether the operation just
   performed was the only operation on the object since the last
   time the client received the attributes for the object. This is
   important, since if an intervening operation has changed the
   object, the client will need to invalidate any cached data for
   the object (except for the data that it just wrote).

   To deal with this, the notion of weak cache consistency data or
   wcc_data is introduced. A wcc_data structure consists of certain
   key fields from the object attributes before the operation,
   together with the object attributes after the operation. This
   information allows the client to manage its cache more
   accurately than in NFS version 2 protocol implementations. The
   term, weak cache consistency, emphasizes the fact that this
   mechanism does not provide the strict server-client consistency
   that a cache consistency protocol would provide.

   In order to support the weak cache consistency model, the server
   will need to be able to get the pre-operation attributes of the
   object, perform the intended modify operation, and then get the
   post-operation attributes atomically. If there is a window for
   the object to get modified between the operation and either of
   the get attributes operations, then the client will not be able
   to determine whether it was the only entity to modify the
   object. Some information will have been lost, thus weakening the
   weak cache consistency guarantees.

pre_change_attr VS pre_ctime

NFS3的文档上定义了pre_ctime,但没有定义pre_change_attr。其实pre_change_attrpre_ctime转换成64bit的数字

static int decode_wcc_attr(struct xdr_stream *xdr, struct nfs_fattr *fattr)
{
...
  fattr->pre_change_attr = nfs_timespec_to_change_attr(&fattr->pre_ctime);
...
}

wcc_data中before字段数据的恢复

从抓包看,写请求结束返回的wcc_data,往往只有after部分,而没有before部分。从下面代码看出,如果before部分没有,可以用之前的内容代替。

int nfs_post_op_update_inode_force_wcc(struct inode *inode, struct nfs_fattr *fattr)
{
...
    if ((fattr->valid & NFS_ATTR_FATTR_CHANGE) != 0 &&
            (fattr->valid & NFS_ATTR_FATTR_PRECHANGE) == 0) {
        fattr->pre_change_attr = inode->i_version;
        fattr->valid |= NFS_ATTR_FATTR_PRECHANGE;
    }
...
}

NFS3数据的缓存失效

前面讲过change_attrctime转化的64位整数。一旦元数据的这个属性发生变化,则认为元数据失效、ACL失效、数据内容失效。

static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
{
...
    /* More cache consistency checks */
    if (fattr->valid & NFS_ATTR_FATTR_CHANGE) {
        if (inode->i_version != fattr->change_attr) {
            dprintk("NFS: change_attr change on server for file %s/%ld\n",
                    inode->i_sb->s_id, inode->i_ino);
            invalid |= NFS_INO_INVALID_ATTR
                | NFS_INO_INVALID_DATA
                | NFS_INO_INVALID_ACCESS
                | NFS_INO_INVALID_ACL
                | NFS_INO_REVAL_PAGECACHE;
            if (S_ISDIR(inode->i_mode))
                nfs_force_lookup_revalidate(inode);
            inode->i_version = fattr->change_attr;
        }
    }
...
}

NFS4的change_info4

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。