ONT r10芯片甲基化分析

攻略还发在https://github.com/gotouerina/GenomeMethPipeline 这个更全,如果好用可以点个star

需要用GPU,V100或者A100都行, 4090什么的估计不行

下机是FASTA5格式,转成pod5格式储存。

pod5软件应该可以用pip装

pod5 convert fast5 fast5/*.fast5 --output pod5/ --one-to-one ./fast5

然后用dorado提取修饰,这一步要GPU操作
/groups/lzu_public/home/u220220932211/software/dorado-0.7.1-linux-x64/bin/dorado/dorado basecaller /groups/lzu_public/home/u220220932211/software/dna_r10.4.1_e8.2_400bps_hac@v5.0.0/ /groups/lzu_public/home/u220220932211/work/ --modified-bases-models /groups/lzu_public/home/u220220932211/software/dna_r10.4.1_e8.2_400bps_sup@v5.0.0_5mC_5hmC@v1/ > calls.bam

最后用modkit提取甲基化信息

modkit pileup calls.bam out.bed --log-filepath pileup.log

提取结果长这样:


image.png

官方解释:
Definitions:

Nmod - Number of calls passing filters that were classified as a residue with a specified base modification.

Ncanonical - Number of calls passing filters were classified as the canonical base rather than modified. The exact base must be inferred by the modification code. For example, if the modification code is m (5mC) then the canonical base is cytosine. If the modification code is a, the canonical base is adenosine.

Nother mod - Number of calls passing filters that were classified as modified, but where the modification is different from the listed base (and the corresponding canonical base is equal). For example, for a given cytosine there may be 3 reads with h calls, 1 with a canonical call, and 2 with m calls. In the bedMethyl row for h Nother_mod would be 2. In the m row Nother_mod would be 3.

Nvalid_cov - the valid coverage. Nvalid_cov = Nmod + Nother_mod + Ncanonical, also used as the score in the bedMethyl

Ndiff - Number of reads with a base other than the canonical base for this modification. For example, in a row for h the canonical base is cytosine, if there are 2 reads with C->A substitutions, Ndiff will be 2.

Ndelete - Number of reads with a deletion at this reference position

Nfail - Number of calls where the probability of the call was below the threshold. The threshold can be set on the command line or computed from the data (usually failing the lowest 10th percentile of calls).

Nnocall - Number of reads aligned to this reference position, with the correct canonical base, but without a base modification call. This can happen, for example, if the model requires a CpG dinucleotide and the read has a CG->CH substitution such that no modification call was produced by the basecaller.

附图
image.png
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容