Usearch fastq_mergepairs 命令使用信息搬运

All the following information come from www.drive5.com, I just use this as a notebook for my learning, I declare no commercial interest with this. Everyone who see this document should refer to www.drive5.com. 

I got some problem when I was trying to merge my read data, then I collected some information, they are shown as following. 


The fastq_mergepairs command merges (assembles) paired-end reads to create consensus sequences and, optionally, consensus quality scores. This command has many features and options so I recommend spending some time browsing the documentation to get familiar with the capabilities of fastq_mergepairs and issues that arise in read merging.


Basic usage

The simplest way to use fastq_mergepairs is to specify the the forward and reverse FASTQ filenames and an output FASTQ filename.

usearch -fastq_mergepairs SampleA_R1.fastq -reverse SampleA_R2.fastq -fastqout merged.fq


Automatic R2 filename

If the -reverse option is omitted, the reverse FASTQ filename is constructed by replacing R1 with R2. The following command line is equivalent to the example above.

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq


Merging multiple FASTQ file pairs in a single command

You can specify two or more FASTQ filenames following -fastq_mergepairs. In the following example, SampleA and SampleB are both merged. The R2 filenames are constructed automatically as explained above, or can be given explicitly using the -reverse option.

usearch -fastq_mergepairs SampleA_R1.fastq SampleB_R1.fastq -fastqout merged.fq

usearch -fastq_mergepairs *_R1*.fastq  -fastqout merged.fq (This is what I was using when I had 45 reads).


Adding sample identifiers to read labels

If multiple samples are combined into a single file as shown in some of the above examples, then you lose track of which read came from which sample. This is addressed by adding a sample identifier to each read label. The simplest method is to use the -sample option, e.g.

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq -sample SampleA

The string sample=SampleA; will be added at the end of the read label.


Getting the sample identifier from the FASTQ filename

FASTQ filenames are often based on the sample identifier, e.g. SampleA_R1.fastq. If you specify  -relabel @ then fastq_mergepairs gets the sample identifier from the FASTQ file name by truncating at the first underscore (_) or period (.). A period and the read number is added after the sample identifier to make the new read label, which replaces the original label. This differs from the -sample option, which adds the sample= annotation at the end of the label. The usearch_global command understands both of these methods for putting sample identifiers into read labels..

usearch -fastq_mergepairs SampleA_R1.fastq -fastqout merged.fq  -relabel @


Merging multiple files with sample identifiers

By using wildcards and the  -relabel @ option you can merge multiple files and add sample identifiers to the read labels, for example:

usearch -fastq_mergepairs *R1*.fastq -fastqout merged.fq  -relabel @


fastq_mergepairs options

Input files

-

fastq_mergepairs  Forward FASTQ filename(s).  -reverse  Reverse FASTQ filename(s). If not given, constructed by replacing R1 with R2.

-interleaved  Forward and reverse reads are interleaved in the same file (sometimes produced by SRA fastq-dump).

Output files

-

fastqout  FASTQ filename for merged reads.

-fastaout  FASTA filename for merged reads.

-fastqout_notmerged_fwd  FASTQ filename for forward reads which were not merged.

-fastaout_notmerged_fwd  FASTA filename for forward reads which were not merged.

-fastqout_notmerged_rev  FASTQ filename for reverse reads which were not merged.

-fastaout_notmerged_rev  FASTA filename for reverse reads which were not merged.

Reports

 -report   Filename for summary report. See Reviewing a fastq_mergepairs report to check for problems.

-tabbedout  Tabbed text file containing detailed information about merging process for each pair including reason for discarding.

-alnout  Human-readable alignments. Useful for trouble-shooting.

Merged read labels

-relabel  Prefix string for output labels. The read number 1, 2, 3... is appended after the prefix.

-relabel @ Relabel using prefix string constructed from FASTQ filename, this will be understood as the sample identifier.

  -sample  xxx Append sample identifier to read label using sample=xxx; format. This is an alternative method for adding sample ids.

-fastq_eeout  Add ee=xxx; annotation with the number of expected errors in the merged read.

-label_suffix  Suffix to append to merged read label. Can be used e.g. to add sample=xxx; type of sample identifier annotations.

Filtering

  -fastq_maxdiffs  Maximum number of mismatches in the alignment. Default 5. Consider increasing if you have long overlaps.

-fastq_pctid  Minimum %id of alignment. Default 90. Consider decreasing if you have long overlaps.

-fastq_nostagger  Discard staggered pairs. Default is to trim overhangs (non-biological sequence).

-fastq_minmergelen  Minimum length for the merged sequence. See Filtering artifacts by setting a merge length range.

-fastq_maxmergelen  Maximum length for the merged sequence.

-fastq_minqual  Discard merged read if any merged Q score is less than the given value. (No minimum by default).

-fastq_minovlen  Discard pair if alignment is shorter than given value. Default 16.

Pre-processing of reads before alignment

 -fastq_trunctail  Truncate reads at the first Q score with <= this value. Default 2.

-fastq_minlen  Discard pair if either read is shorter than this, after truncating by -fastq_trunctail if applicable. Default 64.

Multi-threading

 -threads Specifies the number of threads. Default 10, or the number of CPU cores, which ever is less.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,634评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,951评论 3 391
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 161,427评论 0 351
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,770评论 1 290
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,835评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,799评论 1 294
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,768评论 3 416
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,544评论 0 271
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,979评论 1 308
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,271评论 2 331
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,427评论 1 345
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,121评论 5 340
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,756评论 3 324
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,375评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,579评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,410评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,315评论 2 352

推荐阅读更多精彩内容