Linux011 Sra toolkit安装及使用

SRA（Sequence ReadArchive）数据库是用于存储二代测序的原始数据，包括 454，Illumina，SOLiD，IonTorrent，Helicos 和 CompleteGenomics。除了原始序列数据外，SRA现在也存在raw reads在参考基因的比对信息。
根据SRA数据产生的特点，将SRA数据分为四类：

Studies-- 研究课题
Experiments-- 实验设计
Runs-- 测序结果集
Samples-- 样品信息
SRA Toolkit是将NCBI数据库中sra文件下载并转换为 .fstaq.gz文件的工具。

进入NCBI官网，选择SRA数据库

image.png

找到sra toolkit下载页面

image.png

复制下载链接

image.png

在linux中使用wget命令下载

wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.10.8/sratoolkit.2.10.8-ubuntu64.tar.gz

将文件移动至指定文件夹，如/home/sratoolkit

mkdir /home/sratoolkit
mv sratoolkit.2.10.8-ubuntu64.tar.gz /home/sratoolkit

解压

cd /home/sratoolkit
tar xzvf sratoolkit.2.10.8-ubuntu64.tar.gz

修改.bashrc文件

echo "export PATH=\$PATH:/home/sratoolkit/sratoolkit.2.10.8-ubuntu64/bin" >> ~/.bashrc
source ~/.bashrc
fastq-dump -h

安装到fastq-dump -h时报错，按照报错原因运行 vdb-config --interactive即可

image.png

sra toolkit使用

SRA检索，以brca为例，可以在NCBI sra数据库检索到大量的测序数据，另外paper一般也会提供测序数据的SRA号，可直接根据号码进行检索

image.png

image.png

prefetch命令下载文件，比如:`prefetch SRR11097713`

prefetch Usage:
prefetch [options] <path/SRA file | path/kart file> [<path/file> ...]
prefetch [options] <SRA accession>
prefetch [options] --list <kart_file>
Frequently Used Options:
General:
-h | --help Displays ALL options, general usage, and version information.
-V | --version Display the version of the program.
Data transfer:
-f | --force <value> Force object download. One of: no, yes, all. no [default]: Skip download if the object if found and complete; yes: Download it even if it is found and is complete; all: Ignore lock files (stale locks or if it is currently being downloaded: use at your own risk!).
--transport <value> Value one of: ascp (only), http (only), both (first try ascp, fallback to http). Default: both.
-l | --list List the contents of a kart file.
-s | --list-sizes List the content of kart file with target file sizes.
-N | --min-size <size> Minimum file size to download in KB (inclusive).
-X | --max-size <size> Maximum file size to download in KB (exclusive). Default: 20G.
-o | --order <value> Kart prefetch order. One of: kart (in kart order), size (by file size: smallest first). default: size.
-a | --ascp-path <ascp-binary|private-key-file> Path to ascp program and private key file (asperaweb_id_dsa.openssh).
-p | --progress <value> Time period in minutes to display download progress (0: no progress). Default: 1.
--option-file <file> Read more options and parameters from the file.

fastq-dump

将sra转换成fastq：fastq-dump SRR11097713
sra转换成fasta：fastq-dump --fasta 50 SRR11097713
将双端测序文件分开：fastq-dump --split-files SRR11097713

fastq-dump Usage:
fastq-dump [options] <path/file> [<path/file> ...]
fastq-dump [options] <accession>
Frequently Used Options:
General:
-h | --help Displays ALL options, general usage, and version information.
-V | --version Display the version of the program.
Data formatting:
--split-files Dump each read into separate file. Files will receive suffix corresponding to read number.
--split-spot Split spots into individual reads.
--fasta <[line width]> FASTA only, no qualities. Optional line wrap width (set to zero for no wrapping).
-I | --readids Append read id after spot id as 'accession.spot.readid' on defline.
-F | --origfmt Defline contains only original sequence name.
-C | --dumpcs <[cskey]> Formats sequence using color space (default for SOLiD). "cskey" may be specified for translation.
-B | --dumpbase Formats sequence using base space (default for other than SOLiD).
-Q | --offset <integer> Offset to use for ASCII quality scores. Default is 33 ("!").
Filtering:
-N | --minSpotId <rowid> Minimum spot id to be dumped. Use with "X" to dump a range.
-X | --maxSpotId <rowid> Maximum spot id to be dumped. Use with "N" to dump a range.
-M | --minReadLen <len> Filter by sequence length >= <len>
--skip-technical Dump only biological reads.
--aligned Dump only aligned sequences. Aligned datasets only; see sra-stat.
--unaligned Dump only unaligned sequences. Will dump all for unaligned datasets.
Workflow and piping:
-O | --outdir <path> Output directory, default is current working directory ('.').
-Z | --stdout Output to stdout, all split data become joined into single stream.
--gzip Compress output using gzip.
--bzip2 Compress output using bzip2.

Linux011 Sra toolkit安装及使用

进入NCBI官网，选择SRA数据库

找到sra toolkit下载页面

复制下载链接

在linux中使用wget命令下载

解压

修改.bashrc文件

安装到fastq-dump -h时报错，按照报错原因运行 vdb-config --interactive即可

sra toolkit使用

prefetch命令下载文件，比如:prefetch SRR11097713

fastq-dump

prefetch命令下载文件，比如:`prefetch SRR11097713`