一、对应关系
NCBI的版本包括GRCh36,37,38,UCSC包括hg18,19,38, ENSEMBL有各种release,他们之间的对应关系如下:
GRCh36 (hg18): ENSEMBL release_52.
GRCh37 (hg19): ENSEMBL release_59/61/64/68/69/75.
GRCh38 (hg38): ENSEMBL release_76/77/78/80/81/82.
二、参考基因组的下载
基因组fasta文件的下载可以在illumina网站下载 各个版本都有 https://support.illumina.com.cn/sequencing/sequencing_software/igenome.html?langsel=/cn/
UCSC的下载地址:
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/
如果要按照染色体号进行下载可以用脚本:
for i in $(seq 1 22) X Y M;
do echo $i;
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes/chr${i}.fa.gz;
done
NCBI的下载地址在:ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/BUILD.37.3/
三、下载GTF注释文件
NCBI:
ftp://ftp.ncbi.nlm.nih.gov/genomes/Homo_sapiens/ARCHIVE/
ENSEMBL:ftp://ftp.ensembl.org/pub/release75/gtf/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz
UCSC需要自己选择一系列参数:
http://genome.ucsc.edu/cgi-bin/hgTables
clade: Mammal
genome: Human
assembly: Feb. 2009 (GRCh37/hg19)
group: Genes and Gene Predictions
track: UCSC Genes
table: knownGene
region: Select "genome" for the entire genome.
output format: GTF - gene transfer format
output file: hg19_ucsc.gtf
Click 'get output'.