http://genome.ucsc.edu/cgi-bin/hgTables
wget -c -O mm9.refGene.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/mm9/database/refGene.txt.gz
wget -c -O mm10.refGene.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz
wget -c -O hg19.refGene.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/refGene.txt.gz
wget -c -O hg38.refGene.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz
以hg38为例,进入terminal,选定文件夹后,运行
$ wget -c -O hg38.refGene.txt.gz http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/refGene.txt.gz
得到hg38.refGene.txt.gz
解压缩,
$ gunzip hg38.refGene.txt.gz
查看里面的内容,
$ less -S hg38.refGene.txt
提取其中的带有geneID的一列,并且没有重复
#示例而已
$ awk '{print $7}' refGene.list.txt | sort -u > uniqu.refGene.list.txt
查看列数
$ wc -l uniqu.refGene.list.txt
28006 uniqu.refGene.list.txt