新开了jimmy老师开的服务器,下载数据还是跑流程,感觉飞起来了一下,感激万分。
言归正传:下面是新的流程:
文章:两篇:
①、Cancers | Free Full-Text | Targeting Palbociclib-Resistant Estrogen Receptor-Positive Breast Cancer Cells via Oncolytic Virotherapy | HTML
https://www.mdpi.com/2072-6694/11/5/684/htm
②、Genes | Free Full-Text | Transcriptomic Profiling Identifies Differentially Expressed Genes in Palbociclib-Resistant ER+ MCF7 Breast Cancer Cells
https://www.mdpi.com/2073-4425/11/4/467
主要是研究:差异表达的基因和参与对palbociclib耐药性发展的途径(乳腺癌)
流程:
1、下载数据:
conda create -n rnaseq python=2 bwa
source activate rnaseq
#安装软件aspera
wget [http://d3gcli72yxqn2z.cloudfront.net/connect/bin/aspera-connect-3.5.1.92523-linux-64.tar.gz](http://d3gcli72yxqn2z.cloudfront.net/connect/bin/aspera-connect-3.5.1.92523-linux-64.tar.gz)
tar zxf aspera-connect-3.5.1.92523-linux-64.tar.gz
bash aspera-connect-3.5.1.92523-linux-64.sh
echo 'PATH=$PATH:~/.aspera/connect/bin/' >> ~/.bashrc
source ~/.bashrc
ascp --help
#注意下一下,这里用bash。而不是sh。之前用sh会弹出很多错误
############
#用aspera下载数据。比prefetch,应该快不少
cat 'SRR_Acc_List (1).txt'|while read id
do
x=$(echo $id | cut -b1-6)
y=$(echo $id | cut -b10-10)
echo $id
ascp -QT -l 300m -P33001 -i \
~/.aspera/connect/etc/asperaweb_id_dsa.openssh \
[era-fasp@fasp.sra.ebi.ac.uk](mailto:era-fasp@fasp.sra.ebi.ac.uk):/vol1/fastq/$x/00$y/$id/ ./
done
gzip -d SRR*gz
2、质控
#安装软件:
conda install -y sra-tools
conda install -c bioconda multiqc
#质检
ls *fastq|xargs fastqc -t 10
multiqc .
浏览器打开multiqc_report.html。
3、比对:
#软件用hisat2吧
conda install -y hisat2
conda install -samtools
########
#比对
for ((i=23;i<=34;i++));
do
hisat2 -p 6 -x /home/data/server/reference/index/hisat/hg38/genome
-U /home/data/gmb29/data/chip_seq/SRR89845${i}.fastq
-S SRR89845${i}.sam ;
done
4、sam转bam,bam排序,计数gtf
#sam转bam
for ((i=23;i<=34;i++));
do
samtools view -@ 6 -bS -h SRR89845${i}.sam > SRR89845${i}.bam ;
done
###################
#bam排序
for ((i=23;i<=34;i++));
do
samtools sort -@ 6 SRR89845${i}.bam -o SRR89845${i}.sort ;
done
#计数gft
featureCounts -T 6 -t \
-t exon -g gene_id \
-a /home/data/server/reference/gtf/ensembl/Homo_sapiens.GRCh38.98.chr.gtf.gz -o all.id.txt *.sort
得到了all -id的计数。
原文的流程是:tophat2 ——cufflinks