Protocols for RNA-seq data analysis and obtain target genes

1.Using rice as an example
2.All scripts and rawdata can be found in '/data/dta/shared/rnaseqworkflow'(For lab members)

Before working :

  1. Create a root directory to store all future data
  2. Create a subdirectory , download reference genome data and annotations
  3. Use the alignment software you like to make index for genome
  4. Create other subdirectories to store different data such as raw data, matrix, script
code:
$ mkdir Drought_stress
$ mkdir Drought_stress/Rice  && cd Drought_stress/Rice
$ mkdir data matrix homology olddata reference src_rice
$ mkdir reference/IRGSP && cd  reference/IRGSP
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/fasta/oryza_sativa/dna/Oryza_sativa.IRGSP-1.0.dna.toplevel.fa.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gtf/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gtf.gz
$ wget ftp://ftp.ensemblgenomes.org/pub/release-47/plants/gff3/oryza_sativa/Oryza_sativa.IRGSP-1.0.47.gff3.gz
$ gunzip *.gz
$ module load Anaconda3 hisat2
$ mkdir hsindex
$ hisat2-build -p 8 Oryza_sativa.IRGSP-1.0.dna.toplevel.fa hsindex/IRGSP
$ module unload Anaconda3 hisat2



Workflow:

1-3 :Run on the server. 4-7:Run on personal computer. 8-9:Run on the server
  1. Find bioprojects according to drought, roots and other conditions
  2. Make a samplelist.txt and save the sra number to be downloaded under data subdirectory
  3. command : nohup sh RNAseq_workflow.sh &
code:
$ cd ~/Drought_stress/Rice/data
$ vim samplelist.txt  # Then Enter the sra number we want to download
$ cd ../src_rice
$ nohup sh RNAseq_workflow.sh &  # This script can be found in the attachment

  1. Send count files to the local for downstream analysis(The R version of the server is too high to support the R package “biomRt”)
    (We can use scp command or FileZilla software to transfer files between local and server )
  2. Build an R project and use DESeq2 and biomaRt for diff analysis and annotation in Rstudio locally
  3. Run the following R scripts in sequence :downstream.R > Deseq2analysis.R > merge_desingn.R (Whole project can be found in the attachment named Rice4.zip)
  4. Send the diff gene table and gene count table to the server,Put them in the '~/Drought_stress/Rice/homology' directory

  1. Go to src_rice subdirectory
  2. Run related scripts
code:
$ cd ~/Drought_stress/Rice/src_rice
$ nohup sh anno.sh &
$ nohup sh merge.sh &
#Scripts can be found in the attachment
# the Rice.anno.txt can be found in the attachment
# the head.txt is a Colname for the final output table which was edit  and bind  from the colname of those raw files we used.

Attention:

If you have any suggestions or comments, please contact the author via xuyp8121@mail.ustc.edu.cn
We have been looking forward to friends who have the same interests in systems biology and comparative biology !!!
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

友情链接更多精彩内容