软件:https://github.com/indrops/indrops
第一步先将软件下载下来,采用git clone https://github.com/indrops/indrops.git
根据说明先装requires,python,RSEM,bowtie,samtools,java,
再根据说明建index,
mkdir -pv DOWNLOAD_DIR
cd DOWNLOAD_DIR
# Download the soft-masked, primary assembly Genome Fasta file
wget ftp://ftp.ensembl.org/pub/release-85/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
# Download the corresponding GTF file.
wget ftp://ftp.ensembl.org/pub/release-85/gtf/homo_sapiens/Homo_sapiens.GRCh38.85.gtf.gz
# This command will go through all the steps for creating the index
python indrops.py project.yaml build_index \
--genome-fasta-gz DOWNLOAD_DIR/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz \
--ensembl-gtf-gz DOWNLOAD_DIR/Homo_sapiens.GRCh38.85.gtf.gz
跑这一步需要用到project.yaml。
这里是我配置的文件,
project_name : "test"
project_dir : "/work/03.indrop_data"
paths :
bowtie_index : "/work/03.indrop_data/DOWNLOAD_DIR" # 由于bowtie index要建的地址,一定要写到DOWNLEAD_DIR,否则会报错找不到ref。
bowtie_dir : "/software/biosoftware/bowtie-1.2.2-linux-x86_64" # 这是bowtie安装路径,下载,解压就可以了,
python_dir : "/root/anaconda2/bin" # python 安装路径,
samtools_dir : "/software/biosoftware/samtools-1.3.1/bin/samtools" #samtools 安装路径
rsem_dir : "/software/biosoftware/RSEM-1.3.1/" # rsem 安装路径
java_dir : "/usr/bin/" # java安装路径
sequencing_runs :
- name : "Test_du" # 随便起名
version : 'v1'
dir : "/work/03.indrop_data/" # 这里是data的路径
fastq_path : "{library_prefix}_{split_affix}_{read}_001.fastq.gz" read是R1,R2两个,
split_affixes : ["L007"]
libraries :
- {library_name: "L007", library_prefix: "WBJPE18020236_HMWMYCCXY_L7_WBJPE18020236_20180818_P_S1"}
# 所以fastq名称应该是 WBJPE18020236_HMWMYCCXY_L7_WBJPE18020236_20180818_P_S1_L007_R1_001.fastq.gz
parameters : # OPTIONAL PARAMETERS # 这些都是默认参数。
umi_quantification_arguments:
m : 10 #Ignore reads with more than M alignments, after filtering on distance from transcript end.
u : 1 #Ignore counts from UMI that should be split among more than U genes.
d : 600 #Maximal distance from transcript end, NOT INCLUDING THE POLYA TAIL
split-ambigs: False #If umi is assigned to m genes, add 1/m to each gene's count (instead of 1)
min_non_polyA: 15 #Require reads to align to this much non-polyA sequence. (Set to 0 to disable filtering on this parameter.)
output_arguments:
output_unaligned_reads_to_other_fastq: False
filter_alignments_to_softmasked_regions: False
# low_complexity_mask: False
bowtie_arguments:
m : 200
n : 1
l : 15
e : 80
trimmomatic_arguments:
LEADING: "28"
SLIDINGWINDOW: "4:20"
MINLEN: "16"
argument_order: ['LEADING','SLIDINGWINDOW','MINLEN']
low_complexity_filter_arguments:
max_low_complexity_fraction: 0.50