Single cells, reverse transcription (RT) reagents, Gel Beads containing barcoded oligonucleotides, and oil are combined on a microfluidic chip to form reaction vesicles called Gel Beads in Emulsion, or GEMs. GEMs are formed in parallel within the microfluidic channels of the chip, allowing the user to process 100’s to 10,000’s of single cells in a single 7-minute Chromium™ Instrument run. It’s important to note that cells are loaded at a limiting dilution in order to maximize the number of GEMs containing a single cell to ensure a low doublet rate, while maintaining a high cell recovery rate of up to ~65%.
Each functional GEM contains a single cell, a single Gel Bead, and RT reagents. Within each GEM reaction vesicle, a single cell is lysed, the Gel Bead is dissolved to free the identically barcoded RT oligonucleotides into solution, and reverse transcription of polyadenylated mRNA occurs. As a result, all cDNAs from a single cell will have the same barcode, allowing the sequencing reads to be mapped back to their original single cells of origin. The preparation of NGS libraries from these barcoded cDNAs is then carried out in a highly efficient bulk reaction. The video below gives a great visual explanation of how this all works.
上述的过程是单细胞简单测序流程,它采用了通过polyA富集目的片段,把单个细胞的片段通过barcode进行了标识
单细胞的bam文件
Line | Tag | Description | Example |
---|---|---|---|
1 | NA | Query name | A00228:279:HFWFVDMXX:1:1113:30147:36699 |
2 | NA | Flag | 0 |
3 | NA | Reference name | chr1 |
4 | NA | Position | 14170 |
5 | NA | Mapping quality | 255 |
6 | NA | Cigar string | 1S90M |
7 | NA | Reference name of mate | * |
8 | NA | Position of mate | 0 |
9 | NA | Template length | 0 |
10 | NA | Sequence | CCACCACCCCG...CACCCTTGCAC |
11 | NA | Sequence quality | F:F:,F::F,F:FF::FFF:F,:.... |
12 | NH | Number of reported alignments for query | NH:i:3 |
13 | HI | Query hit index | HI:i:1 |
14 | AS | Alignment score | AS:i:65 |
15 | nM | Number of mismatches per pair | nM:i:10 |
16 | RG | Read group | RG:Z:run_count_1kpbmcs:0:1:HFWFVDMXX:1 |
17 | RE | Region type (E = exonic, N = intronic, I = intergenic) | RE:A:I |
18 | ? | ? | xf:i:0 |
19 | CR | Cell barcode | CR:Z:GGAGGATGTAAGTCAA |
20 | CY | Cell barcode read quality | CY:Z::FFFFFF:FFF:FFFF |
21 | CB | Cell barcode that is error-corrected and confirmed against a list of known-good barcode sequences | CB:Z:AGGTGTTAGCAATTAG-1 |
22 | UR | Unique Molecular Identifier (UMI) | UR:Z:AGCACTACCCAG |
23 | UY | UMI read quality | UY:Z:FFFFFFFFFFFF |
24 | UB | UMI that is error-corrected among other molecular barcodes with the same cellular barcode and gene alignment | UB:Z:TGCATTTATTTG |
其中细胞识别的是原来bam文件的tag标签为-CB,是细胞的identity
samtools view possorted_genome_bam.bam|cut -f 21|less -SN
补充:不是所有的第21列为CB,所以不能单纯的以列作为分割进行取舍!
samtools view possorted_genome_bam.bam |cut -f 21 |cut -d ":" -f 3 >barcode_Information.txt
1 173955
2 9044 ,
3 1199 ,,
4 201 ,,,
5 36 ,,,,
6 9 ,,,,,
7 1 ,,,,,,
8 66 #
9 4 #,
10 98 0
11 2184 AAAAAAAAAAAA
12 32 AAAAAAAAAAAC
13 86 AAAAAAAAAAAG
less -SN barcode_Information.txt|cut -d "-" -f 2| sort|uniq -c|less -SN#对它进行统计
细胞条形码CB标签包含一个gem组后缀“-1”,用于标记来自单个通道的gem。
可以对这个bam文件转换成fastq文件
https://www.jianshu.com/p/e3517ff352c8
https://cloud.tencent.com/developer/article/1677573
http://samtools.github.io/hts-specs/SAMv1.pdf
https://cloud.tencent.com/developer/article/1758985
https://www.jianshu.com/p/cdfc828d0a9d
https://www.jianshu.com/p/1cc868654499