脚本 | R | 转录组 counts 2 tpm

看了一些文章,方法很多。比如先定义函数,再用sapply......。我是个只会用tidyverse的R低端用户,而且定义函数过程中用到了log等一些数学计算,我不是很懂,只能用最朴实的计算了。

  • featureCounts进行转录组数据定量后,得到文件*counts.txt,整理成一个counts矩阵,并将exon length信息添加进去。
awk '{if($0~/#/ || $0~/Geneid/)next;print FILENAME"\t"$1"\t"$6"\t"$7}' *.counts.txt | \
awk '{if(a[$2]=="")a[$2]=$2"\t"$3"\t"$4;else a[$2]=a[$2]"\t"$4}END{for(i in a)print a[i]}' > All.counts.Length.tsv

这里添加FILENAME其实没太大必要,只是想确认顺序没问题。

  • 利用ls *counts.txt | sed 's/.counts.txt//g' | xargs | sed 's/ /\\t/g'获得表头,然后添加表头。
$ sed -i '1i Geneids\tLength\t.............' All.counts.Length.tsv
$ head -5 All.counts.Length.tsv | column -t
Geneids     Length  S01     S02     S03      S04     S05     S06     S07     S08     S09     S10     S11     S12     S13     S14     S15     S16     S17     S18     S19     S20     S21     S22     S23     S24     S25     S26     S27     S28     S29     S30     S31     S32     S33
gene-1      1932    53      225     41       63      27      11      69      42      118     94      65      23      117     45      111     113     36      28      30      26      42      19      49      12      37      52      51      67      35      46      43      38      189
gene-2      2074    761     1750    1246     1270    415     555     1218    588     736     882     1048    731     1238    802     746     1143    963     827     907     620     822     831     527     652     550     682     1474    1125    693     771     1041    1074    1286
gene-3      999     0       0       0        0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0       0
gene-4      1689    48      143     64       62      52      49      131     74      99      46      91      64      98      80      65      151     126     17      32      63      92      77      16      6       34      37      91      76      65      35      124     99      168

这里比较笨办法,因为每个人文件命名的区别。

  • R进行counts转tpm。
 # 导入包
library(tidyverse)

# 导入数据
countsMatrix <- read_tsv("All.counts.Length.tsv")

# 宽变长,按样本分组,根据counts值和length计算tpm,长变宽
tpmMatrix <- countsMatrix %>%
  pivot_longer(c(-Geneids, -Length),
               names_to = "Group",
               values_to = "SampleCounts") %>%
  group_by(Group) %>%
  mutate(SampleTPM = (((SampleCounts/Length)*1e6)/sum(SampleCounts/Length))) %>%
  pivot_wider(id_cols = "Geneids",
              names_from = "Group",
              values_from = "SampleTPM")

# 导出tpm矩阵
write_tsv(tpmMatrix, "All.tpm.tsv")
$ head -5 All.tpm.tsv | column -t
Geneids     S01                   S02                   S03                   S04                   S05                   S06                   S07                   S08                   S09                   S10                   S11                    S12                   S13                   S14                   S15                   S16                   S17                   S18                   S19                   S20                   S21                   S22                   S23                   S24                   S25                   S26                   S27                   S28                   S29                   S30                   S31                   S32                   S33
gene-1      1.5238507710210807    6.310542071951338     1.2404535842920152    2.0663001763869144    0.9743108982268808    0.33974128224220745   2.019076017077262     1.4929440706884478    3.5567586201299908    3.2808012279442056    2.0141367829381984     0.7273476663573547    3.745576833513508     1.4293891079195502    3.86430047708595      3.3438601600173263    1.1944017312413042    0.8918109644551627    1.0411976601193402    0.9440828265392236    1.2502476891288166    0.627534624837922     1.6827401326544236    0.41501120242179135   1.198653312088212     1.7465312446943184    1.5573020278860934    2.2766184024976206    1.231820162316741     1.4683053863083524    1.3969678488346147    1.2888290768160424    5.910887556087811
gene-2      20.382131364015265    45.72151022236682     35.11664958446983     38.8020753148581      13.950194506595745    15.967870048962455    33.20085269354047     19.47017898938338     20.66562620912296     28.676029625384178    30.250688030110204     21.53426041707185     36.91916441916996     23.730707861496388    24.192742740279144    31.50752314249931     29.7627173926142      24.53684140053276     29.323620195611745    20.971370316403522    22.79381177499461     25.567218695129807    16.858928073760243    21.005089653143823    16.59789164983364     21.33810070431632     41.92745742194797     35.60953739855625     22.720132961041315    22.925103673788364    31.504098374003306    33.93238455898632     37.46538779255658
gene-3      0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                      0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0                     0
gene-4      1.5786479115856311    4.587727969134735     2.214899923327977     2.326065957393124     2.146419535545147     1.7311280320418376    4.3848258322317335    3.0088701104934734    3.413383246304719     1.836484932100434     3.225480858786998      2.315110156505115     3.588693736495643     2.906734826919883     2.588443793237929     5.111214219034013     4.7818499151294835    0.6193571706962143    1.2703967661183757    2.6167050442922104    3.1326514034299304    2.9090573970704754    0.6285190244271279    0.23735987065686825   1.2599353329885021    1.421517505758171     3.178495038562206     2.9539728822630242    2.61679736080075      1.277921206556115     4.6080492600494445    3.8408239049024777    6.010043948877915
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容