目的:分析基因Znhit1在肿瘤acute myeloid leukemia与正常样本中国是否存在表达差异和甲基化差异
下载的文件按照ensemble ID整理,需要提取出Znhit1对应的基因,NCBI上查得对应的ensembl ID为ENSG00000106400
参考:
https://www.cnblogs.com/zdwu/p/9072533.html
https://cloud.tencent.com/developer/article/1512900
https://cloud.tencent.com/developer/article/1422044
https://cloud.tencent.com/developer/article/1531071
TCGA网址:https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
TCGA数据库介绍:
https://blog.csdn.net/weixin_43700050/article/details/100527245
https://gdc.cancer.gov/resources-tcga-users/tcga-code-tables/sample-type-codes
一、下载数据
setwd("/Users/asang/Desktop/TCGA_Analysis")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("TCGAbiolinks")
BiocManager::install("Rsamtools")
library(TCGAbiolinks)
query <- GDCquery(project = "TCGA-LAML",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "HTSeq - Counts")
GDCdownload(query)
query.met <- GDCquery(project ="TCGA-LAML",
legacy = TRUE,
data.category = "DNA methylation",
platform = "Illumina Human Methylation 450")
GDCdownload(query.met)
由于下载的文件按照sample放在各自的文件夹中,需要移动到一个文件夹进行操作:
#!/bin/bash
cd /Users/asang/Desktop/TCGA_Analysis/GDCdata/TCGA-LAML/harmonized/Transcriptome_Profiling/Gene_Expression_Quantification
for i in `ls`
do
cd $i
mv ./* ../
cd ../
done
由于之前下载数据的时候没有考虑到癌与癌旁的对应关系,现重新下载,参考链接:
https://mp.weixin.qq.com/s?__biz=MzA4NDAzODkzMA==&mid=2651263330&idx=1&sn=ff440567bbacae48dd41fb3d1daa8751&chksm=841ef51fb3697c098f2c84276930d46884bcb51663f11f2c1c01d822427fed9c4a5c7fe0cf2c&scene=21#wechat_redirect
https://cloud.tencent.com/developer/article/1481904
就RNA-seq而言,对照组一般是Solid Tissue Normal,而不是血,原因是由于血和肿瘤的RNA差异太大了。然而Solid Tissue Normal占的是少数,有些时候你会发现这种肿瘤根本就没有。这种情况下,就没法进行DEA分析了。
参考文献,如果遇到这样非实体瘤的情况,可以1)与正常组织进行比较,2)与正常人的骨髓夜结果进行比较
NCBI上下载
未完成,待整理。