学习GEO芯片数据下载时踩了各种坑。记录如下:
跟从老师讲解,尝试使用GEOquery下载:
library('GEOquery')
library(dplyr)
library(tidyverse)
gset <- getGEO(GEO='GSE87211', destdir=".", getGPL = F)
### destdir存储目录位置,getGPL=F为拒绝下载注释文件
报错。下载龟速,且报错 Timeout of 60 seconds was reached
Found 3 file(s)
GSE12417-GPL570_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE12nnn/GSE12417/matrix/GSE12417-GPL570_series_matrix.txt.gz'
Content type 'application/x-gzip' length 23572020 bytes (22.5 MB)
========================
> options(timeout=60)
> gset <- getGEO(GEO='GSE87211', destdir=".",getGPL = F)
Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)
downloaded 688 KB
Error in download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
download from 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz' failed
In addition: Warning messages:
1: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
downloaded length 704512 != reported length 35235899
2: In download.file(sprintf("https://ftp.ncbi.nlm.nih.gov/geo/series/%s/%s/matrix/%s", :
URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz': Timeout of 60 seconds was reached
解决Timeout of 60 seconds was reached(我的Rstudio server原先设定等待时间仅为60s)
#查看timout时间
> getOption('timeout')
[1] 60
#设定timeout时间
> options(timeout=100000)
##确认一下
> getOption('timeout')
[1] 1e+05
再次运行GEOquery的getGEO。代码顺利运行,但因某些原因仍下载龟速。
image.png
有人提出解决方案:
options( 'download.file.method.GEOquery' = 'libcurl' )
## libcurl LibCurl是免费的URL传输库
仅有一点点改善,依然龟速。
求助百度,尝试使用geoChina代码。此代码基于AnnoProbe包。先安装AnnoProbe。
> install.packages('AnnoProbe')
> library(AnnoProbe)
#更新镜像库
> devtools::install_git("https://gitee.com/jmzeng/GEOmirror")
#使用中国镜像下载GEO数据
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
#此处mirror仅有企鹅源
下载成功。
Found 1 file(s)
GSE87211_series_matrix.txt.gz
trying URL 'https://ftp.ncbi.nlm.nih.gov/geo/series/GSE87nnn/GSE87211/matrix/GSE87211_series_matrix.txt.gz'
Content type 'application/x-gzip' length 35235899 bytes (33.6 MB)
==
> gset <- AnnoProbe::geoChina(gse='GSE87211', mirror = 'tencent', destdir = '.')
trying URL 'http://49.235.27.111/GEOmirror/GSE87nnn/GSE87211_eSet.Rdata'
Content type 'application/octet-stream' length 31922908 bytes (30.4 MB)
==================================================
downloaded 30.4 MB
file downloaded in .
you can also use getGEO from GEOquery, by
getGEO('GSE87211', destdir=".", AnnotGPL = F, getGPL = F)
>
image.png
经比对,与getGEO代码下载所得数据没有差异。