做纯生信数据挖掘一定要验证吗?不一定,因为一堆没有验证的文章照样发表了。补充验证一定可以加分?也不一定,看期刊主编和审稿人的。下面这两个案例就是因为补充了验证,差点对审稿人怼哭了,被拒稿的可能性也很大。
案例1
Reviewer 1:
The author used more than 500 cases of TCGA data to build the model, and then used 102 cases of GEO data to verify the model. There is too much difference between the two, and the verification part does not have much meaning. Therefore, this article is not suitable for publication.
案例2
Section Editor's Comments to Author:
These key genes were screened out using hundreds of TCGA data, which was based on difference analysis. However, the author collects only 20 samples from hospitals, which will lead to a lack of reliability in the verification results, which may affect the results of the entire article. Therefore, authors should increase the sample size for verification to increase the credibility of the paper's results.
第一案例作者是使用TCGA数据建模,然后使用GEO数据进行验证,正常的情况下,补充验证是加分的,结果被这位审稿人建议拒稿处理,因为审稿人认为GEO的样本量太少了。一般来说GEO数据的样本量都是比较少的,有的甚至只有几个样本,或者十来个。如果有条件的话,谁不愿意找上千样本的。
第二个案例就是期刊编辑认为作者使用自己收集的临床样本来验证这些基因太少了,认为缺乏一定的可信度,需要作者回去增大样本量来验证。这编辑说得倒是轻巧了,完全没有考虑到作者的情况。第一,增加样本无疑就是增加时间,而且有时间样本也是有限的,不是你想拿就能拿到;第一,增加样本测序就是增加科研经费,不是谁都有这么多钱。例如如果只有20万的青年基金经费,你前面花完了就是花完,哪有钱再进行测序。因为样本不是你想加就能加的,基本上都是由人力物力、时间来决定的。