得赞同最多的回答分享了ICML审稿人的话,说的非常好:
The academic is not an army race. It does not really matter how fancy the model is. It does not really matter whether the model can achieve the stoa performance. The real innovation is to find something new and this work has found a fresh new perspective.
Yoshua Bengio 在Quora的回答中也非常好:
Benchmarks also play an important role to raise our attention to new methods that outperform the earlier state-of-the-art. But they should not be used to discard research that does not beat the benchmark. Otherwise we risk getting stuck in incremental research. This mindset has killed innovation in some fields I know. If something works well on a benchmark it probably means that we should pay some attention to it, but the converse is not true. You may have a great idea embodied in a method that is not currently performing great because one pesky detail is hampering it, which might be fixed next year.
There is too much importance given to comparative experimental results by machine learning reviewers these days. I believe it is a kind of laziness. It is indeed easier to check the table of results than to actually try to understand the ideas in the paper and project yourself in the potential it opens up.
有人也这样说: 如果你的paper主题是设计某种新的结构,并需要在某些公开数据库上进行实验以验证你的新模型的有效性 (比如你设计了某种新的神经网络然后在Imagenet上测试),那么这种情况下最好是至少在同类模型或方法中达到sota(确如其他答案所说并不需要干掉所有模型),否则这种新结构或者创新点的意义就会被低估,因为至少在这些公开的衡量标准下,你的模型看起来不具备足够的竞争力和影响力。当然paper至少应当跟过去最近的类似工作进行适当的比较,保险起见是至少超过某个basline。倘若过分强调paper创新而忽视performance,必然会导致伪创新和投机取巧的泛滥,也会增加论文期刊中同行评审的成本。反之则会让同行评审偷懒只看结果不看内容。
我的想法是:一般论文在做实验时,至少会在两三个数据集上进行实验。如果在所有数据集上,都不能达到SOTA,那么这个方法的先进性就会受到质疑。如果在某一个数据集上略逊色于SOTA方法,但较为接近,可以分析一下,是什么原因导致自己的方法效果稍差,同时说明,自己的方法和 SOTA 也是非常接近的。这样审稿人应该也会认可,但具体论文评审结果如何,还是要看运气了