Machine-based translation is amazing, but hundreds of millions of people on our Pale Blue Dot can’t enjoy its benefits–because their language is nowhere to be found in the translator’s pull-down menu. Now, two new artificial intelligence systems–one from the Universidad del País Vasco (UPV) in Spain and another from Carnegie Mellon University (CMU)–promise to change all that, opening the door to true universal translators like the ones in Star Trek.
pull-down menu. 下拉菜单
opening the door to true universal translators like the ones in Star Trek.为星际迷航中那样真正宇宙通用译者的到来打开大门
To understand the potential of these new systems,To understand the potential of these new systems, it helps to know how current machine translation works. The current de facto standard is Google Translate, a system that covers 103 languages from Afrikaans to Zulu, including the top 10 languages in the world–in order, Mandarin, Spanish, English, Hindi, Bengali, Portuguese, Russian, Japanese, German, and Javanese. Google’s system uses human-supervised neural networks that compare parallel texts–books and articles that have been previously translated by humans. By comparing extremely large amounts of these parallel texts, Google Translate learns the equivalences between any two given languages, thus acquiring the ability to quickly translate between them. Sometimes the translations are funny or don’t really capture the original meaning but, in general, they are functional and, overtime, they’re getting better and better.
it helps to know how current machine translation works.首先要弄清这些机器翻译是如何工作的
current de facto standard 事实标杆
Afrikaans阿非利卡语
Mandarin汉语
Hindi印地语
Bengali孟加拉语
Portuguese 葡萄牙语
Javanese爪哇语
human-supervised neural networks 人类监督的平行网络
compare parallel texts比较平行文本
learns the equivalences between any two given languages学习任意两种指定语言之间的对等关系
don’t really capture the original meaning并不能真正反映原文的意思
they are functional 他们是有用的
overtime, they’re getting better and better.随着时间的推移,翻译质量会越来越好
Google’s approach is good, and it works. But unfortunately, it’s not universally functional. That’s because supervised training requires a very long time and a lot of supervisors–so many that Google actually uses crowdsourcing–but also because there just aren’t enough of these parallel texts translated between all the languages in the world. Consider this: According to the Ethnologue catalog of world languages, there are 6,909 living languages on Earth. 414 of those account for 94% of humanity. Since Google Translate covers 103, that leaves 6,806 languages without automated translation–311 with more than one million speakers. In total, at least eight hundred million people can’t enjoy the benefits of automated translation.
unfortunately, it’s not universally functional. 可惜不能全球通用
crowdsourcing 众包
Ethnologue catalog of world languages 世界民族语目录
there are 6,909 living languages on Earth
地球上现存有6906种语言
414 of those account for 94% of humanity 其中的414种语言的使用人数占人类总人数的94%
In total 总的来说
Most machine learning—in which neural networks and other computer algorithms learn from experience—is “supervised.”
computer algorithms 计算机算法
To start, each constructs bilingual dictionaries without the aid of a human teacher telling them when their guesses are right. That’s possible because languages have strong similarities in the ways words cluster around one another. The words for table and chair, for example, are frequently used together in all languages. So if a computer maps out these co-occurrences like a giant road atlas with words for cities, the maps for different languages will resemble each other, just with different names. A computer can then figure out the best way to overlay one atlas on another. Voilà! You have a bilingual dictionary.
如此一来,计算机就能找出将一个图谱集覆盖在另一个图谱集上的最佳方法。瞧!一本双语词典诞生了。
construct bilingual dictionaries 构建双语词典
telling them when their guesses are right判断它们的推测是否正确
in the ways words cluster around one another词汇结合方面
a computer maps out these co-occurrences like a giant road atlas with words for cities计算机将这些共现组合像一个城市巨大的路网那样描绘出来
map out 描绘
road altas 路网
maps for different languages will resemble each other不同语言的图谱就会彼此相似
overlay one atlas on another.将一个图谱集覆盖在另一个图谱集
The new systems, which use remarkably similar methods, can also translate at the sentence level. They both use two training strategies, called back translation and denoising. In back translation, a sentence in one language is roughly translated into the other, then translated back into the original language. If the back-translated sentence is not identical to the original, the neural networks are adjusted so that next time they’ll be closer. Denoising is similar to back translation, but instead of going from one language to another and back, it adds noise to a sentence (by rearranging or removing words) and tries to translate that back into the original. Together, these methods teach the networks the deeper structure of language.
called back translation 回译
denoising 去噪
roughly粗略地
translated back into the original language
转译回最初的语言
Denoising is similar to back translation去噪类似于回译
rearranging or removing words 编排或删除文字
Together, these methods teach the networks the deeper structure of language.
这两种方法相结合教会了网络更深层次的语言结构。
Both systems encode a sentence from one language into a more abstract representation before decoding it into the other language, but the Facebook system verifies that the intermediate “language” is truly abstract. Artetxe and Lample both say they could improve their results by applying techniques from the other’s system.
encode a sentence from one language into a more abstract representation将一种语言的一个句子编码成一种更加抽象的表征
verifies that the intermediate “language” is truly abstract 核实中间语言是完全抽象的
In addition to translating between languages without many parallel texts, both Artetxe and Lample say their systems could help with common pairings like English and French if the parallel texts are all the same kind, like newspaper reporting, but you want to translate into a new domain, like street slang or medical jargon. But, “This is in infancy,” Artetxe’s co-author Eneko Agirre cautions. “We just opened a new research avenue, so we don’t know where it’s heading.”
Artetxe的共同作者Eneko Agirre说,“我们刚刚开启新的研究之路,还不知道它会通向哪里。”
translating between languages 跨语言翻译
common pairings 常用匹配
a new domain 新的领域
street slang 街头俚语
medical jargon 医学术语
This is in infancy 这一切属于新生阶段
open a new research avenue 开启了新的研究之路
where it’s heading. 通向何方
One caveat? The systems are not as accurate as current parallel text deep learning systems–but the fact that a computer can guess all this without any human guidance is, like Microsoft AI expert Di He points out, nothing short incredible. We’re just scratching the surface of this new learning method. It seems very likely that sometime soon, a true universal translator that allows us to talk to anyone in their native tongue won’t just be the stuff of sci-fi.
computer can guess all this without any human guidance 电脑能在没有任何人类指导的情况下猜测所有这些事实
nothing short incredible 本身就很不可思议
scratching the surface of 浅层接触
talk to anyone in their native tongue用对方的母语交谈
stuff of sci-fi.科幻