"thought vector" understanding

In LSTM model, the hidden state of the model when it receives the end of sequence symbol "<eos>" can be viewed as the thought vector because it stores the information of the sentence, or thought, "ABC".

the explain by Geoffrey Hinton are as follows:
Thought Vectors, Deep Learning & the Future of AI

“Thought vector” is a term popularized by Geoffrey Hinton, the prominent deep-learning researcher now at Google, which is using vectors based on natural language to improve its search results.

A thought vector is like a word vector, which is typically a vector of 300-500 numbers that represent a word. A word vector represents a word’s meaning as it relates to other words (its context) with a single column of numbers.

That is, the word is embedded in a vector space using a shallow neural network like word2vec, which learns to generate the word’s context through repeated guesses.

A thought vector, therefore, is a vectorized thought, and the vector represents one thought’s relations to others. A thought vector is trained to generate a thought’s context. Just as a words are linked by grammar (a sentence is just a path drawn across words), so thoughts are linked by a chain of reasoning, a logical path of sorts.

So training an algorithm to represent any thought in its relation to others might be called the artificial construction of common sense. Given one thought, a neural network might predict the thoughts that are likely to follow, much like recurrent neural networks do with characters and words. Conversation as search.

Hinton, in a 2015 speech to the Royal Society in London, said this:
"The implications of this for document processing are very important. If we convert a sentence into a vector that captures the meaning of the sentence, then Google can do much better searches; they can search based on what's being said in a document. Also, if you can convert each sentence in a document into a vector, then you can take that sequence of vectors and [try to model] natural reasoning. And that was something that old fashioned AI could never do. If we can read every English document on the web, and turn each sentence into a thought vector, you've got plenty of data for training a system that can reason like people do. Now, you might not want it to reason like people do, but at least we can see what they would think. What I think is going to happen over the next few years is this ability to turn sentences into thought vectors is going to rapidly change the level at which we can understand documents. To understand it at a human level, we're probably going to need human level resources and we have trillions of connections [in our brains], but the biggest networks we have built so far only have billions of connections. So we're a few orders of magnitude off, but I'm sure the hardware people will fix that."
Let’s pause for a moment and consider what Hinton is saying.

Traditional, rules-based AI, a pile of if-then statements locking brittle symbols into hard-coded relationships with others, is not flexible enough to represent the world without near infinite amounts of human intervention. Symbolic logic and knowledge graphs may establish strict relations between entities, but those relations are unlikely to adapt quickly to the new.

Hinton is saying that, rather than hard-code the logical leaps that lead an AI from one thought to another, we can simply feed neural nets enough text – enough trains of thought – that they will eventually be able to mimic the thoughts expressed there, and generate their own thought trains, the context of the thoughts they’ve been fed.

This affects how well algorithms will understand natural-language queries at search engines like Google, and it will also go beyond pure search.

With the ability to associate thoughts comes the ability to converse. Thought vectors could serve as the basis for chatbots, personal assistants, and other agents whose purpose is to augment and entertain human beings. That’s the good side. The bad side is that, on the Internet, you really won’t know who’s a dog, or in this case, a bot.

If we define thought vectors loosely, we could say they are already being used to represent similar sentences in different languages, which is useful in machine translation. (In fact, improving Google Translate was one the goals that brought thought vectors about.) They are therefore independent of any particular language.

Thought vectors can also represent images, which makes them more general than, and independent of, language alone. Thus the term thought, a concept more general that the textual or visual mediums by which it is expressed. The problem with thought vectors, even if we limit ourselves to words, is that their number increases exponentially with the words used to express them. Thoughts are combinatorial. What’s more, one sentence may contain many states, or discrete elements of thought; e.g. x is-a y, or b has-a c. So every sentence might contain and mingle several thoughts.

This is important, because when we vectorize words, we index those words in a lookup table. In the massive matrix of all words, each word is a vector, and that vector is a row in the matrix. (Each column represents a feature of the word, which in a low-dimensional space would be 300-500 columns.)

Given that neural networks are already taxing current hardware to its limits, the exponentially larger costs of manipulating a dense matrix containing all thought vectors looks impractical. For now.

The autonomous conversational agents, or chatbots, so much in the news will probably require proper thought vectors to determine the most appropriate response in a given conversation. For the moment, those chatbots are unable to deliver useful, interesting and plausible responses in a complex conversation of any duration. Our best tools tend to serve up hard answers, like Watson winning at Jeopardy, or 3-5 words sentences, as you find with Google Smart Reply. As of mid-2016, we’re not capable of much more than that.

The future of this branch of AI will depend on advances in hardware, as well as advances in thought vectorization, or capturing thoughts with numbers in novel ways. (How do we discretize sentences? What are the fundamental units of thought?)

A word should be said about semantic structure. It’s possible to embed dependency and constituency based parsing in vectors. In fact, interesting work is being done at Stanford, Cornell and University of Texas, among other schools.

Advances in theory and hardware, in turn, will give us other tools to tackle natural language processing and machine conceptualization, the missing link between symbolic logic, which is abstract, and machine percetion via deep learning, which is processing concrete instances of, say, images or sounds.

Here are a few of the approaches that are being made to thought vectorization:

  • Doc2vec: Doc2Vec, paragraph vectors and sentence vectors are broadly synonymous. It doesn’t necessarily account for word order and it is generally used in associating word groups with labels (in sentiment analysis, for example)
  • Seq2seq bilingual translation and skip-thought vectors.
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 216,163评论 6 498
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,301评论 3 392
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 162,089评论 0 352
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,093评论 1 292
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,110评论 6 388
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,079评论 1 295
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,005评论 3 417
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,840评论 0 273
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,278评论 1 310
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,497评论 2 332
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,667评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,394评论 5 343
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,980评论 3 325
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,628评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,796评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,649评论 2 368
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,548评论 2 352

推荐阅读更多精彩内容