Java Lexcial Structure

Lexical analysis

lexical analysis is the process of translation from a raw Unicode character stream to a sequence of tokens. The tokens are the terminal symbols of the syntactic grammar. A program that perform lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer. In detail, there are three steps in turn :

  1. translate all Unicode escapes to the corresponding Unicode character, for example, translate \n to 0A
  2. recognize line terminators to separate the stream resulting from step 1 to the input characters and terminators, this step will save line numbers of source code so that you can debug your program by some error message with corresponding line number
  3. split result from step 2 to white space (including line terminator), comments and tokens , and then tokens are reserved

Tokens

Token is a very important concept in compiler. Java tokens contain :

  • Identifier
  • Keyword
  • Literal
  • Separator
  • Operator

The Tokens are non-terminal symbols of the lexical grammar with characters as terminal symbols, like this :

BooleanLiteral:
       true
       false

but the terminal symbols of the syntactic grammar. A parser which analyze the syntax of programming language uses token stream as input, and abstract syntax tree (AST) as output.

References

  1. https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html
  2. https://en.wikipedia.org/wiki/Lexical_analysis
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 灯红酒绿 翩翩起舞的蝴蝶 通过风传来诱惑 而我只是与长头发的杨柳一起 望着小河 夏天用雨用浪表达着热情 我也希望心...
    舒严阅读 147评论 0 0
  • 地表鸡卵随风熟,远方脚步无可阻。 碧海蓝天畅辽阔,山青水秀载桃花。 普陀梵音响竹苑,禅寺孕妇沾圣水。 起早贪黑逐日...
    然1314阅读 170评论 0 0
  • iPhone 8、X发布前,马化腾带领高管去了趟苹果的总部,虽然不知道他们跟库克聊了什么,但肯定是很愉快的。 上周...
    理想三旬_63d1阅读 209评论 0 0
  • 今天才发现微信storage增加了“手机已用空间”的展示,刚看到时有一种“wow,真贴心真强大”的感觉,因为查看系...
    妖精王国的猫小姐阅读 655评论 2 0