JS函数charCodeAt的Lua实现

JS函数charCodeAt的Lua实现

charCodeAt by Lua

@(Lua JavaScript charCodeAt)

I wanted to have a function charCodeAt in Lua ,and it should works exactly like javascript
but with Lua5.1 ,UTF8 and Unicode are not supported,

1: how charCodeAt works in javascript

to show Console press F12 in Chrome( MAC:CMD+alt+J)

[
'你'.charCodeAt(0),
'ñ'.charCodeAt(0),
'n'.charCodeAt(0)
]

it will output [20320, 241, 110] ,it means the numeric value of Unicode , '你'=20320 , 'ñ'=241, 'n'=110.

The charCodeAt() method returns the numeric Unicode value of the character at the given index (except for unicode codepoints > 0x10000).

according to alexander-yakushev we can know how many bytes one UTF8 word takes using function utf8.charbytes
[https://github.com/alexander-yakushev/awesompd/blob/master/utf8.lua]

function utf8.charbytes (s, i)
   -- argument defaults
   i = i or 1
   local c = string.byte(s, i) 
   -- determine bytes needed for character, based on RFC 3629
   if c > 0 and c <= 127 then
      -- UTF8-1 byte
      return 1
   elseif c >= 194 and c <= 223 then
      -- UTF8-2 byte
      return 2
   elseif c >= 224 and c <= 239 then
      -- UTF8-3 byte
      return 3
   elseif c >= 240 and c <= 244 then
      -- UTF8-4 byte
      return 4
   end
end

Unicode & UTF8 convert method

Unicode code range UTF-8 code example
hex code binary code char
0000 0000-0000 007F 0xxxxxxx n(alphabet)
0000 0000-0000 007F 110xxxxx 10xxxxxx ñ
0000 0080-0000 07FF 1110xxxx 10xxxxxx 10xxxxxx (most CJK)
0001 0000-0010 FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx other chars

but we should pay attention to 4 bytes UTF8[emoji], it works not that simple

special Method

javascript engine using UTF16,characters in Basic Multilingual Plane were the same with unicode, but if the characters were in Supplementary Plane it should use the formula below,usually we encounter Supplementary Plane emoji like😝 (4 byte UTF8 character)

-- formula 1
H = Math.floor((c-0x10000) / 0x400)+0xD800 
L = (c - 0x10000) % 0x400 + 0xDC00

code is here

https://github.com/lilien1010/lua-bit

Feedback & Bug Report


Thank you for reading this , if you got any better idea, share it.

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • 思不见, 幽深碧海接天流,疏离月下挂龙钩。 危林猛生天地增,金阳骤起万兽游。 庞鱼腾浪吞鲲鹏,恶虎拦空撕飞虬。 天...
    弄情阅读 649评论 66 97
  • 昨日做下的事: 去娄底见一个朋友,聊了很久。 看专栏文章,写读后感。 写日记总结,练双截棍。 做做熟人市场的宽带预...
    文建伟CZYH阅读 400评论 0 0
  • 1. /proc/kallsyms列出了linux内核导出的所有符号及对应的地址。 基本格式是: 逻辑地址 标识 ...
    WebSSO阅读 426评论 0 0