TTF字体文件格式

参考
https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html
https://github.com/StbSharp/StbTrueTypeSharp/blob/master/generation/StbTrueTypeSharp.Generator/stb_truetype.h

文件头

数据类型 字段 说明
uint32 scaler type A tag to indicate the OFA scaler to be used to rasterize this font; see the note on the scaler type below for more information.
uint16 numTables number of tables
uint16 searchRange (maximum power of 2 <= numTables)*16
uint16 entrySelector log2(maximum power of 2 <= numTables)
uint16 rangeShift numTables*16-searchRange

一般只需要解析第5、6两个字节,知道表的数量。然后就是连续的表字典,记录每个表的情况。每个字典项16字节

表字典

数据类型 字段 说明
uint32 tag 4-byte identifier
uint32 checkSum checksum for this table
uint32 offset offset from beginning of sfnt
uint32 length length of this table in byte (actual length not padded length)

一般只解析第1~4字节,知道表名字,以及9~12字节,知道表位置的偏移量

cmap表

cmap表记录着glyph中的每个字符对应的索引位置。首先是cmap的索引,4字节

数据类型 字段 说明
UInt16 version Version number (Set to zero)
UInt16 numberSubtables Number of encoding subtables

显然只有第3、4字节有意义。接着是numberSubtables个子表,每个项8字节

数据类型 字段 说明
UInt16 platformID Platform identifier
UInt16 platformSpecificID Platform-specific encoding identifier
UInt32 offset Offset of the mapping table

其中platformID的取值

Platform ID Platform Platform-specific ID
0 Unicode Indicates Unicode version.
1 Macintosh Script Manager code.
2 (reserved; do not use)
3 Microsoft Microsoft encoding.

然后解析具体的cmap表。cmap表有几种格式,然而fontforge生成的是format 4的
https://github.com/fontforge/fontforge/blob/master/contrib/fonttools/pcl2ttf.c#L494

数据类型 字段 说明
UInt16 format Format number is set to 4
UInt16 length Length of subtable in bytes
UInt16 language Language code (see above)
UInt16 segCountX2 2 * segCount
UInt16 searchRange 2 * (2**FLOOR(log2(segCount)))
UInt16 entrySelector log2(searchRange/2)
UInt16 rangeShift (2 * segCount) - searchRange
UInt16 endCode[segCount] Ending character code for each segment, last = 0xFFFF.
UInt16 reservedPad This value should be zero
UInt16 startCode[segCount] Starting character code for each segment
UInt16 idDelta[segCount] Delta for all character codes in segment
UInt16 idRangeOffset[segCount] Offset in bytes to glyph indexArray, or 0
UInt16 glyphIndexArray[variable] Glyph index array

第一个有用的字段是 segCountX2,注意这是X2后的值,代表了表中有多少个段(因为UNICODE有65536个字符,一般字体不可能把65536个符号都做满,分段只记录有的符号)。然后需要 endCode[segCount]、startCode[segCount]、idDelta[segCount]、idRangeOffset[segCount] 四个数组。startCode 和 endCode 决定了每个分段所包含的字符编码范围。如果一个分段的 idRangeOffset[i] 不为0,一个字符的index位置为
glyphIndex = *( &idRangeOffset[i] + idRangeOffset[i] / 2 + (c - startCode[i]) )

如果 idRangeOffset[i] == 0,则
glyphIndex = (idDelta[i] + c) % 65536

head表

head表中 xMin,yMin,xMax,yMax 用于描述所有字体共有的box大小。另外第50字节开始有2字节的indexToLocFormat,0表示2字节的offset,1表示4字节的offset。indexToLocFormat影响着loca表的解析(一般不是超大的字体集合,indexToLocFormat是0)

loca表

locate表记录了每个符号在glyf中的偏移量。这个表格式非常简单,只有一条条的偏移量记录

数据类型 字段 说明
uint16 offsets[n] The actual local offset divided by 2 is stored. The value of n is the number of glyphs in the font + 1. The number of glyphs in the font is found in the maximum profile table.

注意2字节的版本,偏移量是除以2存放的,也就是说每组符号的glyf都是2字节的整数倍。loca表还有4字节offset的版本

数据类型 字段 说明
uint32 offsets[n] The actual local offset is stored. The value of n is the number of glyphs in the font + 1. The number of glyphs in the font is found in the maximum profile table.

4字节版本存放的就是原始的offset

glyf表

glyf表记录了每个字符的具体矢量数据。每个字符的glyf子表,先是表头

数据类型 字段 说明
int16 numberOfContours If the number of contours is positive or zero, it is a single glyph;If the number of contours less than zero, the glyph is compound
FWord xMin Minimum x for coordinate data
FWord yMin Minimum y for coordinate data
FWord xMax Maximum x for coordinate data
FWord yMax Maximum y for coordinate data

需要从表头1、2字节知道这个字符包含了几个轮廓。对于轮廓数>=0的,称为单体字形(single glyph),否则称为复合字形(compound glyph)。对于单体字形,其具体的轮廓数据为

数据类型 字段 说明
uint16 endPtsOfContours[n] Array of last points of each contour; n is the number of contours; array entries are point indices
uint16 instructionLength Total number of bytes needed for instructions
uint8 instructions[instructionLength] Array of instructions for this glyph
uint8 flags[variable] Array of flags
uint8 or int16 xCoordinates[] Array of x-coordinates; the first is relative to (0,0), others are relative to previous point
uint8 or int16 yCoordinates[] Array of y-coordinates; the first is relative to (0,0), others are relative to previous point

要解析轮廓,首先要知道有几个顶点。endPtsOfContours指定了每个轮廓的最后一个顶点,因此最后一个轮廓的endPtsOfContours决定了一共有多少个顶点。而每个顶点的flags,影响着flags[]、xCoordinates[]、yCoordinates[]三个数组的最终长度。

Flags Bit (0 is lsb) 说明
On Curve 0 If set, the point is on the curve;Otherwise, it is off the curve.
x-Short Vector 1 If set, the corresponding x-coordinate is 1 byte long;Otherwise, the corresponding x-coordinate is 2 bytes long
y-Short Vector 2 If set, the corresponding y-coordinate is 1 byte long;Otherwise, the corresponding y-coordinate is 2 bytes long
Repeat 3 If set, the next byte specifies the number of additional times this set of flags is to be repeated. In this way, the number of flags listed can be smaller than the number of points in a character.
This x is same (Positive x-Short vector) 4 This flag has one of two meanings, depending on how the x-Short Vector flag is set.If the x-Short Vector bit is set, this bit describes the sign of the value, with a value of 1 equalling positive and a zero value negative.If the x-short Vector bit is not set, and this bit is set, then the current x-coordinate is the same as the previous x-coordinate.If the x-short Vector bit is not set, and this bit is not set, the current x-coordinate is a signed 16-bit delta vector. In this case, the delta vector is the change in x
This y is same (Positive y-Short vector) 5 This flag has one of two meanings, depending on how the y-Short Vector flag is set.If the y-Short Vector bit is set, this bit describes the sign of the value, with a value of 1 equalling positive and a zero value negative.If the y-short Vector bit is not set, and this bit is set, then the current y-coordinate is the same as the previous y-coordinate.If the y-short Vector bit is not set, and this bit is not set, the current y-coordinate is a signed 16-bit delta vector. In this case, the delta vector is the change in y
Reserved 6 - 7 Set to zero

首先要扫一遍flags数组,如果某个flag的 bit3 是1要特别注意,这意味着后面几个顶点都是相同的flag,需要往flags数组再读一个字节,知道重复的数量。把每个顶点的flag都记录下来后,就要扫 xCoordinates 数组了。扫 xCoordinates 的时候,如果该顶点的 bit1 为1,说明x_delta是1byte,否则是2bytes。在1byte的时候,bit4 代表了符号位,但是是反的,1代表正,0代表负;而在2bytes的时候,bit4代表了当前点是否和上一点一致。如果bit4为1且bit1为0,则 xCoordinates 数组中没有该点数据。扫 yCoordinates 数组也是类似的过程

©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容

  • <elf.h> 头文件定义了 ELF 可执行二进制文件的格式。这些文件包括普通的可执行文件,即可以直接执行的应用程...
    hanpfei阅读 2,054评论 0 0
  • TIFF 文件格式 本文档基于 TIFF version 6.0 描述附录:TIFF 规范 版本6.0.pdf ...
    雷震西山阅读 12,054评论 0 5
  • qcow2 镜像格式是 QEMU 模拟器支持的一种磁盘镜像。它也是可以用一个文件的形式来表示一块固定大小的块设备磁...
    卡塞尔阅读 32,252评论 0 52
  • BMP文件格式,又称为Bitmap(位图)或是DIB(Device-Independent Device,设备无关...
    我是嘻哈大哥阅读 4,900评论 0 1
  • 本文主要参考自Official MIDI Specifications[https://www.midi.org/...
    不凋花阅读 666评论 0 0