0. 什么是BOM(byte order mark
, 字节序标记)?
bom可认为是unicode编码格式的一个标识。bom的字符为\uFEFF
,不同编码格式下会encoding为不同的字节序,如下图:
bom encoding table
1. BOM作用
- 确定字节序,大端序 or 小端序(用于16-bit,32bit编码)
- 确定文本流为Unicode编码格式
- 确定当前使用的哪种Unicode编码格式
2. 细说UTF-8下的字节序
-
String.valueOf('\ufeff').getBytes("utf-8"),得到bom在utf-8下的字节序:
0xef,0xbb,0xbf
- 若某字符串起始字符为
\ufeff
,则通过String#getBytes("utf-8")产生含bom的utf-8字节数组
3. Java写入Bom示例:
1)使用PrintStream#write(int i),该方法写入的是字节,即最低位字节
- 源码:
/**
* Writes the specified byte to this stream. If the byte is a newline and
* automatic flushing is enabled then the <code>flush</code> method will be
* invoked.
*
* <p> Note that the byte is written as given; to write a character that
* will be translated according to the platform's default character
* encoding, use the <code>print(char)</code> or <code>println(char)</code>
* methods.
*
* @param b The byte to be written
* @see #print(char)
* @see #println(char)
*/
public void write(int b) {
try {
synchronized (this) {
ensureOpen();
out.write(b);
if ((b == '\n') && autoFlush)
out.flush();
}
}
catch (InterruptedIOException x) {
Thread.currentThread().interrupt();
}
catch (IOException x) {
trouble = true;
}
}
- Demo:
PrintStream out = System.out;
out.write('\ufeef'); // emits 0xef
out.write('\ufebb'); // emits 0xbb
out.write('\ufebf'); // emits 0xbf
PrintStream out = System.out;
out.write(0xef); // emits 0xef
out.write(0xbb); // emits 0xbb
out.write(0xbf); // emits 0xbf
2)PrintStream#print(char c),该方法写入的char
。
- 源码
/**
* Prints a character. The character is translated into one or more bytes
* according to the platform's default character encoding, and these bytes
* are written in exactly the manner of the
* <code>{@link #write(int)}</code> method.
*
* @param c The <code>char</code> to be printed
*/
public void print(char c) {
write(String.valueOf(c));
}
- Demo
PrintStream out = System.out;
out.print('\ufeff');
3)StringWriter.write(int c), 写入的是char
,同PrintStream#print。
- 源码
/**
* Write a single character.
*/
public void write(int c) {
buf.append((char) c);
}