Java.lang.String 源码精度

类的定义

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence {
/** The value is used for character storage. */
private final char value[];

/** Cache the hash code for the string */
private int hash; // Default to 0

/** use serialVersionUID from JDK 1.0.2 for interoperability */
private static final long serialVersionUID = -6849794470754667710L;

private static final ObjectStreamField[] serialPersistentFields =
    new ObjectStreamField[0];

public String() {
    this.value = "".value;
}

public String(String original) {
    this.value = original.value;
    this.hash = original.hash;
}

public String(char value[]) {
    this.value = Arrays.copyOf(value, value.length);
}

public String(char value[], int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count <= 0) {
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        if (offset <= value.length) {
            this.value = "".value;
            return;
        }
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > value.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }
    this.value = Arrays.copyOfRange(value, offset, offset+count);
}

public String(int[] codePoints, int offset, int count) {
    if (offset < 0) {
        throw new StringIndexOutOfBoundsException(offset);
    }
    if (count <= 0) {
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        if (offset <= codePoints.length) {
            this.value = "".value;
            return;
        }
    }
    // Note: offset or count might be near -1>>>1.
    if (offset > codePoints.length - count) {
        throw new StringIndexOutOfBoundsException(offset + count);
    }

    final int end = offset + count;

    // Pass 1: Compute precise size of char[]
    int n = count;
    for (int i = offset; i < end; i++) {
        int c = codePoints[i];
        if (Character.isBmpCodePoint(c))
            continue;
        else if (Character.isValidCodePoint(c))
            n++;
        else throw new IllegalArgumentException(Integer.toString(c));
    }

    // Pass 2: Allocate and fill in char[]
    final char[] v = new char[n];

    for (int i = offset, j = 0; i < end; i++, j++) {
        int c = codePoints[i];
        if (Character.isBmpCodePoint(c))
            v[j] = (char)c;
        else
            Character.toSurrogates(c, v, j++);
    }

    this.value = v;
}

private static void checkBounds(byte[] bytes, int offset, int length) {
    if (length < 0)
        throw new StringIndexOutOfBoundsException(length);
    if (offset < 0)
        throw new StringIndexOutOfBoundsException(offset);
    if (offset > bytes.length - length)
        throw new StringIndexOutOfBoundsException(offset + length);
}

public String(byte bytes[], int offset, int length, String charsetName)
        throws UnsupportedEncodingException {
    if (charsetName == null)
        throw new NullPointerException("charsetName");
    checkBounds(bytes, offset, length);
    this.value = StringCoding.decode(charsetName, bytes, offset, length);
}

public String(byte bytes[], int offset, int length, Charset charset) {
    if (charset == null)
        throw new NullPointerException("charset");
    checkBounds(bytes, offset, length);
    this.value =  StringCoding.decode(charset, bytes, offset, length);
}

public String(byte bytes[], String charsetName)
        throws UnsupportedEncodingException {
    this(bytes, 0, bytes.length, charsetName);
}

public String(byte bytes[], Charset charset) {
    this(bytes, 0, bytes.length, charset);
}

public String(byte bytes[], int offset, int length) {
    checkBounds(bytes, offset, length);
    this.value = StringCoding.decode(bytes, offset, length);
}

public String(byte bytes[]) {
    this(bytes, 0, bytes.length);
}

public String(StringBuffer buffer) {
    synchronized(buffer) {
        this.value = Arrays.copyOf(buffer.getValue(), buffer.length());
    }
}

public String(StringBuilder builder) {
    this.value = Arrays.copyOf(builder.getValue(), builder.length());
}

String(char[] value, boolean share) {
    // assert share : "unshared not supported";
    this.value = value;
}

public int length() {
    return value.length;
}

public boolean isEmpty() {
    return value.length == 0;
}

}
这也是一个用 final 声明的常量类，不能被任何类所继承,而且一旦一个String对象被创建, 包含在这个对象中的字符序列是不可改变的, 包括该类后续的所有方法都是不能修改该对象的，直至该对象被销毁，这是我们需要特别注意的（该类的一些方法看似改变了字符串，其实内部都是创建一个新的字符串）。接着实现了 Serializable接口，这是一个序列化标志接口，还实现了 Comparable 接口，用于比较两个字符串的大小（按顺序比较单个字符的ASCII码），后面会有具体方法实现；最后实现了 CharSequence 接口，表示是一个有序字符的集合。

在Java中，String是一个引用类型，它本身也是一个class。但是，Java编译器对String有特殊处理，即可以直接用"..."来表示一个字符串：

String s1 = "Hello!";
实际上字符串在String内部是通过一个char[]数组表示的，因此，按下面的写法也是可以的：

String s2 = new String(new char[] {'H', 'e', 'l', 'l', 'o', '!'});
因为String太常用了，所以Java提供了"..."这种字符串字面量表示方法。

Java字符串的一个重要特点就是字符串不可变。这种不可变性是通过内部的private final char[]字段，以及没有任何修改char[]的方法实现的。

String 类是用 final 关键字修饰的，所以我们认为其是不可变对象。但是真的不可变吗？

每个字符串都是由许多单个字符组成的，我们知道其源码是由 char[] value 字符数组构成。

value 被 final 修饰，只能保证引用不被改变，但是 value 所指向的堆中的数组，才是真实的数据，只要能够操作堆中的数组，依旧能改变数据。而且 value 是基本类型构成，那么一定是可变的，即使被声明为 private，我们也可以通过反射来改变。

String str = "vae";
//打印原字符串
System.out.println(str);//vae
//获取String类中的value字段
Field fieldStr = String.class.getDeclaredField("value");
//因为value是private声明的，这里修改其访问权限
fieldStr.setAccessible(true);
//获取str对象上的value属性的值
char[] value = (char[]) fieldStr.get(str);
//将第一个字符修改为 V(小写改大写)
value[0] = 'V';
//打印修改之后的字符串
System.out.println(str);//Vae
　　通过前后两次打印的结果，我们可以看到 String 被改变了，但是在代码里，几乎不会使用反射的机制去操作 String 字符串，所以，我们会认为 String 类型是不可变的。

那么，String 类为什么要这样设计成不可变呢？我们可以从性能以及安全方面来考虑：

安全

引发安全问题，譬如，数据库的用户名、密码都是以字符串的形式传入来获得数据库的连接，或者在socket编程中，主机名和端口都是以字符串的形式传入。因为字符串是不可变的，所以它的值是不可改变的，否则黑客们可以钻到空子，改变字符串指向的对象的值，造成安全漏洞。

保证线程安全，在并发场景下，多个线程同时读写资源时，会引竞态条件，由于 String 是不可变的，不会引发线程的问题而保证了线程。

HashCode，当 String 被创建出来的时候，hashcode也会随之被缓存，hashcode的计算与value有关，若 String 可变，那么 hashcode 也会随之变化，针对于 Map、Set 等容器，他们的键值需要保证唯一性和一致性，因此，String 的不可变性使其比其他对象更适合当容器的键值。

性能

当字符串是不可变时，字符串常量池才有意义。字符串常量池的出现，可以减少创建相同字面量的字符串，让不同的引用指向池中同一个字符串，为运行时节约很多的堆内存。若字符串可变，字符串常量池失去意义，基于常量池的String.intern()方法也失效，每次创建新的 String 将在堆内开辟出新的空间，占据更多的内存。

Java没有内置的字符串类型，标准的java类库中提供了一个预定义类，很自然的叫做String，每个用双引号括起来的字符串都是String类的一个实例。

与绝大多数程序设计语言一样，Java语言运行使用+好连接(拼接)两个字符串。当将一个字符串与一个非字符串的值进行拼接时，后者被转换成字符串(任意一个Java对象都可以转换成字符串)。

如果考虑把多个字符串放在一起，用一个定界符分隔，可以使用静态join方法。

String all = String.join("/","S","M","L","XL")
// S/M/L/XL
string：首先，string是引用类型，存放在堆内存中，有“不可变性”的特性（驻留池机制），但是在做字符串拼接时，每次都会创建一个新对象，也就是每次都要去申请内存空间，因为做大量字符串拼接时性能很差，只适合做少量的字符串拼接。

StringBuilder：微软在string的基础上对StringBuilder做了优化，不会每次都去申请内存，而是一下子就申请一大块内存，做大量字符串拼接性能非常高。

一个 String 字符串实际上是一个 char 数组。

构造方法

String str1 = "abc";//注意这种字面量声明的区别，文末会详细介绍
String str2 = new String("abc");
String str3 = new String(new char[]{'a','b','c'});
equals(Object anObject) 方法

 public boolean equals(Object anObject) {
     if (this == anObject) {
         return true;
     }
     if (anObject instanceof String) {
         String anotherString = (String)anObject;
         int n = value.length;
         if (n == anotherString.value.length) {
             char v1[] = value;
             char v2[] = anotherString.value;
             int i = 0;
             while (n-- != 0) {
                 if (v1[i] != v2[i])
                     return false;
                 i++;
             }
             return true;
         }
     }
     return false;
 }

public boolean equalsIgnoreCase(String anotherString) {
    return (this == anotherString) ? true
            : (anotherString != null)
            && (anotherString.value.length == value.length)
            && regionMatches(true, 0, anotherString, 0, value.length);
}

public boolean regionMatches(int toffset, String other, int ooffset,
        int len) {
    char ta[] = value;
    int to = toffset;
    char pa[] = other.value;
    int po = ooffset;
    // Note: toffset, ooffset, or len might be near -1>>>1.
    if ((ooffset < 0) || (toffset < 0)
            || (toffset > (long)value.length - len)
            || (ooffset > (long)other.value.length - len)) {
        return false;
    }
    while (len-- > 0) {
        if (ta[to++] != pa[po++]) {
            return false;
        }
    }
    return true;
}

public boolean regionMatches(boolean ignoreCase, int toffset,
        String other, int ooffset, int len) {
    char ta[] = value;
    int to = toffset;
    char pa[] = other.value;
    int po = ooffset;
    // Note: toffset, ooffset, or len might be near -1>>>1.
    if ((ooffset < 0) || (toffset < 0)
            || (toffset > (long)value.length - len)
            || (ooffset > (long)other.value.length - len)) {
        return false;
    }
    while (len-- > 0) {
        char c1 = ta[to++];
        char c2 = pa[po++];
        if (c1 == c2) {
            continue;
        }
        if (ignoreCase) {
            // If characters don't match but case may be ignored,
            // try converting both characters to uppercase.
            // If the results match, then the comparison scan should
            // continue.
            char u1 = Character.toUpperCase(c1);
            char u2 = Character.toUpperCase(c2);
            if (u1 == u2) {
                continue;
            }
            // Unfortunately, conversion to uppercase does not work properly
            // for the Georgian alphabet, which has strange rules about case
            // conversion.  So we need to make one last check before
            // exiting.
            if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
                continue;
            }
        }
        return false;
    }
    return true;
}

public boolean contentEquals(StringBuffer sb) {
    return contentEquals((CharSequence)sb);
}

private boolean nonSyncContentEquals(AbstractStringBuilder sb) {
    char v1[] = value;
    char v2[] = sb.getValue();
    int n = v1.length;
    if (n != sb.length()) {
        return false;
    }
    for (int i = 0; i < n; i++) {
        if (v1[i] != v2[i]) {
            return false;
        }
    }
    return true;
}

public boolean contentEquals(CharSequence cs) {
    // Argument is a StringBuffer, StringBuilder
    if (cs instanceof AbstractStringBuilder) {
        if (cs instanceof StringBuffer) {
            synchronized(cs) {
               return nonSyncContentEquals((AbstractStringBuilder)cs);
            }
        } else {
            return nonSyncContentEquals((AbstractStringBuilder)cs);
        }
    }
    // Argument is a String
    if (cs instanceof String) {
        return equals(cs);
    }
    // Argument is a generic CharSequence
    char v1[] = value;
    int n = v1.length;
    if (n != cs.length()) {
        return false;
    }
    for (int i = 0; i < n; i++) {
        if (v1[i] != cs.charAt(i)) {
            return false;
        }
    }
    return true;
}

String 类重写了 equals 方法，比较的是组成字符串的每一个字符是否相同，如果都相同则返回true，否则返回false。

  最大的差别就是String的equals方法只有在另一个对象是String的情况下才可能返回true，而contentEquals只要求另一个对象是CharSequence或其子类的对象。

public class StringTest {
public static void main(String[] args) {
String s1="123";
String s2=new String("123");
StringBuilder sb=new StringBuilder("123");
System.out.println(s1.equals(s2)); //true
System.out.println(s1.contentEquals(s2)); //true
System.out.println(s1.equals(sb)); //false
System.out.println(s1.contentEquals(sb)); //true
}
}
hashCode() 方法

 public int hashCode() {
     int h = hash;
     if (h == 0 && value.length > 0) {
         char val[] = value;

         for (int i = 0; i < value.length; i++) {
             h = 31 * h + val[i];
         }
         hash = h;
     }
     return h;
 }

String 类的 hashCode 算法很简单，主要就是中间的 for 循环，计算公式如下：

s[0]31^(n-1) + s[1]31^(n-2) + ... + s[n-1]
　　s 数组即源码中的 val 数组，也就是构成字符串的字符数组。这里有个数字 31 ，为什么选择31作为乘积因子，而且没有用一个常量来声明？主要原因有两个：

①、31是一个不大不小的质数，是作为 hashCode 乘子的优选质数之一。

②、31可以被 JVM 优化，31 * i = (i << 5) - i。因为移位运算比乘法运行更快更省性能。

charAt(int index) 方法

public char charAt(int index) {
    if ((index < 0) || (index >= value.length)) {
        throw new StringIndexOutOfBoundsException(index);
    }
    return value[index];
}

我们知道一个字符串是由一个字符数组组成，这个方法是通过传入的索引（数组下标），返回指定索引的单个字符。

public int compareTo(String anotherString) {
    int len1 = value.length;
    int len2 = anotherString.value.length;
    int lim = Math.min(len1, len2);
    char v1[] = value;
    char v2[] = anotherString.value;

    int k = 0;
    while (k < lim) {
        char c1 = v1[k];
        char c2 = v2[k];
        if (c1 != c2) {
            return c1 - c2;
        }
        k++;
    }
    return len1 - len2;
}

public static final Comparator<String> CASE_INSENSITIVE_ORDER
                                     = new CaseInsensitiveComparator();
private static class CaseInsensitiveComparator
        implements Comparator<String>, java.io.Serializable {
    // use serialVersionUID from JDK 1.2.2 for interoperability
    private static final long serialVersionUID = 8575799808933029326L;

    public int compare(String s1, String s2) {
        int n1 = s1.length();
        int n2 = s2.length();
        int min = Math.min(n1, n2);
        for (int i = 0; i < min; i++) {
            char c1 = s1.charAt(i);
            char c2 = s2.charAt(i);
            if (c1 != c2) {
                c1 = Character.toUpperCase(c1);
                c2 = Character.toUpperCase(c2);
                if (c1 != c2) {
                    c1 = Character.toLowerCase(c1);
                    c2 = Character.toLowerCase(c2);
                    if (c1 != c2) {
                        // No overflow because of numeric promotion
                        return c1 - c2;
                    }
                }
            }
        }
        return n1 - n2;
    }

    /** Replaces the de-serialized object. */
    private Object readResolve() { return CASE_INSENSITIVE_ORDER; }
}

public int compareToIgnoreCase(String str) {
    return CASE_INSENSITIVE_ORDER.compare(this, str);
}

源码也很好理解，该方法是按字母顺序比较两个字符串，是基于字符串中每个字符的 Unicode 值。当两个字符串某个位置的字符不同时，返回的是这一位置的字符 Unicode 值之差，当两个字符串都相同时，返回两个字符串长度之差。

compareToIgnoreCase() 方法在 compareTo 方法的基础上忽略大小写，我们知道大写字母是比小写字母的Unicode值小32的，底层实现是先都转换成大写比较，然后都转换成小写进行比较。

public boolean startsWith(String prefix, int toffset) {
    char ta[] = value;
    int to = toffset;
    char pa[] = prefix.value;
    int po = 0;
    int pc = prefix.value.length;
    // Note: toffset might be near -1>>>1.
    if ((toffset < 0) || (toffset > value.length - pc)) {
        return false;
    }
    while (--pc >= 0) {
        if (ta[to++] != pa[po++]) {
            return false;
        }
    }
    return true;
}

public boolean startsWith(String prefix) {
    return startsWith(prefix, 0);
}

public boolean endsWith(String suffix) {
    return startsWith(suffix, value.length - suffix.value.length);
}

字符以suffix开头或结尾则返回true。

public int indexOf(int ch) {
    return indexOf(ch, 0);
}

public int indexOf(int ch, int fromIndex) {
final int max = value.length;//max等于字符的长度
if (fromIndex < 0) {//指定索引的位置如果小于0，默认从 0 开始搜索
fromIndex = 0;
} else if (fromIndex >= max) {
//如果指定索引值大于等于字符的长度（因为是数组，下标最多只能是max-1），直接返回-1
return -1;
}

 if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {//一个char占用两个字节，如果ch小于2的16次方（65536），绝大多数字符都在此范围内
     final char[] value = this.value;
     for (int i = fromIndex; i < max; i++) {//for循环依次判断字符串每个字符是否和指定字符相等
         if (value[i] == ch) {
             return i;//存在相等的字符，返回第一次出现该字符的索引位置，并终止循环
         }
     }
     return -1;//不存在相等的字符，则返回 -1
 } else {//当字符大于 65536时，处理的少数情况，该方法会首先判断是否是有效字符，然后依次进行比较
     return indexOfSupplementary(ch, fromIndex);
 }

}

private int indexOfSupplementary(int ch, int fromIndex) {
    if (Character.isValidCodePoint(ch)) {
        final char[] value = this.value;
        final char hi = Character.highSurrogate(ch);
        final char lo = Character.lowSurrogate(ch);
        final int max = value.length - 1;
        for (int i = fromIndex; i < max; i++) {
            if (value[i] == hi && value[i + 1] == lo) {
                return i;
            }
        }
    }
    return -1;
}

public int lastIndexOf(int ch) {
    return lastIndexOf(ch, value.length - 1);
}

public int lastIndexOf(int ch, int fromIndex) {
    if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {
        // handle most cases here (ch is a BMP code point or a
        // negative value (invalid code point))
        final char[] value = this.value;
        int i = Math.min(fromIndex, value.length - 1);
        for (; i >= 0; i--) {
            if (value[i] == ch) {
                return i;
            }
        }
        return -1;
    } else {
        return lastIndexOfSupplementary(ch, fromIndex);
    }
}

private int lastIndexOfSupplementary(int ch, int fromIndex) {
    if (Character.isValidCodePoint(ch)) {
        final char[] value = this.value;
        char hi = Character.highSurrogate(ch);
        char lo = Character.lowSurrogate(ch);
        int i = Math.min(fromIndex, value.length - 2);
        for (; i >= 0; i--) {
            if (value[i] == hi && value[i + 1] == lo) {
                return i;
            }
        }
    }
    return -1;
}

public int indexOf(String str) {
    return indexOf(str, 0);
}

public int indexOf(String str, int fromIndex) {
    return indexOf(value, 0, value.length,
            str.value, 0, str.value.length, fromIndex);
}

static int indexOf(char[] source, int sourceOffset, int sourceCount,
        String target, int fromIndex) {
    return indexOf(source, sourceOffset, sourceCount,
                   target.value, 0, target.value.length,
                   fromIndex);
}

static int indexOf(char[] source, int sourceOffset, int sourceCount,
        char[] target, int targetOffset, int targetCount,
        int fromIndex) {
    if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
        fromIndex = 0;
    }
    if (targetCount == 0) {
        return fromIndex;
    }

    char first = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j]
                    == target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

public int lastIndexOf(String str) {
    return lastIndexOf(str, value.length);
}

public int lastIndexOf(String str, int fromIndex) {
    return lastIndexOf(value, 0, value.length,
            str.value, 0, str.value.length, fromIndex);
}

static int lastIndexOf(char[] source, int sourceOffset, int sourceCount,
        String target, int fromIndex) {
    return lastIndexOf(source, sourceOffset, sourceCount,
                   target.value, 0, target.value.length,
                   fromIndex);
}

static int lastIndexOf(char[] source, int sourceOffset, int sourceCount,
        char[] target, int targetOffset, int targetCount,
        int fromIndex) {
    /*
     * Check arguments; return immediately where possible. For
     * consistency, don't check for null str.
     */
    int rightIndex = sourceCount - targetCount;
    if (fromIndex < 0) {
        return -1;
    }
    if (fromIndex > rightIndex) {
        fromIndex = rightIndex;
    }
    /* Empty string always matches. */
    if (targetCount == 0) {
        return fromIndex;
    }

    int strLastIndex = targetOffset + targetCount - 1;
    char strLastChar = target[strLastIndex];
    int min = sourceOffset + targetCount - 1;
    int i = min + fromIndex;

startSearchForLastChar:
    while (true) {
        while (i >= min && source[i] != strLastChar) {
            i--;
        }
        if (i < min) {
            return -1;
        }
        int j = i - 1;
        int start = j - (targetCount - 1);
        int k = strLastIndex - 1;

        while (j > start) {
            if (source[j--] != target[k--]) {
                i--;
                continue startSearchForLastChar;
            }
        }
        return start - sourceOffset + 1;
    }
}

返回与字符串str或代码点cp匹配的第一个子串的开始位置。这个位置从索引0或formIndex开始计算，如果在原始串中不存在str，返回-1。

public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > value.length) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    int subLen = endIndex - beginIndex;
    if (subLen < 0) {
        throw new StringIndexOutOfBoundsException(subLen);
    }
    return ((beginIndex == 0) && (endIndex == value.length)) ? this
            : new String(value, beginIndex, subLen);
}

public CharSequence subSequence(int beginIndex, int endIndex) {
    return this.substring(beginIndex, endIndex);
}

①、substring(int beginIndex)：返回一个从索引 beginIndex 开始一直到结尾的子字符串。

②、 substring(int beginIndex, int endIndex)：返回一个从索引 beginIndex 开始，到 endIndex 结尾的子字符串。

public String concat(String str) {
    int otherLen = str.length();
    if (otherLen == 0) {
        return this;
    }
    int len = value.length;
    char buf[] = Arrays.copyOf(value, len + otherLen);
    str.getChars(buf, len);
    return new String(buf, true);
}

void getChars(char dst[], int dstBegin) {
    System.arraycopy(value, 0, dst, dstBegin, value.length);
}

public void getChars(int srcBegin, int srcEnd, char dst[], int dstBegin) {
    if (srcBegin < 0) {
        throw new StringIndexOutOfBoundsException(srcBegin);
    }
    if (srcEnd > value.length) {
        throw new StringIndexOutOfBoundsException(srcEnd);
    }
    if (srcBegin > srcEnd) {
        throw new StringIndexOutOfBoundsException(srcEnd - srcBegin);
    }
    System.arraycopy(value, srcBegin, dst, dstBegin, srcEnd - srcBegin);
}

该方法是将指定的字符串连接到此字符串的末尾。

首先判断要拼接的字符串长度是否为0，如果为0，则直接返回原字符串。如果不为0，则通过 Arrays 工具类（后面会详细介绍这个工具类）的copyOf方法创建一个新的字符数组，长度为原字符串和要拼接的字符串之和，前面填充原字符串，后面为空。接着在通过 getChars 方法将要拼接的字符串放入新字符串后面为空的位置。

注意：返回值是 new String(buf, true)，也就是重新通过 new 关键字创建了一个新的字符串，原字符串是不变的。这也是前面我们说的一旦一个String对象被创建, 包含在这个对象中的字符序列是不可改变的。

@param src the source array.源数组
@param srcPos starting position in the source array.源数组要复制的起始位置
@param dest the destination array.目标数组（将原数组复制到目标数组）
@param destPos starting position in the destination data.目标数组起始位置（从目标数组的哪个下标开始复制操作）
@param length the number of array elements to be copied.复制源数组的长度
@exception IndexOutOfBoundsException if copying would cause

          access of data outside array bounds.

@exception ArrayStoreException if an element in the <code>src</code>

          array could not be stored into the <code>dest</code> array

```
          because of a type mismatch.
```
@exception NullPointerException if either <code>src</code> or

          <code>dest</code> is <code>null</code>.

*/
public static native void arraycopy(Object src, int srcPos,Object dest, int destPos,int length);

        /*
         * 开始执行数组复制操作
         * 将源数组['h','e','l','l','o','w']从数组下标0开始的4位长度的数组['h','e','l','l']
         * 复制到目标数组['1','2','3','4','5','6','7','8'],从下标为3的位置开始
         */
        System.arraycopy(src,0,dest,3,4);

复制完成之后的dest目标数组为：123hell9
public String replace(char oldChar, char newChar) {
if (oldChar != newChar) {
int len = value.length;
int i = -1;
char[] val = value; /* avoid getfield opcode */

        while (++i < len) {
            if (val[i] == oldChar) {
                break;
            }
        }
        if (i < len) {
            char buf[] = new char[len];
            for (int j = 0; j < i; j++) {
                buf[j] = val[j];
            }
            while (i < len) {
                char c = val[i];
                buf[i] = (c == oldChar) ? newChar : c;
                i++;
            }
            return new String(buf, true);
        }
    }
    return this;
}

public boolean matches(String regex) {
    return Pattern.matches(regex, this);
}

public boolean contains(CharSequence s) {
    return indexOf(s.toString()) > -1;
}

public String replaceFirst(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceFirst(replacement);
}

public String replaceAll(String regex, String replacement) {
    return Pattern.compile(regex).matcher(this).replaceAll(replacement);
}

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

①、replace(char oldChar, char newChar) ：将原字符串中所有的oldChar字符都替换成newChar字符，返回一个新的字符串。

②、String replaceAll(String regex, String replacement)：将匹配正则表达式regex的匹配项都替换成replacement字符串，返回一个新的字符串。

要在字符串中替换子串，有两种方法。一种是根据字符或字符串替换：

String s = "hello";
s.replace('l', 'w'); // "hewwo"，所有字符'l'被替换为'w'
s.replace("ll", "~~"); // "he~~o"，所有子串"ll"被替换为"~~"
另一种是通过正则表达式替换：

String s = "A,,B;C ,D";
s.replaceAll("[\,\;\s]+", ","); // "A,B,C,D"
上面的代码通过正则表达式，把匹配的子串统一替换为","。

注意到contains()方法的参数是CharSequence而不是String，因为CharSequence是String的父类。

public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ". $|()[{^?*+\\", or (2)two-char String and the first char is the backslash and the second is not the ascii digit or ascii letter. */ char ch = 0; if (((regex.value.length == 1 && ".$ |()[{^?*+\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};

        // Add remaining segment
        if (!limited || list.size() < limit)
            list.add(substring(off, value.length));

        // Construct result
        int resultSize = list.size();
        if (limit == 0) {
            while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                resultSize--;
            }
        }
        String[] result = new String[resultSize];
        return list.subList(0, resultSize).toArray(result);
    }
    return Pattern.compile(regex).split(this, limit);
}

public String[] split(String regex) {
    return split(regex, 0);
}

public static String join(CharSequence delimiter, CharSequence... elements) {
    Objects.requireNonNull(delimiter);
    Objects.requireNonNull(elements);
    // Number of elements not likely worth Arrays.stream overhead.
    StringJoiner joiner = new StringJoiner(delimiter);
    for (CharSequence cs: elements) {
        joiner.add(cs);
    }
    return joiner.toString();
}

public static String join(CharSequence delimiter,
        Iterable<? extends CharSequence> elements) {
    Objects.requireNonNull(delimiter);
    Objects.requireNonNull(elements);
    StringJoiner joiner = new StringJoiner(delimiter);
    for (CharSequence cs: elements) {
        joiner.add(cs);
    }
    return joiner.toString();
}

public final class StringJoiner {
private final String prefix;
private final String delimiter;
private final String suffix;

private StringBuilder value;

private String emptyValue;

public StringJoiner(CharSequence delimiter) {
    this(delimiter, "", "");
}

public StringJoiner(CharSequence delimiter,
                    CharSequence prefix,
                    CharSequence suffix) {
    Objects.requireNonNull(prefix, "The prefix must not be null");
    Objects.requireNonNull(delimiter, "The delimiter must not be null");
    Objects.requireNonNull(suffix, "The suffix must not be null");
    // make defensive copies of arguments
    this.prefix = prefix.toString();
    this.delimiter = delimiter.toString();
    this.suffix = suffix.toString();
    this.emptyValue = this.prefix + this.suffix;
}

public StringJoiner setEmptyValue(CharSequence emptyValue) {
    this.emptyValue = Objects.requireNonNull(emptyValue,
        "The empty value must not be null").toString();
    return this;
}

@Override
public String toString() {
    if (value == null) {
        return emptyValue;
    } else {
        if (suffix.equals("")) {
            return value.toString();
        } else {
            int initialLength = value.length();
            String result = value.append(suffix).toString();
            // reset value to pre-append initialLength
            value.setLength(initialLength);
            return result;
        }
    }
}

public StringJoiner add(CharSequence newElement) {
    prepareBuilder().append(newElement);
    return this;
}

public StringJoiner merge(StringJoiner other) {
    Objects.requireNonNull(other);
    if (other.value != null) {
        final int length = other.value.length();
        // lock the length so that we can seize the data to be appended
        // before initiate copying to avoid interference, especially when
        // merge 'this'
        StringBuilder builder = prepareBuilder();
        builder.append(other.value, other.prefix.length(), length);
    }
    return this;
}

private StringBuilder prepareBuilder() {
    if (value != null) {
        value.append(delimiter);
    } else {
        value = new StringBuilder().append(prefix);
    }
    return value;
}

public int length() {
    // Remember that we never actually append the suffix unless we return
    // the full (present) value or some sub-string or length of it, so that
    // we can add on more if we need to.
    return (value != null ? value.length() + suffix.length() :
            emptyValue.length());
}

}
类似用分隔符拼接数组的需求很常见，所以Java标准库还提供了一个StringJoiner来干这个事：

public class Main {
public static void main(String[] args) {
String[] names = {"Bob", "Alice", "Grace"};
var sj = new StringJoiner(", ");
for (String name : names) {
sj.add(name);
}
System.out.println(sj.toString());
}
}
慢着！用StringJoiner的结果少了前面的"Hello "和结尾的"!"！遇到这种情况，需要给StringJoiner指定“开头”和“结尾”：

public class Main {
public static void main(String[] args) {
String[] names = {"Bob", "Alice", "Grace"};
var sj = new StringJoiner(", ", "Hello ", "!");
for (String name : names) {
sj.add(name);
}
System.out.println(sj.toString());
}
}
StringJoiner内部实际上就是使用了StringBuilder，所以拼接效率和StringBuilder几乎是一模一样的。

String还提供了一个静态方法join()，这个方法在内部使用了StringJoiner来拼接字符串，在不需要指定“开头”和“结尾”的时候，用String.join()更方便：

String[] names = {"Bob", "Alice", "Grace"};
var s = String.join(", ", names);
　　split(String regex) 将该字符串拆分为给定正则表达式的匹配。split(String regex , int limit) 也是一样，不过对于 limit 的取值有三种情况：

①、limit > 0 ，则pattern（模式）应用n - 1 次

String str = "a,b,c";
String[] c1 = str.split(",", 2);
System.out.println(c1.length);//2
System.out.println(Arrays.toString(c1));//{"a","b,c"}
　　②、limit = 0 ，则pattern（模式）应用无限次并且省略末尾的空字串

String str2 = "a,b,c,,";
String[] c2 = str2.split(",", 0);
System.out.println(c2.length);//3
System.out.println(Arrays.toString(c2));//{"a","b","c"}
　　③、limit < 0 ，则pattern（模式）应用无限次

String str2 = "a,b,c,,";
String[] c2 = str2.split(",", -1);
System.out.println(c2.length);//5
System.out.println(Arrays.toString(c2));//{"a","b","c","",""}
public String toLowerCase(Locale locale) {
if (locale == null) {
throw new NullPointerException();
}

    int firstUpper;
    final int len = value.length;

    /* Now check if there are any characters that need to be changed. */
    scan: {
        for (firstUpper = 0 ; firstUpper < len; ) {
            char c = value[firstUpper];
            if ((c >= Character.MIN_HIGH_SURROGATE)
                    && (c <= Character.MAX_HIGH_SURROGATE)) {
                int supplChar = codePointAt(firstUpper);
                if (supplChar != Character.toLowerCase(supplChar)) {
                    break scan;
                }
                firstUpper += Character.charCount(supplChar);
            } else {
                if (c != Character.toLowerCase(c)) {
                    break scan;
                }
                firstUpper++;
            }
        }
        return this;
    }

    char[] result = new char[len];
    int resultOffset = 0;  /* result may grow, so i+resultOffset
                            * is the write location in result */

    /* Just copy the first few lowerCase characters. */
    System.arraycopy(value, 0, result, 0, firstUpper);

    String lang = locale.getLanguage();
    boolean localeDependent =
            (lang == "tr" || lang == "az" || lang == "lt");
    char[] lowerCharArray;
    int lowerChar;
    int srcChar;
    int srcCount;
    for (int i = firstUpper; i < len; i += srcCount) {
        srcChar = (int)value[i];
        if ((char)srcChar >= Character.MIN_HIGH_SURROGATE
                && (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
            srcChar = codePointAt(i);
            srcCount = Character.charCount(srcChar);
        } else {
            srcCount = 1;
        }
        if (localeDependent ||
            srcChar == '\u03A3' || // GREEK CAPITAL LETTER SIGMA
            srcChar == '\u0130') { // LATIN CAPITAL LETTER I WITH DOT ABOVE
            lowerChar = ConditionalSpecialCasing.toLowerCaseEx(this, i, locale);
        } else {
            lowerChar = Character.toLowerCase(srcChar);
        }
        if ((lowerChar == Character.ERROR)
                || (lowerChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
            if (lowerChar == Character.ERROR) {
                lowerCharArray =
                        ConditionalSpecialCasing.toLowerCaseCharArray(this, i, locale);
            } else if (srcCount == 2) {
                resultOffset += Character.toChars(lowerChar, result, i + resultOffset) - srcCount;
                continue;
            } else {
                lowerCharArray = Character.toChars(lowerChar);
            }

            /* Grow result if needed */
            int mapLen = lowerCharArray.length;
            if (mapLen > srcCount) {
                char[] result2 = new char[result.length + mapLen - srcCount];
                System.arraycopy(result, 0, result2, 0, i + resultOffset);
                result = result2;
            }
            for (int x = 0; x < mapLen; ++x) {
                result[i + resultOffset + x] = lowerCharArray[x];
            }
            resultOffset += (mapLen - srcCount);
        } else {
            result[i + resultOffset] = (char)lowerChar;
        }
    }
    return new String(result, 0, len + resultOffset);
}

public String toLowerCase() {
    return toLowerCase(Locale.getDefault());
}

public String toUpperCase(Locale locale) {
    if (locale == null) {
        throw new NullPointerException();
    }

    int firstLower;
    final int len = value.length;

    /* Now check if there are any characters that need to be changed. */
    scan: {
        for (firstLower = 0 ; firstLower < len; ) {
            int c = (int)value[firstLower];
            int srcCount;
            if ((c >= Character.MIN_HIGH_SURROGATE)
                    && (c <= Character.MAX_HIGH_SURROGATE)) {
                c = codePointAt(firstLower);
                srcCount = Character.charCount(c);
            } else {
                srcCount = 1;
            }
            int upperCaseChar = Character.toUpperCaseEx(c);
            if ((upperCaseChar == Character.ERROR)
                    || (c != upperCaseChar)) {
                break scan;
            }
            firstLower += srcCount;
        }
        return this;
    }

    /* result may grow, so i+resultOffset is the write location in result */
    int resultOffset = 0;
    char[] result = new char[len]; /* may grow */

    /* Just copy the first few upperCase characters. */
    System.arraycopy(value, 0, result, 0, firstLower);

    String lang = locale.getLanguage();
    boolean localeDependent =
            (lang == "tr" || lang == "az" || lang == "lt");
    char[] upperCharArray;
    int upperChar;
    int srcChar;
    int srcCount;
    for (int i = firstLower; i < len; i += srcCount) {
        srcChar = (int)value[i];
        if ((char)srcChar >= Character.MIN_HIGH_SURROGATE &&
            (char)srcChar <= Character.MAX_HIGH_SURROGATE) {
            srcChar = codePointAt(i);
            srcCount = Character.charCount(srcChar);
        } else {
            srcCount = 1;
        }
        if (localeDependent) {
            upperChar = ConditionalSpecialCasing.toUpperCaseEx(this, i, locale);
        } else {
            upperChar = Character.toUpperCaseEx(srcChar);
        }
        if ((upperChar == Character.ERROR)
                || (upperChar >= Character.MIN_SUPPLEMENTARY_CODE_POINT)) {
            if (upperChar == Character.ERROR) {
                if (localeDependent) {
                    upperCharArray =
                            ConditionalSpecialCasing.toUpperCaseCharArray(this, i, locale);
                } else {
                    upperCharArray = Character.toUpperCaseCharArray(srcChar);
                }
            } else if (srcCount == 2) {
                resultOffset += Character.toChars(upperChar, result, i + resultOffset) - srcCount;
                continue;
            } else {
                upperCharArray = Character.toChars(upperChar);
            }

            /* Grow result if needed */
            int mapLen = upperCharArray.length;
            if (mapLen > srcCount) {
                char[] result2 = new char[result.length + mapLen - srcCount];
                System.arraycopy(result, 0, result2, 0, i + resultOffset);
                result = result2;
            }
            for (int x = 0; x < mapLen; ++x) {
                result[i + resultOffset + x] = upperCharArray[x];
            }
            resultOffset += (mapLen - srcCount);
        } else {
            result[i + resultOffset] = (char)upperChar;
        }
    }
    return new String(result, 0, len + resultOffset);
}

public String toUpperCase() {
    return toUpperCase(Locale.getDefault());
}

public String trim() {
    int len = value.length;
    int st = 0;
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[st] <= ' ')) {
        st++;
    }
    while ((st < len) && (val[len - 1] <= ' ')) {
        len--;
    }
    return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

public char[] toCharArray() {
    // Cannot use Arrays.copyOf because of class initialization order issues
    char result[] = new char[value.length];
    System.arraycopy(value, 0, result, 0, value.length);
    return result;
}

public static String format(String format, Object... args) {
    return new Formatter().format(format, args).toString();
}

public static String format(Locale l, String format, Object... args) {
    return new Formatter(l).format(format, args).toString();
}

public static String valueOf(char data[]) {
    return new String(data);
}

public static String valueOf(char data[], int offset, int count) {
    return new String(data, offset, count);
}

public static String copyValueOf(char data[]) {
    return new String(data);
}

public static String valueOf(float f) {
    return Float.toString(f);
}

public static String valueOf(double d) {
    return Double.toString(d);
}

public native String intern();
去除首尾空白字符

使用trim()方法可以移除字符串首尾空白字符。空白字符包括空格，\t，\r，\n：

" \tHello\r\n ".trim(); // "Hello"
注意：trim()并没有改变字符串的内容，而是返回了一个新字符串。

另一个strip()方法也可以移除字符串首尾空白字符。它和trim()不同的是，类似中文的空格字符\u3000也会被移除：

"\u3000Hello\u3000".strip(); // "Hello"
" Hello ".stripLeading(); // "Hello "
" Hello ".stripTrailing(); // " Hello"
String还提供了isEmpty()和isBlank()来判断字符串是否为空和空白字符串：

"".isEmpty(); // true，因为字符串长度为0
" ".isEmpty(); // false，因为字符串长度不为0
" \n".isBlank(); // true，因为只包含空白字符
" Hello ".isBlank(); // false，因为包含非空白字符
声明一个字符串对象的形式有两种：

①、通过“字面量”的形式直接赋值

String str = "hello";
　　②、通过 new 关键字调用构造函数创建对象

String str = new String("hello");
　　那么这两种声明方式有什么区别呢？在讲解之前，我们先介绍 JDK1.7（不包括1.7）以前的 JVM 的内存分布：

①、程序计数器：也称为 PC 寄存器，保存的是程序当前执行的指令的地址（也可以说保存下一条指令的所在存储单元的地址），当CPU需要执行指令时，需要从程序计数器中得到当前需要执行的指令所在存储单元的地址，然后根据得到的地址获取到指令，在得到指令之后，程序计数器便自动加1或者根据转移指针得到下一条指令的地址，如此循环，直至执行完所有的指令。线程私有。

②、虚拟机栈：基本数据类型、对象的引用都存放在这。线程私有。

③、本地方法栈：虚拟机栈是为执行Java方法服务的，而本地方法栈则是为执行本地方法（Native Method）服务的。在JVM规范中，并没有对本地方法栈的具体实现方法以及数据结构作强制规定，虚拟机可以自由实现它。在HotSopt虚拟机中直接就把本地方法栈和虚拟机栈合二为一。

④、方法区：存储了每个类的信息（包括类的名称、方法信息、字段信息）、静态变量、常量以及编译器编译后的代码等。注意：在Class文件中除了类的字段、方法、接口等描述信息外，还有一项信息是常量池，用来存储编译期间生成的字面量和符号引用。

⑤、堆：用来存储对象本身的以及数组（当然，数组引用是存放在Java栈中的）。

在 JDK1.7 以后，方法区的常量池被移除放到堆中了，如下：

常量池：Java运行时会维护一个String Pool（String池），也叫“字符串缓冲区”。String池用来存放运行时中产生的各种字符串，并且池中的字符串的内容不重复。

①、字面量创建字符串或者纯字符串（常量）拼接字符串会先在字符串池中找，看是否有相等的对象，没有的话就在字符串池创建该对象；有的话则直接用池中的引用，避免重复创建对象。

②、new关键字创建时，直接在堆中创建一个新对象，变量所引用的都是这个新对象的地址，但是如果通过new关键字创建的字符串内容在常量池中存在了，那么会由堆在指向常量池的对应字符；但是反过来，如果通过new关键字创建的字符串对象在常量池中没有，那么通过new关键词创建的字符串对象是不会额外在常量池中维护的。

③、使用包含变量表达式来创建String对象，则不仅会检查维护字符串池，还会在堆区创建这个对象，最后是指向堆内存的对象。

String str1 = "hello";
String str2 = "hello";
String str3 = new String("hello");
System.out.println(str1==str2);//true
System.out.println(str1==str3);//fasle
System.out.println(str2==str3);//fasle
System.out.println(str1.equals(str2));//true
System.out.println(str1.equals(str3));//true
System.out.println(str2.equals(str3));//true
　　对于上面的情况，首先 String str1 = "hello"，会先到常量池中检查是否有“hello”的存在，发现是没有的，于是在常量池中创建“hello”对象，并将常量池中的引用赋值给str1；第二个字面量 String str2 = "hello"，在常量池中检测到该对象了，直接将引用赋值给str2；第三个是通过new关键字创建的对象，常量池中有了该对象了，不用在常量池中创建，然后在堆中创建该对象后，将堆中对象的引用赋值给str3，再将该对象指向常量池。如下图所示：

注意：看上图红色的箭头，通过 new 关键字创建的字符串对象，如果常量池中存在了，会将堆中创建的对象指向常量池的引用。我们可以通过文章末尾介绍的intern()方法来验证。

使用包含变量表达式创建对象：

String str1 = "hello";
String str2 = "helloworld";
String str3 = str1+"world";//编译器不能确定为常量(会在堆区创建一个String对象)
String str4 = "hello"+"world";//编译器确定为常量，直接到常量池中引用

System.out.println(str2==str3);//fasle
System.out.println(str2==str4);//true
System.out.println(str3==str4);//fasle
　　str3 由于含有变量str1，编译器不能确定是常量，会在堆区中创建一个String对象。而str4是两个常量相加，直接引用常量池中的对象即可。

intern() 方法

当调用intern方法时，如果池中已经包含一个与该String确定的字符串相同equals(Object)的字符串，则返回该字符串。否则，将此String对象添加到池中，并返回此对象的引用。

这句话什么意思呢？就是说调用一个String对象的intern()方法，如果常量池中有该对象了，直接返回该字符串的引用（存在堆中就返回堆中，存在池中就返回池中），如果没有，则将该对象添加到池中，并返回池中的引用。

String str1 = "hello";//字面量只会在常量池中创建对象
String str2 = str1.intern();
System.out.println(str1==str2);//true

String str3 = new String("world");//new 关键字只会在堆中创建对象
String str4 = str3.intern();
System.out.println(str3 == str4);//false

String str5 = str1 + str2;//变量拼接的字符串，会在常量池中和堆中都创建对象
String str6 = str5.intern();//这里由于池中已经有对象了，直接返回的是对象本身，也就是堆中的对象
System.out.println(str5 == str6);//true

String str7 = "hello1" + "world1";//常量拼接的字符串，只会在常量池中创建对象
String str8 = str7.intern();
System.out.println(str7 == str8);//true
Java使用Unicode编码表示String和char；

转换编码就是将String和byte[]转换，需要指定编码；

转换为byte[]时，始终优先考虑UTF-8编码。

字符编码

在早期的计算机系统中，为了给字符编码，美国国家标准学会（American National Standard Institute：ANSI）制定了一套英文字母、数字和常用符号的编码，它占用一个字节，编码范围从0到127，最高位始终为0，称为ASCII编码。例如，字符'A'的编码是0x41，字符'1'的编码是0x31。

如果要把汉字也纳入计算机编码，很显然一个字节是不够的。GB2312标准使用两个字节表示一个汉字，其中第一个字节的最高位始终为1，以便和ASCII编码区分开。例如，汉字'中'的GB2312编码是0xd6d0。

类似的，日文有Shift_JIS编码，韩文有EUC-KR编码，这些编码因为标准不统一，同时使用，就会产生冲突。

为了统一全球所有语言的编码，全球统一码联盟发布了Unicode编码，它把世界上主要语言都纳入同一个编码，这样，中文、日文、韩文和其他语言就不会冲突。

Unicode编码需要两个或者更多字节表示，我们可以比较中英文字符在ASCII、GB2312和Unicode的编码：

英文字符'A'的ASCII编码和Unicode编码：

     ┌────┐

ASCII: │ 41 │
└────┘
┌────┬────┐
Unicode: │ 00 │ 41 │
└────┴────┘
英文字符的Unicode编码就是简单地在前面添加一个00字节。

中文字符'中'的GB2312编码和Unicode编码：

     ┌────┬────┐

GB2312: │ d6 │ d0 │
└────┴────┘
┌────┬────┐
Unicode: │ 4e │ 2d │
└────┴────┘
那我们经常使用的UTF-8又是什么编码呢？因为英文字符的Unicode编码高字节总是00，包含大量英文的文本会浪费空间，所以，出现了UTF-8编码，它是一种变长编码，用来把固定长度的Unicode编码变成1～4字节的变长编码。通过UTF-8编码，英文字符'A'的UTF-8编码变为0x41，正好和ASCII码一致，而中文'中'的UTF-8编码为3字节0xe4b8ad。

UTF-8编码的另一个好处是容错能力强。如果传输过程中某些字符出错，不会影响后续字符，因为UTF-8编码依靠高字节位来确定一个字符究竟是几个字节，它经常用来作为传输编码。

在Java中，char类型实际上就是两个字节的Unicode编码。如果我们要手动把字符串转换成其他编码，可以这样做：

byte[] b1 = "Hello".getBytes(); // 按ISO8859-1编码转换，不推荐
byte[] b2 = "Hello".getBytes("UTF-8"); // 按UTF-8编码转换
byte[] b2 = "Hello".getBytes("GBK"); // 按GBK编码转换
byte[] b3 = "Hello".getBytes(StandardCharsets.UTF_8); // 按UTF-8编码转换
注意：转换编码后，就不再是char类型，而是byte类型表示的数组。

如果要把已知编码的byte[]转换为String，可以这样做：

byte[] b = ...
String s1 = new String(b, "GBK"); // 按GBK转换
String s2 = new String(b, StandardCharsets.UTF_8); // 按UTF-8转换
始终牢记：Java的String和char在内存中总是以Unicode编码表示。

对于不同版本的JDK，String类在内存中有不同的优化方式。具体来说，早期JDK版本的String总是以char[]存储，它的定义如下：

public final class String {
private final char[] value;
private final int offset;
private final int count;
}
而较新的JDK版本的String则以byte[]存储：如果String仅包含ASCII字符，则每个byte存储一个字符，否则，每两个byte存储一个字符，这样做的目的是为了节省内存，因为大量的长度较短的String通常仅包含ASCII字符：

public final class String {
private final byte[] value;
private final byte coder; // 0 = LATIN1, 1 = UTF16
对于使用者来说，String内部的优化不影响任何已有代码，因为它的public方法签名是不变的。

转载于 CSDN：

原文链接：

https://blog.csdn.net/qq_35029061/article/details/100277396

Java.lang.String 源码精度

转载于 CSDN：

原文链接：

推荐阅读更多精彩内容