汉字拼音和首字母

一、原理

1. 根源

所有的字符都是一张张矢量图，然后把这些矢量图放到一起存储在字库文件里（ttf）

2. 存储

每一个字符（包括各国国家的字符，当然包括汉字）有一个对应编码的唯一内码，就是存储了。然而同一个字符（例如“中”字）不同编码对应的内码不一样，Unicode编码的话在计算机里的存储的十六进制是“2D4E”，两个字节；国标码存的是“D6D0”，两个字节；utf8存的是“E4B8AD”三个字节。
Unicode码也是一种国际标准编码，采用二个字节编码，与ASCII码不兼容。目前，在网络、Windows系统和很多大型软件中得到应用。

二、具体实现

1. 算法

汉字转拼音

字符（包括汉字）在每一种编码中都有对应的值，而汉字的数量是有限的，所以可以通过穷举法，找到汉字对应的拼音，有了拼音对应的首字母也就出来了。
目前java语言里面默认char转int采用的是unicode编码，而且unicode编码每个中文占用2个字节，utf8需要占用3个字节，本代码采用的是unicode编码去实现

声调

汉语中，能够发声的元素一共是“aeiouü”(ü在输入法中通常用v代替)，发音一共有以下几种情形：

拼音中只有一个发音元素，则声调直接标在那个发音元素上，如：旅（lǚ）、一（yī）、个（gè）
拼音中出现了多个发音元素，“a”、“e”、“ou”这三种情况，声调在这个元素上，如：好（hăo）、游（yóu）、美（mĕi）、在（zài）。其它情况声调通常标在最后一个发音元素上，如：穷（qióng）

2. 实现代码

package xxx;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.UnsupportedEncodingException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.Map;
import java.util.Properties;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PinyinUtil {
    
    private static final Map<Character, String[]> cachedPinyinMap = new HashMap<Character, String[]>();
    private static final Properties unicodeProps = new Properties();

    public static void main(String[] args) throws IOException {
        String str = "茜绿旅,一、我世界";
        for (char ch : str.toCharArray()) {
            String[] pinyinArray = getPinyinArray(ch, true);
            if (pinyinArray != null) {
                System.out.println("------" + ch + "------");
                System.out.println("首字母：" + pinyinArray[0].charAt(0));
                System.out.println("首读音：" + pinyinArray[0]);
                System.out.println("全部读音" + Arrays.toString(pinyinArray));
                System.out.println("不带音调全部读音" + Arrays.toString(getPinyinArray(ch, false)));
            }
        }
    }
    
    public static String[] getPinyinArray(Character ch, boolean withTone) {
        if (cachedPinyinMap.get(ch) != null) {
            return format(cachedPinyinMap.get(ch), withTone);
        }
        
        String pinyin = unicodeProps.getProperty(Integer.toHexString(ch).toUpperCase());
        if (pinyin != null) {
            pinyin = pinyin.substring(1, pinyin.length() - 1);
            cachedPinyinMap.put(ch, pinyin.split(","));
            return format(cachedPinyinMap.get(ch), withTone);
        }
        return null;
    }
    
    private static String[] format(String[] pinyins, boolean withTone) {
        String[] formatedPinyin = new String[pinyins.length];
        int completeIndex = 0;
        if (withTone == false) {
            for (String pinyin : pinyins) {
                formatedPinyin[completeIndex++] = pinyin.substring(0, pinyin.length()-1);
            }
            return formatedPinyin;
        }
        
        char unmarkedVowel = '/';
        int indexOfUnmarkedVowel = -1;
        final char charA = 'a';
        final char charE = 'e';
        final String ouStr = "ou";
        final String allUnmarkedVowelStr = "aeiouv";
        final String allMarkedVowelStr = "āáăàaēéĕèeīíĭìiōóŏòoūúŭùuǖǘǚǜü";
        
        
        for (String pinyin : pinyins) {
            if ((indexOfUnmarkedVowel = pinyin.indexOf(charA)) != -1
                    || (indexOfUnmarkedVowel = pinyin.indexOf(charE)) != -1
                    || (indexOfUnmarkedVowel = pinyin.indexOf(ouStr)) != -1) {
                unmarkedVowel = pinyin.charAt(indexOfUnmarkedVowel);
            } else {
                for (int i = pinyin.length()-1; i > 0; i--) {
                    if (allUnmarkedVowelStr.indexOf(pinyin.charAt(i)) != -1) {
                        indexOfUnmarkedVowel = i;
                        unmarkedVowel = pinyin.charAt(indexOfUnmarkedVowel);
                        break;
                    }
                }
            }
            if (indexOfUnmarkedVowel != -1) {
                int index = allUnmarkedVowelStr.indexOf(unmarkedVowel);
                int toneNumber = Character.getNumericValue(pinyin.charAt(pinyin.length() - 1));
                char markedVowel = allMarkedVowelStr.charAt(index*5+toneNumber-1);
                StringBuffer resultBuffer = new StringBuffer();
                resultBuffer.append(pinyin.substring(0, indexOfUnmarkedVowel).replaceAll("v", "ü"));
                resultBuffer.append(markedVowel);
                resultBuffer.append(pinyin.substring(indexOfUnmarkedVowel + 1, pinyin.length() - 1).replaceAll("v", "ü"));
                formatedPinyin[completeIndex++] = resultBuffer.toString();
            } else {
                formatedPinyin[completeIndex++] = pinyins[completeIndex];
            }
        }
        return formatedPinyin;
    }
    
    static {
        try {
            init();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    
    private static void init() throws IOException {
        unicodeProps.load(Thread.currentThread().getContextClassLoader().getResourceAsStream("pinyindb/unicode_to_pinyin.txt"));
    }
    
    
    
    
}

3. 附件

链接: https://pan.baidu.com/s/1JPjSiRBOxllgdr-JoPMK3g 提取码: i5v3 复制这段内容后打开百度网盘手机App，操作更方便哦

三、扩展

如果喜欢用utf8编码实现解析汉子拼音的功能也是可以的，下面代码提供了将unicode编码汉字对应的拼音文件，转换成utf8编码对应的拼音文件，有兴趣的可以基于生成的附件实现汉子解析拼音功能。

@SuppressWarnings("unused")
    private static void generateUTF8FileFromUnicode() throws IOException {
        InputStream is = Thread.currentThread().getContextClassLoader().getResourceAsStream("pinyindb/unicode_to_pinyin.txt");
        InputStreamReader reader = new InputStreamReader(is);
        BufferedReader bf = new BufferedReader(reader);
        String line;
        Pattern pattern = Pattern.compile("(\\S+) (\\S+)");
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream("src/main/resources/pinyindb/utf8_to_pinyin.txt")));
        while ( (line = bf.readLine()) != null) {
            Matcher matcher = pattern.matcher(line);
            if (matcher.find()) {
                String utf8 = UnicodeToUTF8(matcher.group(1));
                writer.write(utf8 + " " + matcher.group(2) + "\n");
                System.out.println(matcher.group(1) + ":" + matcher.group(2));
            }
        }
        bf.close();
        writer.close();
    }
    
    private static String UnicodeToUTF8(String hex) throws UnsupportedEncodingException {
        int i = Integer.parseInt(hex, 16);
        char ch = (char)i;
        byte[] bytes = new String(ch+"").getBytes("utf8");
        StringBuffer sb = new StringBuffer();
        for (byte b : bytes) {
            sb.append(Integer.toHexString(b&0xff).toUpperCase());
        }
        return sb.toString();
    }

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 204,590评论 6赞 478
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 86,808评论 2赞 381
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 151,151评论 0赞 337
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 54,779评论 1赞 277
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 63,773评论 5赞 367
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 48,656评论 1赞 281
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 38,022评论 3赞 398
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 36,678评论 0赞 258
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 41,038评论 1赞 299
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 35,659评论 2赞 321
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 37,756评论 1赞 330
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 33,411评论 4赞 321
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 39,005评论 3赞 307
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 29,973评论 0赞 19
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 31,203评论 1赞 260
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 45,053评论 2赞 350
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 42,495评论 2赞 343