Android通讯录中的模糊搜索(简拼或全拼)和多音字问题

【声明:】本文是作者(蘑菇v5)原创,版权归作者 蘑菇v5所有,侵权必究。本文首发在简书。如若转发,请注明作者和来源地址!未经授权,严禁私自转载!


前段时间,公司项目中需要加入联系人功能,而联系人数据保存在公司服务器上,所以手机端通过OkHttp框架以https协议从服务器上获取json格式的人员数据。

在开发的过程当中,遇到了一些问题,比如搜索的时候,要能够以拼音全写和简写的形式,模糊搜索到相应的人员,还有姓氏多音字的问题,之前用的拼音转换工具pinyin4j.jar,pinyin4j是一个流行的Java库,支持中文字符和拼音之间的转换,但是对于多音字没有相应的处理。而在Android的系统应用联系人中也给我们实现了汉字与拼接转换的方式,那就是android提供的HanziToPinyin工具类,成功处理了比如“单”姓所遇到的dan与shan。HanziToPinyin类代码如下:

importandroid.text.TextUtils;
importandroid.util.Log;
importjava.util.ArrayList;
importlibcore.icu.Transliterator;
/**
*An object to convert Chinese character to its corresponding pinyin string.
*For characters with multiple possible pinyin string, only one is selected
*according to ICU Transliterator class. Polyphone is not supported in this
*implementation.
*/
public class HanziToPinyin {
    private static final String TAG = "HanziToPinyin";
    private static HanziToPinyin sInstance;
    private Transliterator mPinyinTransliterator;
    private Transliterator mAsciiTransliterator;

    public static class Token {
        /**
         * Separator between target string for each source char
         */
        public static final String SEPARATOR = " ";
        public static final int LATIN = 1;
        public static final int PINYIN = 2;
        public static final int UNKNOWN = 3;

        public Token() {
        }
        public Token(int type, String source, String target) {
            this.type = type;
            this.source = source;
            this.target = target;
        }

        /**
         * Type of this token, ASCII, PINYIN or UNKNOWN.
         */
        public int type;
        /**
         * Original string before translation.
         */
        public String source;
        /**
         * Translated string of source. For Han, target is corresponding Pinyin. Otherwise target is
         * original string in source.
         */
        public String target;
    }

    private HanziToPinyin() {
        try {
            mPinyinTransliterator = new Transliterator(
                    "Han-Latin/Names; Latin-Ascii; Any-Upper");
            mAsciiTransliterator = new Transliterator("Latin-Ascii");
        } catch (IllegalArgumentException e) {
            Log.w(TAG, "Han-Latin/Names transliterator data is missing,"
                    + " HanziToPinyin is disabled");
        }
    }

    public boolean hasChineseTransliterator() {
        return mPinyinTransliterator != null;
    }

    public static HanziToPinyin getInstance() {
        synchronized (HanziToPinyin.class) {
            if (sInstance == null) {
                sInstance = new HanziToPinyin();
            }
            return sInstance;
        }
    }

    private void tokenize(char character, Token token) {
        token.source = Character.toString(character);

        // ASCII
        if (character < 128) {
            token.type = Token.LATIN;
            token.target = token.source;
            return;
        }

        // Extended Latin. Transcode these to ASCII equivalents
        if (character < 0x250 || (0x1e00 <= character && character < 0x1eff)) {
            token.type = Token.LATIN;
            token.target = mAsciiTransliterator == null ? token.source :
                    mAsciiTransliterator.transliterate(token.source);
            return;
        }

        token.type = Token.PINYIN;
        token.target = mPinyinTransliterator.transliterate(token.source);
        if (TextUtils.isEmpty(token.target) ||
                TextUtils.equals(token.source, token.target)) {
            token.type = Token.UNKNOWN;
            token.target = token.source;
        }
    }

    public String transliterate(final String input) {
        if (!hasChineseTransliterator() || TextUtils.isEmpty(input)) {
            return null;
        }
        return mPinyinTransliterator.transliterate(input);
    }

    /**
     * Convert the input to a array of tokens. The sequence of ASCII or Unknown characters without
     * space will be put into a Token, One Hanzi character which has pinyin will be treated as a
     * Token. If there is no Chinese transliterator, the empty token array is returned.
     */
    public ArrayList<Token> getTokens(final String input) {
        ArrayList<Token> tokens = new ArrayList<Token>();
        if (!hasChineseTransliterator() || TextUtils.isEmpty(input)) {
            // return empty tokens.
            return tokens;
        }

        final int inputLength = input.length();
        final StringBuilder sb = new StringBuilder();
        int tokenType = Token.LATIN;
        Token token = new Token();

        // Go through the input, create a new token when
        // a. Token type changed
        // b. Get the Pinyin of current charater.
        // c. current character is space.
        for (int i = 0; i < inputLength; i++) {
            final char character = input.charAt(i);
            if (Character.isSpaceChar(character)) {
                if (sb.length() > 0) {
                    addToken(sb, tokens, tokenType);
                }
            } else {
                tokenize(character, token);
                if (token.type == Token.PINYIN) {
                    if (sb.length() > 0) {
                        addToken(sb, tokens, tokenType);
                    }
                    tokens.add(token);
                    token = new Token();
                } else {
                    if (tokenType != token.type && sb.length() > 0) {
                        addToken(sb, tokens, tokenType);
                    }
                    sb.append(token.target);
                }
                tokenType = token.type;
            }
        }
        if (sb.length() > 0) {
            addToken(sb, tokens, tokenType);
        }
        return tokens;
    }

    private void addToken(
            final StringBuilder sb, final ArrayList<Token> tokens, final int tokenType) {
        String str = sb.toString();
        tokens.add(new Token(tokenType, str, str));
        sb.setLength(0);
    }
    /**
     *  输入汉字返回拼音的通用方法函数
     */
    public static String getPinYin(String hanzi) {
        ArrayList<Token> tokens = HanziToPinyin.getInstance().getTokens(hanzi);
        StringBuilder sb = new StringBuilder();
        if (tokens != null && tokens.size() > 0) {
            for (Token token : tokens) {
                if (Token.PINYIN == token.type) {
                    sb.append(token.target);
                } else {
                    sb.append(token.source);
                }
            }
        }

        return sb.toString().toUpperCase();
    }
}

上面的Transliterator类,一定要放在libcore.icu包下面:

图1

Transliterator类的代码如下:

public final class Transliterator {
    private long peer;
    /**
     * Creates a new Transliterator for the given id.
     */
    public Transliterator(String id) {
        peer = create(id);
    }
    @Override protected synchronized void finalize() throws Throwable {
        try {
            destroy(peer);
            peer = 0;
        } finally {
            super.finalize();
        }
    }
    /**
     * Returns the ids of all known transliterators.
     */
    public static native String[] getAvailableIDs();
    /**
     * Transliterates the specified string.
     */
    public String transliterate(String s) {
        return transliterate(peer, s);
    }
    private static native long create(String id);
    private static native void destroy(long peer);
    private static native String transliterate(long peer, String s);
}

在项目中使用的时候,以如下方式调用:

String pinyin=HanziToPinyin.getPinYin(name);

以上是通讯录多音字的时候,下面介绍如何模糊搜索,网上关于此方面的文章很多,大都是拼音全拼和文字检索,本文介绍了拼音简写的搜索。
首先,写一个实体类,如SortModel,用于方便存取对象中某个字段的数据,如下:

public class SortModel extends Contact implements Serializable{
public SortModel() {
super();
}
public SortModel(String id,String usercode,String name,String pinyin,String status,String serverTime) {
super(id,usercode,name,pinyin,status,serverTime);
}
public SortModel(String id,String usercode,String name,String pinyin,String status) {
super(id,usercode,name,pinyin,status);
}
public String sortLetters;//显示数据拼音的首字母
public SortToken sortToken= new SortToken();//中文全名,全拼,简拼
}

如下contact类:

public class Contact implements Serializable{
    public String id;
    public String usercode;
    public String name;
    public String pinyin;
    //人员状态
    public String status;//add del update
    public boolean isChecked;
    //服务器时间
    public String serverTime;
    public Contact(){}
    public Contact(String id,String usercode,String name,String pinyin){
        this.id=id;
        this.usercode=usercode;
        this.name=name;
        this.pinyin=pinyin;
    }

    public Contact(String id, String usercode, String name, String pinyin, String status) {
        this.id = id;
        this.usercode = usercode;
        this.name = name;
        this.pinyin = pinyin;
        this.status = status;
    }

    public Contact(String id, String usercode, String name, String pinyin, String status, String serverTime) {
        this.id = id;
        this.usercode = usercode;
        this.name = name;
        this.pinyin = pinyin;
        this.status = status;
        this.serverTime = serverTime;
    }
    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getUsercode() {
        return usercode;
    }

    public void setUsercode(String usercode) {
        this.usercode = usercode;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getPinyin() {
        return pinyin;
    }

    public void setPinyin(String pinyin) {
        this.pinyin = pinyin;
    }
    public String getStatus() {
        return status;
    }

    public void setStatus(String status) {
        this.status = status;
    }

    public void setIsChecked(boolean isChecked) {
        this.isChecked = isChecked;
    }

    public boolean isChecked() {
        return isChecked;
    }
    public String getServerTime() {
        return serverTime;
    }

    public void setServerTime(String serverTime) {
        this.serverTime = serverTime;
    }

    @Override
    public String toString() {
        return "Contact{" +
                "id='" + id + '\'' +
                ", usercode='" + usercode + '\'' +
                ", name='" + name + '\'' +
                ", pinyin='" + pinyin + '\'' +
                ", status='" + status + '\'' +
                ", isChecked=" + isChecked +
                '}';
    }
}

SortToken类:

/**
*拼音
*/
public class SortToken implements Serializable{
public String simpleSpell="";//简拼
public String wholeSpell="";//全拼
public String chName="";//中文全名
}

拼音简拼和全拼,自己处理字符串的操作,这里不做详细说明了(作者是通过处理sortKey的方式,sortKey的格式:SHI 世 JIE 界 NI 你 HAO 好,自己写了个工具类PinyinUtils,处理拼音:

 public class PinyinUtils {
    /**
     * 名字转拼音,取首字母
     * @param name
     * @return
     */
    public static String getSortLetter(String name,String pinyin) {
        String letter = "#";
        if (name == null) {
            return letter;
        }
        String sortString = pinyin.substring(0, 1).toUpperCase(Locale.CHINESE);

        // 正则表达式,判断首字母是否是英文字母
        if (sortString.matches("[A-Z]")) {
            letter = sortString.toUpperCase(Locale.CHINESE);
        }
        return letter;
    }

    private static final String chReg = "[\\u4E00-\\u9FA5]+";//中文字符串匹配
    //String chReg="[^\\u4E00-\\u9FA5]";//除中文外的字符匹配
    /**
     * 解析sort_key,封装简拼,全拼
     * @param sortKey
     * @return
     */
    public static SortToken parseSortKey(String sortKey) {
        SortToken token = new SortToken();
        if (sortKey != null && sortKey.length() > 0) {
            //其中包含的中文字符
            String[] enStrs = sortKey.replace(" ", "").split(chReg);
            for (int i = 0, length = enStrs.length; i < length; i++) {
                if (enStrs[i].length() > 0) {
                    //拼接简拼
                    token.simpleSpell += enStrs[i].charAt(0);
                    //拼接全拼
                    token.wholeSpell += enStrs[i];
                }
            }
        }
        return token;
    }
}

下面说一下,模糊查询的方法,如下:

/**
*通过名字或者拼音搜索
* @paramstr
* @return
*/
  public List<SortModel> searchContact(final String str, List<SortModel> mAllContactsList){
        List<SortModel> filterList = new ArrayList<SortModel>();// 过滤后的list
        //if (str.matches("^([0-9]|[/+])*$")) {// 正则表达式 匹配号码
        if (str.matches("^([0-9]|[/+]).*")) {// 正则表达式 匹配以数字或者加号开头的字符串(包括了带空格及-分割的号码)
            for (SortModel contact : mAllContactsList) {
                if (contact.name != null) {
                    if (contact.name.contains(str)) {
                        if (!filterList.contains(contact)) {
                            filterList.add(contact);
                        }
                    }
                }
            }
        }else {
            for (SortModel contact : mAllContactsList) {
                if (contact.name != null) {
                    //姓名全匹配,姓名首字母简拼匹配,姓名全字母匹配
                    if (contact.name.toLowerCase(Locale.CHINESE).contains(str.toLowerCase(Locale.CHINESE))
                            || contact.sortToken.simpleSpell.toLowerCase(Locale.CHINESE).contains(str.toLowerCase(Locale.CHINESE))
                            || contact.sortToken.wholeSpell.toLowerCase(Locale.CHINESE).contains(str.toLowerCase(Locale.CHINESE))) {
                        if (!filterList.contains(contact)) {
                            filterList.add(contact);
                        }
                    }
                }
            }
        }
        return filterList;
    }

通过以上方式能够处理自己在项目中遇到的问题,截个效果图吧,运行在自己手机上的效果,图标和标题已ps掉:

图2
图3
图4
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容