背景
今天用mysql5.7 client 登录数据库,用query 查询展示汉字没问题,但命令行输入中文会直接变空格,比如我输入'测试'
MySQL [(none)]> select 'ce shi'; # 这里本应该是汉字 '测试'
MySQL [(none)]> select ''; #这里本应该是汉字 '测试'
+--+
| |
+--+
| |
+--+
1 row in set (0.00 sec)
分析
1、mysql 数据库字符集问题
MySQL [(none)]> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
client、connection、database、results、server 都是utf8mb4
2、系统字符集问题
[root@db-dev ~]# echo $LANG
en_US.UTF-8
[root@db-dev ~]# 测试
系统字符集LANG 是en_US.UTF-8,与数据库配置一致,并且在系统是可以输入汉字的
3、客户端问题
3.1 切换到mysql5.6 client 后,现象一样,无法输入汉字
3.2 用ldd 命令对比/usr/local/mysql/bin/mysql 引用的动态库,引用一致
[root@db-dev ~]# ldd /usr/local/mysql/bin/mysql
linux-vdso.so.1 => (0x00007ffc8473d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f87531dc000)
librt.so.1 => /lib64/librt.so.1 (0x00007f8752fd4000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f8752dd0000)
libncurses.so.5 => /lib64/libncurses.so.5 (0x00007f8752ba9000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f87528a2000)
libm.so.6 => /lib64/libm.so.6 (0x00007f87525a0000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f875238a000)
libc.so.6 => /lib64/libc.so.6 (0x00007f8751fbc000)
libtinfo.so.5 => /lib64/libtinfo.so.5 (0x00007f8751d92000)
/lib64/ld-linux-x86-64.so.2 (0x00007f87533f8000)
4、切换用户测试
这台服务器还有一套postgresql 12 的数据库,在postgres系统用户下登录pg数据库是可以输入汉字的
[postgres@db-dev ~]$ psql
psql (12.2)
Type "help" for help.
postgres=# select '测试';
?column?
----------
测试
(1 row)
postgres=#
平时我登录mysql 数据库都是在root系统账号下,切换postgres 系统用户下测试一下
MySQL [(none)]> show variables like '%char%';
+--------------------------+----------------------------------+
| Variable_name | Value |
+--------------------------+----------------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/local/mysql/share/charsets/ |
+--------------------------+----------------------------------+
8 rows in set (0.01 sec)
MySQL [(none)]> select '测试';
+--------+
| 测试 |
+--------+
| 测试 |
+--------+
1 row in set (0.00 sec)
发现在postgres 用户下登录mysql是可以输入汉字的,问题基本确定是环境变量引起的,对比 $LANG
[postgres@db-dev ~]$ echo $LANG
en_US.UTF-8
[postgres@db-dev ~]$ exit
logout
[root@db-dev ~]# echo $LANG
en_US.UTF-8
发现LANG 一致,奇怪了,那我在root 系统用户下登录pg数据库试试
[root@db-dev ~]# psql -Upostgres
psql (12.2)
Type "help" for help.
postgres=# select '������';
postgres-# ;
;
--------
测试
(1 row)
postgres=#
发现输入汉字变成了乱码,但可以正常展示中文,对比一下pg字符集参数
[root@db-dev ~]# psql -Upostgres
psql (12.2)
Type "help" for help.
postgres=# select name,setting from pg_settings where name like '%coding%';
name | setting
-----------------+-----------
client_encoding | SQL_ASCII
server_encoding | UTF8
(2 rows)
postgres=# \q
[root@db-dev ~]# su - postgres
Last login: Wed May 13 09:38:00 CST 2020 on pts/2
[postgres@db-dev ~]$ psql
psql (12.2)
Type "help" for help.
postgres=# select name,setting from pg_settings where name like '%coding%';
name | setting
-----------------+---------
client_encoding | UTF8
server_encoding | UTF8
(2 rows)
postgres=#
发现在root 系统用户下登录pg数据库,client_encoding 是SQL_ASCII,而在postgres系统用户下登录pg数据库,client_encoding 是 UTF8,能确定还是系统字符集引起的问题
5、探究系统字符集
之前看两个系统用户下的$LANG是一致的,那问题在哪呢?用locale 命令看一下
[root@db-dev ~]# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US.UTF-8
LC_CTYPE=UTF-8
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
[root@db-dev ~]#
[root@db-dev ~]# su - postgres
Last login: Wed May 13 09:47:02 CST 2020 on pts/2
[postgres@db-dev ~]$
[postgres@db-dev ~]$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
发现在root 用户下,执行locale 会有提示
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
这时会发现,除了LC_CTYPE 这个有差别外,其他的都是en_US.UTF-8,问题基本确定是LC_CTYPE 引起的,修改LC_CTYPE 试一下
[root@db-dev ~]# export LC_CTYPE="en_US.UTF-8"
[root@db-dev ~]# mysql
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 5409
Server version: 5.7.28-log MySQL Community Server (GPL)
Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> select '测试';
+--------+
| 测试 |
+--------+
| 测试 |
+--------+
1 row in s
问题处理,困惑多时的心落地了,为了防止每次都需要重新设置LC_CTYPE,
直接在centos7 下将LC_CTYPE="en_US.UTF-8" 添加到/etc/locale.conf
6、LC_ALL、LANG和LC_*的关系
locale把按照所涉及到的使用习惯的各个方面分成12个大类,设定locale就是设定12大类的locale分类属性,即 12个LC_*。
1、语言符号及其分类(LC_CTYPE)
2、数字(LC_NUMERIC)
3、比较和习惯(LC_COLLATE)
4、时间显示格式(LC_TIME)
5、货币单位(LC_MONETARY)
6、信息主要是提示信息,错误信息,状态信息,标题,标签,按钮和菜单等(LC_MESSAGES)
7、姓名书写方式(LC_NAME)
8、地址书写方式(LC_ADDRESS)
9、电话号码书写方式(LC_TELEPHONE)
10、度量衡表达方式 (LC_MEASUREMENT)
11、默认纸张尺寸大小(LC_PAPER)
12、对locale自身包含信息的概述(LC_IDENTIFICATION)。
除了这12个变量可以设定以外,还有两个变量: LC_ALL和LANG。
它们之间的优先级关系是: LC_ALL>LC_*>LANG
LC_ALL是最上级设定或者强制设定,而LANG是默认设定值
LANG是LC_*的默认值,LC_*变量可以单独设置而可以与LANG不同。
LC_ALL比LC_*的优先级别高,设置完LC_ALL之后,会强制重置LC_*各个值,如果不将LC_ALL重新设置为空,则无法设置LC_*的单个值
总结
无法输入中文 或 中文乱码 基本上都是由字符集引起的,需要系统字符集、数据库配置字符集 及 数据库对象字符集保持一致