前言
好久没碰MySQL了,这次碰巧在研究superset的时候需要将一份csv格式的数据文件导入到数据库中。正好借此机会可以重温下MySQL。
数据来源
网盘密码 : g5xa
开发环境
- Mac OS 10.13
- MySQL 8.0
准备工作
在开始之前需要对源数据做一次清洗:
- 去除”,“等影响数据导入的符号
- 去除第一行索引值,因为它不是数据
- 如果需要咋数据库中加上id,那么为了对应在源数据中也要加上,excel中加上id还是很方便的。
- 把数据转为utf-8格式的csv文件
在这之前先简单阅读下官方文档:MySQL Documentation
可以很方便的找到LOAD DATA的表达式:
LOAD DATA
[LOW_PRIORITY | CONCURRENT] [LOCAL]
INFILE 'file_name'
[REPLACE | IGNORE]
INTO TABLE tbl_name
[PARTITION (partition_name [, partition_name] ...)]
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ESCAPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[(col_name_or_user_var
[, col_name_or_user_var] ...)]
[SET col_name={expr | DEFAULT},
[, col_name={expr | DEFAULT}] ...]
Get Started
我们知道导入数据首先得有表,所以在开始导入之前,我们先建个表。这个简单:
mysql> create table da_inv(
id int not null auto_increment primary key,
city varchar(20) not null,
companyFullName varchar(255) not null, companyId varchar(100) not null, companyLabelList varchar(255) not null default '',
companyShortName varchar(100) not null default '',
companySize varchar(100) not null, businessZones varchar(255) not null,
firstType varchar(100) not null,
secondType varchar(100) not null,
education varchar(100) not null,
industruField varchar(100) not null,
positionId varchar(100) not null, positionAdvantage varchar(255) not null, positionName varchar(255) not null, positonLabels varchar(255) not null,
salary varchar(100) not null,
workYear varchar(100) not null,
topSalary varchar(100) not null,
bottomSalary varchar(100) not null, averageSalary varchar(100) not null);
ok...
那么问题来了,表我们建好了,数据也准备好了 。
- 数据文件放哪里?
带着这两个问题继续查看文档:
If LOCAL is not specified, the file must be located on the server host and is read directly by the server.
我们先照着上面这句的方式,把文件放在MySQL Server中:
mysql> select @@datadir;
+-----------------------+
| @@datadir |
+-----------------------+
| /usr/local/var/mysql/ |
+-----------------------+
1 row in set (0.01 sec)
ok.. 把数据放到对应的数据库目录下
cd /usr/local/var/mysql/inv_data
➜ inv_data mkdir data
为了区分,我在这里新建了个data的文件夹,把数据放到这里了。
- 切换到对应的数据库
mysql> use inv_data;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql>
- 开始导入
load data infile 'inv_data/data/DataAnalyst.csv' into table da_inv fields terminated by ',' lines terminated by '\r\n';
ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
报了个错误,emmm
MySQL还加了什么限制吗?看下文档:
mysql> select @@secure_file_priv;
+--------------------+
| @@secure_file_priv |
+--------------------+
| NULL |
+--------------------+
1 row in set (0.00 sec)
secure_file_priv may be set as follows:
* If empty, the variable has no effect. This is not a secure setting.
* If set to the name of a directory, the server limits import and export operations to work only with files in that directory. The directory must exist; the server will not create it.
* If set to `NULL`, the server disables import and export operations.
文档中解释到由于secure_file_priv
为NULL
,无法进行导入导出操作。我们需要把它改为指定路径。
通过修改my.cnf文件,找到mysqld
设置secure-file-priv
为/usr/local/var/mysql/inv_data
mdfind -name "my.cnf"
/private/etc/my.cnf
找到mysqld,添加secure-file-priv = /usr/local/var/mysql/inv_data
[mysqld]
# skip-grant-tables
character-set-server=utf8
init_connect='SET NAMES utf8
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
character-set-server=utf8
init_connect='SET NAMES utf8'
secure-file-priv = /usr/local/var/mysql/inv_data
保存并重启mysql
mysql.server restart
Shutting down MySQL
..... SUCCESS!
Starting MySQL
.... SUCCESS!
mysql> load data infile 'inv_data/data/DataAnalyst.csv' into table da_inv fields terminated by ',' lines terminated by '\r\n';
Query OK, 6876 rows affected (0.80 sec)
Records: 6876 Deleted: 0 Skipped: 0 Warnings: 0
搞定!
对于LOAD DATA的工作原理和一些思考
- 这次是把文件放在MySQL Server中的,如果把文件放在本地该如何处理?
先读文档:LOCAL INFILE Request
可以看到下面这张图表:
基本上可以这么解读:
- 由我们的客户端(命令行)发送
COM_QUERY
来执行SQL语句到服务端(MySQL server) - 服务端接收请求后通过MySQL交互协议将文件名发送给客户端去读取文件
- 客户端以文件流的形式将内容和一个空的数据包发送给服务端,空数据包代表传输结束。
问题来了
- 为什么简单的发送文件,它需要客户端发送给服务端,然后服务端再把文件名返回给客户端之后才开始向服务端传送文件?多出来的两个步骤是为了什么?
客户端如果如果想要发送文件必须先知道这个文件的完整路径,但是在本例中我们只告诉了客户端inv_data/data/DataAnalyst.csv
这样不完整的路径。所以需要服务端再返回给客户端该文件的详细地址才能把文件发送过去。就好比快递员接受的你的订单,但是你给他的地址并不齐全这个时候快递员就会打电话过来问你要具体的地址然后才能把快递寄到你家里。
MySQL Server的原理解释清楚了,回到一开始的问题:
如何发送本地文件到MySQL Server
If
LOCAL
is specified, the file is read by the client program on the client host and sent to the server. The file can be given as a full path name to specify its exact location. If given as a relative path name, the name is interpreted relative to the directory in which the client program was started.
load data local infile '/Users/weijiezhu/Desktop/DataAnalyst.txt' into table da_inv fields terminated by ',' lines terminated by '\r\n';
ERROR 1148 (42000): The used command is not allowed with this MySQL version
发现并不能顺利使用,反而告诉我这条命令被禁掉了。
可以看这篇文档 Security Issues with LOAD DATA LOCAL
For the mysql client, local data loading is disabled by default. To disable or enable it explicitly, use the
--local-infile=0
or--local-infile[=1]
option.
这里按照官方文档设置为1没用,不知道什么原因。改为ON
就没问题了
SET GLOBAL local_infile = 'ON';
load data local infile '/Users/weijiezhu/Desktop/DataAnalyst.txt' into table da_inv fields terminated by ',' lines terminated by '\r\n';
Query OK, 0 rows affected, 6876 warnings (0.25 sec)
Records: 6876 Deleted: 0 Skipped: 6876 Warnings: 6876
ok 搞定了!
导入本地文件的总结
- --local-infile参数决定了我们能否通过客户端来使用LOAD DATA LOCAL INFILE的关键,如果在设置为1是无效,那么可以强制设置为
ON
也是个好办法。