Sqoop
Sqoop is a tool designed to transfer data between Hadoop and relational databases or mainframes. You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle or a mainframe into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.
Sqoop automates most of this process, relying on the database to describe the schema for the data to be imported. Sqoop uses MapReduce to import and export the data, which provides parallel operation as well as fault tolerance.
List databases
$ sqoop list-databases --connect jdbc:mysql://${ip}:${port} --username ${username} --password ${password}
Mysql to Hive
$ sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:mysql://${ip}:${port}/{db} --username ${username} --password ${password} --table ${table} --hive-import --hive-table ${hive_table} --map-column-hive ${col}=string --target-dir hdfs://user/hive/warehouse/xxx
其中--map-column-hive可以映射mysql有但hive中没有的数据类型
隐藏密码
生成密码别名
$ hadoop credential create mysql.pwd.alias -provider jceks://hdfs/user/xxx/mysql.pwd.jceks
输入数据库密码
之后同步就只需要密码别名
$ sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/xxx/mysql.pwd.jceks -Dorg.apache.sqoop.splitter.allow_text_splitter=true --connect \
jdbc:mysql://${ip}:${port}/${db} --username ${username} --password-alias mysql.pwd.alias --table ${table} --hive-import --hive-table ${hive_table} \
--map-column-hive ${col}=string --delete-target-dir --target-dir hdfs://ft-bdp-test/user/hive/warehouse/xxx