1.复制建表语句。做适当修改,把字段类型换成hive支持的类型。启动hive,粘贴建表
CREATETABLEdw_ft_barrage_record (pstring'from os: 1,2,5,9',roomidstring'房间号',uidstring'用户id',categoryidstring'类目id',sendtime DATETIME'发言时间',isarmyint'是否水军',isrealint'是否真实用户')'弹幕事实表(基于mongo_barrage)'PARTITIONEDBY(ptstring'按日期的分区列');PARTITIONED BY (pt string)row format delimited FIELDS TERMINATED BY ','STORED AS TEXTFILE;
注意:
rowformatdelimited FIELDS TERMINATED BY','这是列分隔符,必须和hdfs-write插件一致。方可插入数据
vim odpsToKylintoday=$(date +%Y%m%d)hive<
以上是为以后分区表目录做的准备。历史分区可以参照一下代码
vim history.shaltertabledw_ft_barrage_recordaddpartition(pt='starttime');altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="xxx");altertabledw_ft_barrage_recordaddpartition(pt="now");
chmod 777 fileName对odpsToKylin 设置定时调度任务为当前用户创建cron服务1. 键入 crontab -e 编辑crontab服务文件2.输入以下内容0 0 * * * * /bin/sh /home/qmbd/odpsToKylin.sh3.启动服务一般启动服务用 /sbin/servicecrond start 若是根用户的cron服务可以用 sudoservicecrond start, 4. 查看该用户下的crontab服务是否创建成功 crontab -l
{"configuration": {"reader": {"plugin":"odps","parameter": {"partition":"pt=${bdp.system.bizdate}","datasource":"odps_first","column": ["*"],"table":"dw_ft_barrage_record"} },"writer": {"plugin":"hdfs","parameter": {"path":"/user/hive/warehouse/dw.db/dw_ft_barrage_record/pt=${bdp.system.bizdate}","fileName":"dw_ft_barrage_record","compress":"GZIP","column": [ {"name":"p","type":"string"}, {"name":"roomid","type":"string"}, {"name":"uid","type":"string"}, {"name":"categoryid","type":"string"}, {"name":"sendtime","type":"timestamp"}, {"name":"isarmy","type":"bigint"}, {"name":"isreal","type":"bigint"} ],"defaultFS":"hdfs://10.7.20.15:8020","writeMode":"append","fieldDelimiter":",","encoding":"UTF-8","fileType":"text"} },"setting": {"errorLimit": {"record":"0"},"speed": {"concurrent":"10","mbps":"20"} } },"type":"job","version":"1.0"}
说明: 以上是同步分区表的demo.非分区在上面的基础上只需要满足以下条件
3.删除hdfs-write中的pt=${bdp.system.bizdate}
1.建表要仔细。2.配置插件要仔细3.调度任务要早于同步任务4.建调度任务命令一定要顶格5.在hive支持字段的类型基础上要调整hdfs-write插件中的字段类型