- hive读写es数据 http://blog.csdn.net/u013063153/article/details/60757307
- 官方文档 hive集成es https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html#hive-type-conversion
- hive复杂数据类型Array,map,struct使用 http://blog.csdn.net/gamer_gyt/article/details/52169441
- 针对es中的数据类型和hive类型的对应,可以在hive中使用复杂数据类型Map,array等存储es中二维json
add jar file:///home/liuxiaowen/elasticsearch-hadoop-2.2.0-beta1/dist/elasticsearch-hadoop-hive-2.2.0-beta1.jar;
CREATE EXTERNAL TABLE lxw1234_es_tags (
cookieid string,
area string,
media_view_tags string,
interest string ,
userInfo map<string,string>,
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
'es.nodes' = ',',
'es.index.auto.create' = 'false',
'es.resource' = 'lxw1234/tags',
'es.read.metadata' = 'true',
'es.mapping.names' = 'cookieid:_metadata._id, area:area, media_view_tags:media_view_tags, interest:interest,userInfo:userInfo');
Elasticsearch resource location, where data is read and written to. Requires the format <index>/<type> (relative to the Elasticsearch host/port (see below))).
es.resource = twitter/tweet # index 'twitter', type 'tweet'
es.resource.read (defaults to es.resource)
Elasticsearch resource used for reading (but not writing) data. Useful when reading and writing data to different Elasticsearch indices within the same job. Typically set automatically (except for the Map/Reduce module which requires manual configuration).
es.resource.write(defaults to es.resource)
Elasticsearch resource used for writing (but not reading) data. Used typically for dynamic resource writes or when writing and reading data to different Elasticsearch indices within the same job. Typically set automatically (except for the Map/Reduce module which requires manual configuration).
Note that multiple indices and/or types are allowed only for reading. Use _all/types to search types in all indices or index/ to search all types within index. Do note that reading multiple indices/types typically works only when they have the same structure and only with some libraries. Integrations that require a strongly typed mapping (such as a table like Hive or SparkSQL) are likely to fail.
'es.resource' = '_all/types'
'es.resource' = 'index/'
二、 es.query (default none)
Holds the query used for reading data from the specified es.resource. By default it is not set/empty, meaning the entire data under the specified index/type is returned. es.query can have three forms:
uri query
using the form ?uri_query, one can specify a query string. Notice the leading ?.
query dsl
using the form query_dsl - note the query dsl needs to start with { and end with } as mentioned here
external resource
if none of the two above do match, elasticsearch-hadoop will try to interpret the parameter as a path within the HDFS file-system. If that is not the case, it will try to load the resource from the classpath or, if that fails, from the Hadoop DistributedCache. The resource should contain either a uri query or a query dsl.
1)以uri 方式查询
es.query = ?q=98E5D2DE059F1D563D8565
2)以dsl 方式查询
es.query = { "query" : { "term" : { "user" : "costinl" } } }
'es.query'='{"query": {"match_all": { }}}',
- external resource
es.query = org/mypackage/myquery.json