Impala

Setup

Download and run

Directly download the pre-compiled impala, and run


Overview of Impala Table

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_tables.html

Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to other database systems, plus some features such as partitioning often associated with higher-end data warehouse systems.

Logically, each table has a structure based on the definition of its columns, partitions, and other properties.

Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files underneath that directory:

  • Internal tables are managed by Impala, and use directories inside the designated Impala work area.

  • External tables use arbitrary HDFS directories, where the data files are typically shared between different Hadoop components.

  • Large-scale data is usually handled by partitioned tables, where the data files are divided among different HDFS subdirectories.

  • table format
    How impala work with Hadoop file format
    Each table in Impala has associate file format:

Different file formats and compression codecs work better for different data sets. While Impala typically provides performance gains regardless of file format, choosing the proper format for your data can yield further performance improvements. Use the following considerations to decide which combination of file format and compression to use for a particular table.

Create table

Load data

Show table statistics

Query


Impala with Parquet

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_parquet.html#parquet_performance


最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容