Setup

Download and run

Directly download the pre-compiled impala, and run

Overview of Impala Table

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_tables.html

Tables are the primary containers for data in Impala. They have the familiar row and column layout similar to other database systems, plus some features such as partitioning often associated with higher-end data warehouse systems.

Logically, each table has a structure based on the definition of its columns, partitions, and other properties.

Physically, each table that uses HDFS storage is associated with a directory in HDFS. The table data consists of all the data files underneath that directory:

Internal tables are managed by Impala, and use directories inside the designated Impala work area.
External tables use arbitrary HDFS directories, where the data files are typically shared between different Hadoop components.
Large-scale data is usually handled by partitioned tables, where the data files are divided among different HDFS subdirectories.
table format
How impala work with Hadoop file format
Each table in Impala has associate file format:
- parquet
- text (default)
- Avro

Different file formats and compression codecs work better for different data sets. While Impala typically provides performance gains regardless of file format, choosing the proper format for your data can yield further performance improvements. Use the following considerations to decide which combination of file format and compression to use for a particular table.

Create table

Load data

Show table statistics

Query

Impala with Parquet

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/impala_parquet.html#parquet_performance

Impala