Impala Usage Guide
Updated at:2025-11-03
Impala
Impala is an MPP (Massive Parallel Processing) SQL query engine designed for handling large datasets stored in Hadoop clusters. It is open-source software written in C++ and Java. Compared to other SQL engines for Hadoop, Impala offers superior performance and lower query latency.
Installation steps
Install metastore
Refer to the “Presto Access Based on S3” section in the article Presto User Guide to install and configure the metastore
Install impala
- Download the RPM package from the address: http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz
After unzipping by using
tar -zxvf cdh5.14.0-centos6.tar.gz, navigate to cd cdh/5.14.0, create a local server and run it
Bash
1python -m SimpleHTTPServer 8092 &
- Configure the local YUM repository
Bash
1vim /etc/yum.repos.d/localimp.repo
2[localimp]
3name=localimp
4baseurl=http://127.0.0.1:8092/
5gpgcheck=0
6enabled=1
- Install the software using the following command.
Bash
1yum install -y impala impala-server impala-state-store impala-catalog impala-shell
- Copy the Hive configuration file (metastore-site.xml) to the Impala configuration path:
Bash
1#Copy the configured conf to /etc/impala/conf/
2cp metastore/conf/metastore-site.xml /etc/impala/conf/hive-site.xml
- Add s3 configuration vim /etc/impala/conf/core-site.xml, and refer toimpala-s3 configuration
XML
1<configuration>
2 <property>
3 <name>fs.s3a.block.size</name>
4 <value>134217728 </value>
5 </property>
6<property>
7 <name>fs.azure.user.agent.prefix</name>
8 <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
9</property>
10<property>
11 <name>fs.s3a.connection.maximum</name>
12 <value>1500</value>
13</property>
14<property>
15 <name>fs.defaultFS</name>
16 <value>s3a://${bucket}</value>
17 </property>
18<property>
19 <name>fs.s3a.endpoint</name>
20 <value>s3.bj.bcebos.com</value>
21 <description>endpoint</description>
22 </property>
23<property>
24 <name>fs.s3a.access.key</name>
25 <value>${AK}</value>
26 <description>AK</description>
27 </property>
28<property>
29 <name>fs.s3a.secret.key</name>
30 <value>${SK}</value>
31 <description>SK</description>
32 </property>
33</configuration>
- Modify Bigtop configuration. Set JAVA_HOME and ensure that the Impala user has access permissions Modify the java_home path for bigtop (on 3 machines)
Bash
1vim /etc/default/bigtop-utils
2export JAVA_HOME=/export/servers/jdk1.8.0_65
- Set up a soft link for the MySQL driver:
Bash
1ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar
- Start Impala
Bash
1service impala-state-store start
2service impala-catalog start
3service impala-server start
After starting, you can check the logs in the /var/log/impala folder Run impala-shell command:
Bash
1[root@my-node impala]# impala-shell
2Starting Impala Shell without Kerberos authentication
3Connected to my-node:21000
4Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
5***********************************************************************************
6Welcome to the Impala shell.
7(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan 6 13:27:16 PST 2018)
8When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
9the delimiter for fields in the same row. The default is ','.
10***********************************************************************************
11[my-node:21000] > show databases;
12Query: show databases
13+------------------+----------------------------------------------+
14| name | comment |
15+------------------+----------------------------------------------+
16| _impala_builtins | System database for Impala builtin functions |
17| default | Default Hive database |
18+------------------+----------------------------------------------+
19Fetched 2 row(s) in 0.16s
20[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
21Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
22WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.
23Fetched 0 row(s) in 2.51s
24[my-node:21000] > show databases;
25Query: show databases
26+------------------+----------------------------------------------+
27| name | comment |
28+------------------+----------------------------------------------+
29| _impala_builtins | System database for Impala builtin functions |
30| db_on_s3 | |
31| default | Default Hive database |
32+------------------+----------------------------------------------+
33Fetched 3 row(s) in 0.01s
34[my-node:21000] > use db_on_s3;
35Query: use db_on_s3
36[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
37Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
38Fetched 0 row(s) in 2.11s
39[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
40Query: insert into hive_test(a, b) values(1,'tom')
41Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
42Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
43Modified 1 row(s) in 7.57s
44[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
45Query: insert into hive_test(a, b) values(2,'jerry')
46Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
47Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
48Modified 1 row(s) in 1.02s
You can see the newly generated files in the corresponding path:

