Impala Usage Guide

Updated at：2025-11-03

Impala

Impala is an MPP (Massive Parallel Processing) SQL query engine designed for handling large datasets stored in Hadoop clusters. It is open-source software written in C++ and Java. Compared to other SQL engines for Hadoop, Impala offers superior performance and lower query latency.

Installation steps

Install metastore

Refer to the “Presto Access Based on S3” section in the article Presto User Guide to install and configure the metastore

Install impala

Download the RPM package from the address: http://archive.cloudera.com/cdh5/repo-as-tarball/5.14.0/cdh5.14.0-centos6.tar.gz After unzipping by using tar -zxvf cdh5.14.0-centos6.tar.gz, navigate to cd cdh/5.14.0, create a local server and run it

Bash

1python -m SimpleHTTPServer 8092 &

Configure the local YUM repository

                Bash
                
            

                vim /etc/yum.repos.d/localimp.repo
[localimp]
name=localimp
baseurl=http://127.0.0.1:8092/
gpgcheck=0
enabled=1
            

Install the software using the following command.

Bash

1yum install -y impala impala-server impala-state-store impala-catalog impala-shell

Copy the Hive configuration file (metastore-site.xml) to the Impala configuration path:

Bash

1#Copy the configured conf to /etc/impala/conf/
2cp metastore/conf/metastore-site.xml  /etc/impala/conf/hive-site.xml

Add s3 configuration vim /etc/impala/conf/core-site.xml, and refer toimpala-s3 configuration

                XML
                
            

                <configuration>
 <property>
     <name>fs.s3a.block.size</name>
     <value>134217728 </value>
 </property>
<property>
    <name>fs.azure.user.agent.prefix</name>
    <value>User-Agent: APN/1.0 Hortonworks/1.0 HDP/None</value>
</property>
<property>
    <name>fs.s3a.connection.maximum</name>
    <value>1500</value>
</property>
<property>
    <name>fs.defaultFS</name>
        <value>s3a://${bucket}</value>
        </property>
<property>
    <name>fs.s3a.endpoint</name>
        <value>s3.bj.bcebos.com</value>
            <description>endpoint</description>
            </property>
<property>
    <name>fs.s3a.access.key</name>
        <value>${AK}</value>
            <description>AK</description>
            </property>
<property>
    <name>fs.s3a.secret.key</name>
        <value>${SK}</value>
            <description>SK</description>
            </property>
</configuration>
            

Modify Bigtop configuration. Set JAVA_HOME and ensure that the Impala user has access permissions Modify the java_home path for bigtop (on 3 machines)

                Bash
                
                vim /etc/default/bigtop-utils
export JAVA_HOME=/export/servers/jdk1.8.0_65

Set up a soft link for the MySQL driver:

Bash

1ln -s mysql-connector-java-5.1.32.jar /usr/share/java/mysql-connector-java.jar

Start Impala

                Bash
                
                service impala-state-store start
service impala-catalog start
service impala-server start

After starting, you can check the logs in the /var/log/impala folder Run impala-shell command:

                Bash
                
            

                [root@my-node impala]# impala-shell
Starting Impala Shell without Kerberos authentication
Connected to my-node:21000
Server version: impalad version 2.11.0-cdh5.14.0 RELEASE (build d68206561bce6b26762d62c01a78e6cd27aa7690)
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v2.11.0-cdh5.14.0 (d682065) built on Sat Jan  6 13:27:16 PST 2018)
When pretty-printing is disabled, you can use the '--output_delimiter' flag to set
the delimiter for fields in the same row. The default is ','.
***********************************************************************************
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 2 row(s) in 0.16s
[my-node:21000] > CREATE DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3';
Query: create DATABASE db_on_s3 LOCATION 's3a://my-bigdata/impala/s3'
WARNINGS: Path 's3a://my-bigdata/impala' cannot be reached: Path does not exist.
Fetched 0 row(s) in 2.51s
[my-node:21000] > show databases;
Query: show databases
+------------------+----------------------------------------------+
| name             | comment                                      |
+------------------+----------------------------------------------+
| _impala_builtins | System database for Impala builtin functions |
| db_on_s3         |                                              |
| default          | Default Hive database                        |
+------------------+----------------------------------------------+
Fetched 3 row(s) in 0.01s
[my-node:21000] > use db_on_s3;
Query: use db_on_s3
[my-node:21000] > create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
Query: create table hive_test (a int, b string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
Fetched 0 row(s) in 2.11s
[my-node:21000] > insert into hive_test(a, b) values(1,'tom');
Query: insert into hive_test(a, b) values(1,'tom')
Query submitted at: 2023-09-13 19:20:26 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=ec4463f20d37dfe4:5192e94f00000000
Modified 1 row(s) in 7.57s
[my-node:21000] > insert into hive_test(a, b) values(2,'jerry');
Query: insert into hive_test(a, b) values(2,'jerry')
Query submitted at: 2023-09-13 19:20:42 (Coordinator: http://my-node:25000)
Query progress can be monitored at: http://my-node:25000/query_plan?query_id=694061adf492a154:4a24912d00000000
Modified 1 row(s) in 1.02s
            

You can see the newly generated files in the corresponding path: image (1)_6360b52.png

Flink Usage Guide

BOS-Probe Error Detection Tool