Variable

Last Updated：2021-11-02

The Variable in Palo refer to the variable settings in MySQL.

Users can set session level variables and globally effective scalars.

Some variables, however, are only used to be compatible with some MySQL client protocols, and do not have their practical significance in MySQL database.

Variable Setting and Viewing

Variable viewing

SHOW VARIABLES;
SHOW VARIABLES LIKE '%time_zone%';

Variable setting

Some variables can be set to be effective globally or only for the current session. After the setting of being effective globally, the setting value will be used in subsequent new session connections. The setting is only effective for the current session, and the variable will only affect the current session.

The setting of being effective for only the current session:

SET exec_mem_limit = 137438953472;
SET forward_to_master = true;
SET time_zone = "Asia/Shanghai";

The setting of being effective globally:

SET GLOBAL exec_mem_limit = 137438953472

Note 1： Only the user admin can set globally effective variables. Note 2： Globally effective variables only affect the variables in the new session rather than the variable values of the current session. The setting is effective after disconnecting and logging in again.

Variables that support both the current session and global effectiveness include:

time_zone
wait_timeout
sql_mode
is_report_success
query_timeout
exec_mem_limit
batch_size
parallel_fragment_exec_instance_num
parallel_exchange_instance_num
enable_fold_constant_by_be
enable_cost_based_join_reorder

Variables that support only global effectiveness include:

default_rowset_type

At the same time, constant expressions are supported for variable settings. For example:

SET exec_mem_limit = 10 * 1024 * 1024 * 1024;
SET forward_to_master = concat('tr', 'u', 'e');

Set Variables in Query Statements

We, in some scenarios, may need to set variables targeting certain queries. Session variables can be set in the query through SET_VAR prompt（effective within a single statement）. Examples are as follows:

SELECT /*+ SET_VAR(exec_mem_limit = 8589934592) */ name FROM people ORDER BY name;
SELECT /*+ SET_VAR(query_timeout = 1) */ sleep(3);

Note: Notes must start with /*+ and only must come after SELECT.

Supported Variables

SQL_AUTO_IS_NULL

Used for compatibility with JDBC connection pool C3P0. No practical effect.
auto_increment_increment

Used for compatibility with MySQL client. No practical effect.
autocommit

Used for compatibility with MySQL client. No practical effect.
batch_size

Used to specify the number of rows of a single data packet transmitted by each nodes during query execution. The number of rows in a data packet, by default, is 1024, which means that every 1024 rows of data generated by the source node will be packaged and sent to destination node.

A larger number of rows will enhance the query throughput in the scenario of scanning large data amount, but may increase the query delay in the scenario of small query. At the same time, it also increases the memory cost of the query. The recommended setting range is 1024 to 4096.
character_set_client

Used for compatibility with MySQL client. No practical effect.
character_set_connection

Used for compatibility with MySQL client. No practical effect.
character_set_results

Used for compatibility with MySQL client. No practical effect.
character_set_server

Used for compatibility with MySQL client. No practical effect.
codegen_level

Used to set LLVM codegen level（Currently not effective）.
collation_connection

Used for compatibility with MySQL client. No practical effect.
collation_database

Used for compatibility with MySQL client. No practical effect.
collation_server

Used for compatibility with MySQL client. No practical effect.
disable_colocate_join

Used to control whether to start Colocation Join function. The default value is false, indicating that the function is started. While true means that the function is prohibited. When this function is prohibited, query plan will not attempt to execute Colocation Join.
disable_streaming_preaggregations

Used to control whether to start streaming preaggregations. The default value is false, indicating that the function is started. It cannot be set currently and is started by default.
enable_cost_based_join_reorder

Whether to start Join Reorder optimization based on cost model. It is off by default. After starting, Palo will use a better planning method to determine the order of Join. The function is still experimental, though, for complex table join queries, the user can try to start the function to observe the optimization effect.
enable_fold_constant_by_be

Whether to evaluate constant expressions in SQL through ComputeNode. The default is false.

The Leader Node will calculate some constant expressions in SQL by default, which is helpful for query planner to perform query optimizations such as predicate push down and partition clipping for the calculation of constant expressions. The calculation ability of Leader Node, however, is limited, which can not support some complex constant expressions. At this time, the user can set this variable to true and Palo will use Compute Node to calculate the expression. However, this method will increase additional RPC between nodes. The performance impact on the cluster should be observed for high concurrency scenarios.
enable_insert_strict

Used to set whether strictmode is started when importing data through INSERT statement. The default is false, which means strict mode is off. Refer to Here for mode instructions.
enable_spilling

Used to set whether to start large data volume drop sorting. The default is false, which means the function is off. When the user does not specify the LIMIT condition of the ORDER BY clause, the drop sorting can be started with enable_spilling set to true simultaneously. When the function is started, temporary drop data will be stored in doris-scratch/ directory in BE data directory, and the temporary data will be cleared after the query is finished.

This function is mainly used to sort large amount of data with limited memory.

Note that this function is experimental and its stability can not be guaranteed. Please start it carefully.
exec_mem_limit

Used to set the memory limit for a single query. The default is 2GB, the unit isB/K/KB/M/MB/G/GB/T/TB/P/PB, and the default is B.

This parameter is used to limit the memory used in a single query plan instance in a query plan. A query plan may have multiple instances, and a BE node may execute one or more instances. Therefore, this parameter can not accurately limit the memory usage of a query in the whole cluster, nor can it accurately limit the memory usage of a query in a single BE node. The details need to be determined according to the generated query plan.

Often, only some blocking nodes (such as sorting node, aggregation node and Join node) will consume more memory, while other nodes (such as scan node) will not consume more memory because the data is streaming.

When Memory Exceed Limit error occurs, the user can try to increase the parameter exponentially, such as 4G, 8G, 16g, etc.

forward_to_master

User to set whether to forward some commands to Master FE node for execution. The default is false, which means no forwarding. Palo has several FE nodes, one of which is the Master node. Generally, the user can connect any FE node for full function operation. However, only the Master FE node can obtain the detailed information for some information viewing instructions.

SHOW BACKENDS; command, for example, if the user does not forward to the Master FE node, the user can only see some basic information, such as whether the node is alive or not, while if the user forward to the Master FE node, the user can get more detailed information, including the start time of the node, the last heartbeat time and so on.

Commands currently affected by this parameter are as follows:
1. SHOW FRONTENDS;
  
  Forward to Master to view the last heartbeat information.
2. SHOW BACKENDS;
  
  Forward to Master to view the start time, last heartbeat information and disk capacity information.
3. SHOW BROKER;
  
  Forward to Master to view the start time, last heartbeat information
4. SHOW TABLET;/ADMIN SHOW REPLICA DISTRIBUTION;/ADMIN SHOW REPLICA STATUS;
  
  Forward to Master to view the tablet information stored in Master FE metadata. Normally, the tablet information in different FE metadata should be consistent. Compare the differences between the current FE and Master FE metadata with this method when problems occur.
5. SHOW PROC;
  
  Forward to Master to view the relevant PROC information stored in the Master FE metadata. It is mainly used for metadata comparison.
init_connect

Used for compatibility with MySQL client. No practical effect.
interactive_timeout

Used for compatibility with MySQL client. No practical effect.
is_report_success

Used to set whether to view the profile of query. The default is false, which means no profile is required.

By default, BE will send profile to FE to view the error only when there is an error in the query. Normally ended queries do not send profile. Sending profile will cause some network overhead, which is adverse for high concurrent query scenarios. When the user wants to analyze the profile of a query, the user can set this variable to true and send the query. After the query, the user can view the profile on the web page of the currently connected FE:

fe_host:fe_http_port/query

The latest 100 items of profile queried when is_report_success is started will be displayed.
language

Used for compatibility with MySQL client. No practical effect.
license

Used to display the License of Palo. No other effects.
load_mem_limit

Used to specify the memory limit for the load operation. The default value is 0, which means that the variable is not used as the memory limit of the loading operation, but exec_mem_limit is used.

This variable is only used for INSERT operations. Because INSERT operation is designed for 2 parts: query and loading, and if the user does not set this variable, the memory limit of the query and loading operation isexec_mem_limit respectively. Otherwise, the memory limit of INSERT query is exec_mem_limit, and the loading limit is load_mem_limit.

Memory limits of other loading methods such as BROKER LOAD and STREAM LOAD still use exec_mem_limit.
lower_case_table_names

Used for compatibility with MySQL client. Cannot be set. The current table names in Palo are case sensitive by default.
max_allowed_packet

Used for compatibility with JDBC connection pool C3P0. No practical effect.
net_buffer_length

Used for compatibility with MySQL client. No practical effect.
net_read_timeout

Used for compatibility with MySQL client. No practical effect.
net_write_timeout

Used for compatibility with MySQL client. No practical effect.
parallel_exchange_instance_num

Used to set the number of exchange nodes used by an upper node to receive data from a lower node in the execution plan. The default value is - 1, which means that the number of exchange nodes is equal to the number of execution instances of the lower level nodes (the default behavior). When the setting is greater than 0 and less than the number of execution instances of lower level nodes, the number of exchange nodes is equal to the set value.

In a distributed query execution plan, the upper node usually has one or more exchange nodes to receive data from the execution instances of the lower level nodes on different BE. Usually, the number of exchange nodes is equal to the number of execution instances of the lower level nodes.

In some aggregation query scenarios, on DUPLICATE KEY detail model, for example, if the amount of data that needs to be scanned at the bottom is large, but the amount of data after aggregation is small, the user can try to modify this variable to a smaller value to reduce the resource cost of such queries.
parallel_fragment_exec_instance_num

For Scan nodes, Used to set the number of execution instances on each BE node for Scan nodes. The default is 1.

Usually, a query plan generates a set of scan ranges, which are the data ranges to be scanned. These data are distributed on multiple BE nodes and a BE node will have one or more scan ranges. A set of scan ranges for each BE node, by default, is handled by only one execution instance. To improve the query efficiency, this variable can be added, when the machine resources are abundant, to allow more execution instances to process a set of scan ranges simultaneously.

The number of scan instances determines the number of other execution nodes in the upper layer, such as aggregation nodes and join nodes. It is, therefore, equivalent to increasing the concurrency of the whole query plan execution. Modifying this parameter will improve the efficiency of large query, but larger values will consume more machine resources, such as CPU, memory, disk IO.
query_cache_size

Used for compatibility with MySQL client. No practical effect.
query_cache_type

Used for compatibility with JDBC connection pool C3P0. No practical effect.
query_timeout

Used to set the query timeout. This variable will act on all the query statements like INSERT statement in the current connection. The default is 5 minutes in seconds.
resource_group

Out of use for the time being.
sql_mode

Used to specify the SQL mode to accommodate certain SQL dialects. Refer to Here for SQL mode.
sql_safe_updates

Used for compatibility with MySQL client. No practical effect.
sql_select_limit

Used for compatibility with MySQL client. No practical effect.
system_time_zone

Use to display the current system time zone. It is unalterable.
time_zone

Used to set the time zone, which affects the results of some time functions, for the current session. Refer to Here for time zone.
tx_isolation

Used for compatibility with MySQL client. No practical effect.
version

Used for compatibility with MySQL client. No practical effect.
performance_schema

Used for compatibility with MySQL JDBC with 8.0.16 and above. No practical effect.
version_comment

Used to display Palo version. It is unalterable.
wait_timeout

Used to set the connection duration of idle connections. Palo will initiatively disconnect the connection when an idle connection has no interaction with Palo during the period. The default is 8 hours in seconds.
rewrite_count_distinct_to_bitmap_hll

Used to decide whether to rewrite count distinct queries of type bitmap and hll as bitmap_union_count and hll_union_agg.
prefer_join_method

Used to decide that when selecting the specific implementation mode of a join as broadcast join or shuffle join and if broadcast join cost and shuffle join cost are equal, which join mode is preferred.

The currently optional values of this variable are broadcast or shuffle.

Resource Management

Time Zone

百度智能云

Data Warehouse