使用 Hint 调整 Join Shuffle 方式

更新时间：2025-08-21

概述

PALO支持使用 Hint 来调整 Join 操作中数据 Shuffle 的类型，从而优化查询性能。本节将详细介绍如何在 PALO中利用 Hint 来指定 Join Shuffle 的类型。

注意：当前 PALO已经具备良好的开箱即用的能力，也就意味着在绝大多数场景下，PALO会自适应的优化各种场景下的性能，无需用户来手工控制 hint 来进行业务调优。本章介绍的内容主要面向专业调优人员，业务人员仅做简单了解即可。

目前，PALO支持两种独立的 Distribute Hint，[shuffle] 和 [broadcast]，用来指定 Join 右表的 Distribute Type。Distribute Type 需置于 Join 右表之前，采用中括号 [] 的方式。同时，PALO也可以通过 Leading Hint 配合 Distribute Hint 的方式，指定 shuffle 方式。

示例如下：

SQL

1SELECT COUNT(*) FROM t2 JOIN [broadcast] t1 ON t1.c1 = t2.c2;
2SELECT COUNT(*) FROM t2 JOIN [shuffle] t1 ON t1.c1 = t2.c2;

案例

接下来将通过同一个例子来展示 Distribute Hint 的使用方法：

SQL

1EXPLAIN SHAPE PLAN SELECT COUNT(*) FROM t1 JOIN t2 ON t1.c1 = t2.c2;

原始 SQL 的计划如下，可见 t1 连接 t2 使用了 hash distribute 即DistributionSpecHash的方式。

SQL

1+----------------------------------------------------------------------------------+  
2| Explain String (Nereids Planner)                                                 |  
3+----------------------------------------------------------------------------------+  
4| PhysicalResultSink                                                               |  
5| --hashAgg [GLOBAL]                                                               |  
6| ----PhysicalDistribute [DistributionSpecGather]                                  |  
7| ------hashAgg [LOCAL]                                                            |  
8| --------PhysicalProject                                                          |  
9| ----------hashJoin [INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=()|  
10| ------------PhysicalProject                                                      |  
11| --------------PhysicalOlapScan [t1]                                              |  
12| ------------PhysicalDistribute [DistributionSpecHash]                            |  
13| --------------PhysicalProject                                                    |  
14| ----------------PhysicalOlapScan [t2]                                            |  
15+----------------------------------------------------------------------------------+

加入[broadcast] hint 后：

SQL

1EXPLAIN SHAPE PLAN SELECT COUNT(*) FROM t1 JOIN [broadcast] t2 ON t1.c1 = t2.c2;

可见 t1 连接 t2 的分发方式改为了 broadcast 即DistributionSpecReplicated的方式。

SQL

1+----------------------------------------------------------------------------------+  
2| Explain String (Nereids Planner)                                                 |  
3+----------------------------------------------------------------------------------+  
4| PhysicalResultSink                                                               |  
5| --hashAgg [GLOBAL]                                                               |  
6| ----PhysicalDistribute [DistributionSpecGather]                                  |  
7| ------hashAgg [LOCAL]                                                            |  
8| --------PhysicalProject                                                          |  
9| ----------hashJoin [INNER_JOIN] hashCondition=((t1.c1 = t2.c2)) otherCondition=()|  
10| ------------PhysicalProject                                                      |  
11| --------------PhysicalOlapScan [t1]                                              |  
12| ------------PhysicalDistribute [DistributionSpecReplicated]                      |  
13| --------------PhysicalProject                                                    |  
14| ----------------PhysicalOlapScan [t2]                                            | 
15+----------------------------------------------------------------------------------+

总结

通过合理使用 Distribute Hint，可以优化 Join 操作的 Shuffle 方式，提升查询性能。在实践中，建议先通过 EXPLAIN 分析查询执行计划，再根据实际情况指定合适的 Shuffle 类型。

优化索引设计和使用

DML 计划调优

数据仓库 PALO

数据仓库 PALO

使用 Hint 调整 Join Shuffle 方式

概述

案例

总结