

# Hints
<a name="sql-commands-hints-spark"></a>

Hints for SQL analyses provide optimization directives that guide query execution strategies in AWS Clean Rooms, enabling you to improve query performance and reduce compute costs. Hints suggest how the Spark analytics engine should generate its execution plan.

## Syntax
<a name="hints-syntax"></a>

```
SELECT /*+ hint_name(parameters), hint_name(parameters) */ column_list
FROM table_name;
```

Hints are embedded in SQL queries using comment-style syntax and must be placed directly after the SELECT keyword.

## Supported hint types
<a name="supported-hint-types"></a>

AWS Clean Rooms supports two categories of hints: Join hints and Partitioning hints.

**Topics**
+ [Join hints](join-hints.md)
+ [Partitioning hints](partitioning-hints.md)

# Join hints
<a name="join-hints"></a>

Join hints suggest join strategies for query execution. The syntax, arguments, and some examples come from the [Apache Spark SQL Reference](https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html#join-hints) for more information

### BROADCAST
<a name="broadcast-hint"></a>

Suggests that AWS Clean Rooms use broadcast join. The join side with the hint will be broadcast regardless of autoBroadcastJoinThreshold. If both sides of the join have the broadcast hints, the one with the smaller size (based on stats) will be broadcast.

*Aliases:* BROADCASTJOIN, MAPJOIN

*Parameters:* Table identifiers (optional)

*Examples:*

```
-- Broadcast a specific table
SELECT /*+ BROADCAST(students) */ e.name, s.course
FROM employees e JOIN students s ON e.id = s.id;

-- Broadcast multiple tables
SELECT /*+ BROADCASTJOIN(s, d) */ *
FROM employees e
JOIN students s ON e.id = s.id
JOIN departments d ON e.dept_id = d.id;
```

### MERGE
<a name="merge-hint"></a>

Suggests that AWS Clean Rooms use shuffle sort merge join.

*Aliases:* SHUFFLE\$1MERGE, MERGEJOIN

*Parameters:* Table identifiers (optional)

*Examples:*

```
-- Use merge join for a specific table
SELECT /*+ MERGE(employees) */ *
FROM employees e JOIN students s ON e.id = s.id;

-- Use merge join for multiple tables
SELECT /*+ MERGEJOIN(e, s, d) */ *
FROM employees e
JOIN students s ON e.id = s.id
JOIN departments d ON e.dept_id = d.id;
```

### SHUFFLE\$1HASH
<a name="shuffle-hash-hint"></a>

Suggests that AWS Clean Rooms use shuffle hash join. If both sides have the shuffle hash hints, the query optimizer chooses the smaller side (based on stats) as the build side.

*Parameters:* Table identifiers (optional)

*Examples:*

```
-- Use shuffle hash join
SELECT /*+ SHUFFLE_HASH(students) */ *
FROM employees e JOIN students s ON e.id = s.id;
```

### SHUFFLE\$1REPLICATE\$1NL
<a name="shuffle-replicate-nl-hint"></a>

Suggests that AWS Clean Rooms use shuffle-and-replicate nested loop join.

*Parameters:* Table identifiers (optional)

*Examples:*

```
-- Use shuffle-replicate nested loop join
SELECT /*+ SHUFFLE_REPLICATE_NL(students) */ *
FROM employees e JOIN students s ON e.id = s.id;
```

### Troubleshooting Hints in Spark SQL
<a name="join-hint-warning-cases"></a>

The following table shows common scenarios where hints are not applied in SparkSQL. For additional information, see [Considerations and limitations](sql-commands-hints-spark.md#hints-usage-notes).

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/clean-rooms/latest/sql-reference/join-hints.html)

# Partitioning hints
<a name="partitioning-hints"></a>

Partitioning hints control data distribution across executor nodes. When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.

### COALESCE
<a name="coalesce-hint"></a>

Reduces the number of partitions to the specified number of partitions.

*Parameters:* Numeric value (required) - must be a positive integer between 1 and 2147483647

*Examples:*

```
-- Reduce to 5 partitions
SELECT /*+ COALESCE(5) */ employee_id, salary
FROM employees;
```

### REPARTITION
<a name="repartition-hint"></a>

Repartitions data to the specified number of partitions using the specified partitioning expressions. Uses round-robin distribution.

*Parameters:*
+ Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647
+ Column identifiers (optional) - columns to partition by; These columns must exist in the input schema.
+ If both are specified, numeric value must come first

*Examples:*

```
-- Repartition to 10 partitions
SELECT /*+ REPARTITION(10) */ *
FROM employees;

-- Repartition by column
SELECT /*+ REPARTITION(department) */ *
FROM employees;

-- Repartition to 8 partitions by department
SELECT /*+ REPARTITION(8, department) */ *
FROM employees;

-- Repartition by multiple columns
SELECT /*+ REPARTITION(8, department, location) */ *
FROM employees;
```

### REPARTITION\$1BY\$1RANGE
<a name="repartition-by-range-hint"></a>

Repartitions data to the specified number of partitions using range partitioning on the specified columns.

*Parameters:*
+ Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647
+ Column identifiers (optional) - columns to partition by; These columns must exist in the input schema.
+ If both are specified, numeric value must come first

*Examples:*

```
SELECT /*+ REPARTITION_BY_RANGE(10) */ *
FROM employees;

-- Repartition by range on age column
SELECT /*+ REPARTITION_BY_RANGE(age) */ *
FROM employees;

-- Repartition to 5 partitions by range on age
SELECT /*+ REPARTITION_BY_RANGE(5, age) */ *
FROM employees;

-- Repartition by range on multiple columns
SELECT /*+ REPARTITION_BY_RANGE(5, age, salary) */ *
FROM employees;
```

### REBALANCE
<a name="rebalance-hint"></a>

Rebalances the query result output partitions so that every partition is of a reasonable size (not too small and not too big). This is a best-effort operation: if there are skews, AWS Clean Rooms will split the skewed partitions to make them not too big. This hint is useful when you need to write the result of a query to a table to avoid too small or too big files.

*Parameters:*
+ Numeric value (optional) - number of partitions; Must be a positive integer between 1 and 2147483647
+ Column identifiers (optional) - columns must appear in the SELECT output list
+ If both are specified, numeric value must come first

*Examples:*

```
-- Rebalance to 10 partitions
SELECT /*+ REBALANCE(10) */ employee_id, name
FROM employees;

-- Rebalance by specific columns in output
SELECT /*+ REBALANCE(employee_id, name) */ employee_id, name
FROM employees;

-- Rebalance to 8 partitions by specific columns
SELECT /*+ REBALANCE(8, employee_id, name) */ employee_id, name, department
FROM employees;
```

## Combining multiple hints
<a name="combining-multiple-hints"></a>

You can specify multiple hints in a single query by separating them with commas:

```
-- Combine join and partitioning hints
SELECT /*+ BROADCAST(d), REPARTITION(8) */ e.name, d.dept_name
FROM employees e JOIN departments d ON e.dept_id = d.id;

-- Multiple join hints
SELECT /*+ BROADCAST(s), MERGE(d) */ *
FROM employees e
JOIN students s ON e.id = s.id
JOIN departments d ON e.dept_id = d.id;

-- Hints within separate hint blocks within the same query
SELECT /*+ REPARTITION(100) */ /*+ COALESCE(500) */ /*+ REPARTITION_BY_RANGE(3, c) */ * FROM t;
```

## Considerations and limitations
<a name="hints-usage-notes"></a>
+ Hints are optimization suggestions, not commands. The query optimizer may ignore hints based on resource constraints or execution conditions.
+ Hints are embedded directly in SQL query strings for both CreateAnalysisTemplate and StartProtectedQuery APIs.
+ Hints must be placed directly after the SELECT keyword.
+ Named parameters are not supported with hints and will throw an exception.
+ Column names in REPARTITION amd REPARTITION\$1BY\$1RANGE hints must exist in the input schema.
+ Column names in REBALANCE hints must appear in the SELECT output list.
+ Numeric parameters must be positive integers between 1 and 2147483647. Scientific notations like *1e1* are not supported
+ Hints are not supported in Differential Privacy SQL queries.
+ Hints for SQL queries are not supported in PySpark jobs. To provide directives for execution plans in a PySpark job, use the data frame API. See [Apache Spark DataFrame API Docs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.hint.html) for more information. 