A JOIN clause, also known as a join, combines the results of two or more table expressions based on certain conditions, such as whether the values of some columns are equal.
=
) and one inequality condition (>=
, >
, <=
, or <
). The inequality condition applies to all data types that support inequality comparison while a time-related type is commonly used.
ASOF join is currently supported for streaming operations only.
For example, suppose you have two tables:
stock_prices
: Contains stock price data at certain timestamps.
stock_name | stock_time | price |
---|---|---|
TSLA | 2024-09-24 09:30:00 | 250 |
TSLA | 2024-09-24 10:30:00 | 252 |
TSLA | 2024-09-24 11:30:00 | 255 |
AMZN | 2024-09-24 09:30:00 | 3300 |
AMZN | 2024-09-24 10:30:00 | 3310 |
AMZN | 2024-09-24 11:30:00 | 3320 |
GOOG | 2024-09-24 09:30:00 | 1400 |
GOOG | 2024-09-24 10:30:00 | 1410 |
GOOG | 2024-09-24 11:30:00 | 1420 |
market_data
: Contains market sentiment data at different timestamps.
stock_name | market_time | sentiment |
---|---|---|
TSLA | 2024-09-24 09:00:00 | 0.7 |
TSLA | 2024-09-24 10:00:00 | 0.8 |
TSLA | 2024-09-24 11:00:00 | 0.9 |
AMZN | 2024-09-24 09:00:00 | 0.6 |
AMZN | 2024-09-24 10:00:00 | 0.65 |
AMZN | 2024-09-24 11:00:00 | 0.7 |
NVDA | 2024-09-24 09:00:00 | 0.55 |
NVDA | 2024-09-24 10:00:00 | 0.6 |
NVDA | 2024-09-24 11:00:00 | 0.65 |
market_data
where the market_time
is less than or equal to the stock_time
:
stock_name | stock_time | price | sentiment |
---|---|---|---|
TSLA | 2024-09-24 09:30:00 | 250 | 0.7 |
TSLA | 2024-09-24 10:30:00 | 252 | 0.8 |
TSLA | 2024-09-24 11:30:00 | 255 | 0.9 |
AMZN | 2024-09-24 09:30:00 | 3300 | 0.6 |
AMZN | 2024-09-24 10:30:00 | 3310 | 0.65 |
AMZN | 2024-09-24 11:30:00 | 3320 | 0.7 |
stock_name | stock_time | price | sentiment |
---|---|---|---|
TSLA | 2024-09-24 09:30:00 | 250 | 0.7 |
TSLA | 2024-09-24 10:30:00 | 252 | 0.8 |
TSLA | 2024-09-24 11:30:00 | 255 | 0.9 |
AMZN | 2024-09-24 09:30:00 | 3300 | 0.6 |
AMZN | 2024-09-24 10:30:00 | 3310 | 0.65 |
AMZN | 2024-09-24 11:30:00 | 3320 | 0.7 |
GOOG | 2024-09-24 09:30:00 | 1400 | NULL |
GOOG | 2024-09-24 10:30:00 | 1410 | NULL |
GOOG | 2024-09-24 11:30:00 | 1420 | NULL |
market_data
, so they show the closest preceding sentiment.
GOOG has no corresponding data in market_data
, so the sentiment column is NULL.
join_conditions
must be an equality condition based on the watermarks of the two table expressions. For the syntax of <time_window_expression>
, see Time window functions.
For example, suppose you have these two sources:
interval_condition
must be a watermark-based range.
For example, for sources s1
and s2
used in the above section, you can create an interval join:
FOR SYSTEM_TIME AS OF PROCTIME()
is included in the right table expression.transaction_id | product_id | quantity | sale_date | process_time |
---|---|---|---|---|
1 | 101 | 3 | 2023-06-18 | 2023-06-18 10:15:00 |
2 | 102 | 2 | 2023-06-19 | 2023-06-19 15:30:00 |
3 | 101 | 1 | 2023-06-20 | 2023-06-20 11:45:00 |
products
:
id | product_name | price | valid_from | valid_to |
---|---|---|---|---|
101 | Product A | 20 | 2023-06-01 00:00:00 | 2023-06-15 23:59:59 |
101 | Product A | 25 | 2023-06-16 00:00:00 | 2023-06-19 23:59:59 |
101 | Product A | 22 | 2023-06-20 00:00:00 | NULL |
102 | Product B | 15 | 2023-06-01 00:00:00 | NULL |
products
table and form a wider table. To further improve performance, you can create an index for table products
, and join sales
with the index instead.
transaction_id | product_id | quantity | sale_date | product_name | price |
---|---|---|---|---|---|
1 | 101 | 3 | 2023-06-18 | Product A | 25 |
2 | 102 | 2 | 2023-06-19 | Product B | 15 |
3 | 101 | 1 | 2023-06-20 | Product A | 22 |
sales
:
transaction_id | product_id | quantity | sale_date | product_name | price |
---|---|---|---|---|---|
1 | 101 | 4 | 2023-06-18 | Product A | 25 |
2 | 102 | 3 | 2023-06-19 | Product B | 15 |
3 | 101 | 2 | 2023-06-20 | Product A | 22 |