This guide describes how to sink data from RisingWave to Apache Iceberg using the Iceberg sink connector in RisingWave. Apache Iceberg is a table format designed to support huge tables. For more information, see Apache Iceberg.
| Parameter Names | Description |
|---|---|
| type | Required. Allowed values: appendonly and upsert. |
| force_append_only | Optional. If true, forces the sink to be append-only, even if it cannot be. |
| s3.endpoint | Optional. Endpoint of the S3.
|
| s3.region | Optional. The region where the S3 bucket is hosted. Either s3.endpoint or s3.region must be specified. |
| s3.access.key | Required. Access key of the S3 compatible object store. |
| s3.secret.key | Required. Secret key of the S3 compatible object store. |
| database.name | Required. The database of the target Iceberg table. |
| table.name | Required. The name of the target Iceberg table. |
| catalog.name | Conditional. The name of the Iceberg catalog. It can be omitted for storage catalog but required for other catalogs. |
| catalog.type | Optional. The catalog type used in this table. Currently, the supported values are storage, rest, hive, jdbc, and glue. If not specified, storage is used. For details, see Catalogs. |
| warehouse.path | Conditional. The path of the Iceberg warehouse. Currently, only S3-compatible object storage systems, such as AWS S3 and MinIO, are supported. It’s required if the catalog.type is not rest. |
| catalog.url | Conditional. The URL of the catalog. It is required when catalog.type is not storage. |
| primary_key | The primary key for an upsert sink. It is only applicable to the upsert mode. |
| commit_checkpoint_interval | Optional. Commit every N checkpoints (N > 0). Default value is 10. The behavior of this field also depends on the sink_decouple setting:
|
| RisingWave Type | Iceberg Type |
|---|---|
| boolean | boolean |
| int | integer |
| bigint | long |
| real | float |
| double | double |
| varchar | string |
| date | date |
| timestamptz | timestamptz |
| timestamp | timestamp |
catalog.type to hive to use it.
Examples
spark-sql command creates an Iceberg table named table under the database dev in AWS S3. The table is in an S3 bucket named my-iceberg-bucket in region ap-southeast-1 and under the path path/to/warehouse. The table has the property format-version=2, so it supports the upsert option. There should be a folder named s3://my-iceberg-bucket/path/to/warehouse/dev/table/metadata.
Note that only S3-compatible object store is supported, such as AWS S3 or MinIO.
type = append-only in the CREATE SINK SQL query.
type = append-only and force_append_only = true. This will ignore delete messages in the upstream, and to turn upstream update messages into insert messages.