This article provides guidance for improving the performance of Lakehouse Federation queries.
The JDBC fetch size determines the number of rows to fetch per round trip. By default, most JDBC connectors fetch data atomically. This might cause the amount of data to exceed the amount of available memory.
To avoid out-of-memory errors, set the fetchSize
parameter. When fetchSize
is set to a non-zero value, the connector reads data in batches. The maximum number of rows per batch is equal to the value of fetchSize
. Databricks recommends specifying a large fetchSize
value, for example 1000
.
SELECT * FROM catalog.schema.jdbcTable WITH ('fetchSize' 1000)
Set the partition_size_in_mb
parameter for large query result sets that need to be split into multiple DataFrame partitions. This option specifies the recommended uncompressed size for each DataFrame partition. To reduce the number of partitions, specify a larger value. The default is 100
(MB).
partition_size_in_mb
is used as a recommended size. The actual size of partitions might be smaller or larger. This option only applies when the use_copy_unload
parameter is set to FALSE
.
SELECT * FROM catalog.schema.snowflakeTable WITH ('partition_size_in_mb' 1000)