Best practices for the Kusto Ingest library
Applies to: ✅ Azure Data Explorer
This article explains the best practices for data ingestion with the Kusto Ingest library.
Prefer queued over direct ingestion
For production scenarios, use the queued ingest client. For more information, see Queued ingestion and Direct ingestion.
Use a single ingest client instance
Kusto Ingest client implementations are thread-safe and reusable. For each target database, use a single instance of either a queued or direct ingest client per process. Running multiple instances can overload the database, causing it to become unresponsive or slow to respond to valid requests.
Limit tracking operation status
For large volume data streams, limit the use of positive notifications for ingestion requests. Excessive tracking can lead to increased ingestion latency and even complete non-responsiveness. For more information, see Operation status.
Optimize for throughput
When planning your ingestion pipeline, consider the following factors as they can have significant implications ingestion throughput.
Factor | Description |
---|---|
Data size | Ingestion is more efficient when done in large chunks. We recommend sending data in batches of 100 MB to 1 GB (uncompressed). |
Data format | Prefer data formats such as CSV, or any delimited text format like PSV or TSV, as well as Parquet, JSON, or AVRO, which are optimized for maximum throughput. For more information, see Data formats supported for ingestion. |
Table width | Only ingest essential data. Each column needs to be encoded and indexed, which means that wider tables may have lower the throughput. Control which fields get ingested by providing an ingestion mapping. |
Source data location | Avoid cross-region reads to speed up the ingestion. |
Load on the database | When a database experiences a high query load, ingestion takes longer to complete. |
Note
The queued ingest client splits large data sets into chunks and aggregates them, which is useful when the data can't be batched prior to ingestion.
Optimize for cost
Using Kusto client libraries to ingest data into your database remains the cheapest and the most robust option. We urge our customers to review their ingestion methods to optimize for cost and to take advantage of the Azure Storage pricing that will make blob transactions significantly cost effective.
For cost-effective ingestion:
- Limit the number of ingested data chunks, such as files, blobs, and streams.
- Ingest large chunks of up to 1GB of uncompressed data.
- Opt for batching.
- Provide exact, uncompressed data size to avoid extra storage transactions.
- Avoid setting
FlushImmediately
totrue
. - Avoid sending small amounts of data with
ingest-by
ordrop-by
extent tags.
Note
Overusing the last two methods can disrupt data aggregation, lead to extra storage transactions, and harm ingestion and query performance.