Schema optimization best practices

A table schema defines the names and data types of all the columns in the table. The table schema can be set during table creation, or as a part of the data ingestion process by modifying the applicable ingestion mapping. The way a table schema is defined can significantly affect your query performance. The ideal schema for your data depends on many factors, including use case, data access patterns, and the specific data you plan to store. This article describes best practices for optimizing performance by designing efficient schemas.

Data types

For general information on data types, see scalar data types.

  • Commonly used fields should be typed columns, not dynamic type.

  • Frequently searched for or aggregated JSON properties in a dynamic column should be converted to a regular column in the table with a more specific type such as string, long, or real.

  • Sparse columns that aren't commonly used for filter and aggregation should be collected as a property bag in a dynamic column using the DropMappedFields mapping transformation.

  • Date time columns should be typed as datetime, and not long or other data types.

  • The decimal type provides exact precision, which makes it most suitable to financial and other applications that require exact accuracy. However, it's much slower than the real type. Only use the decimal type when required.

  • All ID (identification) columns should be typed as string, not numeric. This type will make the index much more effective and can significantly improve search time. It will also enable partitioning, since partitioning can only be defined on string columns. If the query filters used on this column are only equality, for example if the column has guids, you can use the encoding profile Identifier. For more information, see encoding policy.

Tables

  • Optimize for narrow tables, which are preferred over wide tables with hundreds of columns.
  • To avoid expensive joins during query time, denormalize dimension data by enriching it during ingestion. If the dimension table used for enrichment is updated and the scenario requires the latest value, use materialize views to keep only the latest value.
  • If there are more than 20 columns that are sparse, meaning that many values are nulls, and these columns are rarely used for searches or aggregation, then group the columns as a JSON property bag in a dynamic column using the DropMappedFields transformation mapping.

Indexing

Fields that are never searched on can disable indexing. Use the encoding policy with profile BigObject to disable indexing on string or dynamic typed columns.