DateTime columns in Kusto
Last modified: 11/08/2018
When you ingest batches of data into Kusto, it gets stored in a sharded column store, where each column is compressed and indexed. The scope of each index is a single data shard (extent), and the implementation of the index depends on the type of the column.
For datetime columns, the minimum and maximum values for each data shard are stored as part of the extent’s metadata. This is done automatically for any datetime column, no user configuration is required.
With that information available, when you apply a datetime-filter in a query (e.g. MyTable | where Timestamp > ago(1d)
), the query planner can identify which extents are irrelevant for serving the query, and filters them out in advance, to avoid excessive scanning of indexes or data.
Thus, even if your table stores 1 year’s worth of data (let’s assume that’s 100s of billions of records), when you query for last week’s records - the irrelevant extents are filtered out upfront, and your query returns much faster compared to if it had to scan the entire data set.
The same is true if you want to query historical data, e.g. a specific week’s worth of data from 6 months ago - only data shards which contain records from that week will be queried.
Things to keep in mind
- If your data includes a some kind of time representation which is not in a datetime format (e.g. epoch (
long
), ticks (long
), orstring
etc.) - it is recommended that you either reformat your data so that it does include datetime values in a supported format, or that you convert these values to datetime values at ingestion time (e.g., using an update policy). In any case - your table’s schema should have this column defined with thedatetime
data type.- Here’s one example on how to achieve such a conversion from a
long
value to adatetime
value:let FromUnixTime = (t:long) { datetime(1970-01-01) + t * 1sec }; print FromUnixTime(1537181567)
This returns
2018-09-17 10:52:47.0000000
. - And another example - converting from a
string
value to adatetime
value:let FromString = (t:string) { todatetime(t) }; print FromString("8 Nov 18 15:05:02 GMT")
This returns
2018-11-08 15:05:02.0000000
.
- Here’s one example on how to achieve such a conversion from a
- If your data doesn’t include datetime values, but you still want to be able to filter it according to when it got ingested, you can use the ingestion_time() function, assuming the ingestion time policy is enabled on your table (which is true, by default).
- When the policy is enabled, a hidden and internal datetime column (whose values are obtained using the
ingestion_time()
function) is added to the table, and it gets populated for each record that gets ingested into the table.
- When the policy is enabled, a hidden and internal datetime column (whose values are obtained using the
- Due to the fact only the minimum and maximum values for datetime column are stored in the extent’s metadata (as previously explained), if you have skewed values from the past or future being ingested alongside ones from the present, this can limit the ability to pre-filter irrelevant extents at query planning time.
- In case you’re back-filling historical data - have a look at this post, which refers to the
creationTime
ingestion property. - If you’re not sure whether or not your datetime values are skewed, a query like the following one can help you figure that out:
MyTable | where ingestion_time() > ago(3h) | summarize count() by bin(DateTimeColumnIWantToCheck, 1h)
If my data is OK, I’d expect to get 3 or 4 hourly buckets (depending on when I run the query). For instance, if run the query above at
2018-11-09 05:17:31
, I’ll get:DateTimeColumnIWantToCheck count_ 2018-11-09 05:00:00.0000000 1,707,930,236 2018-11-09 04:00:00.0000000 8,363,683,151 2018-11-09 03:00:00.0000000 8,353,791,287 2018-11-09 02:00:00.0000000 6,545,498,120
- In case you’re back-filling historical data - have a look at this post, which refers to the