Point-in-Time History
Overview
Cybersyn provides point-in-time tables of all data historically published in our data products. We refer to these point-in-time tables as “history tables.” This is a premium feature for our public domain datasets, available in the Cybersyn Foundations paid product.
History tables create an auditable record of when and how data changed over time. Whether data changed because the underlying source restated historical values, Cybersyn backfilled older history, or a data vendor corrected a previously-undetected error in their methodology, accessing a full history of what data was available when allows for point-in-time backtesting and creates accountability and transparency that helps clients understand changes in underlying data.
Why are point-in-time history tables useful?
- You want to conduct backtesting and need an accurate representation of what data would have been available to you at a historical point in time
- You want to understand how the underlying data source has revised values over time
- You want to audit what your downstream systems were consuming from the Cybersyn datashare at any point in time
Example
Each month the US Bureau of Labor Statistics releases an Employment Situation report that includes the change in the number of nonfarm payroll jobs nationally. These reports come out monthly, but the data is noted to be a “preliminary estimate” until finalized numbers are published 2 months later. Consider the real example below using reports released on Aug 4, Sep 1, and Oct 6, 2023:
Description | Report date | Value |
---|---|---|
Change in nonfarm payroll employment, July 2023 | 2023-08-04 | 187,000 |
Change in nonfarm payroll employment, July 2023 | 2023-09-01 | 157,000 |
Change in nonfarm payroll employment, July 2023 | 2023-10-06 | 236,000 |
If someone asked “How many jobs were added in the US in July 2023?” the best known answer would vary depending on when the person asked. If they asked on August 15th the answer would be 187k, but if they asked today, the revised answer would be 236k. This is critical for understanding why historical numbers might change over time and why you need to back-test data (since you would not have known future revisions when the data was first published).
The numerous revisions complicate what should otherwise be a simple univariate timeseries of employment numbers by month. Cybersyn’s history tables solve this.
Snapshot and History Tables
Cybersyn publishes the most up to date data in snapshot tables which reflect the best known values as of today. We also record and timestamp each value we have historically published in history tables. History tables, which are denoted with the suffix _history
, can be queried to recreate what Cybersyn’s data would have looked like at any given time.
The history tables include all of the same fields as the current snapshot tables in addition to _effective_start_timestamp
and _effective_end_timestamp
fields. These timestamps represent the time that Cybersyn first published that particular row-level data and the last time the row was valid in the data, respectively.
Note that the latest, currently valid data has an _effective_end_timestamp
value of NULL. In addition to updated rows receiving start and end time timestamps, deleted rows also receive the same with the _effective_end_timestamp
being the run time during which the row or value was deleted. The effective timestamps are recorded in Eastern Time and include the offset from UTC.
Using the above employment data as an example, the rows in the history table would look like this:
VARIABLE | DATE | VALUE | _EFFECTIVE_START_TIMESTAMP | _EFFECTIVE_END_TIMESTAMP |
---|---|---|---|---|
Change in nonfarm payroll employment | 2023-07-31 | 187,000 | 2023-08-04 08:43:28 -04:00 | 2023-09-01 08:34:54 -04:00 |
Change in nonfarm payroll employment | 2023-07-31 | 157,000 | 2023-09-01 08:34:54 -04:00 | 2023-10-06 08:39:06 -04:00 |
Change in nonfarm payroll employment | 2023-07-31 | 236,000 | 2023-10-06 08:39:06 -04:00 | NULL |
Sample Queries
From this history table a user can understand when data was updated, how data has changed over time, and what data looked like at a given date. Assuming the above sample data is in a table called t1_history
, the following queries are examples of how a user could use the point-in-time history tables:
Pull all historical revisions to a particular data point
Check the history for a single datapoint. Pull all past values that were previously reported for the particular variable and date combination.
SELECT *
FROM t1_history
WHERE date = '2023-07-31'
AND variable = 'Change in Employees'
ORDER BY _effective_start_timestamp;
Filter to current data
Note, you could equivalently use the snapshot tables to recover this result. This query is for demonstration purposes.
SELECT *
FROM t1_history
WHERE _effective_end_timestamp IS NULL;
View values as of a specific date or time
View all data as it would have appeared on 2023-09-15 at 9:30am ET. The logic here filters to records that were both 1) uploaded before that time, and 2) last valid after that time or are still valid.
SELECT *
FROM t1_history
WHERE _effective_start_timestamp <= '2023-09-15 09:30:00'
AND (_effective_end_timestamp >= '2023-09-15 09:30:00' OR _effective_end_timestamp IS NULL);
Get timestamps for all updates to Cybersyn’s data
SELECT DISTINCT _effective_start_timestamp
FROM t1_history
ORDER BY _effective_start_timestamp;