Point-in-Time History

Overview

Cybersyn provides point-in-time tables of all data historically published in our data products. We refer to these point-in-time tables as “history tables.” This is a premium feature for our public domain datasets, available in Cybersyn's Enterprise product.

History tables create an auditable record of when and how data changed over time. Whether data changed because the underlying source restated historical values, Cybersyn backfilled older history, or a data vendor corrected a previously-undetected error in their methodology, accessing a full history of what data was available when allows for point-in-time backtesting and creates accountability and transparency that helps clients understand changes in underlying data.

Why are point-in-time history tables useful?

You want to conduct backtesting and need an accurate representation of what data would have been available to you at a historical point in time
You want to understand how the underlying data source has revised values over time
You want to audit what your downstream systems were consuming from the Cybersyn datashare at any point in time

Example

Each month the US Bureau Of Labor Statistics releases an Employment Situation report that includes the change in the number of nonfarm payroll jobs nationally. These reports come out monthly, but the data is noted to be a “preliminary estimate” until finalized numbers are published 2 months later. Consider the real example below using reports released on Aug 4, Sep 1, and Oct 6, 2023:

Description

Report date

Value

If someone asked “How many jobs were added in the US in July 2023?” the best known answer would vary depending on when the person asked. If they asked on August 15th the answer would be 187k, but if they asked today, the revised answer would be 236k. This is critical for understanding why historical numbers might change over time and why you need to back-test data (since you would not have known future revisions when the data was first published).

The numerous revisions complicate what should otherwise be a simple univariate timeseries of employment numbers by month. Cybersyn’s history tables solve this.

Snapshot and History Tables

Cybersyn publishes the most up to date data in snapshot tables which reflect the best known values as of today. We also record and timestamp each value we have historically published in history tables. History tables, which are denoted with the suffix _history, can be queried to recreate what Cybersyn’s data would have looked like at any given time.

The history tables include all of the same fields as the current snapshot tables in addition to _effective_start_timestamp and _effective_end_timestamp fields. These timestamps represent the time that Cybersyn first published that particular row-level data and the last time the row was valid in the data, respectively.

Note that the latest, currently valid data has an _effective_end_timestamp value of NULL. In addition to updated rows receiving start and end time timestamps, deleted rows also receive the same with the _effective_end_timestamp being the run time during which the row or value was deleted. The effective timestamps are recorded in Eastern Time and include the offset from UTC.

Using the above employment data as an example, the rows in the history table would look like this:

VARIABLE

DATE

VALUE

_EFFECTIVE_START_TIMESTAMP

_EFFECTIVE_END_TIMESTAMP

Examples & Sample Queries

From this history table a user can understand when data was updated, how data has changed over time, and what data looked like at a given date. Assuming the above sample data is in a table called t1_history, the following queries are examples of how a user could use the point-in-time history tables:

Pull all historical revisions to a particular data point

Check the history for a single datapoint. Pull all past values that were previously reported for the particular variable and date combination.

SELECT *
FROM t1_history
WHERE date = '2023-07-31'
  AND variable = 'Change in Employees'
ORDER BY _effective_start_timestamp;

Filter to current data

Note, you could equivalently use the snapshot tables to recover this result. This query is for demonstration purposes.

SELECT *
FROM t1_history
WHERE _effective_end_timestamp IS NULL;

View values as of a specific date or time

View all data as it would have appeared on 2023-09-15 at 9:30am ET. The logic here filters to records that were both 1) uploaded before that time, and 2) last valid after that time or are still valid.

SELECT *
FROM t1_history
WHERE _effective_start_timestamp <= '2023-09-15 09:30:00'
AND (_effective_end_timestamp >= '2023-09-15 09:30:00' OR _effective_end_timestamp IS NULL);

Get timestamps for all updates to Cybersyn’s data

SELECT DISTINCT _effective_start_timestamp
FROM t1_history
ORDER BY _effective_start_timestamp;

Last updated 1 month ago