Accuracy Benchmarking Playbook
Cybersyn's Methodology for Evaluating Accuracy
We evaluate performance of consumer spend projections based on:
- How accurately we predict government data (such as Monthly Advanced Retail Sales from the US Census)
- How accurately we predict and track publicly reported metrics such as sales from public companies
- How accurately we predict sales data coming from direct partnerships with retailers or point-of-sale providers.
Playbook
This playbook walks through how we can leverage Cybersyn’s SEC data to benchmark consumer spending estimates from the Consumer Current product.
Setup a connection to Snowflake db:
import snowflake.connector
import pandas as pd
import plotly.express as px
conn_params = {
'user': '',
'password': '',
'account': '',
'warehouse': '',
'database': '',
'schema': '',
'role': ''
}
conn = snowflake.connector.connect(**conn_params)
Define a dataframe with the specific business lines that you want to compare.
- merchant_name - corresponds to the 'merchant_name' field in 'consumer_spending_merchant_timeseries_trial' table
- company_id - corresponds to the 'company_id' field in 'sec_metrics_timeseries' table and relates to the corporate entity
- variable_name - corresponds to a field from 'sec_metrics_timeseries' that specifies the business line that should be pulled for comparison
sec_data_cols = ['merchant_name', 'company_id', 'variable_name']
sec_data_params = [
['TARGET', '17aaf933f7a91e45e1d1096ceb594f45', 'REVENUE | QUARTERLY'],
['CHIPOTLE MEXICAN GRILL', 'dd690f97f24fb9c10001c167df35fd3d', 'REVENUE | QUARTERLY'],
['DOLLAR GENERAL', '9737f8671e7db0d4cade5833401874f4', 'NET SALES | QUARTERLY'],
['OLIVE GARDEN', 'f003b5a8c195ec08aa7fade7630b5892', 'SALES | SEGMENT: OLIVE GARDEN | QUARTERLY']
]
df_sec_params = pd.DataFrame(sec_data_params, columns=sec_data_cols)
Pull quarterly financial results from Cybersyn's SEC data
sec_sql_list = []
for _, row_select in df_sec_params.iterrows():
sec_sql_list.append(f"""
SELECT '{row_select['merchant_name']}' AS merchant_name, company_name, variable_name,
period_start_date, period_end_date,
CONCAT(fiscal_year, ' ', fiscal_period) AS fiscal_quarter,
value AS sec_value
FROM cybersyn.sec_metrics_timeseries
WHERE company_id = '{row_select['company_id']}'
AND variable_name = '{row_select['variable_name']}'
AND period_start_date >= '2019-01-01'
""")
sec_sql = ' UNION ALL '.join(sec_sql_list)
df_sec_vals = pd.read_sql(sec_sql, conn)
Pull Cybersyn sales estimates for the merchants that we have selected
merchant_list = df_sec_params['merchant_name']
merchant_list_cond = "'" + "', '".join(merchant_list) + "'"
query = f"""SELECT merchant_name, date - 6 AS period_start_date, date AS period_end_date, value,
-- Working with weekly data, so divide by 7 to get per day metrics (useful later on when going to fiscal quarters)
value / 7 AS value_per_day
FROM cybersyn.consumer_spending_merchant_timeseries_trial
WHERE consumer_geo_level = 'Country'
AND merchant_geo_level = 'Country'
AND channel = 'All Channels'
AND member_age = 'All Age Brackets'
AND member_income = 'All Income Brackets'
AND frequency = 'Week - Monday Start'
AND measure = 'Sales'
-- Let's start in 2019 to only use full years
AND period_end_date >= '2019-01-01'
AND merchant_name IN ({merchant_list_cond})
ORDER BY merchant_name, period_end_date"""
df_weekly = pd.read_sql(query, conn)
min_cybersyn_date = min(df_weekly['PERIOD_START_DATE'])
max_cybersyn_date = max(df_weekly['PERIOD_END_DATE'])
Create a df where each row is a single day. Join this to Cybersyn's weekly data
df_daily_dates = pd.DataFrame(pd.date_range(start='2019-01-01', end='2024-12-31'), columns=['DATE'])
df_daily_weekly_merge = df_daily_dates.merge(df_weekly, how='cross').query('PERIOD_START_DATE <= DATE <= PERIOD_END_DATE').reset_index(drop=True)
df_cybersyn_daily = df_daily_weekly_merge[['DATE', 'MERCHANT_NAME', 'VALUE_PER_DAY']]
Join fiscal quarters to Cybersyn daily data and setup for apples-to-apples comparisons
df_fiscal_qs = df_sec_vals[['MERCHANT_NAME', 'PERIOD_START_DATE', 'PERIOD_END_DATE', 'FISCAL_QUARTER']]
df_daily_quarterly_merge = df_cybersyn_daily.merge(df_fiscal_qs, on='MERCHANT_NAME').query('PERIOD_START_DATE <= DATE <= PERIOD_END_DATE').reset_index(drop=True)
df_cybersyn_q = df_daily_quarterly_merge.groupby(['MERCHANT_NAME', 'PERIOD_START_DATE', 'PERIOD_END_DATE', 'FISCAL_QUARTER']).agg(CYBERSYN_VALUE=('VALUE_PER_DAY', 'sum')).reset_index()
df_combo = df_cybersyn_q.merge(df_sec_vals, on=['MERCHANT_NAME', 'PERIOD_START_DATE', 'PERIOD_END_DATE', 'FISCAL_QUARTER'])
df_combo_complete_qs = df_combo[(df_combo['PERIOD_START_DATE'] >= min_cybersyn_date) & (df_combo['PERIOD_END_DATE'] <= max_cybersyn_date)]
Calcaulte correlations between Cybersyn and SEC sales numbers
corr_factors = df_combo_complete_qs.groupby('MERCHANT_NAME').apply(lambda group: group['CYBERSYN_VALUE'].corr(group['SEC_VALUE'])).reset_index(name='CORR')
Plot the time series comparison
df_combo_long = pd.DataFrame()
for source in ['CYBERSYN', 'SEC']:
df_source = df_combo_complete_qs[['MERCHANT_NAME', 'PERIOD_END_DATE', 'FISCAL_QUARTER', f'{source}_VALUE']].rename(columns={f'{source}_VALUE': 'VALUE'}).assign(SOURCE=source)
df_combo_long = pd.concat([df_combo_long, df_source], ignore_index=True)
fig = px.line(df_combo_long, x='PERIOD_END_DATE', y='VALUE', color='SOURCE', facet_col='MERCHANT_NAME', facet_col_wrap=2, height=800, width=800)
fig.update_yaxes(matches=None, rangemode='tozero')
fig.show(renderer='svg')