Links
Comment on page

Government Essentials

Government spending, demographic, economic, & environmental statistics; reference data on holidays and administrative boundaries

Overview

This product serves as a central source of government statistics and reference data. It includes a collection of demographic, economic, government spending, and environmental timeseries data along with commonly used reference data about geographies and holidays. A single, unified schema joins together across the various publishing agencies. Where applicable, data is available at the national, state, county, and municipal levels.
Example topics covered:
  • GDP
  • Unemployment
  • Household income
  • US government contracts
  • Crime and disease incidences
  • Population
  • Public holidays
  • Production, supply, distribution & export of agricultural commodities
The data is sourced from Data Commons, an aggregator of government data sources that powers contextual Google Search, the American Community Survey (ACS), the US Census Bureau, the System for Award Management (SAM.gov), Statistics Canada, the World Trade Organization (WTO), the World Health Organization (WHO), the United States Postal Service (USPS), the US Department of Agriculture (USDA), and the Python-Holidays package on GitHub. Data Commons, itself, aggregates data from the US Bureau of Labor Statistics, the World Bank, the United Nations, the IMF, the CDC, and other sources.

Key attributes

Geographic Coverage
Global
Entities Covered
Geographic, Government Contracts, Government Contract Awards
Time Granularity
Various, depending on source
Update Frequency
Depending on source; see table below
History
Depending on source

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company). Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics, measure) in a timeseries table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.
The majority of the data, including that from Data Commons, centers around timeseries containing demographic, economic, government spending, and environmental statistics at national, state, county, and municipal levels. This data primarily revolves around geographic entities from the national, state, county, municipal, zip code, and census tract levels. Variable attributes can be joined to the timeseries data for additional metadata about the variables themselves (measurement category, units, frequency, etc.).
The geography reference data consists of an index of geographic entities at different levels (e.g., countries, cities, counties, census tracts, etc.); relationships between these geographies (e.g., which cities are contained within which counties); and the characteristics of those geographies (e.g., geospatial boundaries, coordinates, name abbreviations, etc.). The boundaries are sourced from the US Census and Statistics Canada.
The population data is sourced from Data Commons and the American Community Survey (ACS) published by the US Census Bureau. Data Commons aggregates population data from a range of government agencies and international organizations (e.g. World Bank, OECD). The American Community Survey is an ongoing survey that provides population information annually in the US. This is different from the US Census (also available in this dataset) which is published every 10 years. ACS data covers top-line US population figures and detailed population variables (e.g. age, race, income, employment status, immigration status, household status) for ~500K geographies of various levels across the United States since 2005. Geographic entity levels include: country, states, counties, cities, zip codes, core-based statistical areas (CBSAs), census tracts, census block groups.
The public holiday reference data contains government-designated holidays for 119 countries, joinable to Cybersyn’s other geographic entities, as well as the financial market holidays for the European Central Bank (ECB) and NY Stock Exchange (NYSE).
Additional calendar reference data in the calendar_index table currently includes regular calendar periods (days, weeks, months, quarters, and years) and 4-5-4 retail calendar periods (4-5-4 retail months, quarters, and years). The 4-5-4 retail calendar is a standardized accounting and reporting calendar system used by many retailers, where each fiscal year is divided into 13 weeks, aiming to align with seasonal variations and facilitate more accurate financial comparisons. Users can select what calendar type they want to use.
US government contracts data, sourced from SAM.gov, revolves around two main entities: Contracts and Contract Awards. Contracts represent listings soliciting bids on goods and services that the Federal US government is seeking from contractors. Metadata about contracts includes the department of the Federal government that oversees the contract, the date the contract was originally posted, the deadline for response, and the location where the contract will be fulfilled. The contract_solicitation_id field can be used to find the original contract on sam.gov.
Contract Awards represent accepted bids or solicitations from third-party contractors to fulfill a contract. Metadata about the award contract includes the name of the recipient of the award (business or individual), the value of the award, the date of the award, description of the award, and the primary contact from the government who awarded the contract. Note that the description of the contract award may differ from that of the original contract if the government reopened the contract or awarded multiple awards from a single original contract solicitation. The contract_award_id corresponds to the award number on SAM.gov and can be used to search for the award in the sam.gov portal.
The US Treasury provides a daily overview of net federal revenue collections from income tax deposits, customs duties, fees for government services, fines, and loan repayments. These collections and the channel through which they are processed, such as mail, internet, banking, and over-the-counter transactions, are incorporated within this dataset.
The US Department of Agriculture's (USDA) Foreign Agricultural Service (FAS) publishes US Export Sales Reporting (ESR) data on export sales activity for 40+ US agricultural commodities sold abroad. Weekly ESR data is the most currently available source of US export sales data and gives an indication of the potential impact foreign sales may have on US supplies and prices. The USDA FAS also provides monthly production, supply, and distribution data on agricultural commodities for the United States and other key producing and consuming countries since 1960. The international portion of the data is updated with input from agricultural attachés stationed at U.S. embassies around the world, FAS commodity analysts, and country and commodity analysts with the USDA's Economic Research Service (ERS). The U.S. domestic data is updated with input from analysts in FAS, ERS, the National Agricultural Statistical Service, and USDA's Farm Service Agency (FSA).
Data from the US Department of Commerce's International Trade Administration (ITA) includes trade events, trade leads, export business service providers, ITA export assistance centers, and export restricted entities.
The World Trade Organization (WTO) publishes data on global trade flows, imposed tariffs, and trade interactions between countries. The data details export and import figures for goods and services across different countries and regions, tariff rates and structures that WTO member countries apply to imports from other nations over time, trade dependencies between countries, and the balance of trade between specific pairs of nations. The data is sourced specifically from the WTO's International Trade Statistics, Tariff Indicators (Applied), and Bilateral Imports indicators.
The World Health Organization (WHO) publishes an annual report of more than 1,100+ health-related indicators for its 194 members and their associated country groups and global regions. The publication provides a detailed overview of global health trends and issues. Example metrics include alcohol consumption among adolescents and adults, tobacco control policies, abortion rates, accessibility of dementia care services, and adolescent fertility rates. Environmental health indicators (e.g. air pollution's impact on mortality rates & disability-adjusted life years (DALYs) and deaths attributable to the environment) are also included.

Data Dictionary

Data Sources & Release Frequency

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Table Names
Source
Source Schedule
DATACOMMONS_ATTRIBUTES DATACOMMONS_TIMESERIES
Data Commons is an aggregator of government data sources. Release calendars vary by underlying source.
GOVERNMENT_CONTRACT_AWARD_INDEX GOVERNMENT_CONTRACT_INDEX
SAM.gov
Monthly, within the first week
AMERICAN_COMMUNITY_SURVEY_ATTRIBUTES
AMERICAN_COMMUNITY_SURVEY_TIMESERIES
1 year estimates for geographies with a population of 65K+: mid-September 1 year estimates for geographies with a population of 20K+: mid-October 5 year estimates for all geographies: mid-December
US_TREASURY_REVENUE_COLLECTIONS_TIMESERIES US_TREASURY_REVENUE_COLLECTIONS_ATTRIBUTES
Weekdays (excl. federal holidays)
PUBLIC_HOLIDAY_CALENDAR
Ad hoc
DEPARTMENT_OF_AGRICULTURE_COMMODITIES_ATTRIBUTES DEPARTMENT_OF_AGRICULTURE_COMMODITIES_TIMESERIES
USDA
Production, Supply, Distribution Release Schedule - Updated by commodity, typically monthly U.S. Export Sales - Weekly (Thursday), 8:30am ET
CALENDAR_INDEX
Cybersyn
Ad hoc
INTERNATIONAL_TRADE_ADMINISTRATION_TRADE_LEADS_INDEX INTERNATIONAL_TRADE_ADMINISTRATION_TRADE_EVENTS_INDEX INTERNATIONAL_TRADE_ADMINISTRATION_EXPORT_SCREENED_ENTITIES_INDEX INTERNATIONAL_TRADE_ADMINISTRATION_EXPORT_ASSISTANCE_CENTERS_INDEX TRADE_ADMINISTRATION_BUSINESS_SERVICE_PROVIDERS_INDEX
Daily, with the exception of export assistance centers which updates weekly
WORLD_HEALTH_ORGANIZATION_ATTRIBUTES WORLD_HEALTH_ORGANIZATION_TIMESERIES
Annually, typically in May
WORLD_TRADE_ORGANIZATION_ATTRIBUTES
WORLD_TRADE_ORGANIZATION_TIMESERIES
Varies depending on variable (Monthly, Quarterly, or Annually)
GEOGRAPHY_RELATIONSHIPS GEOGRAPHY_CHARACTERISTICS GEOGRAPHY_INDEX
Data Commons is an aggregator of government data sources. Release calendars vary by underlying source.
The US Census Bureau publishes datasets about the US people and it’s economy, release schedules vary by dataset. Statistics Canada is Canada’s national statistical office. Release schedules vary by dataset.

Notes & Methodology

Geographic coverage

Data Commons’ unique geographic identifiers were used to form the core of the geography_index and related tables. Cybersyn built off of the Data Commons core to expand geographic coverage using sources such as Statistics Canada, the US Census Bureau, and the American Community Survey.
Note, the ACS population data does not includes all geographies published by the source. Today, we provide geographies at the country, state, county, city, zip code, statistical area, and census tract levels.

Variable selection

In cases where a single measure is reported by more than one source, the variable_name includes both the variable being measured and the source for the data. For example, “Total Population, un.org” and “Total Population, census.gov” both exist for US population estimates.

ACS population estimates and history

1 year estimates are based on 12 months of collected data (e.g. January 1, 2022 to December 31, 2022) and provided annually for geographies with a population of 20K+. This data has the smallest sample size but is most current. 5 year estimates are based on 60 months of collected data (e.g. January 1, 2018 to December 31, 2022) and provided annually for all geographies. This data is based on the largest sample size but is the least current.
Note, population variables and history covered varies by geographic entity level (e.g. zip code, state).

Contract and contract award identifiers

The contract_award_id and contract_solicitation_id fields can be used to find the award and the original contract on sam.gov, respectively.

Restated 4-5-4 retail calendars

The 4-5-4 retail calendar included in the calendar_index table is a standardized accounting and reporting calendar system used by many retailers, where each fiscal year is divided into 13 weeks, aiming to align with seasonal variations and facilitate more accurate financial comparisons. The retail calendar is further broken down to include periods for restated 4-5-4 retail calendar years. A 4-5-4 retail year is typically 52 weeks, but every 5-6 years, there is a 53-week year. Such 4-5-4 retail years get a restated 52-week version of the year in order to maintain comparability to the following year. For each of the 4-5-4 retail calendar years with 53 weeks, the calendar_index provides restated 4-5-4 retail periods in addition to regular retail 4-5-4 retail periods. The 53-week years since 2010 have been 2012, 2017, and 2023. Only the restated periods appear in the restated retail 4-5-4 calendar types (i.e., only periods in 2012, 2017, 2023).
Use cases for the restated 4-5-4 periods include comparing a 52 week year (2018) to a 53 week year (2017). In this case, users would compare 2018 periods to the corresponding period (ORDINAL_POSITION_IN_ANNUAL_PERIOD) from the restated 2017 calendar; the non-restated 2017 calendar would be compared to 2016 numbers.

Streamlit Demo

Cybersyn builds Streamlit demos to visualize the data available in this product and provide a jumping off point.

Examples & Sample Queries

Evaluate population by sex for a geographic entity level
Find the population by sex for a block group in Manhattan in the most recent published data.
WITH zip_stats AS (
SELECT
YEAR(ts.date) AS year,
ts.geo_id AS zip,
rship.related_geo_name AS state,
ts.value AS population,
LAG(ts.value, 1) OVER (PARTITION BY zip ORDER BY year ASC) AS prev_year_population,
population / prev_year_population - 1 AS pct_growth,
population - prev_year_population AS absolute_change
FROM cybersyn.american_community_survey_timeseries AS ts
JOIN cybersyn.american_community_survey_attributes AS att
ON ts.variable = att.variable
JOIN cybersyn.geography_index AS geo
ON ts.geo_id = geo.geo_id
JOIN cybersyn.geography_relationships AS rship
ON ts.geo_id = rship.geo_id AND rship.related_level = 'State'
WHERE
att.series_type = 'Total Population'
AND att.measurement_type = 'Estimate'
AND att.measurement_period = '5YR'
AND geo.level = 'CensusZipCodeTabulationArea'
AND ts.value > 25000
)
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY year ORDER BY pct_growth DESC NULLS LAST) AS annual_rank
FROM zip_stats
WHERE year >= 2012
QUALIFY ROW_NUMBER() OVER (PARTITION BY year ORDER BY pct_growth DESC NULLS LAST) <= 10
ORDER BY year, annual_rank;
Find an upcoming trade event
Pick an upcoming in-person trade event to attend in the software industry.
SELECT
trade_event_name,
start_date,
end_date,
relevant_industries,
trade_event_url,
trade_event_description,
registration_type,
registration_url,
primary_venue_geo_id_country,
primary_venue_geo_id_state,
primary_venue_name,
primary_venue_street,
primary_venue_city,
contacts
FROM cybersyn.international_trade_administration_trade_events_index
WHERE
start_date > CURRENT_DATE()
AND ARRAY_CONTAINS('Software'::VARIANT, relevant_industries)
AND trade_event_type = 'In-Person'
ORDER BY start_date
Compare economic statistics across different geographic levels
Show unemployment rates in New York City vs. New York state
SELECT ts.date,
geo.geo_name,
geo.level,
ts.value
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE geo.geo_name = 'New York'
AND geo.level IN ('State', 'City')
AND ts.variable_name ILIKE 'Unemployment Rate%'
AND date >= '2015-01-01'
ORDER BY date;
Show populations of various geographies
Search populations of the United States, Canada, and Mexico since 2000, including human readable names. The time series table contains the core data and geo_index table contains human readable names for geographies. The variable name can be found in the measures table.
SELECT att.variable_name,
geo.geo_name,
geo.geo_id,
date,
value
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.datacommons_attributes AS att ON (ts.variable = att.variable)
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE att.variable_group ='Total Population'
AND geo.geo_id IN ('country/USA', 'country/CAN', 'country/MEX')
AND date >= '2000-01-01'
ORDER BY date DESC;
Display available measures for cities
Explore all of the variables that are available at the core-based statistical area (CBSA) level. A CBSA is a geographic region in the US that contains a large population - typically cities and their surrounding areas.
SELECT DISTINCT variable_name
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE level = 'CensusCoreBasedStatisticalArea';
Explore geographic relationships and hierarchies
Find all counties and zip codes in New York.
SELECT related_geo_id,
related_geo_name,
related_level
FROM cybersyn.geography_relationships
WHERE geo_name = 'New York'
AND level = 'State'
AND related_level IN ('County', 'CensusZipCodeTabulationArea')
ORDER BY related_geo_name;
Compare median income to median age by zip code
The complexity here comes from using latest available data for each variable. We filter independently for the latest value for each of the comparisons we want to make.
WITH income_age_data AS (
SELECT
geo.geo_name,
ts.geo_id,
ts.variable_name,
ts.value
FROM cybersyn.datacommons_timeseries AS ts
INNER JOIN cybersyn.geography_index AS geo
ON (ts.geo_id = geo.geo_id)
WHERE
geo.level = 'CensusZipCodeTabulationArea'
AND ts.variable_name IN ('Median Income for All Households', 'Median Age of Population, census.gov')
QUALIFY ROW_NUMBER() OVER (PARTITION BY ts.geo_id, ts.variable_name ORDER BY ts.date DESC) = 1
)
SELECT
geo_name,
geo_id,
MAX(CASE WHEN variable_name = 'Median Income for All Households' THEN value END) AS median_income,
MAX(CASE WHEN variable_name = 'Median Age of Population, census.gov' THEN value END) AS median_age
FROM income_age_data
GROUP BY
geo_name,
geo_id
HAVING median_income IS NOT NULL AND median_age IS NOT NULL
ORDER BY geo_name;
Pull details about the highest-value contracts awarded by government agencies
Find descriptions of the largest Missile Defense Agency contract awarded in recent years.
SELECT
awards.award_name,
awards.award_description,
contracts.department,
contracts.agency,
contracts.agency_office,
contracts.first_posted_date,
awards.award_date,
awards.award_amount,
contracts.naics_description
FROM cybersyn.government_contract_index AS contracts
JOIN cybersyn.government_contract_award_index AS awards
ON (contracts.contract_solicitation_id = awards.contract_solicitation_id)
WHERE contracts.department = 'Dept Of Defense'
AND contracts.agency_office = 'Missile Defense Agency (Mda)'
AND YEAR(awards.award_date) >= 2021
ORDER BY award_amount DESC NULLS LAST
LIMIT 50;
Use geography boundaries to filter entities
Query the largest counties per U.S. state.
WITH county_areas AS (
SELECT
geo.geo_id,
geo.geo_name AS county,
states.related_geo_name AS state,
countries.related_geo_name AS country,
ST_AREA(TRY_TO_GEOGRAPHY(value)) AS county_area
FROM cybersyn.geography_index AS geo
JOIN cybersyn.geography_relationships AS states
ON (geo.geo_id = states.geo_id AND states.related_level = 'State')
JOIN cybersyn.geography_relationships AS countries
ON (geo.geo_id = countries.geo_id AND countries.related_level = 'Country')
JOIN cybersyn.geography_characteristics AS chars
ON (geo.geo_id = chars.geo_id AND chars.relationship_type = 'coordinates_geojson')
WHERE geo.level = 'County'
)
SELECT *
FROM county_areas
WHERE country = 'United States'
QUALIFY ROW_NUMBER() OVER (PARTITION BY country, state ORDER BY county_area DESC, geo_id) = 1
ORDER BY county_area DESC
LIMIT 10;
Find IRS net tax collections distribution by different period lengths
Explore net tax collections by the IRS from 2015 onwards on a weekly, monthly, quarterly, and annual basis.
WITH daily_tax_net_collections AS (
SELECT date, SUM(value) AS irs_tax_net_collections_amount
FROM cybersyn.us_treasury_revenue_collections_timeseries
WHERE variable_name RLIKE 'IRS Tax Net Collections Amount:.*'
GROUP BY date
)
SELECT
cals.calendar_name AS period_type,
cals.period_start_date,
cals.period_end_date,
SUM(taxes.irs_tax_net_collections_amount) AS irs_tax_net_collections_amount
FROM daily_tax_net_collections AS taxes
LEFT JOIN cybersyn.calendar_index AS cals
ON taxes.date BETWEEN cals.period_start_date AND cals.period_end_date
WHERE cals.calendar_id IN ('week_monday_start', 'month', 'quarter', 'year') AND taxes.date >= '2015-01-01'
GROUP BY cals.calendar_name, cals.period_start_date, cals.period_end_date
ORDER BY cals.calendar_name, cals.period_start_date, cals.period_end_date;
Find commodity production data by country over time.
Determine how olive oil production is changing over time by country by market year.
SELECT
geo.geo_name,
ts.date,
ts.value,
ts.unit
FROM cybersyn.us_department_of_agriculture_commodities_timeseries AS ts
JOIN cybersyn.geography_index AS geo
ON ts.geo_id = geo.geo_id
WHERE ts.variable_name = 'Olive Oil: Production';

Releases & Changelog

11/30/23 - Added global trade, tariff, and import relationship data from the World Trade Organization (WTO)
Added global trade flows, imposed tariffs, and trade interactions between countries from the World Trade Organization (WTO) . The data details export and import figures for goods and services across different countries and regions, tariff rates and structures that WTO member countries apply to imports from other nations, trade dependencies between countries, and the balance of trade between specific pairs of nations.
  • The WORLD_TRADE_ORGANIZATION_ATTRIBUTES table details the global trade, tariff, and import relationship statistics tracked by the World Trade Organization (WTO).
  • The WORLD_TRADE_ORGANIZATION_TIMESERIES table provides timeseries values by date for the reported trade statistics by country, country group, and global region (as defined by the World Trade Organization).
11/30/23 - Added global health indicators from the World Health Organization (WHO)
Added 1,100+ health-related indicators for 194 members of the World Health Organization (WHO) and their associated country groups and global regions. Example metrics include alcohol consumption among adolescents and adults, tobacco control policies, abortion rates, accessibility of dementia care services, and adolescent fertility rates. Environmental health indicators including air pollution's impact on mortality rates and disability-adjusted life years (DALYs) as well as deaths attributable to the environment are also included.
  • The WORLD_HEALTH_ORGANIZATION_ATTRIBUTES table details the health statistics tracked by the World Health Organization (WHO).
  • The WORLD_HEALTH_ORGANIZATION_TIMESERIES table provides timeseries values by date for the reported health indicators by country, country group, or global region (as defined by global organizations like UNICEF, the United Nations, the World Bank, and the World Health Organization).
11/30/23 - Added country groups and regions from the WHO, WTO, and UN to geography tables
Added additional country groups and geography types to the GEOGRAPHY_INDEX from the World Health Organization (WHO), World Trade Organization (WTO), and United Nations (UN). The member countries of the added geographies are mapped in the GEOGRAPHY_RELATIONSHIPS table. Select new geographic regions include:
  • BRICS members
  • World Trade Organization (WTO) members
  • Association of Southeast Asian Nations (ASEAN)
  • UNICEF regions
  • United Nations regions
  • United Nations Sustainable Development Goal (SDG) regions
  • World Bank regions
  • World Health Organization (WHO) regions and income regions
  • World Bank regions and income groups
11/28/23 - Added US agricultural export sales data from the USDA
Added US Export Sales Reporting (ESR) data on weekly export sales activity for 40+ US agricultural commodities sold abroad from the US Department of Agriculture's (USDA) Foreign Agricultural Service (FAS).
  • The US_DEPARTMENT_OF_AGRICULTURE_COMMODITIES_ATTRIBUTES now includes the export of commodities in addition to the existing production, supply, and distribution variables.
  • The US_DEPARTMENT_OF_AGRICULTURE_COMMODITIES_TIMESERIES table provides the reported metrics for each commodity by GEO_ID.
11/20/23 - Expanded American Community Survey (ACS) history for 1,400+ population variables since 2005 for ~500K geographies
Added historical data from the American Community Survey (ACS) to the AMERICAN_COMMUNITY_SURVEY_ATTRIBUTES and AMERICAN_COMMUNITY_SURVEY_TIMESERIES tables for over 1,400 population variables dating back to 2005 at the following geographic entity levels: country, states, counties, cities, zip codes, core-based statistical areas (CBSAs), census tracts, and census block groups. Example population variable additions include age, race, income, employment status, immigration status, and household status.
Data is as up to date as the latest ACS publication.
11/3/23 - Added global agricultural commodity production and distribution data from the USDA. Added calendar index table
Added two tables sourced from the US Department of Agriculture's (USDA) Foreign Agricultural Service (FAS) which provides production, supply, and distribution data on agricultural commodities for both the United States and other producing and consuming countries since 1960.
  • us_department_of_agriculture_commodities_attributes describes the production, supply, and distribution metrics tracked for each commodity by the USDA.
  • us_department_of_agriculture_commodities_timeseries table provides the reported metrics for each commodity and country.
Added the calendar_index table which compiles common calendars into a single table. Each calendar type has a unique CALENDAR_ID, which allows users to select which calendar type they want to use. Individual periods within each calendar type include period start and end dates.
The calendar_index currently includes regular calendar periods (days, weeks, months, quarters, and years) and 4-5-4 retail calendar periods (4-5-4 retail months, quarters, and years).
The 4-5-4 retail calendar is a standardized accounting and reporting calendar system used by many retailers, where each fiscal year is divided into 13 weeks, aiming to align with seasonal variations and facilitate more accurate financial comparisons.
10/19/23 - Added US Federal Government Revenue Collections from the US Treasury Fiscal Data
The US Treasury provides a daily overview of net federal revenue collections from income tax deposits, customs duties, fees for government services, fines, and loan repayments. These collections undergo electronic and/or non-electronic processing, involving various channels such as mail, internet, banking, and over-the-counter transactions, all of which are comprehensively incorporated within this dataset.
The us_treasury_revenue_collections_timeseries table provides daily net collections amounts broken down by tax category and processing channel. The us_treasury_revenue_collections_attributes table details each collection method reported by the US Treasury.
10/11/23 - Added population variables to the American Community Survey tables
Expanded the american_community_survey_attributes and american_community_survey_timeseries tables to include additional population variables related to income, age, and educational attainment.
New series include Household Income in the Past 12 Months (Inflation-Adjusted), Educational Attainment for the Population 25 Years and Over, and Age of Householder By Household Income in the Past 12 Months (Inflation-Adjusted). These series are available by multiple breakdowns (ex. income, age, gender, etc.).
10/2/23 - Added population data from the American Community Survey
Expanded our population dataset to include annual estimates from the American Community Survey (ACS) for 2021 and 2022 at multiple geographic levels in the United States.
9/15/23 - Added FIPS 10-4 country codes and state abbreviations
Expanded the geography_characteristics table to include mappings of FIPS 10-4 country codes and U.S. state abbreviations to country and state-level GEO_IDs, respectively.
8/11/23 – Added geospatial boundaries data for territories in the US and Canada
The Census Bureau and Statistics Canada publish geospatial boundaries data for their territories at multiple geographic levels. We added a table geography_characteristics with the boundary coordinates from the most recent releases in both WKT and GeoJSON formats. The table is joinable at different levels using Cybersyn's GEO_ID. This GEO_ID is compatible with all Cybersyn listings that have geographic identifiers. Currently, the geographic levels covered include:
  • State (US and Canada)
  • County (US only)
  • Census Tract (US only)
  • ZIP Code (US only)
  • Dissemination Area and Aggregate Dissemination Area (Canada only)
  • Census Division and Census Subdivision (Canada only)
  • Census Agglomeration and Census Agglomeration Part (Canada only)
  • Census Metropolitan Division and Census Metropolitan Division Part (Canada only)
8/7/23 – Added text-based US government contracts data from SAM.gov
The US government publishes contract opportunities and proposals to do business with the federal government via the System for Award Management (sam.gov) for contracts and awards with a value of at least $25,000. The data goes back to January 2002 and includes metadata providing descriptions of government contracts and the corresponding awards granted for those contracts.
6/1/23 – Added 3,000 new US zip codes from USPS and US Census
  • Using USPS address change data, we added 3,000 zip codes (mostly PO Box) to the dc_geo_index.
  • Using both the USPS address change and US Census Bureau data, we increased the coverage in geography_relationships table with 6,500 new zip and city relationships. We now map 86% of zip codes to a city.
5/19/23 – Updated product name from Cybersyn Data Commons to Cybersyn Government Essentials
Rebranded Cybersyn Data Commons as Cybersyn Government Essentials. We updated the naming conventions for schemas, tables, and column names to make them consistent across all of Cybersyn’s existing and future data products.
Cybersyn will continue to support and update your older version of the Data Commons tables.

Errata & Future Improvements

We note known issues and planned future improvements. If you would like to submit a bug report or feature request, email us at [email protected]
  • Add additional calendars to calendar_index including specific fiscal calendars and holidays.

Disclaimers

The data in this dataset is sourced here. Links to provider license, terms and disclaimers are provided where appropriate:
Python-Holidays: License
SAM.gov: Disclaimer
Statistics Canada: Terms & Conditions
USA.gov
WHO
WTO
Cybersyn is not endorsed by or affiliated with any of these providers. Contact [email protected] for questions.
Last modified 6d ago