Skip to main content

Data Commons

Overview

Data Commons is an open source project that integrates data from various public sources to power contextual Google Search. Data Commons covers a variety of topics including demographic, economic, government spending, and environmental statistics at national, state, county, and municipal levels. Select sources include the World Bank, US Bureau of Labor Statistics, Center for Disease Control, United Nations, and US Drug Enforcement Agency.

Example topics covered:

  • GDP
  • Unemployment
  • Household income
  • Population
  • Annual electricity consumption

Key Attributes

Geographic CoverageGlobal
Entity LevelCountry, State, County, City, Census Tract, Zip Code, Core Based Statistical Area - Varies by underlying source
Time GranularityDaily, Weekly, Monthly, Quarterly - Varies by underlying source
Release FrequencyVaries by underlying source
HistoryVaries by underlying source
info

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.

Notes & Methodology

The majority of the data centers around timeseries containing demographic, economic, government spending, and environmental statistics at national, state, county, and municipal levels. This data primarily revolves around geographic entities from the national, state, county, municipal, zip code, and census tract levels.

The timeseries table contains the core data and the GEOGRAPHY_INDEX table contains human readable names for geographies. Variable attributes can be joined to the timeseries data for additional metadata about the variables themselves (measurement category, units, frequency, etc.).

In cases where a single measure is reported by more than one source, the variable_name includes both the variable being measured and the source for the data. For example, “Total Population, un.org” and “Total Population, census.gov” both exist for US population estimates.

Cybersyn Products

The Data Commons tables above are available in the following Cybersyn data products.

Sample Queries

Compare economic statistics across different geographic levels

Show unemployment rates in New York City vs. New York state.

SELECT ts.date,
geo.geo_name,
geo.level,
ts.value
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE geo.geo_name = 'New York'
AND geo.level IN ('State', 'City')
AND ts.variable_name ILIKE 'Unemployment Rate%'
AND date >= '2015-01-01'
ORDER BY date;

Show populations of various geographies

Search populations of the United States, Canada, and Mexico since 2000, including human readable names.

SELECT att.variable_name,
geo.geo_name,
geo.geo_id,
date,
value
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.datacommons_attributes AS att ON (ts.variable = att.variable)
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE att.variable_group ='Total Population'
AND geo.geo_id IN ('country/USA', 'country/CAN', 'country/MEX')
AND date >= '2000-01-01'
ORDER BY date DESC;

Display available measures for cities

Explore all of the variables that are available at the core-based statistical area (CBSA) level. A CBSA is a geographic region in the US that contains a large population - typically cities and their surrounding areas.

SELECT DISTINCT variable_name
FROM cybersyn.datacommons_timeseries AS ts
JOIN cybersyn.geography_index AS geo ON (ts.geo_id = geo.geo_id)
WHERE level = 'CensusCoreBasedStatisticalArea';

Compare median income to median age by zip code

The complexity here comes from using latest available data for each variable. We filter independently for the latest value for each of the comparisons we want to make.

WITH income_age_data AS (
SELECT
geo.geo_name,
ts.geo_id,
ts.variable_name,
ts.value
FROM cybersyn.datacommons_timeseries AS ts
INNER JOIN cybersyn.geography_index AS geo
ON (ts.geo_id = geo.geo_id)
WHERE
geo.level = 'CensusZipCodeTabulationArea'
AND ts.variable_name IN ('Median Income for All Households, USD', 'Median Age of Population, census.gov')
QUALIFY ROW_NUMBER() OVER (PARTITION BY ts.geo_id, ts.variable_name ORDER BY ts.date DESC) = 1
)

SELECT
geo_name,
geo_id,
MAX(CASE WHEN variable_name = 'Median Income for All Households, USD' THEN value END) AS median_income,
MAX(CASE WHEN variable_name = 'Median Age of Population, census.gov' THEN value END) AS median_age
FROM income_age_data
GROUP BY
geo_name,
geo_id
HAVING median_income IS NOT NULL AND median_age IS NOT NULL
ORDER BY geo_name;

Disclaimers

The data in this product is sourced from Data Commons 2024, CDC Places, electronic dataset. Cybersyn has reformatted the data from Data Commons as licensed here.

Cybersyn is not endorsed by or affiliated with any of these providers. Contact snowflake-public-data@snowflake.com for questions.