Links
Comment on page

US Housing & Real Estate Essentials

US housing valuation & financing, addresses, change of address requests, points of interest, and income levels

Overview

US Housing & Real Estate Essentials serves as a central source of housing and real estate data covering the United States. This product provides housing valuation and financing, migration, addresses, points of interest (POI), and income statistics across the US.
Example topics covered:
  • Housing market values and appraisals
  • Mortgage lending, bankruptcies, and rates
  • Change of address requests
  • Income levels
  • Business locations and points of interest
  • Geographic boundaries
The data is sourced from the Federal Housing Finance Agency (FHFA), USPS, IRS, CFPB, US Census Bureau, National Address Database (NAD), Overture Maps, and Data Commons.

Key Attributes

Geographic Coverage
United States
Entity Level
Zip code
Release Frequency
Depending on source; see table below
History
Varies depending on source

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company). Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics, measure) in a timeseries table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.
The Federal Housing Finance Agency (FHFA) publishes monthly and quarterly reports of housing finance markets.
  • Uniform Appraisal Dataset (UAD): Trends found in single family appraisals (i.e. home values) collected by Fannie Mae and Freddie Mac. The appraisal statistics may be grouped by neighborhood characteristics, building details, and geographic levels (e.g. national, state, Metropolitan Statistical Areas (MSAs), county, and census tract) and is collected for both new purchases and refinances.
  • National Mortgage Database (NMDB): Aggregate survey of data made up of 5% of residential mortgages indicating rates of delayed payments, foreclosures and bankruptcies, and forbearance. The data is grouped by geographic levels (national, state, CBSA/MSD level) and includes both Enterprise and non-Enterprise individual mortgages.
  • House Price Index (HPI): Repeated-sales index that measures average price changes in repeat sales or refinances on the same properties collected by Fannie Mae and Freddie Mac. The data is useful for estimating changes in house price fluctuations and affordability in specific geographic areas (national, state, CBSA/MSD level).
Overture Maps Foundation is an open data project steered by Amazon, Meta, Microsoft, and TomTom that aggregates map data from multiple sources and provides locations of points of interest (POIs). The data includes the name, location, and category or type of place or business. POIs can be mapped back to addresses. Refer to US Points of Interest & Addresses documentation to learn more about the 150 million POIs, addresses, and geographic administrative areas included in this product.
The USPS provides the total number of inbound and outbound change of address requests at the zip code level. An individual, family, or business may make a change of address request for mail-forwarding purposes when moving from one mailing address to another.
The IRS publishes aggregated individual income tax statistics and migration data based on annual returns. This data illustrates underlying economic conditions at a zip code and state level as well as measuring population and aggregate income changes between counties and states. It also allows for specific measurement of county-to-county population flow.
The Home Mortgages Disclosure Act (HMDA) requires financial institutions to disclose mortgage data and is reported jointly by the Consumer Financial Protection Bureau (CFPB) and the Federal Financial Institutions Examination Council (FFIEC). Cybersyn’s dataset includes details for every home loan application in the US from financial institutions required to file HMDA reports. Institutions that meet criteria such as certain asset thresholds must report HMDA data.

Data Dictionary

Data Sources & Release Frequency

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Tables Names
Source
Source Schedule
fhfa_house_price_attributes fhfa_house_price_timeseries fhfa_mortgage_performance_attributes fhfa_mortgage_performance_timeseries fhfa_uniform_appraisal_attributes fhfa_uniform_appraisal_timeseries
FHFA
Monthly on last Tuesday Monthly on last Tuesday Quarterly on the last day of quarter Quarterly on the last day of quarter Quarterly on the last day of quarter Quarterly on the last day of quarter
home_mortgage_disclosure_attributes
HMDA
Annually in June
geography_index geography_relationships geography_characteristics
Data Commons is an aggregator of government data sources. Release calendars vary by underlying source.
The US Census Bureau publishes datasets about the US people and it’s economy, release schedules vary by dataset.
point_of_interest_index point_of_interest_addresses_relationships
OpenAddresses refreshes weekly on Sunday ~3:30pm ET. Overture intends to release open map data on a regular cadence though the dates of future releases have not been established yet
irs_individual_income_attributes irs_individual_income_timeseries irs_migration_by_characteristic_attributes irs_migration_by_characteristic_timeseries irs_origin_destination_migration_timeseries
IRS
Annually
USPS_address_change_attributes USPS_address_change_timeseries
USPS
Monthly
us_addresses
OpenAddresses refreshes weekly on Sunday ~3:30pm ET. The NAD is refreshed about every quarter.

Notes & Methodology

US points of interest and addresses

This product includes all datasets available in US Points of Interest & Addresses. Refer to the product documentation page for the latest on POIs and addresses.
Note that more than one point of interest can map to a single address. For example, a fast food restaurant might share a location with a gas station or numerous doctors might have their own practices at a single address.

Change of address aggregations

In the USPS data, Cybersyn aggregates measures to different geographic levels. The number of address changes are calculated at the zip code, city, state, and national levels. Note that the USPS data at the zip/city level when there are fewer than 11 requests. For this reason, zip/city totals may not sum to aggregated state-level totals. In addition to aggregating to different geographic levels, Cybersyn calculates the net change in addresses (inflows - outflows).

Changes to HMDA reporting

In 2018, the CFPB changed the reporting standards for HMDA disclosures. As a result, some fields such as the financial institution and loan term are unavailable for records before 2018.

Streamlit Demos

Cybersyn builds Streamlit demos to visualization the data available in this product and provide a jumping off point.
US Home Appraisals
IRS Income & Migration
USPS Change of Address

Example Use Cases & Queries

Find the nearest competitor to a given merchant
Find the closest Lowe’s to any given Home Depot location
WITH joined_data AS (
SELECT poi.poi_id, poi.poi_name, addr.longitude, addr.latitude,
addr.number, addr.street_directional_prefix, addr.street,
addr.street_type, addr.street_directional_suffix,
addr.unit, addr.city, addr.state, addr.zip
FROM cybersyn.point_of_interest_index AS poi
JOIN cybersyn.point_of_interest_addresses_relationships AS map
ON (poi.poi_id = map.poi_id)
JOIN cybersyn.us_addresses AS addr
ON (map.address_id = addr.address_id)
)
SELECT *,
ST_DISTANCE(
ST_MAKEPOINT(home_depot.longitude, home_depot.latitude),
ST_MAKEPOINT(lowes.longitude, lowes.latitude)
) / 1609 AS distance_miles
FROM joined_data AS home_depot
JOIN joined_data AS lowes
WHERE home_depot.poi_name = 'The Home Depot'
AND lowes.poi_name = 'Lowe''s Home Improvement'
QUALIFY ROW_NUMBER() OVER (PARTITION BY home_depot.poi_id ORDER BY distance_miles NULLS LAST) = 1;
Track house prices and gross income inflow in a specific city
Compare Phoenix house prices vs gross income inflow
WITH county_map AS (
SELECT
geo_id,
geo_name,
related_geo_id,
related_geo_name
FROM cybersyn.geography_relationships
WHERE geo_name = 'Phoenix-Mesa-Scottsdale, AZ Metro Area'
AND related_level = 'County'
), gross_income_data AS (
SELECT
geo_id,
geo_name AS msa,
date,
SUM(value) AS gross_income_inflow
FROM cybersyn.irs_origin_destination_migration_timeseries AS ts
JOIN county_map ON (county_map.related_geo_id = ts.to_geo_id)
WHERE ts.variable_name = 'Adjusted Gross Income'
GROUP BY geo_id, msa, date
), home_price_data AS (
SELECT LAST_DAY(date, 'year') AS end_date, AVG(value) AS home_price_index
FROM cybersyn.fhfa_house_price_timeseries AS ts
JOIN cybersyn.fhfa_house_price_attributes AS att
ON (ts.variable = att.variable)
WHERE geo_id IN (SELECT geo_id FROM county_map)
AND att.index_type = 'purchase-only'
AND att.seasonally_adjusted = TRUE
GROUP BY end_date
)
SELECT
msa,
gid.date,
gross_income_inflow,
home_price_index
FROM gross_income_data AS gid
JOIN home_price_data AS hpi ON (gid.date = hpi.end_date)
ORDER BY date;
Determine the percent of conventional mortgages at or nearing default in California
Create a timeseries chart with percent of conventional mortgages either: 90-180 days past due, in forbearance, or in bankruptcy in California
SELECT
ts.date,
geo.geo_name,
SUM(ts.value) AS pct_near_default
FROM
cybersyn.fhfa_mortgage_performance_timeseries AS ts
JOIN
cybersyn.fhfa_mortgage_performance_attributes AS att ON (ts.variable = att.variable)
JOIN cybersyn.geography_index AS geo ON (geo.geo_id = ts.geo_id)
WHERE
att.variable_group IN (
'Percent 90 to 180 Days Past Due Date',
'Percent in Forbearance',
'Percent in the Process of Foreclosure, Bankruptcy, or Deed in Lieu'
)
AND att.market = 'All Mortgages'
AND geo.geo_name = 'California'
GROUP BY
ts.date,
geo.geo_name
ORDER BY
ts.date,
geo.geo_name;
Find zip codes with top adjusted gross income per capita
Show the top 5 zip codes by gross income that have a population of at least 10k people
SELECT
geo.geo_name AS zip_code,
ROUND(agi.value / NULLIF(pop.value, 0), 0) AS per_capita_income
FROM cybersyn.irs_individual_income_timeseries agi -- Adjusted gross income values
JOIN cybersyn.irs_individual_income_timeseries pop -- Population (number of individuals) values
ON (pop.geo_id = agi.geo_id
AND pop.date = agi.date
AND pop.value IS NOT NULL
AND pop.date = '2020-12-31'
AND pop.variable_name = 'Number of individuals, AGI bin: Total')
JOIN cybersyn.geography_index geo
ON (agi.geo_id = geo.geo_id
AND geo.level = 'CensusZipCodeTabulationArea')
WHERE agi.variable_name = 'Adjusted gross income (AGI), AGI bin: Total'
AND agi.value IS NOT NULL
AND pop.value > 10000
ORDER BY per_capita_income DESC
LIMIT 5;
Find addresses in metros that experienced the top house price growth over 5 years to target for a marketing campaign to drive home equity lines of credit
WITH base_price AS (
SELECT
ts.geo_id,
ts.value
FROM
cybersyn.fhfa_house_price_timeseries ts
WHERE
ts.date = '2018-03-31'
AND ts.variable = 'FHFA_HPI_traditional_purchase-only_quarterly_SA'
),
current_price AS (
SELECT
ts.geo_id,
ts.value
FROM
cybersyn.fhfa_house_price_timeseries ts
WHERE
ts.date = '2023-03-31'
AND ts.variable = 'FHFA_HPI_traditional_purchase-only_quarterly_SA'
),
metros AS (
SELECT
geo.geo_name AS metro_name,
geo.geo_id,
100 * (c.value - b.value) / NULLIFZERO(b.value) AS growth
FROM
base_price b
JOIN current_price c ON c.geo_id = b.geo_id
JOIN cybersyn.geography_index geo ON geo.geo_id = b.geo_id
AND geo.level IN (
'CensusCoreBasedStatisticalArea',
'CensusMetropolitanDivision'
)
ORDER BY
growth DESC
LIMIT
3
)
SELECT
metros.metro_name,
addresses.number,
addresses.street,
addresses.street_type,
addresses.city,
addresses.state,
addresses.zip
FROM
metros
JOIN cybersyn.geography_relationships cities ON metros.geo_id = cities.geo_id
and cities.related_level = 'City'
JOIN cybersyn.geography_relationships zips ON metros.geo_id = zips.geo_id
AND zips.related_level = 'CensusZipCodeTabulationArea'
JOIN cybersyn.us_addresses addresses ON addresses.id_zip = zips.related_geo_id
AND addresses.id_city = cities.related_geo_id
LIMIT
12;

Check mortgage approval rates by financial institution

Find financial institution with >10k applications that had the highest approval rates in 2021
WITH approved_apps AS (
SELECT financial_institution_name, COUNT(*) AS approved_count
FROM cybersyn.home_mortgage_disclosure_attributes
WHERE action_taken IN ('Application approved but not accepted',
'Loan originated',
'Preapproval request approved but not accepted')
AND year = '2021'
GROUP BY financial_institution_name
), all_apps AS (
SELECT financial_institution_name, COUNT(*) AS total_count
FROM cybersyn.home_mortgage_disclosure_attributes
WHERE year = '2021'
GROUP BY financial_institution_name
)
SELECT all_apps.financial_institution_name,
approved_count,
total_count,
approved_count / total_count AS pct_approved
FROM all_apps
JOIN approved_apps ON (all_apps.financial_institution_name = approved_apps.financial_institution_name)
WHERE total_count > 1E4
ORDER BY pct_approved DESC;
Create a map to visualize average home prices across the country
Join the house prices time series data with the geospatial boundaries
SELECT
ts.*,
char.value AS coordinates_geojson
FROM cybersyn.fhfa_house_price_timeseries AS ts
JOIN cybersyn.geography_characteristics AS char
ON (
ts.geo_id = char.geo_id
AND char.relationship_type = 'coordinates_geojson'
)
LIMIT 10;

Releases & Changelog

10/09/23 - Added U.S. Census Regions & Divisions.
Expanded the geography_index table to include U.S. Census Regions and Divisions.
Expanded the geography_hierarchy table to include the relationships between U.S. Census regions and U.S. Census divisions; U.S. Census regions and U.S. states; and U.S. Census divisions and U.S. states.
Census regions include the United States Northeast, Midwest, etc. and census divisions include the United States Middle Atlantic, East North Central, etc.
8/27/23 - Added points of interest data from Overture Maps Foundation
Added the point_of_interest_index table, which includes names and categories for points of interest in the US. Each POI is uniquely identified by a POI_ID.
To tie POIs to addresses, we added a new column, ADDRESS_ID, to the us_addresses table to uniquely identify each individual address. This column allows users to join addresses to POIs using the new point_of_interest_addresses_relationships table with POI_ID and ADDRESS_ID as the join keys for the point_of_interest_index table and us_addresses table, respectively.
8/27/23 - Added 7.2M new addresses, removed 49.8M duplicate addresses, deleted 1.2M addresses with Null STREET value
Added 7.2M new addresses covering points of interest from Overture Maps Foundation to the us_addresses table.
Removed 49.8M addresses that were duplicative aside from minor variability in coordinates. Removed 1.2M rows from rows from the us_addresses table where the STREET value contained a string with value Null.
8/27/23 - Added country-level geospatial boundaries to the geography_characteristics table
Added country-level geospatial boundaries to the geography_characteristics table with data from Overture Maps Foundation.
8/11/23 - Added geospatial boundaries data for territories in the US and Canada
The Census Bureau and Statistics Canada publish geospatial boundaries data for their territories at multiple geographic levels. We added a table geography_characteristics with the boundary coordinates from the most recent releases in both WKT and GeoJSON formats. The table is joinable at different levels using Cybersyn's GEO_ID. This GEO_ID is compatible with all Cybersyn listings that have geographic identifiers. Currently, the geographic levels covered include:
  • State (US and Canada)
  • County (US only)
  • Census Tract (US only)
  • ZIP Code (US only)
  • Dissemination Area and Aggregate Dissemination Area (Canada only)
  • Census Division and Census Subdivision (Canada only)
  • Census Agglomeration and Census Agglomeration Part (Canada only)
  • Census Metropolitan Division and Census Metropolitan Division Part (Canada only)

Errata & Future Improvements

We note known issues and planned future improvements. If you would like to submit a bug report or feature request, email us at [email protected].
  • The addresses for a small fraction of locations are incorrectly parsed. In particular, addresses of non-standard format such street intersections may be parsed incorrectly. The STATE, CITY, ZIP, and coordinates for these addresses are generally correct, but the STREET, NUMBER and UNIT may contain errors in these cases.

Disclaimers

The data in this dataset is sourced here. Links to provider license, terms and disclaimers are provided where appropriate:
Cybersyn is not endorsed by or affiliated with any of these providers.
Last modified 29d ago