Links
Comment on page

US Points of Interest & Addresses

Over 150 million US points of interest, addresses, and geographic administrative areas

Overview

This product serves as a master points of interest (POI), address, and geographic reference dataset. The points of interest data contains the name, location, and category of 11M points of interest ranging from restaurants and commercial brands to hospitals and parks, The address data includes 145M US residential and commercial addresses covering the United States and Puerto Rico. The geographic data contains Cybersyn’s standardized geographic entities (e.g. cities, counties), relationships between these geographies (e.g. cities contained within counties) and the characteristics of these geographies (e.g. geospatial boundaries, coordinates, abbreviations).
Example topics covered:
  • Points of interest
  • Business locations
  • Street names
  • House numbers
  • Postal codes
  • Longitude and latitude coordinates
The address data is sourced from OpenAddresses, the National Address Database (NAD) and Overture Maps Foundation. The geography reference data is sourced from the US Census and the Overture Maps Foundation.

Key attributes

Geographic Coverage
United States
Entities Covered
Geographic
Update Frequency
Depending on source; see table below

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company). Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics, measure) in a timeseries table. Data is joinable across all Cybersyn products that have a GEO_ID at the zip code, city, and state levels. Refer to Cybersyn Concepts for more details.
Each address record includes latitude and longitude coordinates for geolocation. These coordinates can be combined with the geospatial boundaries included in the geography_characteristics table. Geospatial boundaries are provided as GeoJSON and WKT polygons and are represented as coordinates. These geospatial boundaries are also referred to as ”shapefiles”, “geographic boundaries,” “bounding coordinates,” and “geographic area coordinates.”
Each POI includes the name, location, category or type of place or business, and a POI_ID unique identifier. POIs can be mapped back to addresses using the relationships table. Refer back to Cybersyn Concepts mentioned above for more details. Overture Maps Foundation provides locations of points of interest. It is an open data project steered by Amazon, Meta, Microsoft, and TomTom that aggregates map data from multiple sources.

Data Dictionary

Data Sources & Release Frequency

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Table Names
Source
Source Schedule
US_ADDRESSES
The NAD is updated roughly once a quarter, exact timing varies
OpenAddresses refreshes weekly on Sunday ~3:30pm ET. Overture intends to release open map data on a regular cadence though the dates of future releases have not been established yet
POINT_OF_INTEREST_INDEX POINT_OF_INTEREST_ADDRESSES_RELATIONSHIPS
OpenAddresses refreshes weekly on Sunday ~3:30pm ET Overture intends to release open map data on a regular cadence though the dates of future releases have not been established yet
GEOGRAPHY_INDEX GEOGRAPHY_RELATIONSHIPS GEOGRAPHY_CHARACTERISTICS
Data Commons is an aggregator of government data sources. Release calendars vary by underlying source.
The US Census Bureau publishes datasets about the US people and it’s economy, release schedules vary by dataset.

Notes & Methodology

Address normalization

In the us_addresses table, Cybersyn normalizes the street names, city, state, and zip codes for each address line in the dataset using our geography_index to create consistency across city names (e.g., “Saint Paul, MN” vs. “St. Paul, MN”) and to verify accuracy. Street abbreviations are also standardized in the data (e.g., “Rd” -> “Road”).
Zip codes are determined using the address coordinates in combination with geospatial data from the US Census Bureau and are validated using data from the US Postal Service (USPS).

Point of interest to address mapping

Note that more than one point of interest can map to a single address. For example, a fast food restaurant might share a location with a gas station or numerous doctors might have their own practices at a single address.

Examples & Sample Queries

Find the nearest competitor to a given merchant
Find the closest Lowe’s to any given Home Depot location
WITH joined_data AS (
SELECT poi.poi_id, poi.poi_name, addr.longitude, addr.latitude,
addr.number, addr.street_directional_prefix, addr.street,
addr.street_type, addr.street_directional_suffix,
addr.unit, addr.city, addr.state, addr.zip
FROM cybersyn.point_of_interest_index AS poi
JOIN cybersyn.point_of_interest_addresses_relationships AS map
ON (poi.poi_id = map.poi_id)
JOIN cybersyn.us_addresses AS addr
ON (map.address_id = addr.address_id)
)
SELECT *,
ST_DISTANCE(
ST_MAKEPOINT(home_depot.longitude, home_depot.latitude),
ST_MAKEPOINT(lowes.longitude, lowes.latitude)
) / 1609 AS distance_miles
FROM joined_data AS home_depot
JOIN joined_data AS lowes
WHERE home_depot.poi_name = 'The Home Depot'
AND lowes.poi_name = 'Lowe''s Home Improvement'
QUALIFY ROW_NUMBER() OVER (PARTITION BY home_depot.poi_id ORDER BY distance_miles NULLS LAST) = 1;
Query all POIs of a specific type (e.g., coffee shop) within a given ZIP code
Generate a list of all coffee shops in a given ZIP code along with their addresses
SELECT
poi.poi_name,
poi.category_main,
poi.category_alternate,
addr.number,
addr.street,
addr.street_directional_prefix,
addr.street,
addr.street_type,
addr.street_directional_suffix,
addr.unit,
addr.city,
addr.state,
addr.zip
FROM cybersyn.point_of_interest_index AS poi
JOIN cybersyn.point_of_interest_addresses_relationships AS map
ON (poi.poi_id = map.poi_id)
JOIN cybersyn.us_addresses AS addr
ON (map.address_id = addr.address_id)
WHERE addr.zip = '10003'
AND poi.category_main = 'Coffee Shop';
Find addresses within a zip code to send direct mail campaign to
Query US addresses by zip code to find relevant addresses in your target area
SELECT NUMBER, STREET, STREET_TYPE, CITY, STATE, ZIP
FROM CYBERSYN.US_ADDRESS
WHERE ZIP = '02114'
LIMIT 5;
Reverse geocoding
Query addresses near longitude and latitude coordinates to find nearby addresses
SELECT LONGITUDE, LATITUDE, NUMBER, STREET, STREET_TYPE, CITY, STATE, ZIP
FROM CYBERSYN.US_ADDRESS
WHERE LONGITUDE BETWEEN -74.5 AND -74
AND LATITUDE BETWEEN 40.0 AND 40.5
LIMIT 5;
Use geographic boundaries to filter addresses
Query addresses in the largest zip code within a US state (e.g., Florida) by total tabulation area
WITH zip_areas AS (
SELECT
geo.geo_id,
geo.geo_name AS zip,
states.related_geo_name AS state,
countries.related_geo_name AS country,
ST_AREA(TRY_TO_GEOGRAPHY(value)) AS area
FROM cybersyn.geography_index AS geo
JOIN cybersyn.geography_relationships AS states
ON (geo.geo_id = states.geo_id AND states.related_level = 'State')
JOIN cybersyn.geography_relationships AS countries
ON (geo.geo_id = countries.geo_id AND countries.related_level = 'Country')
JOIN cybersyn.geography_characteristics AS chars
ON (geo.geo_id = chars.geo_id AND chars.relationship_type = 'coordinates_geojson')
WHERE geo.level = 'CensusZipCodeTabulationArea'
),
zip_area_ranks AS (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY country, state ORDER BY area DESC, geo_id) AS zip_area_rank
FROM zip_areas
)
SELECT addr.number, addr.street, addr.street_type, addr.city, addr.state, addr.zip, areas.country
FROM cybersyn.us_addresses AS addr
JOIN zip_area_ranks AS areas
ON (addr.id_zip = areas.geo_id)
WHERE addr.state = 'FL' AND areas.country = 'United States' AND areas.zip_area_rank = 1
LIMIT 10;

Errata & Future Improvements

We note known issues and planned future improvements. If you would like to submit a bug report or feature request, email us at [email protected].
  • The addresses for a small fraction of locations are incorrectly parsed. In particular, addresses of non-standard format such street intersections may be parsed incorrectly. The STATE, CITY, ZIP, and coordinates for these addresses are generally correct, but the STREET, NUMBER and UNIT may contain errors in these cases.

Releases & Changelog

9/15/23 - Added FIPS 10-4 country codes and state abbreviations
Expanded the geography_characteristics table to include mappings of FIPS 10-4 country codes and U.S. state abbreviations to country and state-level GEO_IDs, respectively.
8/27/23 - Added points of interest data from Overture Maps Foundation
Added the point_of_interest_index table, which includes names and categories for points of interest in the US. Each POI is uniquely identified by a POI_ID.
To tie POIs to addresses, we added a new column, ADDRESS_ID, to the us_addresses table to uniquely identify each individual address. This column allows users to join addresses to POIs using the new point_of_interest_addresses_relationships table with POI_ID and ADDRESS_ID as the join keys for the point_of_interest_index table and us_addresses table, respectively.
8/27/23 - Added 7.2M new addresses, removed 49.8M duplicate addresses, deleted 1.2M addresses with Null STREET value
Added 7.2M new addresses covering points of interest from Overture Maps Foundation to the us_addresses table.
Removed 49.8M addresses that were duplicative aside from minor variability in coordinates. Removed 1.2M rows from rows from the us_addresses table where the STREET value contained a string with value Null.
8/27/23 - Added country-level geospatial boundaries to the geography_characteristics table
Added country-level geospatial boundaries to the geography_characteristics table with data from Overture Maps Foundation.
8/11/23 - Added geospatial boundaries data for territories in the US and Canada
The Census Bureau and Statistics Canada publish geospatial boundaries data for their territories at multiple geographic levels. We added a table geography_characteristics with the boundary coordinates from the most recent releases in both WKT and GeoJSON formats. The table is joinable at different levels using Cybersyn's GEO_ID. This GEO_ID is compatible with all Cybersyn listings that have geographic identifiers. Currently, the geographic levels covered include:
  • State (US and Canada)
  • County (US only)
  • Census Tract (US only)
  • ZIP Code (US only)
  • Dissemination Area and Aggregate Dissemination Area (Canada only)
  • Census Division and Census Subdivision (Canada only)
  • Census Agglomeration and Census Agglomeration Part (Canada only)
  • Census Metropolitan Division and Census Metropolitan Division Part (Canada only)
5/19/23 - Added source data from National Address Database (NAD)
Added the National Address Database (NAD) as a source to increase our US address coverage:
  • Increased the coverage from 140 million addresses to more than 188 million.
  • There is now at least one address in more than 85% of zip codes, up from 74% previously.
  • Increased the portion of cities that are mapped to distinct IDs joinable to our other data sets from 24% to over 77%

Disclaimers

The data in this dataset is sourced here. Links to provider license, terms and disclaimers are provided where appropriate
Cybersyn is not endorsed by or affiliated with any of these providers. Contact [email protected] for questions.
Last modified 2mo ago