City Police Departments
Overview
The New York, Los Angeles, San Francisco, Houston, Chicago, and Seattle Police Departments publish data on local crime incidences.
Topics covered:
- Date of occurrence
- Offense category (e.g. theft, narcotics, battery) and original description
- City and estimated zip code
Key Attributes
Geographic Coverage | Select US cities |
Entity Level | Zip Code, City, Incident ID |
Time Granularity | Daily |
Release Frequency | San Francisco, Chicago, Seattle: Daily |
As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Notes
Cybersyn has normalized the data with the following changes:
- Each jurisdiction currently uses a different offense classification system, Houston uses NIBR (the new national standard), Chicago uses IUCR, NYC uses the NY State Penal Code. Thus, different cities will have different offense codes for similar crimes. Cybersyn mapped these granular crime codes to broad "offense categories" using Chicago's IUCR system. Most cities are expected to transition to NIBRS in the near future. See the ‘reporting_system’ column for the code system used.
- Jurisdictions that use NIBRS may log crimes in the offense-level starting in 2021. For these jurisdictions, incidents may have multiple rows for each offense reported. Jurisdictions using older classification systems will only have one row per incident, classified by the worst offense recorded. See ‘reporting_level’ column to know which level the incident was reported in.
- When unavailable in the source data, zip codes are mapped based on incident lat/long or reported address location.
All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID
. Refer to Cybersyn Concepts for more details.
Cybersyn Products
Tables above are available in the following Cybersyn data products:
Sample Queries
Historical crime incidents in a location
Timeseries of crime incidents in zip code 60620
SELECT
geo.geo_name,
ts.date,
ts.variable_name,
ts.value
FROM cybersyn.urban_crime_timeseries AS ts
JOIN cybersyn.geography_index AS geo
ON (ts.geo_id = geo.geo_id)
WHERE geo.geo_name = '60620' --zip code of interest
AND ts.variable_name = 'Daily count of incidents, all incidents'
ORDER BY ts.date;
Locations with the highest level of a specific crime
List of zip codes with highest levels of theft in 2020
SELECT
geo_id,
YEAR(date)::STRING AS year,
SUM(value) AS annual_incidents
FROM cybersyn.urban_crime_timeseries
WHERE YEAR(date) = '2020'
AND variable_name = 'Daily count of incidents, theft'
GROUP BY geo_id, year
ORDER BY annual_incidents DESC;
Disclaimers
The data in this product is sourced from the following:
- Los Angeles Police Department (LAPD)
- New York Police Department (NYPD)
- San Francisco Police Department (SFPD)
- Houston Police Department (HPD)
- Seattle Police Department (SPD)
- Chicago Police Department: This site provides applications using data that has been modified for use from its original source, the official website of the City of Chicago. The City of Chicago makes no claims as to the content, accuracy, timeliness, or completeness of any of the data provided at this site. The data provided at this site is subject to change at any time. It is understood that the data provided at this site is being used at one’s own risk.
Cybersyn is not endorsed by or affiliated with any of these providers. Contact snowflake-public-data@snowflake.com for questions.