Links
Comment on page

Crime Statistics

Crime counts & classifications for major cities at zip code level

Overview

Police department crime data from New York, Los Angeles, San Francisco, Houston, Chicago, and Seattle. This dataset has been reformatted to cover date of occurrence, offense classification/description, and estimated zip code location for crimes.

Key Attributes

Geographic Coverage
Select Cities in the US
Entity Level
Zip Code
Time Granularity
Daily
Update Frequency
Depending on source; see table below

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company). Entities may have characteristics (i.e. descriptors of the entity) in an index table and values (i.e. statistics, measure) in a timeseries table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.
Cybersyn is expanding this to include other cities, if there are additional cities to prioritize, contact us at [email protected].

Data Dictionary

Data Sources & Release Frequency

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Tables Names
Source
Source Schedule
geography_index geography_relationships
Data Commons is an aggregator of government data sources. Release calendars vary by underlying source.
The US Census Bureau publishes datasets about the US people and it’s economy, release schedules vary by dataset.
urban_crime_attributes urban_crime_incident_log urban_crime_timeseries
NYC reports quarterly on last month of the quarter LA reports weekly on Wednesdays
SF reports daily
Houston reports monthly ~1-2 weeks before month end
Chicago reports daily
Seattle reports daily

Notes & Methodology

Crime Normalization

Cybersyn has normalized the data with the following changes:
  • Each jurisdiction currently uses a different offense classification system, Houston uses NIBR (the new national standard), Chicago uses IUCR, NYC uses the NY State Penal Code. Thus, different cities will have different offense codes for similar crimes. Cybersyn mapped these granular crime codes to broad "offense categories" using Chicago's IUCR system. Most cities are expected to transition to NIBRS in the near future. See the ‘reporting_system’ column for the code system used.
  • Jurisdictions that use NIBRS may log crimes in the offense-level starting in 2021. For these jurisdictions, incidents may have multiple rows for each offense reported. Jurisdictions using older classification systems will only have one row per incident, classified by the worst offense recorded. See ‘reporting_level’ column to know which level the incident was reported in.
  • When unavailable in the source data, zip codes are mapped based on incident lat/long or reported address location.
  • Chicago, San Francisco, and Seattle data is updated daily; Los Angeles is updated weekly; Houston data is updated monthly; NYC data is updated Quarterly.

Example Use Cases & Queries

Use Case: Historical crime incidents in a location
Times series of crime incidents in a specific zip code
SELECT
geo.geo_name,
ts.date,
ts.variable_name,
ts.value
FROM cybersyn.urban_crime_timeseries AS ts
JOIN cybersyn.geography_index AS geo
ON (ts.geo_id = geo.geo_id)
WHERE geo.geo_name = '60620' --zip code of interest
AND ts.variable_name = 'Daily count of incidents, all incidents'
ORDER BY ts.date;
Use Case: Locations with the highest level of crime
List of zip codes with highest levels of specific crimes (e.g., theft) in 2020
SELECT
geo_id,
YEAR(date)::STRING AS year,
SUM(value) AS annual_incidents
FROM cybersyn.urban_crime_timeseries
WHERE YEAR(date) = '2020'
AND variable_name = 'Daily count of incidents, theft'
GROUP BY geo_id, year
ORDER BY annual_incidents DESC;

Releases & Changelog

There are no updates at this time.

Disclaimer

The data in this dataset is sourced here. Links to provider license, terms and disclaimers are provided where appropriate.
Data Commons: License
Cybersyn is not endorsed by or affiliated with any of these providers. Contact [email protected] for questions.
Last modified 2mo ago