US Securities and Exchange Commission (SEC)
Overview
A subset of SEC Filings submitted by corporations, funds, and individuals. The dataset includes raw (plain) text, parsed and unparsed XBRL, and HTML. SEC Filings are available in both the Free and Cybersyn Foundations paid products. Coverage varies between tiers, with Cybersyn Foundations paid having additional forms and features, such as 10-K/Qs parsed into sections for easier input into LLMs. Cybersyn's company reference spine of ~100K public and private companies and Cybersyn's OpenFIGI and PermID security master are also included.
Key Attributes
SEC Filings (Free) vs. Cybersyn Foundations (Paid)
*Form N-PORT - limited history because N-PORTs replaced Form N-Qs in 2019 (moved from quarterly to monthly reporting)
As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.
Description
All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID
. Refer to Cybersyn Concepts for more details.
SEC data revolves around Company and Report entities. A Central Index Key (CIK) identifies the Company entity and is a unique number that the SEC assigns to each registrant (e.g. company, individual, foreign government) submitting a filing. The Report entity is represented by an ADSH, an accession number assigned to each individual filing. The ADSH can be used to track all records from a single filing/report.
The SEC_CIK_INDEX
table contains the distinct Company entities associated with SEC filings and provides a map between COMPANY_ID
and SIC
, along with geographic and SIC details. Fields include the company name, SIC (Standard Industrial Classification assigned by the SEC to classify the primary industry), SIC category, SIC description, EIN (Employee Identification Number assigned by the IRS), and location.
The SEC_REPORT_INDEX
table contains each Report (e.g. 10-K, 10-Q, 8-K) identified by an ADSH that a Company submits. Each row represents a distinct Report submission type and date by Company. These filings are self-reported details on a company’s business, the risks it faces, and the operating and financial results for the fiscal year or quarter, as applicable. Reports can be joined to the SEC_CIK_INDEX
table via the CIK identifier.
Attributes (the parsed XBRL line items from financial statements) associated with each Report can be found in the SEC_REPORT_ATTRIBUTES
table. A Report’s full text can be found in the SEC_REPORT_TEXT_ATTRIBUTES
table, and again, is joinable to the SEC_REPORT_INDEX
via CIK and ADSH identifiers. Each Report in the SEC_REPORT_TEXT_ATTRIBUTES
table also includes a SEC_DOCUEMNT_ID
, which is a Cybersyn-created ID to uniquely identify report exhibits.
The SEC_HOLDING_FILING_INDEX
table contains an overview of 13-F fund holding reports by ADSH with associated entities (CIK). Fields include the submission type, filing date, reporting period, filing manager, number of holdings, total holding value, and any additional information. Attributes associated with each report can be found in the SEC_HOLDING_FILING_ATTRIBUTES
table. Attributes include security name, market value, number of shares, and OpenFIGI IDs.
SEC_METRICS_TIMESERIES
includes parsed quarterly and annual segment revenues from 10-Qs and 10-Ks. Cybersyn translates raw XBRL data into a cleaned version of SEC revenue segments over time. The new table allows you to easily identify revenue trends across the specific segments (brand, category, location etc.) a company chooses to report on. Target, for example, reports revenue by category (Apparel & Accessories, Beauty & Household Essentials etc.) whereas Darden Restaurants reports by brand (Olive Garden, Longhorn Steakhouse etc.). This premium dataset is available in our Cybersyn Foundations - PAID product.
SEC_CORPORATE_REPORT_ITEM_ATTRIBUTES
provides a wide format breakdown of 10-Ks and 10-Qs, which are parsed by item into plaintext, HTML, and JSON structure. Each row represents an "Item", or section, within an SEC 10-K or 10-Q. This is helpful for inputting smaller chunks of data into an LLM for data extraction. For example, you want to run GenAI on the Management Discussion & Analysis section of every company that reported this week. Since every company’s filing, titling, section naming, and structure is slightly different and the full-text of these sections is large enough to be inconvenient to put the entire thing into a context window - it is helpful to input smaller chunks of the data. Note - only filings post-2023 are included in this table today.
Each company’s fiscal calendars can be found in the SEC_FISCAL_CALENDARS
table. A fiscal calendar is a financial calendar year mapped to the company's quarterly and annual reporting period.
SEC data is also used in the reference spine Cybersyn is building out to support joining companies, subsidiaries, and their brands together in a hierarchy. Click here to learn about the COMPANY_INDEX
table which aggregates commonly used Company identifiers (CIKs, EINs, LEIs) into a single COMPANY_ID
.
Notes & Methodology
Covered quarters
In the sec_report_attributes
table, the covered_qtrs
denotes how many periods the value covers. For example, a year-to-date measure for revenue from Q1-Q3 would be 3. A point-in-time measure that is “as of” a specific date has a 0 for covered_qtrs
.
Text formatting
In the sec_report_text_attributes
table, the raw text is pulled from the originally-filed reports. The raw text is stripped of HTML formatting and is presented as a single block of text. Note that this text includes both text-based sections of the reports (such as Management Discussions & Analysis) as well as non-formatted text of values from financial statements.
Geography mapping
Cybersyn mapped portions of companies’ addresses (city, state, country) to a unique geo_id
that corresponds to that location. This is a feature of all Cybersyn datasets to allow for easy comparisons across datasets that use geographic identifiers. Learn more about Cybersyn's geography tables here.
Sample Queries
Find performance of a company's business segments over time
View Target's quarterly category mix over time
WITH target_category_segments AS (
SELECT
company_name,
business_segment,
date,
value AS segment_value
FROM cybersyn.sec_metrics_timeseries
WHERE
cik = '0000027419'
AND measure = 'REVENUE'
AND frequency = 'QUARTERLY'
AND business_segment IN ('HARDLINES','HOME FURNISHINGS AND DECOR','APPAREL AND ACCESSORIES','BEAUTY AND HOUSEHOLD ESSENTIALS','FOOD AND BEVERAGE')
)
, quarterly_performance AS (
SELECT
company_name,
date,
SUM(segment_value) AS quarterly_total_value
FROM target_category_segments
GROUP BY 1,2
)
SELECT *, segment_value / quarterly_total_value AS category_mix
FROM target_category_segments
JOIN quarterly_performance
USING (company_name, date)
ORDER BY date, business_segment;
Find the performance of Darden Restaurants' brands over time
SELECT *
FROM cybersyn.sec_metrics_timeseries
WHERE
cik = '0000940944'
AND frequency = 'QUARTERLY'
AND business_segment IN ('OLIVE GARDEN','LONGHORN STEAKHOUSE','FINE DINING','ALL OTHER');
Aggregate company identifies and metadata about companies
Search for commonly-used corporate identifiers such as CIK and EIN as well as stock-specific identifiers like ticker symbol and OpenFIGI ID
SELECT
idx.company_id,
idx.company_name,
idx.cik,
idx.ein,
idx.lei,
openfigi.openfigi_share_class_id,
openfigi.primary_ticker,
idx.permid_company_id
FROM cybersyn.company_index AS idx
JOIN cybersyn.company_characteristics AS char
ON (idx.company_id = char.company_id)
JOIN cybersyn.company_security_relationships AS rship
ON (idx.company_id = rship.company_id)
JOIN cybersyn.openfigi_security_index AS openfigi
ON ARRAY_CONTAINS(rship.security_id::VARIANT, openfigi.openfigi_share_class_id)
WHERE char.relationship_type = 'sic_description'
AND char.value = 'Air transportation, scheduled';
Compare financial metrics across companies
Compare revenue numbers from for 2022 for different airline companies.
SELECT i.cik, i.company_name, r.period_start_date, r.period_end_date, r.measure_description, TO_NUMERIC(r.value) AS value
FROM cybersyn.sec_cik_index AS i
JOIN cybersyn.sec_report_attributes AS r ON (r.cik = i.cik)
WHERE i.sic_code_description = 'AIR TRANSPORTATION, SCHEDULED'
AND r.statement = 'Income Statement'
AND r.period_end_date = '2022-12-31'
AND r.covered_qtrs = 4
AND r.metadata IS NULL
AND r.measure_description IN ('Total operating revenues', 'Total operating revenue');
View historical trends for companies
Measure Chipotle’s store count growth over time
SELECT i.cik, i.company_name, r.period_end_date, r.measure_description, MAX(TO_NUMBER(r.value)) AS value
FROM cybersyn.sec_cik_index AS i
JOIN cybersyn.sec_report_attributes AS r ON (r.cik = i.cik)
WHERE company_name = 'CHIPOTLE MEXICAN GRILL INC'
AND r.measure_description = 'Number of restaurants'
GROUP BY i.cik, i.company_name, i.cik, r.period_end_date, r.measure_description;
Pull fiscal calendars for public companies
Pull Walmart’s fiscal calendar does not align with the calendar year. Pull their quarter start and end dates
SELECT company_name, fiscal_year, fiscal_period, period_start_date, period_end_date
FROM cybersyn.sec_fiscal_calendars
WHERE company_name = 'WALMART INC.'
ORDER BY period_end_date;
Track portfolios of fund managers
See Berkshire Hathaway’s most recent public holdings disclosure and join associated tickers under which each holding may be traded globally on all exchanges
WITH latest_filing AS (
SELECT adsh
FROM cybersyn.sec_holding_filing_index
WHERE filing_manager_name = 'Berkshire Hathaway Inc'
ORDER BY filing_date DESC
LIMIT 1
)
SELECT att.*,
securities.global_tickers,
securities.permid_quote_id
FROM cybersyn.sec_holding_filing_attributes AS att
LEFT JOIN cybersyn.permid_security_index AS securities
ON att.permid_security_id = securities.permid_security_id
WHERE att.adsh IN (SELECT * FROM latest_filing)
ORDER BY att.market_value DESC;
Disclaimers
The data in this product is sourced from the US Securities and Exchange Commission (SEC).
Cybersyn is not endorsed by or affiliated with any of these providers. Contact snowflake-public-data@snowflake.com for questions.G