US Securities and Exchange Commission (SEC)

Overview

A subset of SEC Filings submitted by corporations, funds, and individuals. The dataset includes raw (plain) text, parsed and unparsed XBRL, and HTML. SEC Filings are available in both the Free and Enterprise products. Coverage varies between tiers, with Enterprise having additional forms and features, such as 10-K/Qs parsed into sections for easier input into LLMs. Cybersyn's company reference spine of ~100K public and private companies and Cybersyn's OpenFIGI and PermID security master are also included.

Key Attributes

*Form N-PORT - limited history because N-PORTs replaced Form N-Qs in 2019 (moved from quarterly to monthly reporting)

As with all Public Domain datasets, Cybersyn aims to release data on Snowflake Marketplace as soon as the underlying source releases new data. We check periodically for changes to the underlying source and, upon detecting a change, propagate the data to Snowflake Marketplace immediately. See our release process for more details.

Description

All Cybersyn products follow the EAV (entity, attributes, value) model with a unified schema. Entities are tangible objects (e.g. geography, company) that Cybersyn provides data on. All timeseries' dates and values that refer to the entity are included in a timeseries table. Descriptors of the timeseries are included in an attributes table. Data is joinable across all Cybersyn products that have a GEO_ID. Refer to Cybersyn Concepts for more details.

SEC data revolves around Company and Report entities. A Central Index Key (CIK) identifies the Company entity and is a unique number that the SEC assigns to each registrant (e.g. company, individual, foreign government) submitting a filing. The Report entity is represented by an ADSH, an accession number assigned to each individual filing. The ADSH can be used to track all records from a single filing/report.

The SEC_CIK_INDEX table contains the distinct Company entities associated with SEC filings and provides a map between COMPANY_ID and SIC, along with geographic and SIC details. Fields include the company name, SIC (Standard Industrial Classification assigned by the SEC to classify the primary industry), SIC category, SIC description, EIN (Employee Identification Number assigned by the IRS), and location.

The SEC_REPORT_INDEX table contains each Report (e.g. 10-K, 10-Q, 8-K) identified by an ADSH that a Company submits. Each row represents a distinct Report submission type and date by Company. These filings are self-reported details on a company’s business, the risks it faces, and the operating and financial results for the fiscal year or quarter, as applicable. Reports can be joined to the SEC_CIK_INDEX table via the CIK identifier.

Attributes (the parsed XBRL line items from financial statements) associated with each Report can be found in the SEC_REPORT_ATTRIBUTES table. A Report’s full text can be found in the SEC_REPORT_TEXT_ATTRIBUTES table, and again, is joinable to the SEC_REPORT_INDEX via CIK and ADSH identifiers. Each Report in the SEC_REPORT_TEXT_ATTRIBUTES table also includes a SEC_DOCUEMNT_ID, which is a Cybersyn-created ID to uniquely identify report exhibits.

The SEC_HOLDING_FILING_INDEX table contains an overview of 13-F fund holding reports by ADSH with associated entities (CIK). Fields include the submission type, filing date, reporting period, filing manager, number of holdings, total holding value, and any additional information. Attributes associated with each report can be found in the SEC_HOLDING_FILING_ATTRIBUTES table. Attributes include security name, market value, number of shares, and OpenFIGI IDs.

SEC_METRICS_TIMESERIES includes parsed quarterly and annual segment revenues from 10-Qs and 10-Ks. Cybersyn translates raw XBRL data into a cleaned version of SEC revenue segments over time. The new table allows you to easily identify revenue trends across the specific segments (brand, category, location etc.) a company chooses to report on. Target, for example, reports revenue by category (Apparel & Accessories, Beauty & Household Essentials etc.) whereas Darden Restaurants reports by brand (Olive Garden, Longhorn Steakhouse etc.). This premium dataset is available in our Enterprise product.

SEC_CORPORATE_REPORT_ITEM_ATTRIBUTES provides a wide format breakdown of 10-Ks and 10-Qs, which are parsed by item into plaintext, HTML, and JSON structure. Each row represents an "Item", or section, within an SEC 10-K or 10-Q. This is helpful for inputting smaller chunks of data into an LLM for data extraction. For example, you want to run GenAI on the Management Discussion & Analysis section of every company that reported this week. Since every company’s filing, titling, section naming, and structure is slightly different and the full-text of these sections is large enough to be inconvenient to put the entire thing into a context window - it is helpful to input smaller chunks of the data. Note - only filings post-2023 are included in this table today.

Each company’s fiscal calendars can be found in the SEC_FISCAL_CALENDARS table. A fiscal calendar is a financial calendar year mapped to the company's quarterly and annual reporting period.

SEC data is also used in the reference spine Cybersyn is building out to support joining companies, subsidiaries, and their brands together in a hierarchy. Click here to learn about the COMPANY_INDEX table which aggregates commonly used Company identifiers (CIKs, EINs, LEIs) into a single COMPANY_ID.

Tables, Products, & Sources

Source: U.S. Securities & Exchange Commission

Notes & Methodology

Covered quarters

In the sec_report_attributes table, the covered_qtrs denotes how many periods the value covers. For example, a year-to-date measure for revenue from Q1-Q3 would be 3. A point-in-time measure that is “as of” a specific date has a 0 for covered_qtrs.

Text formatting

In the sec_report_text_attributes table, the raw text is pulled from the originally-filed reports. The raw text is stripped of HTML formatting and is presented as a single block of text. Note that this text includes both text-based sections of the reports (such as Management Discussions & Analysis) as well as non-formatted text of values from financial statements.

Geography mapping

Cybersyn mapped portions of companies’ addresses (city, state, country) to a unique geo_id that corresponds to that location. This is a feature of all Cybersyn datasets to allow for easy comparisons across datasets that use geographic identifiers. Learn more about Cybersyn's geography tables here.

Examples & Sample Queries

Find performance of a company's business segments over time

View Target's quarterly category mix over time

WITH target_category_segments AS (
    SELECT
        company_name,
        business_segment,
        date,
            value AS segment_value
    FROM cybersyn.sec_metrics_timeseries
    WHERE 
        cik = '0000027419'
        AND measure = 'REVENUE'
        AND frequency = 'QUARTERLY' 
        AND business_segment IN ('HARDLINES','HOME FURNISHINGS AND DECOR','APPAREL AND ACCESSORIES','BEAUTY AND HOUSEHOLD ESSENTIALS','FOOD AND BEVERAGE')
)

, quarterly_performance AS (
    SELECT
        company_name,
        date,
        SUM(segment_value) AS quarterly_total_value
    FROM target_category_segments
    GROUP BY 1,2
)

SELECT *, segment_value / quarterly_total_value AS category_mix
FROM target_category_segments
JOIN quarterly_performance
USING (company_name, date)
ORDER BY date, business_segment;

Find the performance of Darden Restaurants' brands over time

SELECT *
FROM cybersyn.sec_metrics_timeseries
WHERE 
    cik = '0000940944'
    AND frequency = 'QUARTERLY'
    AND business_segment IN ('OLIVE GARDEN','LONGHORN STEAKHOUSE','FINE DINING','ALL OTHER');

Aggregate company identifies and metadata about companies

Search for commonly-used corporate identifiers such as CIK and EIN as well as stock-specific identifiers like ticker symbol and OpenFIGI ID

SELECT
    idx.company_id,
    idx.company_name,
    idx.cik,
    idx.ein,
    idx.lei,
    openfigi.openfigi_share_class_id,
    openfigi.primary_ticker,
    idx.permid_company_id
FROM cybersyn.company_index AS idx
JOIN cybersyn.company_characteristics AS char
    ON (idx.company_id = char.company_id)
JOIN cybersyn.company_security_relationships AS rship
    ON (idx.company_id = rship.company_id)
JOIN cybersyn.openfigi_security_index AS openfigi
    ON ARRAY_CONTAINS(rship.security_id::VARIANT, openfigi.openfigi_share_class_id)
WHERE char.relationship_type = 'sic_description'
  AND char.value = 'Air transportation, scheduled';

Compare financial metrics across companies

Compare revenue numbers from for 2022 for different airline companies.

SELECT i.cik, i.company_name, r.period_start_date, r.period_end_date, r.measure_description, TO_NUMERIC(r.value) AS value
FROM cybersyn.sec_cik_index AS i
JOIN cybersyn.sec_report_attributes AS r ON (r.cik = i.cik)
WHERE i.sic_code_description = 'AIR TRANSPORTATION, SCHEDULED'
  AND r.statement = 'Income Statement'
  AND r.period_end_date = '2022-12-31'
  AND r.covered_qtrs = 4
  AND r.metadata IS NULL
  AND r.measure_description IN ('Total operating revenues', 'Total operating revenue');

View historical trends for companies

Measure Chipotle’s store count growth over time

SELECT i.cik, i.company_name, r.period_end_date, r.measure_description, MAX(TO_NUMBER(r.value)) AS value
FROM cybersyn.sec_cik_index AS i
JOIN cybersyn.sec_report_attributes AS r ON (r.cik = i.cik)
WHERE company_name = 'CHIPOTLE MEXICAN GRILL INC'
  AND r.measure_description = 'Number of restaurants'
GROUP BY i.cik, i.company_name, i.cik, r.period_end_date, r.measure_description;

Pull fiscal calendars for public companies

Pull Walmart’s fiscal calendar does not align with the calendar year. Pull their quarter start and end dates

SELECT company_name, fiscal_year, fiscal_period, period_start_date, period_end_date
FROM cybersyn.sec_fiscal_calendars
WHERE company_name = 'WALMART INC.'
ORDER BY period_end_date;

Track portfolios of fund managers

See Berkshire Hathaway’s most recent public holdings disclosure and join associated tickers under which each holding may be traded globally on all exchanges

WITH latest_filing AS (
    SELECT adsh
    FROM cybersyn.sec_holding_filing_index
    WHERE filing_manager_name = 'Berkshire Hathaway Inc'
    ORDER BY filing_date DESC
    LIMIT 1
)
SELECT att.*,
    securities.global_tickers,
    securities.permid_quote_id
FROM cybersyn.sec_holding_filing_attributes AS att
LEFT JOIN cybersyn.permid_security_index AS securities
    ON att.permid_security_id = securities.permid_security_id
WHERE att.adsh IN (SELECT * FROM latest_filing)
ORDER BY att.market_value DESC;

Disclaimers

The data in this product is sourced from the US Securities and Exchange Commission (SEC).

Cybersyn is not endorsed by or affiliated with any of these providers. Contact support@cybersyn.com for questions.G

Last updated

Copyright © 2024 Cybersyn