Skip to main content

Key Concepts & Notes

This section highlights the core concepts and data structures behind the Consumer Current product. It provides an overview of how merchants are organized and categorized, how spending is estimated across products and geographies, and how demographic insights are incorporated to enhance the data. These foundational concepts are essential for understanding how the data can be effectively leveraged for analysis and decision-making.


Spend by Company

Company-level estimates are provided at the most specific subsidiary or business unit available. For instance, transactions at Whole Foods are attributed to Whole Foods Market Inc., rather than to Amazon Inc. If no distinct subsidiary exists for a particular merchant, transactions are attributed to the top-level holding company.


Spend by Product

Product-level estimates are available when product information can be extracted directly from the transaction description. This is common with online subscriptions such as Netflix, OpenAI, and Peloton, where the product or service can be explicitly identified from the payment details.



Merchant Categorization

Merchant category codes (MCCs) are assigned by card processors to classify merchants based on the types of products they sell. Cybersyn maps these MCCs to North American Industry Classification System (NAICS) codes to create industry-level aggregations that are directly comparable to the U.S. Census Bureau’s Advance Monthly Retail Sales Survey (MARTS) estimates.

Customer Retention

Retention measures help quantify the rate at which consumers continue shopping with a merchant over time. For example, a 20% retention rate at the 12th month indicates that 20% of shoppers returned 12 months after their first purchase. These measures are calculated across key metrics such as customer counts, sales, and transactions.


Geographic Granularity

Geographic data can be analyzed from the perspective of either the merchant or the consumer:

  • Merchant: Sales estimates are available down to the individual point-of-interest (POI) level for brick-and-mortar stores.
  • Consumer: Sales estimates are available down to the consumer zip-code level which is based on the cardholder’s billing address. We are also exploring the integration of additional data sources, such as shipping addresses, to improve the accuracy of consumer location data.

Certain granular combinations, such as specific MERCHANT x CONSUMER_GEO or MERCHANT_GEO, may not be exposed if they do not meet a minimum average number of monthly transactions. For example, spend data for Retailer ABC in New York City may be available, but not for Akron, OH due to sample size limitations, especially for regionally concentrated companies.

At present, cross-references of CONSUMER_GEO x MERCHANT_GEO (i.e., consumers living in New York but shopping in Los Angeles) are not available but this is on the roadmap.


Demographics

Demographics currently available in the product include age and income. We are also working to include additional demographics such as gender, ethnicity, political affiliation, and children in household.

We provide additional insights into consumer demographics through data inference:

  • Zip-Code Inference: We probabilistically assign demographics based on the characteristics of the zip code associated with the cardholder’s billing address.
  • Behavioral Inference: Certain purchases can reveal consumer preferences or demographics. For example, only military members can shop at AAFES, indicating a specific customer demographic.

Combining Geographic and Demographic Data

Geographic spend estimates can be combined with demographic variables, allowing for insights into spending patterns by consumer age, income, and other demographic factors. For instance, we can analyze spending at a specific merchant by consumers residing in a particular zip code and further break this down by demographic attributes.


4-5-4 Retail Calendar

The 4-5-4 retail calendar is a standardized system used by many retailers, dividing the fiscal year into 13 weeks. This structure accounts for shifts in weekends and holidays, ensuring more accurate financial comparisons. Typically, a 4-5-4 year consists of 52 weeks, though every 5-6 years there is a 53-week year, as seen in 2012, 2017, and 2023.


Access to Raw Data

We do not share raw transaction data because without proper panelization and projection, raw data holds limited value. Our expertise lies in processing this data to maximize its usefulness for our clients. Additionally, for most firms the effort needed to work with raw data isn't worth the cost, so we focus on providing insights that offer a better return on investment. Limiting access to raw data also helps minimize the risk of privacy breaches or security attacks. To maintain flexibility, we offer User-Defined Functions (UDFs) and native applications that provide similar functionality to raw data access while preserving data privacy and security.