One of MaRS Data Catalyst’s major initiatives is the development of a “data layer” for Ontario’s innovation economy. For some time now, MaRS has been aware that a data gap exists around startups in the province. We had always struggled to answer questions like the following:
- How many startups  are there in Ontario?
- Are they clustering by industry and by geography? If so, how?
- How are Ontario startups developing over time? What’s their success rate? Does it differ by geography? By industry?
- What percentage of startups is high-growth? 
- How do our startups compare to those in other jurisdictions?
The data we did find to answer these questions was weak—patchy, out of date, or not specific enough to the kinds of startups MaRS works with as a member of the Ontario Network of Excellence (ONE).
So when the opportunity to address this gap arose, we jumped on it. To make sure, however, that we weren’t at risk of duplicating existing data sets or developing something that couldn’t link to other data programs, we surveyed the data landscape more systematically.
Data about business
Data that provides company-level details about startups—such as profile, funding and history—can be categorized along these lines.
Commercial company directory data:
You may already be familiar with company data provided by providers like Dun & Bradstreet/Hoover’s, OneSource or Scott’s Business Directories. While very useful for understanding more mature businesses, these sources do not always capture very young firms like the ones Data Catalyst is targeting. Data is largely self-reported, which may compromise data quality. It also costs to access the data.
Startup or tech company directories:
These are directories of startups typically maintained by support organizations, media organizations or members of the startup community. Well-known examples include StartupNorth’s Index and CrunchBase. AngelList’s API can also be included as a data source within this category. While free and often rich resources, they can also be limited. Their company listings are infrequently refreshed or updated and tend to skew heavily to startups in the Information and Communications Technologies (ICT) sector. Moreover, the definition of a “startup” may be very loose.
A number of federal and provincial government departments and programs fund startups. Current examples include the National Research Council’s Industrial Research Assistance Program, Federal Development Agency for Southwestern Ontario (FedDev Ontario) and Ministry of Economic Development and Innovation (MEDI). Unfortunately, almost all of these programs either a) release their data through a press release or via a website, b) release incomplete information, for example, without a company description or industry classification, or c) only release aggregate data. This means data must be scraped, compiled and patched together in order to be integrated with other data sources.
Funding data—venture capital:
A number of data providers, such as Thomson Reuters, Dow Jones and CB Insights, track venture capital investing—all for a fee . While useful sources for understanding startups, they obviously only capture data about companies that have received venture capital funding—and only those for whom investment has been disclosed publicly. Angel investment and the companies that receive it are also not always captured by these databases.
Small business statistics
A number of sources provide statistics or aggregate data about small business activity. These sources can provide higher-level or trend-focused insight into startups and young companies.
Business counts data:
Most official statistical agencies provide counts or numbers of businesses by location, by employment and/or by industry.
For Canada, the core product is Statistics Canada’s Canadian Business Patterns, which uses government administrative data like tax records to derive counts. While this data is useful for providing a view of business activity, it does not capture firm age as a variable, so new business creation over time can only be inferred. There may also be some lag time between when a startup forms and when it is recognized as a business entity by Statistics Canada. The industry classification used (NAICS) also does not align well with the kinds of sectors and clusters we see in Ontario’s R&D-intensive or knowledge-based startups.
Business dynamics data:
Business dynamics data tracks business “births,” “deaths” and survival rates, typically at a country level. Because it is age-based, this kind of data also allows you to look at trends around high-growth and “gazelle” firms.  Unfortunately, the business dynamics data sets available in Canada are out of date, with new data only in the early stages of development. More recent data is available from the US (via the Census Bureau) and the UK (via Nesta).
Indicators and benchmarks
Indicator and benchmark sources provide comparative data about startups. If Ontario startup support organizations like ONE or startups themselves want to know where they stand collectively or individually against their peers elsewhere in the world, this kind of data can help.
Country-level indicators or benchmarks:
The OECD produces a set of Entrepreneurship Indicators on a regular basis, with the most recent released in June 2012. These indicators aggregate and, to a certain extent, normalize data from OECD-member countries that is related to entrepreneurship, business startup and exit rates, high-growth and gazelle firm rates, and other special topics. Unfortunately, the data is not free, it does not always derive from the same time period, and it may not be collected in the same way across various jurisdictions.
Global Entrepreneurship Monitor (GEM) is a multi-year, multi-country study of entrepreneurship. Unlike other studies, it explores the entrepreneurial activity, attitudes and aspirations of individuals. Canada last participated in the study in 2003 but recently rejoined in 2011. Because this study does not measure startups per se and recent Canadian data is lacking, GEM’s value is limited as a benchmark data source.
Other benchmark data sources:
The Kauffman Firm Survey is the world’s largest longitudinal study of new businesses. The survey is a panel study of about 4,900 new businesses based in the US and started in 2004. With over six years’ worth of data now collected, the survey offers insights into strategy, operations, employment and founders themselves over the early years of a startup. Because the study contains high-tech sample strata, its data is particularly relevant as a potential benchmark.
SME Benchmarking Tool is a free Industry Canada resource that offers financial performance data for planning and benchmarking purposes. It allows Canadian startups to see industry-specific income and balance statement data based on industry averages. The tool does not, however, allow users to filter results by firm age, or to access the underlying complete data set for reuse or additional analysis. The age of the data itself (2008) is also a concern.
Startup Compass was launched in 2011 as a benchmarking and normative data set for Internet startups. It offers key performance indicators on topics ranging from teams to customer acquisition to time spent in various development stages, based on performance data supplied by startups. Startup Compass is interesting in that it is both descriptive and prescriptive; it not only provides benchmarks, it also attempts to identify red flags that signal developmental challenges or risk of failure. Startup Compass offers value to individual startups in the online space, but the complete underlying data is not available for further reuse or analysis and is limited, of course, by its single industry focus.
Here’s a quick look at all of the data sets we identified, in one easy chart:
Did we miss anything? Are there other sources we should consider?
If you are interested in working with us to address this knowledge gap, send us an email.
1. Data Catalyst’s working definition of a “startup” starts with Steve Blank’s own definition: “an organization created to search for a repeatable and scalable business model.” Because we are interested in innovation, we are limiting our scope further to startups in knowledge-based and/or R&D intensive industries. These sectors encompass life sciences, clean technology, advanced manufacturing, and information and communications technology. Startups are also typically young companies; Data Catalyst is focused on generating insights into Ontario companies five years old or less.
2. High-growth firms can be defined in a number of ways but we are using the OECD definition as our benchmark: “enterprises with average annualized growth in employees (or in turnover) greater than 20% a year, over a three-year period, and with ten or more employees at the beginning of the observation period.” (OECD: Measuring entrepreneurship: A collection of indicators, 2009)
3. CB Insights offers select data for free.
4. The OECD defines “gazelles” as “high-growth enterprises born five years or less before the end of the three-year observation period. In other words, measured in terms of employment (or of turnover) gazelles are enterprises which have been employers for a period of up to five years, with average annualized growth in employees (or in turnover) greater than 20% a year over a three-year period and with ten or more employees at the beginning of the observation period.” (OECD: Measuring entrepreneurship: A collection of indicators, 2009)
Helen Kula is the former Manager of Data Product at MaRS Data Catalyst. While with Data Catalyst, she worked on acquiring, evaluating and linking data to build products that generate insight into Ontario’s startups and the province’s innovation economy. See more…