Data, Insights, Knowledge

“There's no library where you can go and find good solutions to scale. Information is spread out. And it's often very hard to identify a solution, especially when there's not enough aggregated data to look at and compare projects in say Congo and South Africa. We don't have data sets to compare like for like. This makes it harder to get to scale.” - interviewee

That observation was made in 2022. It was accurate then. It is less accurate now - and the change is worth accounting for honestly.

From scarcity to uneven access

The original analysis described the knowledge ecosystem for African scaling ventures as characterised by deep and consequential data scarcity. The empirical evidence base was thin, fragmented, and predominantly focused on funding rounds rather than operational dynamics.

Several years of subsequent work have changed the landscape. State of Startup Innovation reports now exist for Kenya, Ethiopia, and Rwanda, mapping venture valuations, investment flows, and employment data at a level of granularity that did not previously exist. Scale diagnostics, data foundations studies, GESI research, ecosystem voices, venture case studies - the library our 2022 interviewee said did not exist now has at least one substantial shelf.

But the architecture of the problem has changed more than the problem itself. The constraint has shifted from the absence of data to three related and persistent challenges: the geographic concentration of available data in East Africa while West Africa remains largely uncharted; the gap between data generation and data use - evidence exists that practitioners, investors, and policymakers do not consistently access or act on; and the absence of institutional infrastructure that would make data collection, curation, and dissemination a sustained and self-improving function rather than a project-by-project activity.

The analytical frame for thinking about this kind of structural problem sits in the knowledge-commons literature. Yochai Benkler's The Wealth of Networks - the foundational contemporary treatment of commons-based knowledge production - names the two distinct production economies: market production through proprietary firms, and commons production through coordinated voluntary contribution. Both can produce knowledge; they produce categorically different kinds of knowledge with different distributional consequences. Hess and Ostrom's Understanding Knowledge as a Commons extends Ostrom's commons-governance framework - the empirical work cited in Feedback Loops on collective-action problems - to knowledge resources specifically. The central finding: knowledge commons are sustained by institutional design, not by goodwill. Where the design principles are absent, contributions decline, free-riding rises, and the commons fragments back into proprietary silos. The African scaling ecosystem's knowledge problem is not a data-availability problem in 2026. It is a knowledge-commons governance problem - and the institutional design that would sustain a functioning commons has not been built.

What has been built at the continental level - and its limits

The most significant attempt to build continental data infrastructure is the Africa Entrepreneurial Ecosystem Index (AEEI), launched in April 2024. Produced by a consortium of Utrecht University, the Allan Gray Centre for Africa Entrepreneurship at Stellenbosch University, and i4Policy, the AEEI measures ecosystem quality across 29 African countries using 21 indicators across seven dimensions: governance, culture, infrastructure, finance, support, human capital, and market access. It is the first measurement tool calibrated to African ecosystem realities rather than adapted from OECD frameworks. The AEEI represents genuine progress. A 2026 critique in the Journal of Technology Transfer by Wim Naudé identified conceptual, methodological, and empirical flaws; the AEEI's own documentation describes the first public version as a "conversation starter and minimum viable product." The AGCAE, launched at Stellenbosch in 2024 with Allan and Gill Gray Philanthropies, has a broader remit: building a "Pan-African Data Hub of hubs" and integrating entrepreneurial ecosystem data across the continent.

These are the most substantive institutional efforts at continental ecosystem knowledge infrastructure since the 2022 analysis. They matter. But they are ecosystem-condition indices - measuring the structural environment for entrepreneurship broadly, not the operational dynamics of scaling ventures specifically. The AEEI tells you about the quality of a country's finance conditions, governance environment, and cultural attitudes toward entrepreneurship. It does not tell you what happens inside ventures navigating the transition from startup to scale, what management decisions drive or stall that transition, or why ventures in similar structural environments produce radically different outcomes. That operational granularity - the scaling-specific data the 2022 analysis called for - remains almost entirely absent.

GEM (Global Entrepreneurship Monitor), Startup Genome, and a number of commercial providers track entrepreneurial activity and funding flows with valuable granularity. None is the equivalent of the East African Data Collaborative (EADC's) country-level work on how ventures actually scale, fail, and navigate structural constraints.

The "data hub of hubs" framing AGCAE has adopted points toward a specific institutional architecture that the contemporary data-governance literature has named explicitly. Delacroix and Lawrence's International Data Privacy Law paper on data trusts, and the Open Data Institute's data-trust framework, describe a governance form in which an independent fiduciary holds and stewards data on behalf of contributors who would not otherwise share with each other directly. Data trusts solve the cooperation problem the African ecosystem-data field consistently runs into: ventures, ESOs, and investors generate data they will not share laterally because they do not trust each other, and that flows vertically only to funders for compliance reporting. A trust architecture - with fiduciary obligations, transparent governance, and contributor benefit guaranteed by structural design rather than goodwill - is the institutional form that could plausibly resolve the cooperation failure. The AGCAE call is conceptually pointing toward this. The architectural question is whether the commitment to a fiduciary structure follows the rhetorical commitment to a hub of hubs.

Collaboration is becoming more normalised across the ecosystem. But institutional coordination remains weak and data collaboration practices remain rare. Siloed behaviours persist. Academic institutions continue to default to competition over collaboration. The benchmarked data that would allow ecosystem actors to understand how their organisations should be running, and that would allow scaling programmes to be evaluated for the benefit of investors, donors, and corporate sponsors, still does not flow freely through the ecosystem.

Entrepreneurs are required to collect donor-specified data that primarily serves funder reporting requirements, diverting scarce time and resources from strategic reflection. The extractive data relationship - where ESOs and ventures generate data that flows upward to funders rather than sideways to peers and back to the ventures themselves - is the operational form of the Loop 1 dynamic treated in Feedback Loops. The data architecture is captured by the same incentives that produce the programme-rich, capability-thin equilibrium. Solving the data problem is not separable from solving the donor-architecture problem.

What has changed is the proof of concept. The EADC demonstrated that a data collaborative model - bringing together governments, investors, ecosystem actors, and research institutions around a shared data infrastructure - is operationally achievable. The AGCAE's call for a "data hub of hubs" is the most explicit acknowledgement yet that this proof of concept needs to be scaled. The challenge is not architectural but political and financial: sustaining the commitment to share data, and sustaining the funding to maintain the infrastructure, beyond the lifecycle of an individual grant-funded programme.

Paucity of scaling data: partially addressed, structurally unsolved

The phenomenon of commercial scaling at the granular level - operational dynamics, management practices, failure patterns, scaling trajectory determinants - remains under-researched across most of the continent. The EADC has produced country-level data for Kenya, Ethiopia, and Rwanda. Nigeria - Africa's largest startup ecosystem - has no equivalent. West Africa broadly, and francophone markets specifically, remain almost entirely uncharted at this level of analytical depth.

The structural reason for this paucity sits in the distinction the knowledge-management literature draws between codified and tacit knowledge. Polanyi's The Tacit Dimension - the foundational treatment - established that human knowledge has two distinct forms: explicit knowledge that can be codified and transmitted through documentation, and tacit knowledge that resides in operational experience and can be transferred only through direct engagement. Nonaka and Takeuchi's The Knowledge-Creating Company operationalised this for firm-level knowledge production: organisations that systematically convert tacit knowledge into codified form, through structured documentation and case work, accumulate institutional capability that organisations relying on tacit transmission alone do not.

The implication for African scaling-ecosystem knowledge is direct. The expensive, hard-won lessons of how ventures actually scale in African operating conditions sit in the tacit knowledge of scaled founders, experienced operators, and the small number of investors who have navigated multiple cycles. That knowledge does not flow to next-generation founders by ambient diffusion. It requires active codification - case studies, structured interviews, comparative analysis, longitudinal tracking - to enter the institutional knowledge base. Six EADC venture case studies across three markets are a starting point at exactly this codification work. The ecosystem needs hundreds of cases to make the codification function structural rather than illustrative.

Without sustained institutional effort, it remains very hard to bring together the most useful data for understanding problems and how to solve them. Data remains in private companies. Academic researchers lack the capacity or resources for continent-wide research, meta-analysis, or cross-country comparisons. The result is that policy interventions remain blunt, unverified, and frequently ineffectual. The substantive treatment of what to build sits in Recommendations and the African Scaleup Lab proposition; the implication for this section is that the case for an equivalent institution at the scaling-dynamics level has strengthened with every year of its absence.

The gap between data generation and data use

Even where data exists, it does not consistently reach the practitioners, investors, and policymakers who could act on it. The knowledge-translation literature names the structural mechanism. Lavis and colleagues' Milbank Quarterly paper on knowledge transfer in health policy - the foundational treatment in policy-research transfer - found that research evidence reaches practitioners reliably only where dedicated institutional infrastructure exists to translate research findings into accessible, actionable, contextually relevant guidance. Knowledge does not flow naturally from researchers to practitioners. Where the translation infrastructure is absent, even high-quality research routinely fails to influence practice.

The African scaling ecosystem has the same structural condition. Data is now being generated at materially greater quality and granularity than in 2022. The translation infrastructure that would route it to founders, investors, and policymakers in the form they actually need is essentially absent. Reports sit on websites. Findings appear in academic journals behind paywalls. Briefings circulate within donor networks. The pipeline that would convert research into the operational guidance founders and investors actually use does not exist as a sustained institutional function.

The DGGF systemic review identifies centralised data platforms and shared measurement infrastructure as one of four priority leverage points for the ESO field. The same logic applies at the ecosystem level. The institutional infrastructure to translate, curate, and route data to the actors who need it is the missing piece - and the most plausible candidates to build it are precisely the institutions whose donor-funded incentives reward report production over translation infrastructure. The structural condition is what produces the gap.

AI is changing the architecture of what a knowledge commons can be

It is still no one's job to orchestrate and curate knowledge, evidence, and data for the African scaling ecosystem. Activities remain largely reactive rather than strategic. What has changed is the technological infrastructure available to support a knowledge commons. In 2022, combining AI, machine learning, and distributed data systems into a living knowledge architecture was an emerging concept. In 2026, it is an operational possibility.

The substantive treatment of how AI changes ecosystem capability sits in What AI changes about African scaling; the implication for knowledge infrastructure is that AI tools now make it feasible to build a living, searchable repository of scaling knowledge that updates continuously; to deploy personalised knowledge to founders at the moment they need it, based on their sector, market, and stage; to surface patterns across the scaling ecosystem with a rigour and speed that manual analysis cannot match; and to translate knowledge across languages, making the insights of anglophone research accessible to francophone founders and vice versa.

The Hess-Ostrom commons framework names what AI specifically changes in this architecture. The historical objection to ambitious knowledge-commons projects was that the curation, translation, and pattern-recognition work required exceeded what voluntary contribution could sustainably produce. AI tools materially reduce that cost. The work that previously required dedicated human capacity - monitoring contribution flows, identifying gaps, surfacing emerging patterns, translating across formats and languages - can now be partially automated. The institutional design problem remains (who governs the commons, who funds its infrastructure, who has fiduciary responsibility to contributors), but the operational cost has fallen materially. The technology is not the constraint. The institutional will to build the infrastructure is.

Few lessons learned are captured and diffused

Six EADC venture case studies across three markets are a starting point, not a learning infrastructure. The ecosystem needs hundreds of cases - across all major markets, sectors, and scaling trajectories, including failure cases, which remain almost entirely absent from the public record. The failure stigma persists. In most African cultures, failure is largely stigmatised.

"There's very strong negativity around failure. So when people fail, often they don't share the underlying story." - interviewee

The structural mechanic operating here has been studied directly in the entrepreneurship-failure literature. Cope's Journal of Business Venturing paper on learning from failure identifies the learning that follows entrepreneurial failure as substantial, multi-dimensional, and high-quality - for those who experience it. The learning rarely diffuses beyond the founder personally. Shepherd's Academy of Management Review work on grief and learning from failure names why: the emotional cost of failure produces grief responses that delay and constrain the learning process, and the social cost of disclosure further constrains the willingness to share lessons publicly. Eggers and Song's Academy of Management Journal paper on serial entrepreneurship extends the empirical analysis: founders who attribute their failure internally - recognising that their decisions contributed - produce better subsequent ventures than founders who attribute externally to economy, government, or fate. The internal-attribution group, which generates the highest learning value, is also the group most exposed to the social cost of disclosure. This is the mechanism by which the most valuable lessons are systematically the least likely to be shared.

The African context compounds this. There is no safety net, and the human cost of failure can be far higher than in markets where failure is normalised or celebrated. The stigma is not irrational; it is a rational response to the consequences. But the consequence at the ecosystem level is that the cases that would teach the next generation of founders the most - internal-attribution failure cases from operators who know exactly what went wrong - are precisely the cases most likely to remain undocumented.

Mercy Corps Ventures' annual impact reporting models the alternative: systematically publishing failures and lessons alongside portfolio highlights, treating honest reflection on what went wrong as a core accountability function rather than a reputational risk. The structural innovation Mercy Corps has demonstrated is institutional: failure reporting becomes safe to produce when it is required rather than discretionary, when it is aggregated rather than individualised, and when the institution that publishes it has explicitly committed to using it as learning input rather than reputational ammunition. This is what scaled failure-learning infrastructure would look like at the ecosystem level. None currently exists.

Public dialogue models

The UNDP Accelerator Labs and AfriLabs Africa Innovation Policy Dialogues series, launched in 2021, represents a helpful start. But it functions more as a broadcast model than a genuinely interactive one. Pathways that cultivate and sustain functional working relationships between those who build solutions and those who regulate them are a structural need, not a series of events.

The deeper requirement is shifting from a mindset of knowing - where funders and intermediaries extract data to report what they already believe - to a mindset of learning, where diverse stakeholders co-create metrics and generate insights together. The WDI Participatory Methods Community of Practice, co-led with LEAP Africa, is one active attempt at the practical infrastructure for this shift.

The Ecosystem Provocations series that Systemic Innovation has published since 2022 - covering the VC landscape, ecosystem collaboration, and innovation infrastructure - has been one attempt to model the kind of evidence-to-practice dialogue the ecosystem needs: grounded in data, provocative in framing, and designed to generate debate rather than merely inform. More of this is needed from more actors.

The deeper structural question the dialogue formats do not yet resolve is the one the Hess-Ostrom commons framework names. Sustained knowledge production requires sustained institutional commitment, governance arrangements that protect contributors, and benefit structures that reward contribution. Dialogue events provide visibility but not infrastructure. The infrastructure is what the African scaling ecosystem still does not have, and what the next phase of knowledge-commons building will either deliver or fail to deliver.

Next Section