Part 3 of a series. The first post diagnosed the gap between what language models can produce when appropriately directed and what they deliver by default. The second post described the reasoning architecture we have built to address it. This post draws out the implications for public-sector capacity building in contexts where experienced practitioners are scarce. A fourth post shows the mechanisms in concrete paired cases.
State capability in developing-country and public-sector contexts is constrained less by access to technical knowledge than by the scarcity of practitioners whose judgement translates knowledge into effective action. Policy templates, institutional models, regulatory frameworks, and technical assistance guidance have been available in quantity for decades. What is scarcer, and what determines whether any of those resources gets used well, is the accumulated professional judgement that lets someone recognise which template fits this ministry's actual conditions, which framework's assumptions hold here and which do not, which model in the literature addresses this specific kind of institutional problem rather than one that looks superficially similar.
The previous posts in this series described a structural failure mode in AI-assisted knowledge work – the intent deficit – and an architecture for addressing it. The argument of this post is that the same architecture, if designed with deliberate attention to what it externalises, has direct implications for the scarce-judgement problem that sits at the centre of state capability research. An AI scaffold designed only to produce better outputs is one thing. An AI scaffold designed to externalise expert reasoning in a form that the practitioners working alongside it can absorb is potentially something else: a piece of capacity infrastructure for contexts where the traditional mechanism of building judgement – extended apprenticeship to an experienced practitioner – is not available at the scale that would be required.
This is a proposition, not a demonstrated finding. The scaffold architecture exists and works for the first purpose; the second use has theoretical grounding but has not been built to specification and has not been evaluated. What follows is the case for why the extension is coherent, what it would require in practice, and what it would not substitute for.
Andrews, Pritchett, and Woolcock's research on state capability and the capability trap established that the gap between capable and non-capable states is not primarily a gap in access to institutional knowledge. The forms of capable institutions – org charts, procedure manuals, regulatory frameworks, reform templates – are available. They have been distributed by development organisations, consultants, and peer governments for more than half a century. What they document is a persistent failure mode in which governments adopt the surface forms of capable institutions without developing the underlying function those forms require. They call the pattern isomorphic mimicry. The forms are present; the judgement that would make them operational is not.
Their analysis locates the operative shortage not in technical knowledge but in the practitioners who can apply it – the people who can look at the specific conditions of a specific ministry and make the design choices that follow from those conditions, rather than inheriting the choices a template made for someone else. Lin and Monga's analysis of African development failure reaches the same diagnosis from inside the policy process. Every governance template, institutional model, and development strategy that arrives as external prescription carries embedded choices – about what problems matter, what sequencing makes sense, what administrative structures are assumed to exist, what economic and cultural conditions are taken as given. Those choices were made for someone else's context. The template looks complete because for the conditions it was designed for it is complete: the choices are appropriate, the assumptions hold, form and function align. Transplanted without adaptation, the form arrives with all those embedded choices already made, which for the new context may be wrong in ways that are not visible until implementation fails.
Banerjee and Duflo's work on development interventions finds repeatedly that technical assistance transfers documents and models, not capability. The government that received the policy blueprint is not thereby equipped to implement it, and the question of who has the judgement to make the implementation work remains open. Pritchett and colleagues estimate, from cross-country administrative capacity evidence, that some low-capability countries at their observed rates of improvement would take centuries to reach moderate-capability levels. This is not a metaphor; it is a projection from data. The gap is not closing through the continuing transfer of forms, because the bottleneck is not form-related.
Seen through this lens, the connection to the session-level failure mode described in the first post of this series is structural rather than merely illustrative. The intent deficit in AI knowledge work and isomorphic mimicry at the institutional scale share a common formal pattern – pattern-completion on surface representations in the absence of reasoning from underlying function. A model producing a policy brief with the expected structure without the reasoning about why this specific audience needs this specific structure is doing, at a different scale, what a ministry with the expected org chart but without the administrative judgement to operate within it is doing. The legitimacy signal – the structure, the chart, the template – is produced; the function is absent; the signal's presence actively inhibits recognition of the absence, because outputs that look like capable work successfully camouflage the reasoning that is not there.
What matters for the capacity-building argument is not whether these are the same mechanism operating at different scales or structurally analogous mechanisms with a shared formal description. What matters is that the diagnostic is the same and the architectural response has the same shape. In both cases, the surface form is cheap; the reasoning from function is the scarce thing; more forms do not close the gap; the only thing that closes the gap is conditions under which reasoning from purpose becomes the default operation rather than an occasional intervention.
A natural response to a judgement shortage, given the availability of AI tools that appear to provide expert-level knowledge on demand, is to propose that AI supplies what local practitioners lack. This framing is mistaken in two distinct ways, and understanding both is necessary for thinking about what deliberate capacity design would actually require.
The first concerns skill development. Evidence from structured studies of AI use in learning contexts finds that unstructured AI access does not build analytical capacity; in some conditions it actively degrades it. Bastani and colleagues' 2025 study on AI use in high-school mathematics found that students with unrestricted AI access improved their immediate performance substantially but showed worse performance than a control group when AI access was subsequently removed. The mechanism, the authors argue, is substitution rather than development: the AI did the work the student would otherwise have done, and the student's analytical capacity did not develop because the developmental load had been offloaded. The finding is specific and important: the harm was not from AI access itself but from the absence of structure that would have required the learner to engage rather than receive. Design determines whether AI interaction builds capacity or bypasses the reasoning it is meant to support. An AI deployment that answers questions is not the same as one that develops the questioner.
The second limitation is more immediate and concerns outputs rather than learning. AI can automate portions of documents, plans, and analyses, and can supply technical knowledge that no local practitioner may possess. But outputs without judgement to evaluate, direct, and select among them are of severely constrained utility. The model that produces a policy analysis does not know whether the framing suits this ministry's political context, whether the evidence is weighted correctly for this specific audience's prior beliefs, or whether the recommendation is implementable given constraints the model cannot fully see. These are judgement calls that require a practitioner who has the analytical capacity to engage critically with what the model produces rather than accept it as oracle output. Expanding AI access in low-capacity contexts without addressing the judgement question does not replace the missing practitioner layer. It produces outputs that require a practitioner layer to be usable, which is the same problem transposed to a different point in the workflow.
AI amplifies judgement; it does not generate it. A context rich in judgement can use AI to scale; a context short on judgement uses AI and gets outputs that are unusable at scale without the judgement that is not there. The argument here is not against AI deployment in low-capacity contexts but for being explicit about what deployment is supposed to accomplish, and for designing deployments with that goal in view, rather than assuming that access alone will do the work.
What follows is a proposition, not a demonstrated finding. No systematic evaluation exists of a scaffold of the kind described in the previous post deployed with deliberate capacity-transfer design. No such deployment has been built to this specification to our knowledge. What exists is theoretical grounding sufficient to make the proposition coherent, and a research direction worth pursuing.
The proposition rests on a body of learning-science work that identifies the mechanism by which expert judgement transfers when it does. Collins, Brown, and Newman's cognitive apprenticeship framework is the foundational account. Their analysis of why traditional schooling fails to build expertise at the rate that genuine apprenticeship does is that it fails not because the knowledge is unavailable but because expert reasoning is invisible. The student sees the problem and the answer, not the process connecting them. In traditional craft apprenticeship, the master's reasoning was externalised in observable action: the apprentice watched the master work, heard the master's thinking in real time as problems were addressed, and absorbed not only the craft's techniques but the underlying judgement about when and why to use them. The reasoning was not a separate curriculum; it was built into the apprenticeship's structure. Apprenticeship in craft domains worked as a mechanism of expertise transfer because the master's reasoning was available for observation.
Vygotsky's zone of proximal development, developed in learning science decades earlier, defines the region in which this kind of transfer occurs – between what a learner can accomplish independently and what they can accomplish with appropriate support. In that zone, given genuine challenge and structured assistance, capacity develops; outside it, in either direction, it does not. The apprentice working on real problems under the master's visible reasoning is in that zone. The student solving pre-digested textbook problems is typically not.
A scaffold that makes AI reasoning visible – through explicit reasoning traces, through required framing artifacts, through per-decision checkpoints presented to the user, through the typology and scratchpad produced as visible outputs – externalises reasoning that would otherwise be hidden. A practitioner working alongside such a scaffold who reads how the model constructs audience models, interrogates framing, and checks each element against purpose is, in the terms Collins and colleagues describe, observing expert reasoning in real time through a task they are also performing. They are in the zone of proximal development with respect to that reasoning: the task is real, the reasoning required is genuine, the AI's structured engagement provides support that makes the work achievable without making it trivial.
This is the mechanism that cognitive apprenticeship identifies as the primary vehicle of expert judgement transfer. It is what traditional craft apprenticeship operationalised physically; it is what the schooling model has struggled to reproduce because classrooms separate the work of reasoning from the product that reasoning produces; it is what structured engagement with a reasoning-visible scaffold could, in principle, reintroduce.
The two-way correction loop matters here. The practitioner reading the scaffold's reasoning is not a passive observer. They have knowledge of the specific context – this ministry, this political environment, this decision-maker – that the model does not and cannot have. When they push back on a framing that does not match what they know about the context, the model's subsequent reasoning incorporates that correction. The practitioner who reads how the model adjusts is simultaneously improving the immediate output and developing the judgement required to push back more effectively next time. The judgement that accumulates across repeated interactions is the capacity the development sector has been unable to build at scale through information transfer and institutional-form transplantation.
One way to see what the scaffold changes, in capacity-building terms, is to notice that the relationship between a practitioner and a scaffold-running AI has a natural developmental arc that mirrors the arc of a junior practitioner becoming a senior one.
Early in deployment, both the AI and the practitioner are working under the scaffold's discipline in the way junior practitioners work under a senior supervisor's direction. The scaffold supplies what neither has fully internalised: the AI lacks accumulated judgement; the junior practitioner has not yet accumulated their own. The scaffold's framing check, per-decision checkpoints, and reasoning-visibility requirements provide the discipline that a senior supervisor would normally supply to a junior practitioner at project start and throughout execution. In this phase, the AI contributes breadth: its reasoning traces make visible not only the reasoning process but the knowledge the model draws on – frameworks, applicable research, technical constraints, domain conventions – in the context of a real task where their relevance is shown rather than stated. The practitioner who reads how the model reaches for a particular theoretical framework, identifies a specific failure mode, or applies a visual design principle to a concrete brief is absorbing knowledge in the mode that cognitive apprenticeship theory identifies as most effective: observing its application on real work, rather than receiving it as abstracted instruction.
Over time, as judgement develops, the practitioner's relationship to the scaffold shifts naturally. They push back more confidently, override with better grounds, correct the model's framings with increasing precision. The feedback they give becomes more specific, and therefore more informative to the session's subsequent work. Eventually they are operating more like the supervisor – delegating to the AI, setting direction, reviewing outputs against their own judgement rather than following the scaffold's prompts – while continuing to use the reasoning traces to track what the model is doing and to catch the cases where its generation has drifted from the task's actual requirements.
The scaffold itself does not recede as the practitioner develops. What changes is who is exercising judgement within it. Early, the scaffold and the AI jointly supply judgement the practitioner does not yet have; late, the practitioner supplies judgement to steer both the scaffold and the AI. The scaffold remains useful at the later stage for a different reason: it maintains reasoning discipline on the AI side regardless of the practitioner's current level of engagement, which means the practitioner can delegate with more confidence and review more efficiently than they could if the AI were generating under default conditions. This is a different pattern than either pure automation or pure replacement of human practitioners. It is a pattern in which the practitioner and the AI are both operating within a structure that supports reasoning from purpose, and the practitioner's role evolves from co-learner to supervisor as their judgement develops.
The scaffold as we have built it is designed to maintain the model's reasoning quality, not to develop the practitioner's. A tool built with capacity transfer in view would require additional design choices, each of which is absent or optional in the current version.
Reasoning visibility would need to be mandatory, not configurable. The value of visible reasoning for capacity transfer depends on practitioners actually reading it, which requires it to be present by default and legible rather than hidden behind settings most users will never find. A tool where reasoning traces are on by default and prominent in the interface produces substantially more of the capacity-building effect than the same tool where reasoning is available only if the user has configured it.
The structure would need to enforce critical engagement rather than allow passive reception. The framing check as currently implemented requires the model to produce a framing for the user to confirm or correct. A capacity-oriented design would require the practitioner to produce their own audience model before seeing the model's, so that the comparison is between two specific positions rather than between a blank and a provided answer. Prompts for evaluation and contestation – is this audience model right for this ministry? does this framing reflect what you know about how this decision-maker thinks? – would be built into the workflow as required steps rather than as optional interventions. This is the mechanism through which capacity transfer occurs rather than passive receipt of outputs, and it does not happen unless the design requires it.
Institutional context references would need to accumulate locally. The domain reference architecture described in the previous post is designed to be extended. A capacity-building deployment in a ministry or an agency would benefit from references that are not generic to the domain but specific to the institution – its actual document formats, its historical patterns of implementation failure, the audiences it repeatedly addresses, the framings that have worked and the framings that have not. A deployment that builds these references over time, drawing on the institution's actual correction history, develops an asset whose value accumulates and which is not easily transferable to another context because its value comes from the specificity of what it captures. This is the analogue, for AI deployment, of the institutional memory that capacity research repeatedly identifies as the shortage most binding on state performance.
Complementary human mentorship would remain irreplaceable. The process described here supplements in-person mentorship; it does not substitute for it. Periodic access to expert practitioners – through visits, training, conferences, structured engagements, peer exchanges – provides forms of context that daily AI interaction cannot: direct feedback on the practitioner's own reasoning rather than correction of AI outputs, exposure to tacit norms visible only in how experienced practitioners behave and interact, membership in a professional community with its own accumulated standards of what good work looks like. The two modes are complementary. Daily sustained engagement with a scaffolded model supplements the intermittent access to human expertise that low-capacity contexts can realistically provide; the occasional in-person engagement provides what the AI interaction cannot.
The right framing is not "AI replaces mentorship" but "AI mitigates the shortage of mentorship in the intervals between access to mentors, in a form that makes that access more valuable when it happens."
This is an argument that a deliberately-designed scaffold could be a useful piece of capacity infrastructure in contexts where the judgement gap is the binding constraint. It is not an argument that any current AI deployment is already serving this function, and it is not an argument that the existing scaffold, as built for output quality, will transfer into this role without modification. The specific design choices required for capacity transfer – mandatory reasoning visibility, required practitioner framing production before the model's, locally-accumulated institutional references, enforced critical engagement – are not yet implemented as defaults in any tool we know of. Building them is a tractable engineering problem. Evaluating whether they produce the capacity-transfer effect at a level that matters is a research problem that has not been undertaken.
The proposition is also narrower than a claim about AI in development generally. It is specific to a particular class of knowledge work – the kind where professional judgement about audience, context, and purpose is the load-bearing skill – and to a particular class of deployment – where the AI is a working partner whose reasoning is visible, not an oracle whose outputs are received. AI deployments that do not fit this pattern may have their own uses in public-sector contexts, but they are not what is under discussion. The specific mechanism by which a reasoning-visible scaffold could build judgement – externalising expert reasoning in a form that makes it observable during real work – is specific to a particular design pattern.
What we would want to see, as a research agenda, is paired-comparison evaluation of practitioners working on substantive knowledge work tasks: with a reasoning-visible scaffold, without it, and with alternative tool configurations. The outcome measures would not be immediate output quality alone, which standard AI benchmarks can already approximate, but the specifically capacity-building dimensions: whether practitioners in the scaffold condition, after repeated use, perform better on subsequent tasks without AI support than practitioners in the unstructured-access condition; whether they identify drift in AI reasoning more precisely; whether they correct with greater specificity; whether, over a period of six months or a year, measurable indicators of professional judgement have changed.
The design of such an evaluation is itself a non-trivial piece of work – quality in knowledge work is multidimensional and partly tacit, which makes automated proxy metrics inadequate. But the outlines are clear enough to build, and the evaluation would produce what this post cannot: evidence about whether the mechanism we have described theoretically produces the effect we have described in practice, under what conditions, and at what magnitude.
The isomorphic mimicry literature identified a structural problem in state capability that has resisted solution through decades of effort focused on form transfer. The structural analogy between that problem and the intent deficit in AI knowledge work is not incidental. Both are failures of surface pattern-completion in the absence of reasoning from function, operating at different scales. The architectural response in both cases has the same shape: conditions under which reasoning from purpose becomes the default operation rather than an occasional intervention.
What the AI case adds is a particular kind of mechanism – structured reasoning externalisation – that is harder to implement for human institutions than it is for AI deployments. A ministry cannot be asked to externalise its reasoning; an AI tool can be asked, and can be designed to do so by default. If the externalised reasoning is visible to the practitioners who work alongside the tool, and if the design enforces critical engagement rather than passive reception, then the tool is not only producing better outputs. It is also, potentially, making expert reasoning available for observation in a form that the traditional mechanisms of capacity development – information transfer, template distribution, periodic training – have not been able to match.
The claim is not that AI is a substitute for the slow patient work of building institutional capability, but that a specifically-designed AI scaffold could be one component of that work, doing something none of the other available components does well: externalising expert judgement during the performance of real tasks, in a form that practitioners working alongside it can observe, contest, correct, and absorb. The extent to which this matters in practice is an empirical question we do not yet have the evidence to answer. The theoretical grounding is coherent, the architecture required is tractable, and the implications for contexts where the practitioner shortage is the binding constraint are large enough to make the evaluation worth undertaking.
The scaffold described in the previous post was built to solve a quality problem in AI-assisted knowledge work. What this post has argued is that, with deliberate attention to what it externalises and how it requires practitioners to engage, the same architecture may have a second use that is harder to measure and larger in consequence than the first. Whether that second use can be realised in practice, and at what scale, depends on whether AI developers and development organisations treat the capacity-transfer potential as a design goal worth pursuing, rather than as a side effect that may or may not emerge from deployments designed for other purposes.