Context Language Validation: The Second Lifecycle for AI in Regulated Systems

|
content-image

The assignment is the kind the team takes on gladly: develop software for a robotic device that cultures cells. The system controls movement and timing, adjusting critical process parameters based on what it sees. An AI model reads the live cell population against a controlled image bank. The promise: more reliable batches for advanced therapies, for cancer and rare diseases, at lower cost.

Today the work is for research. Later, clinical-material production, then commercial production at points of care, buildable now because of AI.

But this is a computerized system first, and regulated systems must be validated and maintained under quality expectations. In the US, 21 CFR Part 11 governs electronic records and electronic signatures. In Europe, EU GMP Annex 11 covers computerized systems used in GMP activities. GAMP 5 is not a regulation but the industry’s framework for risk-based validation of such systems. FDA’s Computer Software Assurance (CSA) favors risk-based assurance over documentation for its own sake.

The foundation holds: the system must perform as intended and stay controlled across its life. That is Computer System Validation (CSV): a staged lifecycle that qualifies the system and maintains its validated state under change control and review. CSV is essential. But for AI, it is not enough.

CSV is strongest within defined boundaries, where the input format is part of the control: fixed fields, set values, controlled configurations. The AI part breaks that. Its input is not a form or a value but a picture of living cells, and what the system does next depends on how that picture is read.

A camera images the cells through the bag’s clear wall. The AI compares what it sees to a controlled bank of reference images: healthy, early stress, abnormal, unknown. It acts: adjusting movement, holding a timing, or flagging a culture. The seeing is not new. Label-free imaging already monitors cells without stains or dyes. The hard part is closing the loop: letting what it sees decide what it does.

Before it moves a parameter, the model leans on more than software. It leans on the image bank, built and labeled by people. On the criteria behind it, someone’s judgment of what each state looks like. On the instructions for how to weigh what it sees: which signals count, how much, and how sure to be before acting. That is not the software. That is context, and context shapes every call the system makes.

And context does not hold still. As the work matures, the bank grows: new cell shapes, sharper labels, criteria tuned by each run. What the model learns from in month one is not what it should rely on in month twelve. A fixed bank would be safe and useless. A learning bank is valuable, but only if its learning is governed.

CSV controls the computerized system, but not the lifecycle of a reference set meant to improve. That gap needs its own lifecycle: Context Language Validation, or CLV.

Context language is more than the inputs a model reads. It is how they are expressed so the model reads them as meant. Two teams can hold the same correct image and describe it in opposite ways. One writes “abnormal.” The other writes “early-stage stress, treat as needs-review.” Same picture. Different context language. Different behavior from the model.

So correctness is only the floor. Good context language meets five conditions:

  • Correct: the input is accurate.
  • Clear: it carries one meaning, not several.
  • Bounded: it says what the thing is and is not, including look-alikes at its edges.
  • Consistent: the same condition is described the same way everywhere.
  • Calibrated: it tells the model how sure to be, and when to hand the call to a human.

CLV is the discipline that holds context to those conditions. CSV’s validated state shows the system performs within its range. CLV’s shows something more: that the context meets those conditions, and that the locked version performs at or above its qualified standard, with no unacceptable regression against predefined acceptance criteria, risk controls, and intended use.

The CLV lifecycle takes CSV’s shape and points its stages at the context, not the machine: Define, Lock, Apply, Monitor, Review. They run like familiar controls, but only to a point.

Define names the context as a controlled set: the image bank, its labels and criteria, the weighing instructions, the dashboards, and the provenance behind each element. It is where the team decides what counts as healthy or stressed, what may move a parameter, and when to ask a human.

Lock approves and freezes each version of what the machine is told. When a scientist refines a label, that is a proposed next version routed through approval, not a live edit.

If the AI function is reviewed by FDA as part of an AI-enabled device software function, a Predetermined Change Control Plan may define approved boundaries for certain iterative modifications. In other settings, the mechanism may be change control, model governance, data governance, and performance monitoring. The pathway may differ, but the principle is the same: learning cannot be uncontrolled.

Apply guarantees that only the locked context runs in production. Obvious, and often where failure begins: the easy thing is to point at the latest bank, not the approved one.

Monitor shows what the model is doing: which images it matched, which it flagged, and where cultures stopped resembling anything in the image bank, a sign the process moved somewhere the model was never taught to see. The system should not pretend to know. It should surface the doubt.

Up to here, CLV runs parallel to CSV: same approvals, audit trail, and change control. Two lifecycles, one quality umbrella. Then comes the stage with no true CSV twin.

Review, in CLV, is not periodic. It is continuous. Every run produces signal, evidence for the next locked version. If a change stays within an approved modification plan, predefined acceptance criteria, and the applicable change-control pathway, it deploys without reopening the validation. Outside those bounds, it becomes a larger change. The shapes the model kept missing get added, the labels that predicted nothing retired, the criteria sharpened, run after run, under human control.

That is the real difference, and it is speed. CSV is the slower cycle; CLV is the faster one. One without the other fails: a perfect machine fed drifting context produces drift; sharp context on an unqualified machine produces nothing you can trust. So the audit trail stops chasing drift and starts running ahead of it.

None of this is free. CLV is continuous work, and it needs people, in a new place in the process, and in quality assurance. AI did not remove the humans here. It moved them out of repetitive execution into CLV, where judgment now lives.

The reason is not efficiency. It is the outcome: healthier cultures, and more therapies that reach patients, but only if people govern what it learns. The software can be perfect. If the context drifts, the output drifts. If the context cannot improve, neither can the system.

CSV is the lifecycle around the machine. CLV is the lifecycle around what the machine is told. One holds the system steady. The other keeps it learning, under control.

That’s the Minerva Way.