Article
AI-Assisted Scenario Generation Without Losing Determinism
Determinism was never the generator's job — it belongs to the artifact a non-deterministic author emits, and the whole discipline lives at the gate where a draft becomes a frozen, validated, replayable object.
TL;DR
The title states a worry that, drawn correctly, dissolves: you never needed the generator to be deterministic — you need the artifact it emits to be. AI-assisted scenario generation is a non-deterministic authoring process feeding a deterministic system of record, and the entire discipline lives at the seam where a probabilistic draft is frozen into a validated, versioned, replayable object. Determinism guarantees a scenario runs the same way every time; it says nothing about whether the scenario was worth admitting. That is a separate gate, and it is the one that actually protects the analysis.
Determinism Was Never the Generator's Job
The phrasing "without losing determinism" implies that adding AI threatens a property the simulation already had. It does not, provided the boundary is in the right place. A language model is non-deterministic by construction — and more thoroughly than most people assume. Even with sampling disabled, the same request can produce different output, because inference depends on how operations are batched and accumulated on the hardware underneath, and floating-point arithmetic is sensitive to that order. Chasing a bit-for-bit reproducible generator is fighting the wrong battle.
It is also unnecessary. The generator's output is not executed; it is materialized into an artifact, and that artifact is what runs. The generator can therefore be as exploratory as you want — high sampling temperature, model upgrades, broad search — without touching the determinism of the analysis, as long as nothing it produces reaches the simulation except through a frozen, validated object. The two common errors both come from misplacing the boundary: trying to force the model to be deterministic, which is the wrong target and brittle, or letting the variance of generation leak into evaluation, which is the actual failure. What must be deterministic is the artifact and its execution. What must stay free to vary is the exploration that produced it.
The Artifact Is the Unit of Record, Not the Prompt
A tempting shortcut is to treat the request as the thing of value: store the prompt, the intent, the context, and regenerate the scenario whenever it is needed. This quietly destroys reproducibility. Because models drift between versions and inference varies between calls, "the scenario" then has no stable identity — two analysts asking for the same thing receive different objects, and a result cited one quarter cannot be reconstructed the next.
The materialized scenario — the concrete entities, parameters, and structure the generator emitted — is the unit of record, not the request that produced it. That object is what gets versioned, hashed, reviewed, and run. The prompt and any retrieved context are worth keeping as provenance, an account of where the artifact came from, but they are not the source of truth and must never be the thing re-executed. The source of truth is the frozen output. Generation is how a candidate enters the world; it is not where the candidate lives.
Schema-Valid Is Not the Same as Admissible
The natural first defense is to validate generated output against a schema, and modern generation can even guarantee schema conformance as it decodes. This is necessary and badly insufficient. Forcing output to satisfy a structure guarantees shape, not sense: a model constrained to emit valid fields will, when it has no good answer, emit locally valid but meaningless ones. The scenario passes every structural check and still describes something the modeled world cannot contain — a sensor with a range its physics does not support, a track that violates its own kinematics, a force laydown that contradicts itself.
So the admission gate has to test more than form. It has to test admissibility: does this scenario describe a situation the simulation can legitimately represent, with internally consistent parameters and physically plausible relationships? That check is domain logic, not schema validation, and it is where generation quality is actually enforced. A structurally perfect scenario that fails admissibility is not a near-miss to be patched; it is exactly the kind of confident, well-formed error a generator produces, and the gate exists to stop it at the door rather than to clean up after it inside the analysis.
Plausible-but-Wrong Is the Specific Hazard of Generation at Scale
The characteristic output of a capable generator is fluent and plausible. That is precisely what makes invalid scenarios dangerous: they read as reasonable, survive a skim, and accumulate in the corpus as confident nonsense. Determinism offers no protection here — it only makes the same nonsense reproducible, which is worse than useless, because reproducibility lends the error the appearance of a stable finding.
Volume sharpens the problem. The reason to use generation at all is to produce more scenarios than a team could author by hand, which means no one is reading each one closely; the analysis now leans on the gate rather than on human attention. Grounding generation in a curated domain corpus reduces invention, and is worth doing, but it narrows the error rate rather than eliminating it — the gate still has to treat the generator as a confident draftsman and not an authority, and admit nothing on plausibility alone. The discipline scales inversely with trust: the more you generate, the less you can afford to believe any single output, and the more the admission gate carries the weight that line-by-line review once did.
Reproducing the Draft Is a Different Problem Than Reproducing the Run
There is a legitimate, separate question hiding inside all of this: can you reproduce the generation itself, for audit? You can record the model version, the retrieved context, and the sampling configuration — but the same hardware-level batching effects that make the generator non-deterministic mean a recorded call may not reproduce later, on different infrastructure or under different load. Treating "rerun the prompt to get the scenario back" as an audit mechanism is therefore unsafe.
The honest answer is to capture rather than to regenerate. Provenance records that this exact artifact was produced from this context by this model version — the output is preserved, not promised. This keeps the audit trail truthful about what is actually reproducible: the run, always, because the artifact is frozen and execution is deterministic; the draft, only insofar as its exact output was stored. Conflating the two — assuming a recorded prompt can stand in for a recorded scenario — is how reproducibility quietly turns into a story a system tells about itself rather than a property it has.
Where the Probabilistic Becomes the Permanent
The contribution of AI to scenario work is breadth and speed of authoring; the contribution of the deterministic core is that anything admitted behaves identically, indefinitely. Those are complementary, but only across a clean seam. The seam is the admission gate — the point where a probabilistic draft stops being a sample from a model's distribution and becomes a fixed, validated, versioned object the analysis can stand on. Get that seam right and a freely exploratory author safely feeds a defensible study. Get it wrong and the result is one of two failures: a brittle effort to tame a generator that was never meant to be tamed, or a growing library of reproducible mistakes.
The worry in the title is real, but it points the wrong way. Generation does not endanger determinism; it relocates the question. The discipline is not making the machine that writes scenarios behave predictably — it is deciding, deliberately and with domain logic, what a generated scenario must satisfy before it is allowed to become permanent.