Configuration¶
SIM-PANEL runs are configured with YAML files. A generation config specifies where panelist and product records come from, which exposure policy to use, how many periods to generate, how outcomes should be produced, and where outputs should be written.
The configuration layer is the bridge between static YAML and runtime objects: it loads records, validates required sections, wires policies and outcome models, and constructs the generator used by the CLI.
Required structure¶
A generation config has three required top-level sections:
panelistsproductspolicy
Optional top-level sections include:
generatorselectionexecutionoutcomes_modelquestionnairebackendoutput_dir
A small random-assignment config may look like this:
panelists:
source: examples/data/panelists.jsonl
variant: default
products:
source: examples/data/products.jsonl
variant: default
policy:
name: random
generator:
schema_version: "0.1.0"
seed: 42
n_periods: 3
validate_on_finish: true
max_errors: 50
event_namespace: minimal-random
outcomes_model:
name: deterministic
questionnaire:
outcomes:
fields:
rating:
type: int
choices: [1, 2, 3, 4, 5]
question: "Overall, how much do you like this product?"
traces:
fields:
review_text:
type: text
question: "Write a short review in 2–4 sentences."
output_dir: outputs/minimal_random
This configuration loads panelists and products from JSONL files, assigns
products randomly, generates three periods, fills a deterministic questionnaire,
and records the intended output directory as outputs/minimal_random.
User-facing YAML and runtime config¶
The public YAML layout keeps major concerns as top-level sections. During loading, SIM-PANEL normalizes these sections into runtime dataclasses.
Typical mapping:
YAML section |
Runtime role |
|---|---|
|
Persona records, selected persona-text variant, optional enrichment and panelist settings. |
|
Product records, selected display-text variant, optional enrichment. |
|
Exposure policy configuration. |
|
Run-level generation settings. |
|
Prompting and parsing behavior for self-selection. |
|
Generator-side operational rules applied after selection. |
|
Outcome model selection. |
|
Structured outcome and trace fields. |
|
Optional local or server-style chat backend. |
|
Intended output directory stored in the normalized run config. |
This separation matters: policies decide exposure, panelists may perform selection or evaluation, outcomes parse and validate questionnaire responses, and generators own event construction and validation.
panelists¶
The panelists section identifies the persona records used in a run.
panelists:
source: examples/data/panelists.jsonl
variant: default
Common fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Path to a panelist/persona JSONL file. |
|
No |
Persona-text variant to use. Defaults to |
|
No |
Optional persisted persona-text enrichment settings. |
|
No |
Optional settings passed to panelists during evaluation. |
|
No |
Optional settings passed to panelists during self-selection. |
Panelist records may contain structured attributes, persona text, or both. The
selected variant determines which rendered persona representation is used at
runtime.
Runtime panelists are built from the loaded records. Their structured attributes
are attached as identity features and may be emitted as panelist_features in
evaluation events.
products¶
The products section identifies the product or intervention records used in a
run.
products:
source: examples/data/products.jsonl
variant: default
Common fields:
Field |
Required |
Description |
|---|---|---|
|
Yes |
Path to a product JSONL file. |
|
No |
Product display-text variant to use. Defaults to |
|
No |
Optional persisted display-text enrichment settings. |
The selected product variant controls what the panelist sees during evaluation or self-selection.
Runtime products are built from the loaded records. Product attributes may be
emitted as product_features in events.
policy¶
The policy section controls exposure: how panelists and products are paired.
policy:
name: random
Supported policy names:
Policy |
Pairing mechanism |
|---|---|
|
Products are assigned to panelists exogenously. |
|
Product-panelist assignments are loaded from a user-provided schedule. |
|
Panelists choose products from a candidate set. |
Policies control exposure. Outcome generation is handled separately by the outcome model.
Random assignment¶
Random assignment is the simplest policy:
policy:
name: random
The generator uses the configured random seed to control exposure sampling and assignment shuffles. This mode is useful for controlled experiments, RCT-style baselines, and deterministic sanity checks.
Manual assignment¶
Manual assignment uses a file-backed schedule. This is useful for scripted experiments, designed interventions, or ablations.
policy:
name: manual
manual:
format: csv_long
path: examples/policies/manual_schedule.csv
For manual policy runs, policy.manual.format and policy.manual.path are
required.
Supported schedule formats are:
Format |
Description |
|---|---|
|
Long CSV schedule. |
|
JSON schedule. |
A long CSV schedule typically identifies the period, panelist, and product to expose:
t,panelist_id,product_id
0,panelist_001,product_001
0,panelist_002,product_003
1,panelist_001,product_002
The configuration loader validates the schedule against available panelist and product IDs. Invalid IDs should fail at configuration-load time rather than during generation.
Self-selection¶
Self-selection allows panelists to choose which products to interact with. This
mode introduces endogenous exposure and may emit both selection and
evaluation events.
A minimal self-selection policy uses policy.name: self_selection:
policy:
name: self_selection
Selection prompting and parsing behavior is configured separately under
selection, while generator-side execution rules are configured under
execution.rules.
selection¶
The optional selection section controls selection prompt rendering and response
parsing. It governs what the panelist is asked to return, not what the generator
ultimately executes.
selection:
allow_empty: true
include_features: true
require_json_only: true
max_selected_soft: null
include_raw_text: true
Fields:
Field |
Default |
Description |
|---|---|---|
|
|
Whether an empty selection is permissible after parsing. |
|
|
Whether product features are included in the selection prompt. |
|
|
Whether the prompt requires strict JSON-only output. |
|
|
Optional soft hint in the prompt; not a hard execution constraint. |
|
|
Whether to keep raw model text in the parsed selection result for debugging. |
|
|
Optional few-shot example used when |
Selection expects JSON only:
{"selected_product_ids": ["product_001", "product_003"], "traces": {"notes": "..."}}
The parsed result records the product IDs requested by the panelist. The generator may later apply execution rules to filter invalid IDs, enforce caps, or handle empty selections.
execution¶
The optional execution section controls generator-side operational rules for
self-selection runs. These are not panelist-facing constraints; they are applied
after selection parsing.
Execution rules are nested under execution.rules:
execution:
rules:
enforce_subset_of_choice_set: true
max_evals_per_panelist_per_t: null
allow_empty: true
keep_strategy: keep_first
Fields:
Field |
Default |
Description |
|---|---|---|
|
|
Drop selected product IDs that were not in the shown choice set. |
|
|
Optional cap on executed evaluations per panelist per period. |
|
|
Whether the generator may execute no evaluations after filtering. |
|
|
Strategy used when applying a cap. In v0, |
Selection is what the panelist requests. Execution is what the system actually evaluates.
generator¶
The optional generator section controls run-level generation behavior.
generator:
schema_version: "0.1.0"
seed: 42
n_periods: 5
validate_on_finish: true
max_errors: 50
include_panelist_features_in_events: true
include_product_features_in_events: true
include_product_features_in_selection_prompt: true
event_namespace: sim_panel.v0
max_workers: 1
prompting_strategy: persona
row_meta:
experiment: beer-demo
Fields:
Field |
Default |
Description |
|---|---|---|
|
|
Event schema version to emit. |
|
|
Random seed for deterministic non-LLM generation. |
|
|
Number of time periods to generate. |
|
|
Whether to validate rows after generation. |
|
|
Maximum validation errors to report before stopping. |
|
|
Whether evaluation events include panelist features. |
|
|
Whether events include product features. |
|
|
Whether product features may be forwarded into selection prompts. |
|
|
Namespace used for stable event-id generation. |
|
|
Number of concurrent decision workers. |
|
|
Prompting strategy. Supported values include |
|
|
Small metadata dictionary merged into each emitted row. |
For reproducibility, keep seed, schema_version, input files, policy settings,
questionnaire settings, and backend settings fixed across runs.
outcomes_model¶
The optional outcomes_model section controls how product evaluations are
converted into structured outcomes and traces.
SIM-PANEL supports deterministic and LLM-backed outcome models.
A deterministic model is suitable for tests, CI, and CPU-only pipeline debugging:
outcomes_model:
name: deterministic
The deterministic outcome model fills the questionnaire using stable hashes of
panelist_id, product_id, and t.
LLM-backed evaluation uses the configured panelist backend:
outcomes_model:
name: llm
backend:
name: ollama
model: qwen2.5:7b
If outcomes_model.name is llm, a top-level backend section is required.
LLM-backed outcomes may not be exactly reproducible unless the backend, model version, prompts, decoding parameters, and runtime behavior are controlled.
questionnaire¶
The questionnaire section defines the structured fields collected during
evaluation.
Outcome fields are stored under event["outcomes"]. Trace fields are stored
under event["traces"].
Each field is defined by a name, type, question, and optional validation rules.
questionnaire:
outcomes:
fields:
rating:
type: int
choices: [1, 2, 3, 4, 5]
question: "Overall, how much do you like this product?"
purchase_intent:
type: categorical
choices: ["no", "maybe", "yes"]
question: "How likely are you to purchase in 30 days?"
traces:
fields:
review_text:
type: text
question: "Write a short review in 2–4 sentences."
rationale:
type: text
question: "Explain the main reasons for your responses."
Supported field types:
Type |
Meaning |
|---|---|
|
Integer-valued response. |
|
Floating-point response. |
|
Response from a fixed set of choices. |
|
Boolean response. |
|
Free-text response. |
|
JSON-valued response. |
Each field may include:
Field |
Description |
|---|---|
|
Field type. Required. |
|
User-facing prompt. Required. |
|
Optional formatting guidance. |
|
Optional allowed values. Recommended for categorical and discrete integer fields. |
For categorical or discrete integer fields, choices is recommended because it
allows validation to catch malformed model outputs.
LLM evaluation must return JSON only, with the expected shape:
{
"outcomes": {
"rating": 5,
"purchase_intent": "yes"
},
"traces": {
"review_text": "Short review text.",
"rationale": "Brief rationale."
}
}
Field names must match the YAML keys exactly.
backend¶
The optional backend section configures a local or server-style chat backend.
It is required when a run uses LLM-backed enrichment or LLM-backed outcomes.
Example local backend:
backend:
name: ollama
model: qwen2.5:7b
Example server-style backend:
backend:
name: server
model: local-model-name
base_url: http://localhost:8000/v1
The backend interface is provider-agnostic: other modules depend on the SIM-PANEL backend contract rather than vendor-specific SDKs.
Persisted enrichment¶
SIM-PANEL can persistently enrich panelist or product records before generation. This is useful when structured records need rendered text variants for prompts.
Panelist enrichment:
panelists:
source: examples/data/panelists.jsonl
variant: default
enrich:
enabled: true
save: in_place
Product enrichment:
products:
source: examples/data/products.jsonl
variant: default
enrich:
enabled: true
save:
path: outputs/enriched_products.jsonl
Enrichment requires a configured backend because it calls the backend chat
interface.
If save: in_place, the source file is overwritten with enriched records. If
save: {path: ...} is used, enriched records are written to the specified path
and the normalized run config is updated to use that path.
output_dir¶
The optional output_dir field records the intended output directory:
output_dir: outputs/run_001
The configuration loader stores this value in the normalized run config. The CLI
may still override or construct run directories and pass them to the IO writers,
for example via --out.
A typical generation run writes:
events.jsonl
metadata.json
data_dictionary.json
Optional outputs may include:
events.csv
events.jsonl is the primary dataset output. events.csv is optional and
intended for convenience; nested structures are JSON-serialized into string
cells.
metadata.json records run bookkeeping such as generation time, schema version,
seed, row counts, panelist/product/period counts, policy name, optional config
snapshot, and config hash.
data_dictionary.json records schema version and JSONable snapshots of key
configs/specs, including generator config, policy config, selection config,
execution rules, and outcome config.
Complete deterministic example¶
The following example combines random assignment, deterministic outcomes, schema validation, and an explicit output directory.
panelists:
source: examples/data/panelists.jsonl
variant: default
products:
source: examples/data/products.jsonl
variant: default
policy:
name: random
generator:
schema_version: "0.1.0"
seed: 42
n_periods: 3
validate_on_finish: true
max_errors: 50
event_namespace: quickstart
max_workers: 1
prompting_strategy: persona
outcomes_model:
name: deterministic
questionnaire:
outcomes:
fields:
rating:
type: int
choices: [1, 2, 3, 4, 5]
question: "Overall, how much do you like this product?"
purchase_intent:
type: categorical
choices: ["no", "maybe", "yes"]
question: "How likely are you to purchase in 30 days?"
traces:
fields:
review_text:
type: text
question: "Write a short review in 2–4 sentences."
output_dir: outputs/quickstart
Run it with:
sim-panel generate --config examples/configs/minimal.yaml
or override the output directory from the command line:
sim-panel generate \
--config examples/configs/minimal.yaml \
--out outputs/quickstart_override
Complete self-selection example¶
The following example enables self-selection and applies explicit execution rules after panelist choice.
panelists:
source: examples/data/panelists.jsonl
variant: default
select_settings:
temperature: 0
eval_settings:
temperature: 0
products:
source: examples/data/products.jsonl
variant: default
policy:
name: self_selection
selection:
allow_empty: true
include_features: true
require_json_only: true
max_selected_soft: 2
include_raw_text: true
execution:
rules:
enforce_subset_of_choice_set: true
max_evals_per_panelist_per_t: 2
allow_empty: true
keep_strategy: keep_first
generator:
schema_version: "0.1.0"
seed: 42
n_periods: 2
validate_on_finish: true
event_namespace: self-selection-example
prompting_strategy: persona
outcomes_model:
name: deterministic
questionnaire:
outcomes:
fields:
rating:
type: int
choices: [1, 2, 3, 4, 5]
question: "Overall, how much do you like this product?"
traces:
fields:
rationale:
type: text
question: "Explain the main reason for your rating."
output_dir: outputs/self_selection_example
In this example, selection.max_selected_soft is only a prompt-level hint.
The hard cap is execution.rules.max_evals_per_panelist_per_t.
Complete LLM-backed example¶
The following example uses an LLM backend for questionnaire evaluation.
panelists:
source: examples/data/panelists.jsonl
variant: default
eval_settings:
temperature: 0
products:
source: examples/data/products.jsonl
variant: default
policy:
name: random
generator:
schema_version: "0.1.0"
seed: 42
n_periods: 2
validate_on_finish: true
event_namespace: llm-example
prompting_strategy: persona
outcomes_model:
name: llm
questionnaire:
outcomes:
fields:
rating:
type: int
choices: [1, 2, 3, 4, 5]
question: "Overall, how much do you like this product?"
traces:
fields:
rationale:
type: text
question: "Explain the main reason for your rating."
backend:
name: ollama
model: qwen2.5:7b
output_dir: outputs/llm_example
This configuration requires a working backend. LLM results may vary if backend settings, model versions, prompts, or runtime behavior change.
Fail-fast behavior¶
Configuration loading should fail early when required sections are missing, required keys are absent, manual schedules reference unavailable IDs, enrichment is requested without a backend, or LLM outcomes are requested without a backend.
This is intentional. SIM-PANEL treats YAML files as executable research specifications, so invalid assumptions should be caught before generation begins.