Concepts¶
This page defines the main terms used throughout SIM-PANEL.
SIM-PANEL is organized around event-level panel simulation: panelists encounter products or interventions over time, exposure is governed by a policy, and each interaction is recorded as a schema-valid event.
Panelists¶
A panelist is a simulated respondent, customer, user, or agent.
Panelists have stable identifiers and may carry structured attributes, persona text, or both. During generation, panelists can evaluate products and, under self-selection designs, choose which products they want to interact with.
Panelist features may be copied into event rows as panelist_features.
Products¶
A product is the item, intervention, treatment, or candidate object being shown to panelists.
Products have stable identifiers, display text, and optional structured attributes. Product records can be hand-written, generated, enriched, or imported from external sources.
Product features may be copied into event rows as product_features.
Events¶
An event is one row in events.jsonl.
SIM-PANEL currently supports two event types:
Event type |
Meaning |
|---|---|
|
A panelist evaluates one product at one period. |
|
A panelist is shown a choice set and requests products to evaluate. |
Most generated rows are evaluation events. Selection events appear under the
self_selection policy.
Each event includes a period index t, so even simple runs preserve panel-style
sequence structure.
Policies¶
A policy controls exposure: how panelists and products are paired.
SIM-PANEL currently supports three policy names:
Policy |
Exposure logic |
|---|---|
|
Products are assigned to panelists exogenously. |
|
Product-panelist assignments are loaded from a schedule or mapping. |
|
Panelists choose products from a shown choice set. |
Policies are pure exposure logic. They do not call LLMs, define outcomes, write files, or build schema rows.
Selection and execution¶
Self-selection separates what the panelist requests from what the generator actually evaluates.
Selection records the panelist’s requested product IDs.
Execution applies generator-side rules such as subset enforcement and maximum evaluations per panelist-period.
This distinction is important because a panelist may request invalid, duplicated, or too many products. Execution rules make the final evaluated subset explicit and auditable.
Outcomes¶
Outcomes are structured responses produced for evaluation events.
Examples include:
{
"rating": 5,
"purchase_intent": "yes"
}
Outcomes are configured through the questionnaire and generated by the configured outcome model. Deterministic outcome models are useful for testing; LLM-backed outcome models are optional.
Traces¶
Traces are optional auxiliary fields attached to events.
They may contain review text, rationales, source provenance, parsing errors, or execution details. For example:
{
"review_text": "A crisp and refreshing option.",
"rationale": "The panelist likes citrus notes."
}
Traces are not the primary reward signal, but they are useful for inspection, debugging, and qualitative diagnostics.
Sources¶
Sources ingest external observational datasets and convert them into SIM-PANEL artifacts.
For example, the Amazon Reviews’23 source converts local review and metadata files into:
events.jsonl
products.jsonl
personas.jsonl
metadata.json
data_dictionary.json
stats.json
Sources do not generate synthetic events. They convert external data into the same artifact contracts used elsewhere in the project.
Benchmarks¶
Benchmarks freeze benchmark-ready real-data subsets from imported source artifacts.
The benchmark layer does not compute comparison metrics. It selects and writes a stable reference subset, typically by choosing products with enough rating-bearing events.
The intended workflow is:
sources -> imported artifacts
benchmarks -> frozen reference subset
comparison -> synthetic-vs-reference evaluation
Analysis¶
Analysis inspects one run.
It reads a single generated or imported run directory and produces summaries, metrics, plots, reports, and optional regression diagnostics.
Use analysis to answer questions such as:
Did the run produce the expected number of events?
Are outcomes missing or malformed?
What do rating distributions look like?
Which products or panelists dominate the run?
Do simple regressions reveal useful feature-outcome structure?
Comparison¶
Comparison evaluates multiple conditions or synthetic outputs against a real reference.
It lives under analysis/compare/ and has its own CLI command:
sim-panel compare --config path/to/compare.yaml
Comparison currently supports:
Mode |
Meaning |
|---|---|
|
Compare synthetic conditions against one another. |
|
Compare synthetic conditions against one real reference. |
Artifact layers¶
A useful way to remember SIM-PANEL’s modularity is:
Layer |
Role |
|---|---|
|
Ingest external data. |
|
Freeze reference subsets. |
|
Simulate event rows. |
|
Validate event contracts. |
|
Inspect one run. |
|
Compare conditions or references. |
|
Expose reproducible workflows. |