Concepts

This page defines the main terms used throughout SIM-PANEL.

SIM-PANEL is organized around event-level panel simulation: panelists encounter products or interventions over time, exposure is governed by a policy, and each interaction is recorded as a schema-valid event.

Panelists

A panelist is a simulated respondent, customer, user, or agent.

Panelists have stable identifiers and may carry structured attributes, persona text, or both. During generation, panelists can evaluate products and, under self-selection designs, choose which products they want to interact with.

Panelist features may be copied into event rows as panelist_features.

Products

A product is the item, intervention, treatment, or candidate object being shown to panelists.

Products have stable identifiers, display text, and optional structured attributes. Product records can be hand-written, generated, enriched, or imported from external sources.

Product features may be copied into event rows as product_features.

Events

An event is one row in events.jsonl.

SIM-PANEL currently supports two event types:

Event type

Meaning

evaluation

A panelist evaluates one product at one period.

selection

A panelist is shown a choice set and requests products to evaluate.

Most generated rows are evaluation events. Selection events appear under the self_selection policy.

Each event includes a period index t, so even simple runs preserve panel-style sequence structure.

Policies

A policy controls exposure: how panelists and products are paired.

SIM-PANEL currently supports three policy names:

Policy

Exposure logic

random

Products are assigned to panelists exogenously.

manual

Product-panelist assignments are loaded from a schedule or mapping.

self_selection

Panelists choose products from a shown choice set.

Policies are pure exposure logic. They do not call LLMs, define outcomes, write files, or build schema rows.

Selection and execution

Self-selection separates what the panelist requests from what the generator actually evaluates.

  • Selection records the panelist’s requested product IDs.

  • Execution applies generator-side rules such as subset enforcement and maximum evaluations per panelist-period.

This distinction is important because a panelist may request invalid, duplicated, or too many products. Execution rules make the final evaluated subset explicit and auditable.

Outcomes

Outcomes are structured responses produced for evaluation events.

Examples include:

{
  "rating": 5,
  "purchase_intent": "yes"
}

Outcomes are configured through the questionnaire and generated by the configured outcome model. Deterministic outcome models are useful for testing; LLM-backed outcome models are optional.

Traces

Traces are optional auxiliary fields attached to events.

They may contain review text, rationales, source provenance, parsing errors, or execution details. For example:

{
  "review_text": "A crisp and refreshing option.",
  "rationale": "The panelist likes citrus notes."
}

Traces are not the primary reward signal, but they are useful for inspection, debugging, and qualitative diagnostics.

Sources

Sources ingest external observational datasets and convert them into SIM-PANEL artifacts.

For example, the Amazon Reviews’23 source converts local review and metadata files into:

events.jsonl
products.jsonl
personas.jsonl
metadata.json
data_dictionary.json
stats.json

Sources do not generate synthetic events. They convert external data into the same artifact contracts used elsewhere in the project.

Benchmarks

Benchmarks freeze benchmark-ready real-data subsets from imported source artifacts.

The benchmark layer does not compute comparison metrics. It selects and writes a stable reference subset, typically by choosing products with enough rating-bearing events.

The intended workflow is:

sources -> imported artifacts
benchmarks -> frozen reference subset
comparison -> synthetic-vs-reference evaluation

Analysis

Analysis inspects one run.

It reads a single generated or imported run directory and produces summaries, metrics, plots, reports, and optional regression diagnostics.

Use analysis to answer questions such as:

  • Did the run produce the expected number of events?

  • Are outcomes missing or malformed?

  • What do rating distributions look like?

  • Which products or panelists dominate the run?

  • Do simple regressions reveal useful feature-outcome structure?

Comparison

Comparison evaluates multiple conditions or synthetic outputs against a real reference.

It lives under analysis/compare/ and has its own CLI command:

sim-panel compare --config path/to/compare.yaml

Comparison currently supports:

Mode

Meaning

cross

Compare synthetic conditions against one another.

benchmark

Compare synthetic conditions against one real reference.

Artifact layers

A useful way to remember SIM-PANEL’s modularity is:

Layer

Role

sources/

Ingest external data.

benchmarks/

Freeze reference subsets.

generators/

Simulate event rows.

schema/

Validate event contracts.

analysis/

Inspect one run.

analysis/compare/

Compare conditions or references.

cli/

Expose reproducible workflows.