Benchmarks

class sim_panel.benchmarks.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]

Bases: object

Configuration for exporting a benchmark-ready real-data subset.

Parameters:
  • import_dir (str) – Directory containing imported source artifacts, expected to include at least events.jsonl and usually products.jsonl.

  • output_dir (str) – Directory to write the frozen benchmark subset.

  • seed (int) – Random seed for reproducible product sampling.

  • min_reviews_per_product (int) – Minimum number of rating-bearing events a product must have to be eligible for the subset.

  • max_products (Optional[int]) – Maximum number of products to keep. If None, keep all eligible products.

  • require_product_record (bool) – If True, only keep products that also appear in products.jsonl.

import_dir: str
output_dir: str
seed: int = 0
min_reviews_per_product: int = 25
max_products: int | None = 100
require_product_record: bool = True
sim_panel.benchmarks.load_benchmark_subset_config(path)[source]

Load a benchmark subset config from YAML.

Accepts either: - a top-level mapping containing benchmark_subset: {…} - or the benchmark_subset fields directly at the top level

Return type:

BenchmarkSubsetConfig

Parameters:

path (str | Path)

sim_panel.benchmarks.build_benchmark_subset(config)[source]

Build a frozen benchmark subset directory from imported real-data artifacts.

Writes the following files into config.output_dir: - events.jsonl - products.jsonl - metadata.json - stats.json

Return type:

Dict[str, Any]

Parameters:

config (BenchmarkSubsetConfig)

Design

This is a streaming two-pass builder over events.jsonl: 1. pass 1 counts rating-bearing events per product 2. pass 2 writes events for the selected products

This avoids loading the full events table into memory.

class sim_panel.benchmarks.config.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]

Bases: object

Configuration for exporting a benchmark-ready real-data subset.

Parameters:
  • import_dir (str) – Directory containing imported source artifacts, expected to include at least events.jsonl and usually products.jsonl.

  • output_dir (str) – Directory to write the frozen benchmark subset.

  • seed (int) – Random seed for reproducible product sampling.

  • min_reviews_per_product (int) – Minimum number of rating-bearing events a product must have to be eligible for the subset.

  • max_products (Optional[int]) – Maximum number of products to keep. If None, keep all eligible products.

  • require_product_record (bool) – If True, only keep products that also appear in products.jsonl.

import_dir: str
output_dir: str
seed: int = 0
min_reviews_per_product: int = 25
max_products: int | None = 100
require_product_record: bool = True
sim_panel.benchmarks.config.load_benchmark_subset_config(path)[source]

Load a benchmark subset config from YAML.

Accepts either: - a top-level mapping containing benchmark_subset: {…} - or the benchmark_subset fields directly at the top level

Return type:

BenchmarkSubsetConfig

Parameters:

path (str | Path)

sim_panel.benchmarks.subset.build_benchmark_subset(config)[source]

Build a frozen benchmark subset directory from imported real-data artifacts.

Writes the following files into config.output_dir: - events.jsonl - products.jsonl - metadata.json - stats.json

Return type:

Dict[str, Any]

Parameters:

config (BenchmarkSubsetConfig)

Design

This is a streaming two-pass builder over events.jsonl: 1. pass 1 counts rating-bearing events per product 2. pass 2 writes events for the selected products

This avoids loading the full events table into memory.