Benchmarks¶

class sim_panel.benchmarks.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]¶

Bases: object

Configuration for exporting a benchmark-ready real-data subset.

Parameters:

import_dir (str) – Directory containing imported source artifacts, expected to include at least events.jsonl and usually products.jsonl.
output_dir (str) – Directory to write the frozen benchmark subset.
seed (int) – Random seed for reproducible product sampling.
min_reviews_per_product (int) – Minimum number of rating-bearing events a product must have to be eligible for the subset.
max_products (Optional[int]) – Maximum number of products to keep. If None, keep all eligible products.
require_product_record (bool) – If True, only keep products that also appear in products.jsonl.

import_dir: str¶

output_dir: str¶

seed: int = 0¶

min_reviews_per_product: int = 25¶

max_products: int | None = 100¶

require_product_record: bool = True¶

sim_panel.benchmarks.load_benchmark_subset_config(path)[source]¶

Load a benchmark subset config from YAML.

Accepts either: - a top-level mapping containing benchmark_subset: {…} - or the benchmark_subset fields directly at the top level

Return type:: BenchmarkSubsetConfig
Parameters:: path (str | Path)

sim_panel.benchmarks.build_benchmark_subset(config)[source]¶

Build a frozen benchmark subset directory from imported real-data artifacts.

Writes the following files into config.output_dir: - events.jsonl - products.jsonl - metadata.json - stats.json

Return type:: Dict[str, Any]
Parameters:: config (BenchmarkSubsetConfig)

Design¶

This is a streaming two-pass builder over events.jsonl: 1. pass 1 counts rating-bearing events per product 2. pass 2 writes events for the selected products

This avoids loading the full events table into memory.

class sim_panel.benchmarks.config.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]¶