Benchmarks¶
- class sim_panel.benchmarks.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]¶
Bases:
objectConfiguration for exporting a benchmark-ready real-data subset.
- Parameters:
import_dir (
str) – Directory containing imported source artifacts, expected to include at leastevents.jsonland usuallyproducts.jsonl.output_dir (
str) – Directory to write the frozen benchmark subset.seed (
int) – Random seed for reproducible product sampling.min_reviews_per_product (
int) – Minimum number of rating-bearing events a product must have to be eligible for the subset.max_products (
Optional[int]) – Maximum number of products to keep. If None, keep all eligible products.require_product_record (
bool) – If True, only keep products that also appear inproducts.jsonl.
- import_dir: str¶
- output_dir: str¶
- seed: int = 0¶
- min_reviews_per_product: int = 25¶
- max_products: int | None = 100¶
- require_product_record: bool = True¶
- sim_panel.benchmarks.load_benchmark_subset_config(path)[source]¶
Load a benchmark subset config from YAML.
Accepts either: - a top-level mapping containing benchmark_subset: {…} - or the benchmark_subset fields directly at the top level
- Return type:
- Parameters:
path (str | Path)
- sim_panel.benchmarks.build_benchmark_subset(config)[source]¶
Build a frozen benchmark subset directory from imported real-data artifacts.
Writes the following files into
config.output_dir: - events.jsonl - products.jsonl - metadata.json - stats.json- Return type:
Dict[str,Any]- Parameters:
config (BenchmarkSubsetConfig)
Design¶
This is a streaming two-pass builder over events.jsonl: 1. pass 1 counts rating-bearing events per product 2. pass 2 writes events for the selected products
This avoids loading the full events table into memory.
- class sim_panel.benchmarks.config.BenchmarkSubsetConfig(import_dir, output_dir, seed=0, min_reviews_per_product=25, max_products=100, require_product_record=True)[source]¶
Bases:
objectConfiguration for exporting a benchmark-ready real-data subset.
- Parameters:
import_dir (
str) – Directory containing imported source artifacts, expected to include at leastevents.jsonland usuallyproducts.jsonl.output_dir (
str) – Directory to write the frozen benchmark subset.seed (
int) – Random seed for reproducible product sampling.min_reviews_per_product (
int) – Minimum number of rating-bearing events a product must have to be eligible for the subset.max_products (
Optional[int]) – Maximum number of products to keep. If None, keep all eligible products.require_product_record (
bool) – If True, only keep products that also appear inproducts.jsonl.
- import_dir: str¶
- output_dir: str¶
- seed: int = 0¶
- min_reviews_per_product: int = 25¶
- max_products: int | None = 100¶
- require_product_record: bool = True¶
- sim_panel.benchmarks.config.load_benchmark_subset_config(path)[source]¶
Load a benchmark subset config from YAML.
Accepts either: - a top-level mapping containing benchmark_subset: {…} - or the benchmark_subset fields directly at the top level
- Return type:
- Parameters:
path (str | Path)
- sim_panel.benchmarks.subset.build_benchmark_subset(config)[source]¶
Build a frozen benchmark subset directory from imported real-data artifacts.
Writes the following files into
config.output_dir: - events.jsonl - products.jsonl - metadata.json - stats.json- Return type:
Dict[str,Any]- Parameters:
config (BenchmarkSubsetConfig)
Design¶
This is a streaming two-pass builder over events.jsonl: 1. pass 1 counts rating-bearing events per product 2. pass 2 writes events for the selected products
This avoids loading the full events table into memory.