Skip to content

Benchmark methodology

How the benchmark results are produced. The benchmarks are intentionally conservative: no number is published until the harness emits result_kind: "actual_benchmark" with is_placeholder: false and claims_allowed: true. Reproduce any result locally with yarn benchmark (raw output: benchmarks/results/latest.actual.json).

Methodology card: benchmark proof without fake numbers — same scenario, same input size, warmup before timing, real versions, commit attached, and claims allowed only from actual benchmark output.

Competitor set

EngineAdapter keyRole
Neuron-JS@sebasoft/neuron-jsFirst-party rules engine under test.
json-rules-enginejson-rules-engineClosest default Node.js JSON rules-engine competitor.
JsonLogicjson-logic-jsPortable JSON predicate format competitor.
Hand-coded TypeScripthand-coded-typescriptBaseline for direct conditional logic without engine overhead.
rule-engine-jsrule-engine-jsSmaller modern competitor selected because it installs/builds in this repository.

Scenario matrix

ScenarioInputs representedWhy it exists
pricing-discounttier, region, coupon, cart total, account ageShows business-rule pricing decisions and validation/explanation overhead.
eligibility-approvalage, country, verification status, risk score, account flagsShows policy/approval style decisions with clear pass/fail outcomes.
workflow-routingchannel, urgency, customer segment, confidence score, escalation flagsShows deterministic workflow routing and trace usefulness.

Input-size matrix

ProfileDecisionsUsage
smoke100Correctness and trace sanity.
small1,000Local development feedback.
medium10,000Chartable throughput.
large100,000Optional; run only if runtime remains practical in CI/local machines.

How each metric is measured

  • Fairness gate. Before timing, every engine must reproduce the scenario's canonical decision (e.g. pricing finalTotal: 105, discountAmount: 20); the run aborts on any mismatch, so all engines are timed doing equivalent work.
  • Throughput / p50 / p95. Warmup iterations run untimed, then measured iterations run in batches; per-decision latency is averaged per batch (so per-call timer overhead does not dominate sub-microsecond engines). Throughput is total measured decisions over total measured seconds.
  • Cold start. Median wall-clock across several fresh Node processes to import the engine and execute the first decision; the timer starts before the import, so Node's own startup is excluded.
  • Bundle size. esbuild bundles and minifies the engine's public surface (ESM, node platform); the output byte length is recorded. The hand-coded baseline has no library dependency (0).
  • Validation / explanation overhead. Neuron-JS only: the per-decision latency delta of running validateScript (resp. explainExecution) around an otherwise identical execution. The other engines provide no equivalent step, so their measured delta is 0.

Result fields

Each row in the results file carries these fields (units and definitions are mirrored in the machine-readable result schema):

FieldUnitMeaning
engineidentifierEngine under test (fixed enum).
scenarioidentifierScenario slug.
input_sizeprofileNamed workload profile (smoke/small/medium/large).
warmup_iterationsdecisionsUnmeasured warmup decisions before timing.
measured_iterationsdecisionsMeasured decisions in the timing window.
throughput_decisions_per_seconddecisions/secondMeasured decisions ÷ elapsed measured seconds.
p50_ms / p95_msmillisecondsMedian / 95th-percentile per-decision latency.
cold_start_msmillisecondsImport + first decision in a fresh process.
bundle_size_minified_bytesbytesMinified bundle of the engine's public surface.
validation_overhead_msmillisecondsValidation-enabled vs disabled per-decision delta (Neuron-JS).
explanation_overhead_msmillisecondsTrace-enabled vs disabled per-decision delta (Neuron-JS).
node_version / package_version / commit_shaprovenanceRun environment and source state.

No fabricated numbers

Numbers are published only from a measured actual_benchmark run; placeholder fixtures are never used for public claims. Results reflect one machine, Node version, and commit — reproduce them before citing, and avoid "fastest"/"best" framing beyond what a named scenario and input size support.