Benchmarks

Quick Jump: latest snapshot | performance | correctness | other

Among other metrics, the PaSh continuous integration infrastructure tracks performance and correctness metrics on the main branch over time. Some of these results—for the most recent commits—are shown below; longer commit history, other branches, and more configurations are part of the control subsystem.

One can also view these results in textual form. For example, to get the latest 5 builds run curl -s ci.binpa.sh | head. To get details about a specific commit, run curl -s ci.binpa.sh/f821b4b.

Latest Snapshot

The next few plots show results only for the latest commit on main branch.

Classic Unix One-liners

This benchmark set contains 9 pipelines written by Unix experts: a few pipelines are from Unix legends (e.g., Top-N, Spell), one from a book on Unix scripting, and a few are from top Stackoverflow answers. Pipelines contain 2-7 stages (avg.: 5.2), ranging from scalable CPU-intensive (e.g., grep stage in Nfa-regex) to non-parallelizable stages (e.g., diff stage in Diff). Inputs are script-specific and average 10GB per benchmark.

Unix50 from Bell Labs

This benchmark includes 36 pipelines solving the Unix 50 game. The pipelines were designed to highlight Unix's modular philosophy, make extensive use of standard commands under a variety of flags, and appear to be written by non-experts. PaSh executes each pipeline as-is, without any modification.

Mass-transit analytics during COVID-19

This benchmark set contains 4 pipelines that were used to analyze real telemetry data from bus schedules during the COVID-19 response in a large European city. The pipelines compute several average statistics on the transit system per day—such as daily serving hours and daily number of vehicles. Pipelines range between 9 and 10 stages (avg.: 9.2), use typical Unix staples sed, sort, and uniq, and operate on a fixed 34GB dataset that contains mass-transport data collected over a single year.

Performance

The next few plots show the performance of various benchmarks over time, as a timeseries (with the time captured by the commit ID) on main branch.

Classic Unix One-liners

This benchmark set contains 9 pipelines written by Unix experts: a few pipelines are from Unix legends (e.g., Top-N, Spell), one from a book on Unix scripting, and a few are from top Stackoverflow answers. Pipelines contain 2-7 stages (avg.: 5.2), ranging from scalable CPU-intensive (e.g., grep stage in Nfa-regex) to non-parallelizable stages (e.g., diff stage in Diff). Inputs are script-specific and average 10GB per benchmark.

Unix50 from Bell Labs

This benchmark includes 36 pipelines solving the Unix 50 game. The pipelines were designed to highlight Unix's modular philosophy, make extensive use of standard commands under a variety of flags, and appear to be written by non-experts. PaSh executes each pipeline as-is, without any modification.

Mass-transit analytics during COVID-19

This benchmark set contains 4 pipelines that were used to analyze real telemetry data from bus schedules during the COVID-19 response in a large European city. The pipelines compute several average statistics on the transit system per day—such as daily serving hours and daily number of vehicles. Pipelines range between 9 and 10 stages (avg.: 9.2), use typical Unix staples sed, sort, and uniq, and operate on a fixed 34GB dataset that contains mass-transport data collected over a single year.

Correctness

This section tracks various statistics across tests checking the correctness of various PaSh components.

Benchmark Passed Failed Untested Unresolved Unsupported Other
Compiler Tests 18/18 0/18 0/18 0/18 0/18 0/18
Interface Tests 11/11 0/11 0/11 0/11 0/11 0/11
Annotation Tests 1/1 1/1 1/1 1/1 1/1 1/1
Aggregation Tests 10/5 0/5 0/5 0/5 0/5 0/5
Smoosh Tests 10/10 0/10 0/10 0/10 0/10 0/10
POSIX Tests 348/494 68/494 31/494 6/494 40/494 1/494