JSONL Sampler
100% client-side. No upload.
Sample
JSONL Sampler
Take a representative slice of a big JSONL file without writing a one-liner: random sample (uses reservoir sampling so a 100M-line file still works), head, tail, every Nth record, or a stratified sample that keeps the same per-category ratio. Seed-controlled for reproducibility. 100% in-browser.
Sampling modes
Random N (reservoir)
Picks N records uniformly at random across the whole file using Algorithm R reservoir sampling. Single pass, constant memory — handles files much larger than the result. Each record has equal probability of being kept.
First N / Last N
Grab the first or last N records. Useful for spot-checking, smoke tests, or "show me the most recent log lines."
Stratified by key
Keeps the same per-category ratio as the input. Set the key path (e.g. category
or user.tier) and a target N; the sampler partitions records by that key and
takes a proportional random sample from each group. Categories smaller than their share
are taken in full. Useful when one category dominates the file and a pure random sample
would miss the rare classes.
Every Nth record
Deterministic systematic sampling: keep record 1, N+1, 2N+1, … Good for downsampling time-ordered logs where you want even temporal coverage rather than random spikes.
Seed
Random and stratified modes use a seeded PRNG so the same input + seed always produces the same sample. Leave the seed blank for a fresh non-deterministic sample on each run.
Tips & common pitfalls
- Reservoir vs. shuffle. Reservoir sampling is the standard for unknown stream size and is much faster than a full shuffle on big files.
- Stratify needs the key to exist on every record. Records missing the key are bucketed under
__missing__. - Output order. Random mode preserves the original order of the kept records; stratified mode interleaves by group.