JSONL Sampler

updated 11 May 2026

100% client-side. No upload.

Sample

Mode: N: Key (stratify only): Seed:

Drop a .jsonl file here, or

Input

Output

JSONL Sampler

Take a representative slice of a big JSONL file without writing a one-liner: random sample (uses reservoir sampling so a 100M-line file still works), head, tail, every Nth record, or a stratified sample that keeps the same per-category ratio. Seed-controlled for reproducibility. 100% in-browser.

— S., [email protected]

Sampling modes

Random N (reservoir)

Picks N records uniformly at random across the whole file using Algorithm R reservoir sampling. Single pass, constant memory — handles files much larger than the result. Each record has equal probability of being kept.

First N / Last N

Grab the first or last N records. Useful for spot-checking, smoke tests, or "show me the most recent log lines."

Stratified by key

Keeps the same per-category ratio as the input. Set the key path (e.g. category or user.tier) and a target N; the sampler partitions records by that key and takes a proportional random sample from each group. Categories smaller than their share are taken in full. Useful when one category dominates the file and a pure random sample would miss the rare classes.

Every Nth record

Deterministic systematic sampling: keep record 1, N+1, 2N+1, … Good for downsampling time-ordered logs where you want even temporal coverage rather than random spikes.

Seed

Random and stratified modes use a seeded PRNG so the same input + seed always produces the same sample. Leave the seed blank for a fresh non-deterministic sample on each run.

Tips & common pitfalls

Reservoir vs. shuffle. Reservoir sampling is the standard for unknown stream size and is much faster than a full shuffle on big files.
Stratify needs the key to exist on every record. Records missing the key are bucketed under __missing__.
Output order. Random mode preserves the original order of the kept records; stratified mode interleaves by group.

JSONL Sampler

Sample

JSONL Sampler

Sampling modes

Random N (reservoir)

First N / Last N

Stratified by key

Every Nth record

Seed

Tips & common pitfalls

Related tools