jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

Shuffle, Sample, or Split a JSONL Training Set

updated 16 May 2026

JSONL sampler. Take a random sample of N records from a JSONL (JSON Lines) file, or grab the first or last N. Stratified sampling by key. Reservoir sampling for large files. Up to 1 GB, in your browser.

100% client-side. No upload.

Sample

Drop a .jsonl file here, or

JSONL Sampler

Take a representative slice of a big JSONL file without writing a one-liner: random sample (uses reservoir sampling so a 100M-line file still works), head, tail, every Nth record, or a stratified sample that keeps the same per-category ratio. Seed-controlled for reproducibility. 100% in-browser.

Sampling modes

Random N (reservoir)

Picks N records uniformly at random across the whole file using Algorithm R reservoir sampling. Single pass, constant memory — handles files much larger than the result. Each record has equal probability of being kept.

First N / Last N

Grab the first or last N records. Useful for spot-checking, smoke tests, or "show me the most recent log lines."

Stratified by key

Keeps the same per-category ratio as the input. Set the key path (e.g. category or user.tier) and a target N; the sampler partitions records by that key and takes a proportional random sample from each group. Categories smaller than their share are taken in full. Useful when one category dominates the file and a pure random sample would miss the rare classes.

Every Nth record

Deterministic systematic sampling: keep record 1, N+1, 2N+1, … Good for downsampling time-ordered logs where you want even temporal coverage rather than random spikes.

Seed

Random and stratified modes use a seeded PRNG so the same input + seed always produces the same sample. Leave the seed blank for a fresh non-deterministic sample on each run.

Tips & common pitfalls

Before you start

The sampler reads your JSONL and returns a smaller subset — useful when you need a representative slice for inspection, prototyping or sharing.

How to use it

  1. Drop a file or paste JSONL.
  2. Pick a mode: Random, Head (first N), Tail (last N), or Stratified (by a key).
  3. Set the sample size N.
  4. For stratified, set the stratify key (e.g. plan, country).
  5. Optionally pin a Seed for reproducible random samples.
  6. Click Sample, then Copy or Download.

Modes explained

Random

Uses reservoir sampling — single pass over the file, memory-efficient even for huge inputs. Every record has equal probability.

Head / Tail

The first or last N lines. Fast and exact.

Stratified

Keeps the distribution of a chosen key. If 80% of your rows have plan: free and 20% plan: pro, a stratified sample of 100 returns roughly 80 free + 20 pro records.

Tips & common pitfalls

Frequently asked questions

Is random sampling truly uniform?

Yes — reservoir sampling gives each row an equal probability, even when the file is too large to fit in memory.

Can I stratify by multiple keys?

Not directly. Concatenate the keys upstream (plan_country) and stratify on the combined value.

Related tools