jsonlkit.com
JSONL (JSON Lines) utilities, in the browser
Say hi →

Split JSONL into Train, Validation, Test

updated 16 May 2026 · for fine-tune / ML datasets

Train, validation and test splitter. Split a JSONL (JSON Lines) dataset into train, val and test sets. Random with seed, optional stratify by key, configurable ratios. Three downloads in one click. Browser-only.

100% client-side. No upload.

Split

Drop a .jsonl file here, or

Train / Val / Test Splitter

The reproducible split every ML pipeline needs. Drop in a JSONL file, set the ratios (80/10/10 by default), choose a seed so your results are repeatable, and optionally stratify by a label key to keep the class distribution identical across all three splits. Three named files come out: train.jsonl, val.jsonl, test.jsonl. 100% in-browser.

Random split

Shuffles the input with a seeded PRNG (so the same input + same seed always gives the same three files), then takes the first train% for train, the next val% for validation, and the rest for test. The seed defaults to 42 — change it if you want to try a different shuffle.

Stratified split

Set a key (typically the label or class field — label, category, intent) and the splitter keeps each class's proportion the same in all three files. Critical when classes are imbalanced: a pure random split can put 0 examples of a rare class into val/test and silently destroy your evaluation.

Why a separate test set?

Standard ML hygiene: use train for fitting, val for hyperparameter tuning and early-stopping, test for the final unbiased evaluation. If you tune on test, your reported metrics will be optimistic and your model will underperform in production.

Ratios that aren't 80/10/10

Tips & common pitfalls

Before you start

You need a single JSONL file representing your full dataset. The splitter shuffles it (using your seed) and produces three files: train, val and test.

How to use it

  1. Drop your JSONL or paste it.
  2. Set the ratios — default 80/10/10 (train/val/test). Any three numbers that sum to 100 work.
  3. Pin a Seed for reproducible splits.
  4. Optional: enable Stratify by key to preserve a label distribution across splits.
  5. Click Split, then download each file.

Stratified split

For classification or labelled data, stratify by the label key (e.g. label, category). The split keeps the proportion of each label roughly equal in all three sets — important when a class is rare.

Tips & common pitfalls

Frequently asked questions

Can I do a 90/5/5 or 70/15/15?

Yes — any three positive numbers summing to 100.

Can I skip the test set?

Set test ratio to 0. The tool will produce just train and val.

k-fold cross-validation?

Not yet — on the roadmap.

Related tools