JSONL Generator
JSONL generator. Generate synthetic JSONL (JSON Lines) data from a tiny schema. Useful for testing pipelines, mocking APIs and stress-testing fine-tuning code without real data. Runs in your browser.
Schema
Before you start
The generator is for synthetic test data — not real data. It's reproducible: the same Seed always produces the same output, so you can pin your fixtures.
How to use it
- Edit the Schema: one line per field as
name: type. Types accept arguments in parens. - Set how many records you want.
- Optionally pin a Seed to make output reproducible.
- Click Generate, then Copy or Download.
Field types
uuid— RFC-4122-ish v4 UUID stringint(min, max)— integer in[min, max]inclusive. Default0..100.float(min, max)— float to 4 decimal places.bool— random true/false.name,firstName,lastName— built-in name pool.email— [email protected] style.url— random URL on a built-in domain pool.date(year_from, year_to)— ISO dateYYYY-MM-DD.datetime— ISO 8601 timestamp.lorem(word_count)— placeholder text.choice(a, b, c, …)— pick one literal value.regex(pattern)— tiny generator covering\d,\wand character classes like[A-Z].
Example
Schema:
id: uuid
user: name
email: email
age: int(18, 80)
plan: choice(free, pro, team)
signed_up: date(2023, 2026)
Output (seed 42, 2 records):
{"id":"d36f...","user":"Maya Jones","email":"[email protected]","age":31,"plan":"pro","signed_up":"2024-08-14"}
{"id":"4b21...","user":"Hugo Lee","email":"[email protected]","age":67,"plan":"free","signed_up":"2025-12-02"}
Tips & common pitfalls
- Nested fields use dot notation.
profile.bio: lorem(10)produces{"profile": {"bio": "lorem ipsum…"}}. - Always pin the seed when committing fixtures, so two devs get the same data.
- Pair with the validators. Generate a chat-shaped dataset, then run it through the matching OpenAI / Anthropic validator to confirm the shape before wiring it into your training script.
- 500k cap. Above that we run out of browser memory. For huge synthetic sets, generate in chunks or use a CLI tool.
Frequently asked questions
Is the data real?
No — purely synthetic. No external API calls, no real PII. Safe to use in public demos.
Can I generate fine-tune chat samples?
Indirectly. Define a flat schema for prompts and answers, then wrap with messages arrays in a tiny post-processing step. A dedicated visual editor exists for hand-authoring chat rows.
Why does the same seed not match other tools?
The seed only matches inside this generator. We use a small mulberry32 PRNG; other tools use different algorithms.