JSONL Token Counter
100% client-side. No upload.
Count tokens
Per-record token counts
JSONL Token Counter
Estimate token totals and per-record token counts across a JSONL file, with a model picker and fine-tune cost estimate. Flags records that exceed the chosen model's context window before you waste a training run on them.
How the estimate works
Real tokenizers (OpenAI's tiktoken, Anthropic's tokenizer) require shipping a
~2 MB WASM blob to the browser. To keep this tool fast and offline-capable, it uses a
characters-per-token heuristic calibrated per model family:
- GPT-family models — ~4.0 chars/token for typical English chat data.
- Claude models — ~3.6 chars/token; Claude's tokenizer splits a bit more aggressively on punctuation.
In practice, the estimate is within a few percent of tiktoken on natural-language
English. Code, non-Latin scripts, and heavy emoji use are tokenized more aggressively
by every tokenizer — for those, treat the number as a lower bound and add ~30% headroom
for budgeting.
Fine-tune cost estimate
For OpenAI models that have a published fine-tune training price, the tool multiplies
total_tokens × epochs × $/1M. For models without published training pricing
(Claude, GPT-4 Turbo currently), the cost row shows a note instead of a fake number.
Pricing reference (subject to change — check the provider's pricing page before committing):
- GPT-4o: $25 / 1M training tokens
- GPT-4o mini: $3 / 1M training tokens
- GPT-3.5 Turbo: $8 / 1M training tokens
Context-window check
Each model's per-request context window is fixed (e.g., 128k for GPT-4o, 200k for Claude). Records longer than that will fail at training or inference time. The summary flags how many records in your file exceed the limit so you can split or trim them before submitting — the JSONL Splitter won't help here (records stay intact), but the JSONL Viewer can help you find the long ones.
Tips & common pitfalls
- Estimate the message body, not the JSON wrapper. This tool counts the whole line as-is, which slightly overcounts because keys like
"role","content"aren't actually tokenized as part of the model input. For chat fine-tunes, the overcount is consistent (a few percent), so treat it as a safe upper bound. - Epochs matter for cost, not for context. Every epoch trains on the full file, so cost scales linearly with epochs. Default is 3 because that's what OpenAI's fine-tune jobs use unless you override it.
- Run after dedup. Duplicate examples pay tokens twice and hurt fine-tune quality. Use the JSONL Deduplicator first.
- Validate the structure first. If you're fine-tuning, confirm the file passes the OpenAI Fine-Tune Validator before estimating cost — invalid records get rejected without refund.
Example
Input — a small fine-tune dataset:
{"messages":[{"role":"user","content":"Hi"},{"role":"assistant","content":"Hello!"}]}
{"messages":[{"role":"user","content":"What is 2+2?"},{"role":"assistant","content":"4."}]}
At 3 epochs on GPT-4o mini, the summary will show ~50 tokens × 3 = 150 training tokens, costing fractions of a cent. Real datasets are millions of tokens; the estimate scales linearly.
Frequently asked questions
Why not use the real tokenizer?
It's a tradeoff. The real tokenizer is more accurate but adds ~2 MB of WASM to download. For budgeting fine-tunes the estimate is close enough — within a few percent on English chat data — and the tool stays instant on big files.
Does this work for inference cost (input/output token billing)?
Not directly — fine-tune training pricing and inference pricing differ. This tool focuses on training. For inference cost, multiply the input-token total by the published per-million input price for the model.
Why don't all Claude / GPT-4 Turbo entries show a price?
Anthropic doesn't publish a self-serve fine-tune price for Claude as of the date on this page; GPT-4 Turbo's fine-tune isn't generally available. Rather than make up numbers, those rows show "—".
Is my data sent to a server?
Never. Counting happens in your browser. See the privacy policy.