Japanese Tokenizer Comparison

This is a demo comparing Japanese tokenizers. You can compare the tokenization results of tools that are available with just a pip install in Python.

Select Tokenizers

Results

Janome - pip install janome

nagisa - pip install nagisa

SudachiPy - pip install sudachipy sudachidict_core

mecab-python3 - pip install mecab-python3

fugashi (IPAdic) - pip install fugashi ipadic

fugashi (UniDic) - pip install fugashi unidic-lite

tiktoken (GPT-4o) - pip install tiktoken

tiktoken (GPT-5) - pip install tiktoken

Examples

Examples
Pages:

How to install each library

Janome:

pip install janome

nagisa:

pip install nagisa

SudachiPy:

pip install sudachipy sudachidict_core

mecab-python3:

pip install mecab-python3

fugashi (IPAdic):

pip install fugashi ipadic

fugashi (UniDic):

pip install fugashi unidic-lite

tiktoken (GPT-4o):

pip install tiktoken

tiktoken (GPT-5):

pip install tiktoken