The `kuromoji_tokenizer` tokenizer uses characters from the MeCab-IPADIC dictionary to split text into tokens. The dictionary includes some full-width characters, such as `o` and `f`. If a text contains full-width characters, the tokenizer can produce unexpected tokens.
For example, the `kuromoji_tokenizer` tokenizer converts the text `CultureofJapan` to the tokens `[ culture, o, f, japan ]` instead of `[ culture, of, japan ]`.
To avoid this, add the [`icu_normalizer` character filter](/reference/elasticsearch-plugins/analysis-icu-normalization-charfilter.md) to a custom analyzer based on the `kuromoji` analyzer. The `icu_normalizer` character filter converts full-width characters to their normal equivalents.
First, duplicate the `kuromoji` analyzer to create the basis for a custom analyzer. Then add the `icu_normalizer` character filter to the custom analyzer. For example:
```console
PUT index-00001
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"kuromoji_normalize": { <1>
"char_filter": [
"icu_normalizer" <2>
],
"tokenizer": "kuromoji_tokenizer",
"filter": [
"kuromoji_baseform",
"kuromoji_part_of_speech",
"cjk_width",
"ja_stop",
"kuromoji_stemmer",
"lowercase"
]
}
}
}
}
}
}
```
1. Creates a new custom analyzer, `kuromoji_normalize`, based on the `kuromoji` analyzer.
2. Adds the `icu_normalizer` character filter to the analyzer.