Wolfram Research

Function Repository Resource:

KanjiQ

Source Notebook

Test if a string is composed of kanji characters

Contributed by: Richard Hennigan (Wolfram Research)

ResourceFunction["KanjiQ"][string]

yields True if all the characters in string are kanji characters, and yields False otherwise.

Details and Options

Kanji (漢字) are the adopted logographic Chinese characters that are used in the Japanese writing system. They are used alongside the Japanese syllabic scripts hiragana and katakana.
ResourceFunction["KanjiQ"][string] by default gives False if string contains any space or punctuation characters.
ResourceFunction["KanjiQ"] has the following options:
IgnorePunctuation False whether to ignore PunctuationCharacter in the string
"IgnoreWhitespace" False whether to ignore WhitespaceCharacter in the string
ResourceFunction["KanjiQ"] automatically threads over lists.

Examples

Basic Examples

Test whether a character is kanji:

In[1]:=
ResourceFunction["KanjiQ"]["字"]
Out[1]=
In[2]:=
ResourceFunction["KanjiQ"]["a"]
Out[2]=
In[3]:=
ResourceFunction["KanjiQ"]["あ"]
Out[3]=

Test if a string contains only kanji characters:

In[4]:=
ResourceFunction["KanjiQ"]["漢字"]
Out[4]=
In[5]:=
ResourceFunction["KanjiQ"]["かんじ"]
Out[5]=
In[6]:=
ResourceFunction["KanjiQ"]["kanji"]
Out[6]=
In[7]:=
ResourceFunction["KanjiQ"]["カンジ"]
Out[7]=

Scope

KanjiQ yields False when given spaces:

In[8]:=
ResourceFunction["KanjiQ"]["間隔 文字"]
Out[8]=
In[9]:=
ResourceFunction["KanjiQ"]["間隔文字"]
Out[9]=

KanjiQ yields False when given punctuation characters:

In[10]:=
ResourceFunction["KanjiQ"]["句読点?"]
Out[10]=

KanjiQ threads over lists:

In[11]:=
ResourceFunction["KanjiQ"][{"字", "あ", "日本語", "ア", "二"}]
Out[11]=

Options

IgnorePunctuation

By default, the presence of characters that match PunctuationCharacter will cause KanjiQ to yield False:

In[12]:=
ResourceFunction["KanjiQ"]["句読点?"]
Out[12]=

Ignore punctuation:

In[13]:=
ResourceFunction["KanjiQ"]["句読点?", IgnorePunctuation -> True]
Out[13]=

IgnoreWhitespace

By default, the presence of characters that match WhitespaceCharacter will cause KanjiQ to yield False:

In[14]:=
ResourceFunction["KanjiQ"]["間隔 文字"]
Out[14]=

Ignore whitespace characters:

In[15]:=
ResourceFunction["KanjiQ"]["間隔 文字", "IgnoreWhitespace" -> True]
Out[15]=

Properties and Relations

The empty string will yield True:

In[16]:=
ResourceFunction["KanjiQ"][""]
Out[16]=

Get a list of kanji characters:

In[17]:=
Select[FromCharacterCode /@ Range[0, 65535], ResourceFunction[
 "KanjiQ"]]
Out[17]=

Test if a character name corresponds to a kanji character:

In[18]:=
kanjiNameQ = ResourceFunction["KanjiQ"]@*ResourceFunction["FromCharacterName"]
Out[18]=
In[19]:=
kanjiNameQ["CJKUnifiedIdeograph5B57"]
Out[19]=
In[20]:=
kanjiNameQ["HiraganaLetterA"]
Out[20]=

Possible Issues

Although there are over 50,000 kanji in existence[1], KanjiQ only identifies characters in the Unicode blocks 3400–4DB6 and 4E00–9FAF:

In[21]:=
FromCharacterCode /@ Join[
  Range[FromDigits["3400", 16], FromDigits["4DB6", 16]],
  Range[FromDigits["4E00", 16], FromDigits["9FAF", 16]]
  ]
Out[21]=
In[22]:=
% === Select[FromCharacterCode /@ Range[0, 65535], ResourceFunction[
  "KanjiQ"]]
Out[22]=

Neat Examples

SpeechSynthesize a piece of text, but use a Japanese voice for Japanese characters:

In[23]:=
text = "This text is in English. このテキストは日本語です。"
Out[23]=
In[24]:=
voice = First[VoiceStyleData[#Language == "Japanese" &]]
Out[24]=
In[25]:=
japaneseQ = Or[
   ResourceFunction["KanjiQ"][#],
   ResourceFunction["HiraganaQ"][#],
   ResourceFunction["KatakanaQ"][#]
   ] &
Out[25]=
In[26]:=
Replace[StringSplit[text, h : (Longest[__?(japaneseQ)] ~~ PunctuationCharacter) :> SpeechSynthesize[h, voice]], s_String :> SpeechSynthesize[s], {1}]
Out[26]=
In[27]:=
AudioPlay[AudioJoin[%]]
Out[27]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Resource History

See Also

License Information