Function Repository Resource:

KanjiQ

Source Notebook

Test if a string is composed of kanji characters

Contributed by: Richard Hennigan (Wolfram Research)

ResourceFunction["KanjiQ"][string]

yields True if all the characters in string are kanji characters, and yields False otherwise.

Details and Options

Kanji (漢字) are the adopted logographic Chinese characters that are used in the Japanese writing system. They are used alongside the Japanese syllabic scripts hiragana and katakana.
ResourceFunction["KanjiQ"][string] by default gives False if string contains any space or punctuation characters.
ResourceFunction["KanjiQ"] has the following options:
IgnorePunctuationFalsewhether to ignore PunctuationCharacter in the string
"IgnoreWhitespace"Falsewhether to ignore WhitespaceCharacter in the string
ResourceFunction["KanjiQ"] automatically threads over lists.

Examples

Basic Examples (2) 

Test whether a character is kanji:

In[1]:=
ResourceFunction["KanjiQ"]["字"]
Out[1]=
In[2]:=
ResourceFunction["KanjiQ"]["a"]
Out[2]=
In[3]:=
ResourceFunction["KanjiQ"]["あ"]
Out[3]=

Test if a string contains only kanji characters:

In[4]:=
ResourceFunction["KanjiQ"]["漢字"]
Out[4]=
In[5]:=
ResourceFunction["KanjiQ"]["かんじ"]
Out[5]=
In[6]:=
ResourceFunction["KanjiQ"]["kanji"]
Out[6]=
In[7]:=
ResourceFunction["KanjiQ"]["カンジ"]
Out[7]=

Scope (3) 

KanjiQ yields False when given spaces:

In[8]:=
ResourceFunction["KanjiQ"]["間隔 文字"]
Out[8]=
In[9]:=
ResourceFunction["KanjiQ"]["間隔文字"]
Out[9]=

KanjiQ yields False when given punctuation characters:

In[10]:=
ResourceFunction["KanjiQ"]["句読点?"]
Out[10]=

KanjiQ threads over lists:

In[11]:=
ResourceFunction["KanjiQ"][{"字", "あ", "日本語", "ア", "二"}]
Out[11]=

Options (4) 

IgnorePunctuation (2) 

By default, the presence of characters that match PunctuationCharacter will cause KanjiQ to yield False:

In[12]:=
ResourceFunction["KanjiQ"]["句読点?"]
Out[12]=

Ignore punctuation:

In[13]:=
ResourceFunction["KanjiQ"]["句読点?", IgnorePunctuation -> True]
Out[13]=

IgnoreWhitespace (2) 

By default, the presence of characters that match WhitespaceCharacter will cause KanjiQ to yield False:

In[14]:=
ResourceFunction["KanjiQ"]["間隔 文字"]
Out[14]=

Ignore whitespace characters:

In[15]:=
ResourceFunction["KanjiQ"]["間隔 文字", "IgnoreWhitespace" -> True]
Out[15]=

Properties and Relations (3) 

The empty string will yield True:

In[16]:=
ResourceFunction["KanjiQ"][""]
Out[16]=

Get a list of kanji characters:

In[17]:=
Select[FromCharacterCode /@ Range[0, 65535], ResourceFunction[
 "KanjiQ"]]
Out[17]=

Test if a character name corresponds to a kanji character:

In[18]:=
kanjiNameQ = ResourceFunction["KanjiQ"]@*ResourceFunction["FromCharacterName"]
Out[18]=
In[19]:=
kanjiNameQ["CJKUnifiedIdeograph5B57"]
Out[19]=
In[20]:=
kanjiNameQ["HiraganaLetterA"]
Out[20]=

Possible Issues (1) 

Although there are over 50,000 kanji in existence[1], KanjiQ only identifies characters in the Unicode blocks 3400–4DB6 and 4E00–9FAF:

In[21]:=
FromCharacterCode /@ Join[
  Range[FromDigits["3400", 16], FromDigits["4DB6", 16]],
  Range[FromDigits["4E00", 16], FromDigits["9FAF", 16]]
  ]
Out[21]=
In[22]:=
% === Select[FromCharacterCode /@ Range[0, 65535], ResourceFunction[
  "KanjiQ"]]
Out[22]=

Neat Examples (1) 

SpeechSynthesize a piece of text, but use a Japanese voice for Japanese characters:

In[23]:=
text = "This text is in English. このテキストは日本語です。"
Out[23]=
In[24]:=
voice = First[VoiceStyleData[#Language == "Japanese" &]]
Out[24]=
In[25]:=
japaneseQ = Or[
   ResourceFunction["KanjiQ"][#],
   ResourceFunction["HiraganaQ"][#],
   ResourceFunction["KatakanaQ"][#]
   ] &
Out[25]=
In[26]:=
Replace[StringSplit[text, h : (Longest[__?(japaneseQ)] ~~ PunctuationCharacter) :> SpeechSynthesize[h, voice]], s_String :> SpeechSynthesize[s], {1}]
Out[26]=
In[27]:=
AudioPlay[AudioJoin[%]]
Out[27]=

Requirements

Wolfram Language 11.3 (March 2018) or above

Version History

  • 1.0.0 – 04 March 2019

Related Resources

License Information