aspell: Unsupported

1 
1 B.2 Unsupported
1 ===============
1 
1 These languages, when written in the given script, are currently
1 unsupported by Aspell for one reason or another.
1 
1 Code   Language Name   Script
1 ja     Japanese        Japanese
1 km     Khmer           Khmer
1 ko     Korean          Han, Hangul
1 lo     Lao             Lao
1 th     Thai            Thai
1 zh     Chinese         Han
1 
1 B.2.1 The Thai, Khmer, and Lao Scripts
1 --------------------------------------
1 
1 The Thai, Khmer, and Lao scripts presents a different problem for
1 Aspell.  The problem is not that there are more than 210 unique symbols,
1 but that there are no spaces between words.  This means that there is no
1 easy way to split a sentence into individual words.  However, it is
1 still possible to spell check these scripts, it is just a lot more
1 difficult.  I will be happy to work with someone who is interested in
1 adding Thai, Khmer, or Lao support to Aspell, but it is not likely
1 something I will do on my own in the foreseeable future.
1 
1 B.2.2 Languages which use Hŕnzi Characters
1 ------------------------------------------
1 
1 Hŕnzi Characters are used to write Chinese, Japanese, Korean, and were
1 once used to write Vietnamese.  Each hŕnzi character represents a
1 syllable of a spoken word and also has a meaning.  Since there are
1 around 3,000 of them in common usage it is unlikely that Aspell will
1 ever be able to support spell checking languages written using hŕnzi
1 until full Unicode support is implemented.  However, I am not even sure
1 if these languages need spell checking since hŕnzi characters are
1 generally not entered in directly.  Furthermore even if Aspell could
1 spell check hŕnzi the existing suggestion strategy will not work well
1 at all, and thus a completely new strategy will need to be developed.
1 However, if it is the case that hŕnzi needs to be spell checked and you
1 know something about the issues involved please fell free to contact me.
1 
1 B.2.3 Japanese
1 --------------
1 
1 Modern Japanese is written in a mixture of "hiragana", "katakana",
1 "kanji", and sometimes "romaji".  "Hiragana" and "katakana" are both
1 syllabaries unique to Japan, "kanji" is a modified form of hŕnzi, and
1 "romaji" uses the Latin alphabet.  With some work, Aspell should be
1 able to check the non-kanji part of Japanese text.  However, based on
1 my limited understanding of Japanese hiragana is often used at the end
1 of kanji.  Thus if Aspell was to simply separate out the hiragana from
1 kanji it would end up with a lot of word endings which are not proper
1 words and will thus be flagged as misspellings.  However, this can be
1 fairly easily rectified as text is tokenized into words before it is
1 converted into Aspell's internal encoding.  In fact, some Japanese text
1 is written in entirely in one script.  For example books for children
1 and foreigners are sometimes written entirely in hiragana.  Thus,
1 Aspell, in its current state, could prove at least somewhat useful for
1 spell checking Japanese.
1 
1 B.2.4 Hangul
1 ------------
1 
1 Korean is generally written in hangul or a mixture of han and hangul.
1 In Hangul letters individual letters, known as jamo, are grouped
1 together in syllable blocks.  Unicode allows Hangul to be stored in one
1 of three ways, (A) Individual jamo letters (Hangul Compatibility Jamo,
1 U+3130 - U+318F), (D) decomposed jamo (Hangul Jamo, U+1100 - U+11FF),
1 and (C) precoposed sylable blocks (Hangul Syllables, U+AC00 - U+D7AF).
1 In order for Aspell to work with Hangul it needs to be form A.
1 Unfortunately the existing Normalization code in Aspell will not be
1 able to adequately deal with converting Hangul from form D and C to
1 form A and back again.  However, once this code is written, Aspell
1 should be able to spell check Hangul without any problem.
1