aspell: The Language Data File

1 
1 7.1 The Language Data File
1 ==========================
1 
1 The basic format of the language data file is the same as it is for the
1 Aspell configuration file.  It is named `LANG.dat' and is located in
1 the architecture independent data dir for Aspell (option `data-dir')
1 which is usually `PREFIX/share/aspell'.  Use `aspell config' to find
1 out where it is in your installation.  By convention the language name
1 should be the two letter ISO 639 language code if it exists, if not use
1 the three letter code.
1 
1    The language data file has several mandatory fields, and several
1 optional ones.  All fields are case sensitive and should be in all
1 lower case.
1 
1    The two mandatory fields are `name' and `charset'.
1 
1    `name' is the name of the language and should be the same as the
1 file name (without the `.dat').
1 
1    `charset' is the 8-bit character set Aspell will expect the word
1 lists to be formatted in.  If possible choose from one of the standard
1 ones provided with Aspell.  These are `iso-8859-*', `koi8-*', or
1 `viscii'.  If your language does not require any non-ascii characters
1 choose `iso-8859-1'.  If one of these standard character sets is not
11 suitable for your language then you can create a new one.  ⇒
 Creating A New Character Set.
1 
1    The optional fields are as follows:
1 
1 `data-encoding'
1      The encoding the language data files are expected to be in as well
1      as the default encoding to use when saving the personal
1      dictionaries.  It can be either `utf-8' or any of the 8-bit
1      encoding that Aspell supports.  If not set, then it defaults to
1      `charset'.
1 
1 `special'
1      Non-letter characters that can appear in your language such as the
1      `'' and `-'. The format for the value is a list separated by
1      spaces.  Each item of the list has the following format.
1 
1           <char> <begin><middle><end>
1 
1      CHAR is the non-letter character in question.  BEGIN, MIDDLE, END
1      are either a `-' or a `*'.  A star for BEGIN means that the
1      character can begin a word, a `-' means it can't.  The same is
1      true for MIDDLE and END. For example, the entry for the `'' in
1      English is:
1 
1           ' -*-
1 
1      To include more than one middle character just list them one after
1      another on the same line.  For example, to make both the `'' and
1      the `-' a middle character, use the following line in the language
1      data file:
1 
1           special ' -*- - -*-
1 
1      However, please be aware that adding special characters can have
1      unintended consequences due to limitations of Aspell.  For example
1      if the `-' was accepted as a middle character, then _every_ word
1      with a `-' in it would be flagged as a spelling error unless that
1      exact word is in the dictionary, even if both parts are in the
1      dictionary.  Also, having a `.' as an end character will cause the
1      `.' to be part of any misspelled words.  Which can get very
1      annoying if you misspell a word at the end of a sentence.
1 
1 `soundslike'
1      The name of the soundslike data for the language.  The data is
1      expected to be in the file `NAME_phonet.dat'.
1 
1      If NAME is `simpile' then a very simple soundslike is used.  This
1      is not as powerful as full phonetic soundslike but it can be
1      computed a lot faster.  (⇒The Simple Soundslike)
1 
1      If the soundslike name is `none', or this option is not specified,
1      then no soundslike will be used.  The effective soundslike is the
1      word converted to all lowercase and possibly with accents stripped
1      depending on the `store-as' option.  For languages with phonetic
1      spelling the difference will not be very noticeable.  However, for
1      languages with non-phonetic spelling there will be a noticeable
1      difference.  The difference you notice will depend on the quality
1      of the soundslike data file.  If you do not notice much of a
1      difference for a language with non-phonetic spelling that is a good
1      indication that the soundslike data is not rough enough--or the
1      words you are trying are not that badly misspelled.
1 
1 `invisible-soundslike'
1      Avoid storing the soundslike information with the word.  Instead
1      it is computed as needed.  This option defaults to true if the
1      soundslike is `none' or `simpile', and false when a phonetic
1      soundslike is used.
1 
1 `repl-table'
1      ⇒Replacement Tables.
1 
1 `keyboard'
1      The base name of the keyboard definition file to use.  For more
1      information see ⇒Notes on Typo-Analysis.
1 
1 `sug-split-char'
1      A list of characters which specifies which characters to insert
1      between two words when a word is split.  This is a list option.
1 
1 `affix'
1 `affix-compress'
1 `partially-expand'
1      ⇒Affix Compression.
1 
1 `store-as'
1      How the words are indexed in the dictionary.  If "stripped" then
1      the word is indexed in a lower case and de-accented form.  If
1      "lower", then the word is indexed in a lower case form but with
1      accent info still intact.  This just controls how the word is
1      indexed, not how it is stored.  The default is "stripped" unless
1      affix compression is used.
1 
1 `norm-required'
1      Should be set to true if your language makes use of private use
1      characters or when Normalization Form C is not the same as full
1      composition.
1 
1 `normalize'
1 
1 `norm-form'
1 
1    Additional options includes options to control how run-together words
1 are handled the same way as they are in the normal configuration files.
11 for more information, please ⇒Controlling the Behavior of
 Run-together Words.
1