liblouis: Translation Opcodes

1 
1 2.8 Translation Opcodes
1 =======================
1 
1 These opcodes define the braille representations for character
1 sequences.  Each of them defines an entry within the contraction table.
1 These entries may be defined in any order except, as noted below, when
1 they define alternate representations for the same character sequence.
1 
1    Each of these opcodes specifies a condition under which the
1 translation is legal, and each also has a characters operand and a dots
1 operand.  The text being translated is processed strictly from left to
1 right, character by character, with the most eligible entry for each
1 position being used.  If there is more than one eligible entry for a
1 given position in the text, then the one with the longest character
1 string is used.  If there is more than one eligible entry for the same
1 character string, then the one defined first is is tested for legality
1 first.  (This is the only case in which the order of the entries makes a
1 difference.)
1 
1    The characters operand is a sequence or string of characters preceded
1 and followed by whitespace.  Each character can be entered in the normal
1 way, or it can be defined as a four-digit hexadecimal number preceded by
1 '\x'.
1 
1    The dots operand defines the braille representation for the
1 characters operand.  It may also be specified as an equals sign ('=').
11 This means that the the default representation for each character (⇒
 Character-Definition Opcodes) within the sequence is to be used.  Note
1 however that the '=' shortcut for dot patterns is deprecated.  Dot
1 patterns should be written out.  Otherwise back-translation may not be
1 correct.
1 
1    In what follows the word 'characters' means a sequence of one or more
1 consecutive letters between spaces and/or punctuation marks.
1 
1 'noback opcode ...'
1      This is an opcode prefix, that is to say, it modifies the operation
1      of the opcode that follows it on the same line.  noback specifies
1      that no back-translation is to be done using this line.
1 
1           noback always ;\s; 0
1 
1 'nofor opcode ...'
1      This is an opcode prefix which modifies the operation of the opcode
1      following it on the same line.  nofor specifies that forward
1      translation is not to use the information on this line.
1 
1 'compbrl characters'
1      If the characters are found within a block of text surrounded by
1      whitespace the entire block is translated according to the default
11      braille representations defined by the ⇒Character-Definition
      Opcodes, if 8-dot computer braille is enabled or according to the
11      dot patterns given in the 'comp6' opcode (⇒comp6 comp6
      opcode.), if 6-dot computer braille is enabled.  For example:
1 
1           compbrl www translate URLs in computer braille
1 
1 'comp6 character dots'
1      This opcode specifies the translation of characters in 6-dot
1      computer braille.  It is necessary because the translation of a
1      single character may require more than one cell.  The first operand
1      must be a character with a decimal representation from 0 to 255
1      inclusive.  The second operand may specify as many cells as
1      necessary.  The opcode is somewhat of a misnomer, since any dots,
1      not just dots 1 through 6, can be specified.  This even includes
1      virtual dots.
1 
1 'nocont characters'
1      Like 'compbrl', except that the string is uncontracted.  'prepunc'
1      opcode (⇒prepunc prepunc opcode.) and 'postpunc' opcode
1      (⇒postpunc postpunc opcode.) rules are applied, however.
1      This is useful for specifying that foreign words should not be
1      contracted in an entire document.
1 
1 'replace characters {characters}'
1      Replace the first set of characters, no matter where they appear,
1      with the second.  Note that the second operand is _NOT_ a dot
1      pattern.  It is also optional.  If it is omitted the character(s)
1      in the first operand will be discarded.  This is useful for
1      ignoring characters.  It is possible that the "ignored" characters
1      may still affect the translation indirectly.  Therefore, it is
11      preferable to use 'correct' opcode (⇒correct correct
      opcode.).
1 
1 'always characters dots'
1      Replace the characters with the dot pattern no matter where they
1      appear.  Do _NOT_ use an entry such as 'always a 1'.  Use the
1      'uplow', 'letter', etc.  character definition opcodes instead.  For
1      example:
1 
1           always world 456-2456 unconditional translation
1 
1 'repeated characters dots'
1      Replace the characters with the dot pattern no matter where they
1      appear.  Ignore any consecutive repetitions of the same character
1      sequence.  This is useful for shortening long strings of spaces or
1      hyphens or periods.  For example:
1 
1           repeated --- 36-36-36 shorten separator lines made with hyphens
1 
1 'repword characters dots'
1      When characters are encountered check to see if the word before
1      this string matches the word after it.  If so, replace characters
1      with dots and eliminate the second word and any word following
1      another occurrence of characters that is the same.  This opcode is
1      used in Malaysian braille.  In this case the rule is:
1 
1           repword - 123456
1 
1 'largesign characters dots'
1      Replace the characters with the dot pattern no matter where they
1      appear.  In addition, if two words defined as large signs follow
1      each other, remove the space between them.  For example, in
1      'en-us-g2.ctb' the words 'and' and 'the' are both defined as large
1      signs.  Thus, in the phrase 'the cat and the dog' the space would
1      be deleted between 'and' and 'the', with the result 'the cat andthe
1      dog'.  Of course, 'and' and 'the' would be properly contracted.
1      The term 'largesign' is a bit of braille jargon that pleases
1      braille experts.
1 
1 'word characters dots'
1      Replace the characters with the dot pattern if they are a word,
1      that is, are surrounded by whitespace and/or punctuation.
1 
1 'syllable characters dots'
1      As its name indicates, this opcode defines a "syllable" which must
1      be represented by exactly the dot patterns given.  Contractions may
1      not cross the boundaries of this "syllable" either from left or
1      right.  The character string defined by this opcode need not be a
1      lexical syllable, though it usually will be.  The equal sign in the
1      following example means that the the default representation for
1      Opcodes::):
1 
1           syllable horse = sawhorse, horseradish
1 
1 'nocross characters dots'
1      Replace the characters with the dot pattern if the characters are
1      all in one syllable (do not cross a syllable boundary).  For this
1      opcode to work, a hyphenation table must be included.  If this is
11      not done, 'nocross' behaves like the 'always' opcode (⇒always
      always opcode.).  For example, if the English Grade 2 table is
1      being used and the appropriate hyphenation table has been included
1      'nocross sh 146' will cause the 'sh' in 'monkshood' not to be
1      contracted.
1 
1 'joinword characters dots'
1      Replace the characters with the dot pattern if they are a word
1      which is followed by whitespace and a letter.  In addition remove
1      the whitespace.  For example, 'en-us-g2.ctb' has 'joinword to 235'.
1      This means that if the word 'to' is followed by another word the
1      contraction is to be used and the space is to be omitted.  If these
1      conditions are not met, the word is translated according to any
1      other opcodes that may apply to it.
1 
1 'lowword characters dots'
1      Replace the characters with the dot pattern if they are a word
1      preceded and followed by whitespace.  No punctuation either before
1      or after the word is allowed.  The term 'lowword' derives from the
1      fact that in English these contractions are written in the lower
1      part of the cell.  For example:
1 
1           lowword were 2356
1 
1 'contraction characters'
1      If you look at 'en-us-g2.ctb' you will see that some words are
1      actually contracted into some of their own letters.  A famous
1      example among braille transcribers is 'also', which is contracted
1      as 'al'.  But this is also the name of a person.  To take another
1      example, 'altogether' is contracted as 'alt', but this is the
1      abbreviation for the alternate key on a computer keyboard.
1      Similarly 'could' is contracted into 'cd', but this is the
1      abbreviation for compact disk.  To prevent confusion in such cases,
11      the letter sign (see 'letsign' opcode (⇒letsign letsign
      opcode.)) is placed before such letter combinations when they
1      actually are abbreviations, not contractions.  The 'contraction'
1      opcode tells the translator to do this.
1 
1 'sufword characters dots'
1      Replace the characters with the dot pattern if they are either a
1      word or at the beginning of a word.
1 
1 'prfword characters dots'
1      Replace the characters with the dot pattern if they are either a
1      word or at the end of a word.
1 
1 'begword characters dots'
1      Replace the characters with the dot pattern if they are at the
1      beginning of a word.
1 
1 'begmidword characters dots'
1      Replace the characters with the dot pattern if they are either at
1      the beginning or in the middle of a word.
1 
1 'midword characters dots'
1      Replace the characters with the dot pattern if they are in the
1      middle of a word.
1 
1 'midendword characters dots'
1      Replace the characters with the dot pattern if they are either in
1      the middle or at the end of a word.
1 
1 'endword characters dots'
1      Replace the characters with the dot pattern if they are at the end
1      of a word.
1 
1 'partword characters dots'
1      Replace the characters with the dot pattern if the characters are
1      anywhere in a word, that is, if they are proceeded or followed by a
1      letter.
1 
1 'exactdots @dots'
1      Note that the operand must begin with an at sign ('@').  The dot
1      pattern following it is evaluated for validity.  If it is valid,
1      whenever an at sign followed by this dot pattern appears in the
1      source document it is replaced by the characters corresponding to
1      the dot pattern in the output.  This opcode is intended for use in
1      liblouisutdml semantic-action files to specify exact dot patterns,
1      as in mathematical codes.  For example:
1 
1           exactdots @4-46-12356
1      will produce the characters with these dot patterns in the output.
1 
1 'prepunc characters dots'
1      Replace the characters with the dot pattern if they are part of
1      punctuation at the beginning of a word.
1 
1 'postpunc characters dots'
1      Replace the characters with the dot pattern if they are part of
1      punctuation at the end of a word.
1 
1 'begnum characters dots'
1      Replace the characters with the dot pattern if they are at the
1      beginning of a number, that is, before all its digits.  For
1      example, in 'en-us-g1.ctb' we have 'begnum # 4'.
1 
1 'midnum characters dots'
1      Replace the characters with the dot pattern if they are in the
1      middle of a number.  For example, 'en-us-g1.ctb' has 'midnum . 46'.
1      This is because the decimal point has a different dot pattern than
1      the period.
1 
1 'endnum characters dots'
1      Replace the characters with the dot pattern if they are at the end
1      of a number.  For example 'en-us-g1.ctb' has 'endnum th 1456'.
1      This handles things like '4th'.  A letter sign is _NOT_ inserted.
1 
1 'joinnum characters dots'
1      Replace the characters with the dot pattern.  In addition, if
1      whitespace and a number follows omit the whitespace.  This opcode
1      can be used to join currency symbols to numbers for example:
1 
1           joinnum \x20AC 15 (EURO SIGN)
1           joinnum \x0024 145 (DOLLAR SIGN)
1           joinnum \x00A3 1234 (POUND SIGN)
1           joinnum \x00A5 13456 (YEN SIGN)
1