liblouis: The Context and Multipass Opcodes

1 
1 2.11 The Context and Multipass Opcodes
1 ======================================
1 
1 The 'context' and multipass opcodes ('pass2', 'pass3' and 'pass4')
1 provide translation capabilities beyond those of the basic translation
1 opcodes (⇒Translation Opcodes) discussed previously.  The
1 multipass opcodes cause additional passes to be made over the string to
1 be translated.  The number after the word 'pass' indicates in which pass
1 the entry is to be applied.  If no multipass opcodes are given, only the
1 first translation pass is made.  The 'context' opcode is basically a
1 multipass opcode for the first pass.  It differs slightly from the
1 multipass opcodes per se.  The format of all these opcodes is 'opcode
1 test action'.  The specific opcodes are invoked as follows:
1 
1 'context test action'
1 'pass2 test action'
1 'pass3 test action'
1 'pass4 test action'
1 
1    The 'test' and 'action' operands have suboperands.  Each suboperand
1 begins with a non-alphanumeric character and ends when another
1 non-alphanumeric character is encountered.  The suboperands and their
1 initial characters are as follows.
1 
1 '" (double quote)'
1      a string of characters.  This string must be terminated by another
1      double quote.  It may contain any characters.  If a double quote is
1      needed within the string, it must be preceded by a backslash ('\').
1      If a space is needed, it must be represented by the escape sequence
1      \s.  This suboperand is valid only in the test part of the
1      'context' opcode.
1 
1 '@ (at sign)'
1      a sequence of dot patterns.  Cells are separated by hyphens as
1      usual.  This suboperand is not valid in the test part of the
1      context and correct opcodes.
1 
1 '` (accent mark)'
1      If this is the beginning of the string being translated this
1      suboperand is true.  It is valid only in the test part and must be
1      the first thing in this operand.
1 
1 '~ (tilde)'
1      If this is the end of the string being translated this suboperand
1      is true.  It is valid only in the test part and must be the last
1      thing in this operand.
1 
1 '$ (dollar sign)'
1      a string of attributes, such as 'd' for digit, 'l' for letter, etc.
1      More than one attribute can be given.  If you wish to check
1      characters with any attribute, use the letter 'a'.  Input
1      characters are checked to see if they have at least one of the
1      attributes.  The attribute string can be followed by numbers
1      specifying how many characters are to be checked.  If no numbers
1      are given, 1 is assumed.  If two numbers separated by a hyphen are
1      given, the input is checked to make sure that at least the first
1      number of characters with the attributes are present, but no more
1      than the second number.  If only one number is present, then
1      exactly that many characters must have the attributes.  A period
1      instead of the numbers indicates an indefinite number of characters
1      (for technical reasons the number of characters that are actually
1      matched is limited to 65535).
1 
1      This suboperand is valid in all test parts but not in action parts.
1      For the characters which can be used in attribute strings, see the
1      following table.
1 
1 '! (exclamation point)'
1      reverses the logical meaning of the suboperand which follows.  For
1      example, !$d is true only if the character is _NOT_ a digit.  This
1      suboperand is valid in test parts only.
1 
1 '% (percent sign)'
11      the name of a class defined by the 'class' opcode (⇒class
      class opcode.) or the name of a swap set defined by the swap
1      opcodes (⇒Swap Opcodes).  Names may contain only letters.
1      The letters may be upper or lower-case.  The case matters.  Class
1      names may be used in test parts only.  Swap names are valid
1      everywhere.
1 
1 '{ (left brace)'
1      Name: the name of a grouping pair.  The left brace indicates that
1      the first (or left) member of the pair is to be used in matching.
1      If this is between replacement brackets it must be the only item.
1      This is also valid in the action part.
1 
1 '} (right brace)'
1      Name: the name of a grouping pair.  The right brace indicates that
1      the second (or right) member is to be used in matching.  See the
1      remarks on the left brace immediately above.
1 
1 '/ (slash)'
1      Search the input for the expression following the slash and return
1      true if found.  This can be used to set a variable.
1 
1 '_ (underscore)'
1      Move backward.  If a number follows, move backward that number of
1      characters.  The program never moves backward beyond the beginning
1      of the input string.  This suboperand is valid only in test parts.
1 
1 '[ (left bracket)'
1      start replacement here.  This suboperand must always be paired with
1      a right bracket and is valid only in test parts.  Multiple pairs of
1      square brackets in a single expression are not allowed.
1 
1 '] (right bracket)'
1      end replacement here.  This suboperand must always be paired with a
1      left bracket and is valid only in test parts.
1 
1 '# (number sign or crosshatch)'
1      test or set a variable.  Variables are referred to by numbers 1 to
1      50, for example, '#1', '#2', '#25'.  Variables may be set by one
1      'context' or multipass opcode and tested by another.  Thus, an
1      operation that occurs at one place in a translation can tell an
1      operation that occurs later about itself.  This feature will be
1      used in math translation, and it may also help to alleviate the
1      need for new opcodes.  This suboperand is valid everywhere.
1 
1      Variables are set in the action part.  To set a variable use an
1      expression like '#1=1', '#2=5', etc.  Variables are also
1      incremented and decremented in the action part with expressions
1      like '#1+', '#3-', etc.  These operators increment or decrement the
1      variable by 1.
1 
1      Variables are tested in the test part with expressions like '#1=2',
1      '#3<4', '#5>6', etc.
1 
1 '* (asterisk)'
1      Copy the characters or dot patterns in the input within the
1      replacement brackets into the output and discard anything else that
1      may match.  This feature is used, for example, for handling numeric
1      subscripts in Nemeth.  This suboperand is valid only in action
1      parts.
1 
1 '? (question mark)'
1      Valid only in the action part.  The characters to be replaced are
1      simply ignored.  That is, they are replaced with nothing.  If
1      either member of a grouping pair is in the replace brackets the
1      other member at the same level is also removed.
1 
1    The characters which can be used in attribute strings are as follows:
1 
1 'a'
1      any attribute
1 'd'
1      digit
1 'D'
1      literary digit
1 'l'
1      letter
1 'm'
1      math
1 'p'
1      punctuation
1 'S'
1      sign
1 's'
1      space
1 'U'
1      uppercase
1 'u'
1      lowercase
1 'w'
1      first user-defined class
1 'x'
1      second user-defined class
1 'y'
1      third user-defined class
1 'z'
1      fourth user-defined class
1 
1    The following illustrates the algorithm how text is evaluated with
1 multipass expressions:
1 
1 Loop over context, pass2, pass3 and pass4 and do the following for each
1 pass:
1 
1   a. Match the text following the cursor against all expressions in the
1      current pass
1   b. If there is no match: shift the cursor one position to the right
1      and continue the loop
1   c. If there is a match: choose the longest match
1   d. Do the replacement (everything between square brackets)
1   e. Place the cursor after the replaced text
1   f. continue loop
1