gettext: General Problems

1 
1 15.5.18.1 General Problems Parsing Perl Code
1 ............................................
1 
1    It is often heard that only Perl can parse Perl.  This is not true.
1 Perl cannot be _parsed_ at all, it can only be _executed_.  Perl has
1 various built-in ambiguities that can only be resolved at runtime.
1 
1    The following example may illustrate one common problem:
1 
1      print gettext "Hello World!";
1 
1    Although this example looks like a bullet-proof case of a function
1 invocation, it is not:
1 
1      open gettext, ">testfile" or die;
1      print gettext "Hello world!"
1 
1    In this context, the string ‘gettext’ looks more like a file handle.
1 But not necessarily:
1 
1      use Locale::Messages qw (:libintl_h);
1      open gettext ">testfile" or die;
1      print gettext "Hello world!";
1 
1    Now, the file is probably syntactically incorrect, provided that the
1 module ‘Locale::Messages’ found first in the Perl include path exports a
1 function ‘gettext’.  But what if the module ‘Locale::Messages’ really
1 looks like this?
1 
1      use vars qw (*gettext);
1 
1      1;
1 
1    In this case, the string ‘gettext’ will be interpreted as a file
1 handle again, and the above example will create a file ‘testfile’ and
1 write the string “Hello world!” into it.  Even advanced control flow
1 analysis will not really help:
1 
1      if (0.5 < rand) {
1         eval "use Sane";
1      } else {
1         eval "use InSane";
1      }
1      print gettext "Hello world!";
1 
1    If the module ‘Sane’ exports a function ‘gettext’ that does what we
1 expect, and the module ‘InSane’ opens a file for writing and associates
1 the _handle_ ‘gettext’ with this output stream, we are clueless again
1 about what will happen at runtime.  It is completely unpredictable.  The
1 truth is that Perl has so many ways to fill its symbol table at runtime
1 that it is impossible to interpret a particular piece of code without
1 executing it.
1 
1    Of course, ‘xgettext’ will not execute your Perl sources while
1 scanning for translatable strings, but rather use heuristics in order to
1 guess what you meant.
1 
1    Another problem is the ambiguity of the slash and the question mark.
1 Their interpretation depends on the context:
1 
1      # A pattern match.
1      print "OK\n" if /foobar/;
1 
1      # A division.
1      print 1 / 2;
1 
1      # Another pattern match.
1      print "OK\n" if ?foobar?;
1 
1      # Conditional.
1      print $x ? "foo" : "bar";
1 
1    The slash may either act as the division operator or introduce a
1 pattern match, whereas the question mark may act as the ternary
1 conditional operator or as a pattern match, too.  Other programming
1 languages like ‘awk’ present similar problems, but the consequences of a
1 misinterpretation are particularly nasty with Perl sources.  In ‘awk’
1 for instance, a statement can never exceed one line and the parser can
1 recover from a parsing error at the next newline and interpret the rest
1 of the input stream correctly.  Perl is different, as a pattern match is
1 terminated by the next appearance of the delimiter (the slash or the
1 question mark) in the input stream, regardless of the semantic context.
1 If a slash is really a division sign but mis-interpreted as a pattern
1 match, the rest of the input file is most probably parsed incorrectly.
1 
1    There are certain cases, where the ambiguity cannot be resolved at
1 all:
1 
1      $x = wantarray ? 1 : 0;
1 
1    The Perl built-in function ‘wantarray’ does not accept any arguments.
1 The Perl parser therefore knows that the question mark does not start a
1 regular expression but is the ternary conditional operator.
1 
1      sub wantarrays {}
1      $x = wantarrays ? 1 : 0;
1 
1    Now the situation is different.  The function ‘wantarrays’ takes a
1 variable number of arguments (like any non-prototyped Perl function).
1 The question mark is now the delimiter of a pattern match, and hence the
1 piece of code does not compile.
1 
1      sub wantarrays() {}
1      $x = wantarrays ? 1 : 0;
1 
1    Now the function is prototyped, Perl knows that it does not accept
1 any arguments, and the question mark is therefore interpreted as the
1 ternaray operator again.  But that unfortunately outsmarts ‘xgettext’.
1 
1    The Perl parser in ‘xgettext’ cannot know whether a function has a
1 prototype and what that prototype would look like.  It therefore makes
1 an educated guess.  If a function is known to be a Perl built-in and
1 this function does not accept any arguments, a following question mark
1 or slash is treated as an operator, otherwise as the delimiter of a
1 following regular expression.  The Perl built-ins that do not accept
1 arguments are ‘wantarray’, ‘fork’, ‘time’, ‘times’, ‘getlogin’,
1 ‘getppid’, ‘getpwent’, ‘getgrent’, ‘gethostent’, ‘getnetent’,
1 ‘getprotoent’, ‘getservent’, ‘setpwent’, ‘setgrent’, ‘endpwent’,
1 ‘endgrent’, ‘endhostent’, ‘endnetent’, ‘endprotoent’, and ‘endservent’.
1 
1    If you find that ‘xgettext’ fails to extract strings from portions of
1 your sources, you should therefore look out for slashes and/or question
1 marks preceding these sections.  You may have come across a bug in
1 ‘xgettext’’s Perl parser (and of course you should report that bug).  In
1 the meantime you should consider to reformulate your code in a manner
1 less challenging to ‘xgettext’.
1 
1    In particular, if the parser is too dumb to see that a function does
1 not accept arguments, use parentheses:
1 
1      $x = somefunc() ? 1 : 0;
1      $y = (somefunc) ? 1 : 0;
1 
1    In fact the Perl parser itself has similar problems and warns you
1 about such constructs.
1