+
+
+
+ANTLR(1) PCCTS Manual Pages ANTLR(1)
+
+
+
+NAME
+ antlr - ANother Tool for Language Recognition
+
+SYNTAX
+ antlr [_\bo_\bp_\bt_\bi_\bo_\bn_\bs] _\bg_\br_\ba_\bm_\bm_\ba_\br__\bf_\bi_\bl_\be_\bs
+
+DESCRIPTION
+ _\bA_\bn_\bt_\bl_\br converts an extended form of context-free grammar into
+ a set of C functions which directly implement an efficient
+ form of deterministic recursive-descent LL(k) parser.
+ Context-free grammars may be augmented with predicates to
+ allow semantics to influence parsing; this allows a form of
+ context-sensitive parsing. Selective backtracking is also
+ available to handle non-LL(k) and even non-LALR(k) con-
+ structs. _\bA_\bn_\bt_\bl_\br also produces a definition of a lexer which
+ can be automatically converted into C code for a DFA-based
+ lexer by _\bd_\bl_\bg. Hence, _\ba_\bn_\bt_\bl_\br serves a function much like that
+ of _\by_\ba_\bc_\bc, however, it is notably more flexible and is more
+ integrated with a lexer generator (_\ba_\bn_\bt_\bl_\br directly generates
+ _\bd_\bl_\bg code, whereas _\by_\ba_\bc_\bc and _\bl_\be_\bx are given independent
+ descriptions). Unlike _\by_\ba_\bc_\bc which accepts LALR(1) grammars,
+ _\ba_\bn_\bt_\bl_\br accepts LL(k) grammars in an extended BNF notation -
+ which eliminates the need for precedence rules.
+
+ Like _\by_\ba_\bc_\bc grammars, _\ba_\bn_\bt_\bl_\br grammars can use automatically-
+ maintained symbol attribute values referenced as dollar
+ variables. Further, because _\ba_\bn_\bt_\bl_\br generates top-down
+ parsers, arbitrary values may be inherited from parent rules
+ (passed like function parameters). _\bA_\bn_\bt_\bl_\br also has a mechan-
+ ism for creating and manipulating abstract-syntax-trees.
+
+ There are various other niceties in _\ba_\bn_\bt_\bl_\br, including the
+ ability to spread one grammar over multiple files or even
+ multiple grammars in a single file, the ability to generate
+ a version of the grammar with actions stripped out (for
+ documentation purposes), and lots more.
+
+OPTIONS
+ -ck _\bn
+ Use up to _\bn symbols of lookahead when using compressed
+ (linear approximation) lookahead. This type of looka-
+ head is very cheap to compute and is attempted before
+ full LL(k) lookahead, which is of exponential complex-
+ ity in the worst case. In general, the compressed loo-
+ kahead can be much deeper (e.g, -ck 10) _\bt_\bh_\ba_\bn _\bt_\bh_\be _\bf_\bu_\bl_\bl
+ _\bl_\bo_\bo_\bk_\ba_\bh_\be_\ba_\bd (_\bw_\bh_\bi_\bc_\bh _\bu_\bs_\bu_\ba_\bl_\bl_\by _\bm_\bu_\bs_\bt _\bb_\be _\bl_\be_\bs_\bs _\bt_\bh_\ba_\bn _\b4).
+
+ -CC Generate C++ output from both ANTLR and DLG.
+
+ -cr Generate a cross-reference for all rules. For each
+ rule, print a list of all other rules that reference
+ it.
+
+ -e1 Ambiguities/errors shown in low detail (default).
+
+ -e2 Ambiguities/errors shown in more detail.
+
+ -e3 Ambiguities/errors shown in excruciating detail.
+
+ -fe file
+ Rename err.c to file.
+
+ -fh file
+ Rename stdpccts.h header (turns on -gh) to file.
+
+ -fl file
+ Rename lexical output, parser.dlg, to file.
+
+ -fm file
+ Rename file with lexical mode definitions, mode.h, to
+ file.
+
+ -fr file
+ Rename file which remaps globally visible symbols,
+ remap.h, to file.
+
+ -ft file
+ Rename tokens.h to file.
+
+ -ga Generate ANSI-compatible code (default case). This has
+ not been rigorously tested to be ANSI XJ11 C compliant,
+ but it is close. The normal output of _\ba_\bn_\bt_\bl_\br is
+ currently compilable under both K&R, ANSI C, and C++-
+ this option does nothing because _\ba_\bn_\bt_\bl_\br generates a
+ bunch of #ifdef's to do the right thing depending on
+ the language.
+
+ -gc Indicates that _\ba_\bn_\bt_\bl_\br should generate no C code, i.e.,
+ only perform analysis on the grammar.
+
+ -gd C code is inserted in each of the _\ba_\bn_\bt_\bl_\br generated pars-
+ ing functions to provide for user-defined handling of a
+ detailed parse trace. The inserted code consists of
+ calls to the user-supplied macros or functions called
+ zzTRACEIN and zzTRACEOUT. The only argument is a _\bc_\bh_\ba_\br
+ * pointing to a C-style string which is the grammar
+ rule recognized by the current parsing function. If no
+ definition is given for the trace functions, upon rule
+ entry and exit, a message will be printed indicating
+ that a particular rule as been entered or exited.
+
+ -ge Generate an error class for each non-terminal.
+
+ -gh Generate stdpccts.h for non-ANTLR-generated files to
+ include. This file contains all defines needed to
+ describe the type of parser generated by _\ba_\bn_\bt_\bl_\br (e.g.
+ how much lookahead is used and whether or not trees are
+ constructed) and contains the header action specified
+ by the user.
+
+ -gk Generate parsers that delay lookahead fetches until
+ needed. Without this option, _\ba_\bn_\bt_\bl_\br generates parsers
+ which always have _\bk tokens of lookahead available.
+
+ -gl Generate line info about grammar actions in C parser of
+ the form # _\bl_\bi_\bn_\be "_\bf_\bi_\bl_\be" which makes error messages from
+ the C/C++ compiler make more sense as they will point
+ into the grammar file not the resulting C file.
+ Debugging is easier as well, because you will step
+ through the grammar not C file.
+
+ -gs Do not generate sets for token expression lists;
+ instead generate a ||-separated sequence of
+ LA(1)==_\bt_\bo_\bk_\be_\bn__\bn_\bu_\bm_\bb_\be_\br. The default is to generate sets.
+
+ -gt Generate code for Abstract-Syntax Trees.
+
+ -gx Do not create the lexical analyzer files (dlg-related).
+ This option should be given when the user wishes to
+ provide a customized lexical analyzer. It may also be
+ used in _\bm_\ba_\bk_\be scripts to cause only the parser to be
+ rebuilt when a change not affecting the lexical struc-
+ ture is made to the input grammars.
+
+ -k _\bn Set k of LL(k) to _\bn; i.e. set tokens of look-ahead
+ (default==1).
+
+ -o dir
+ Directory where output files should go (default=".").
+ This is very nice for keeping the source directory
+ clear of ANTLR and DLG spawn.
+
+ -p The complete grammar, collected from all input grammar
+ files and stripped of all comments and embedded
+ actions, is listed to stdout. This is intended to aid
+ in viewing the entire grammar as a whole and to elim-
+ inate the need to keep actions concisely stated so that
+ the grammar is easier to read. Hence, it is preferable
+ to embed even complex actions directly in the grammar,
+ rather than to call them as subroutines, since the sub-
+ routine call overhead will be saved.
+
+ -pa This option is the same as -p except that the output is
+ annotated with the first sets determined from grammar
+ analysis.
+
+ -prc on
+ Turn on the computation and hoisting of predicate con-
+ text.
+
+ -prc off
+ Turn off the computation and hoisting of predicate con-
+ text. This option makes 1.10 behave like the 1.06
+ release with option -pr on. Context computation is off
+ by default.
+
+ -rl _\bn
+ Limit the maximum number of tree nodes used by grammar
+ analysis to _\bn. Occasionally, _\ba_\bn_\bt_\bl_\br is unable to
+ analyze a grammar submitted by the user. This rare
+ situation can only occur when the grammar is large and
+ the amount of lookahead is greater than one. A non-
+ linear analysis algorithm is used by PCCTS to handle
+ the general case of LL(k) parsing. The average com-
+ plexity of analysis, however, is near linear due to
+ some fancy footwork in the implementation which reduces
+ the number of calls to the full LL(k) algorithm. An
+ error message will be displayed, if this limit is
+ reached, which indicates the grammar construct being
+ analyzed when _\ba_\bn_\bt_\bl_\br hit a non-linearity. Use this
+ option if _\ba_\bn_\bt_\bl_\br seems to go out to lunch and your disk
+ start thrashing; try _\bn=10000 to start. Once the
+ offending construct has been identified, try to remove
+ the ambiguity that _\ba_\bn_\bt_\bl_\br was trying to overcome with
+ large lookahead analysis. The introduction of (...)?
+ backtracking blocks eliminates some of these problems -
+ _\ba_\bn_\bt_\bl_\br does not analyze alternatives that begin with
+ (...)? (it simply backtracks, if necessary, at run
+ time).
+
+ -w1 Set low warning level. Do not warn if semantic
+ predicates and/or (...)? blocks are assumed to cover
+ ambiguous alternatives.
+
+ -w2 Ambiguous parsing decisions yield warnings even if
+ semantic predicates or (...)? blocks are used. Warn if
+ predicate context computed and semantic predicates
+ incompletely disambiguate alternative productions.
+
+ - Read grammar from standard input and generate stdin.c
+ as the parser file.
+
+SPECIAL CONSIDERATIONS
+ _\bA_\bn_\bt_\bl_\br works... we think. There is no implicit guarantee of
+ anything. We reserve no legal rights to the software known
+ as the Purdue Compiler Construction Tool Set (PCCTS) - PCCTS
+ is in the public domain. An individual or company may do
+ whatever they wish with source code distributed with PCCTS
+ or the code generated by PCCTS, including the incorporation
+ of PCCTS, or its output, into commercial software. We
+ encourage users to develop software with PCCTS. However, we
+ do ask that credit is given to us for developing PCCTS. By
+ "credit", we mean that if you incorporate our source code
+ into one of your programs (commercial product, research pro-
+ ject, or otherwise) that you acknowledge this fact somewhere
+ in the documentation, research report, etc... If you like
+ PCCTS and have developed a nice tool with the output, please
+ mention that you developed it using PCCTS. As long as these
+ guidelines are followed, we expect to continue enhancing
+ this system and expect to make other tools available as they
+ are completed.
+
+FILES
+ *.c output C parser.
+
+ *.cpp
+ output C++ parser when C++ mode is used.
+
+ parser.dlg
+ output _\bd_\bl_\bg lexical analyzer.
+
+ err.c
+ token string array, error sets and error support rou-
+ tines. Not used in C++ mode.
+
+ remap.h
+ file that redefines all globally visible parser sym-
+ bols. The use of the #parser directive creates this
+ file. Not used in C++ mode.
+
+ stdpccts.h
+ list of definitions needed by C files, not generated by
+ PCCTS, that reference PCCTS objects. This is not gen-
+ erated by default. Not used in C++ mode.
+
+ tokens.h
+ output #_\bd_\be_\bf_\bi_\bn_\be_\bs for tokens used and function prototypes
+ for functions generated for rules.
+
+
+SEE ALSO
+ dlg(1), pccts(1)
+
+
+
+
+