X-Git-Url: https://pd.if.org/git/?p=pccts;a=blobdiff_plain;f=antlr%2Fantlr.1;fp=antlr%2Fantlr.1;h=6ace1b2a938d7f11c8296c008e2da1fed2909265;hp=0000000000000000000000000000000000000000;hb=780a935d52ff31d98a3f1083ab0f363a7aafb30d;hpb=ca5cea5c2f4e781582ae8f220c83018d17cb418d diff --git a/antlr/antlr.1 b/antlr/antlr.1 new file mode 100755 index 0000000..6ace1b2 --- /dev/null +++ b/antlr/antlr.1 @@ -0,0 +1,209 @@ +.TH ANTLR 1 "September 1995" "ANTLR" "PCCTS Manual Pages" +.SH NAME +antlr \- ANother Tool for Language Recognition +.SH SYNTAX +.LP +\fBantlr\fR [\fIoptions\fR] \fIgrammar_files\fR +.SH DESCRIPTION +.PP +\fIAntlr\fP converts an extended form of context-free grammar into a +set of C functions which directly implement an efficient form of +deterministic recursive-descent LL(k) parser. Context-free grammars +may be augmented with predicates to allow semantics to influence +parsing; this allows a form of context-sensitive parsing. Selective +backtracking is also available to handle non-LL(k) and even +non-LALR(k) constructs. \fIAntlr\fP also produces a definition of a +lexer which can be automatically converted into C code for a DFA-based +lexer by \fIdlg\fR. Hence, \fIantlr\fR serves a function much like +that of \fIyacc\fR, however, it is notably more flexible and is more +integrated with a lexer generator (\fIantlr\fR directly generates +\fIdlg\fR code, whereas \fIyacc\fR and \fIlex\fR are given independent +descriptions). Unlike \fIyacc\fR which accepts LALR(1) grammars, +\fIantlr\fR accepts LL(k) grammars in an extended BNF notation \(em +which eliminates the need for precedence rules. +.PP +Like \fIyacc\fR grammars, \fIantlr\fR grammars can use +automatically-maintained symbol attribute values referenced as dollar +variables. Further, because \fIantlr\fR generates top-down parsers, +arbitrary values may be inherited from parent rules (passed like +function parameters). \fIAntlr\fP also has a mechanism for creating +and manipulating abstract-syntax-trees. +.PP +There are various other niceties in \fIantlr\fR, including the ability to +spread one grammar over multiple files or even multiple grammars in a single +file, the ability to generate a version of the grammar with actions stripped +out (for documentation purposes), and lots more. +.SH OPTIONS +.IP "\fB-ck \fIn\fR" +Use up to \fIn\fR symbols of lookahead when using compressed (linear +approximation) lookahead. This type of lookahead is very cheap to +compute and is attempted before full LL(k) lookahead, which is of +exponential complexity in the worst case. In general, the compressed +lookahead can be much deeper (e.g, \f(CW-ck 10\fP) than the full +lookahead (which usually must be less than 4). +.IP \fB-CC\fP +Generate C++ output from both ANTLR and DLG. +.IP \fB-cr\fP +Generate a cross-reference for all rules. For each rule, print a list +of all other rules that reference it. +.IP \fB-e1\fP +Ambiguities/errors shown in low detail (default). +.IP \fB-e2\fP +Ambiguities/errors shown in more detail. +.IP \fB-e3\fP +Ambiguities/errors shown in excruciating detail. +.IP "\fB-fe\fP file" +Rename \fBerr.c\fP to file. +.IP "\fB-fh\fP file" +Rename \fBstdpccts.h\fP header (turns on \fB-gh\fP) to file. +.IP "\fB-fl\fP file" +Rename lexical output, \fBparser.dlg\fP, to file. +.IP "\fB-fm\fP file" +Rename file with lexical mode definitions, \fBmode.h\fP, to file. +.IP "\fB-fr\fP file" +Rename file which remaps globally visible symbols, \fBremap.h\fP, to file. +.IP "\fB-ft\fP file" +Rename \fBtokens.h\fP to file. +.IP \fB-ga\fP +Generate ANSI-compatible code (default case). This has not been +rigorously tested to be ANSI XJ11 C compliant, but it is close. The +normal output of \fIantlr\fP is currently compilable under both K&R, +ANSI C, and C++\(emthis option does nothing because \fIantlr\fP +generates a bunch of #ifdef's to do the right thing depending on the +language. +.IP \fB-gc\fP +Indicates that \fIantlr\fP should generate no C code, i.e., only +perform analysis on the grammar. +.IP \fB-gd\fP +C code is inserted in each of the \fIantlr\fR generated parsing functions to +provide for user-defined handling of a detailed parse trace. The inserted +code consists of calls to the user-supplied macros or functions called +\fBzzTRACEIN\fR and \fBzzTRACEOUT\fP. The only argument is a +\fIchar *\fR pointing to a C-style string which is the grammar rule +recognized by the current parsing function. If no definition is given +for the trace functions, upon rule entry and exit, a message will be +printed indicating that a particular rule as been entered or exited. +.IP \fB-ge\fP +Generate an error class for each non-terminal. +.IP \fB-gh\fP +Generate \fBstdpccts.h\fP for non-ANTLR-generated files to include. +This file contains all defines needed to describe the type of parser +generated by \fIantlr\fP (e.g. how much lookahead is used and whether +or not trees are constructed) and contains the \fBheader\fP action +specified by the user. +.IP \fB-gk\fP +Generate parsers that delay lookahead fetches until needed. Without +this option, \fIantlr\fP generates parsers which always have \fIk\fP +tokens of lookahead available. +.IP \fB-gl\fP +Generate line info about grammar actions in C parser of the form +\fB#\ \fIline\fP\ "\fIfile\fP"\fR which makes error messages from +the C/C++ compiler make more sense as they will \*Qpoint\*U into the +grammar file not the resulting C file. Debugging is easier as well, +because you will step through the grammar not C file. +.IP \fB-gs\fR +Do not generate sets for token expression lists; instead generate a +\fB||\fP-separated sequence of \fBLA(1)==\fItoken_number\fR. The +default is to generate sets. +.IP \fB-gt\fP +Generate code for Abstract-Syntax Trees. +.IP \fB-gx\fP +Do not create the lexical analyzer files (dlg-related). This option +should be given when the user wishes to provide a customized lexical +analyzer. It may also be used in \fImake\fR scripts to cause only the +parser to be rebuilt when a change not affecting the lexical structure +is made to the input grammars. +.IP "\fB-k \fIn\fR" +Set k of LL(k) to \fIn\fR; i.e. set tokens of look-ahead (default==1). +.IP "\fB-o\fP dir +Directory where output files should go (default="."). This is very +nice for keeping the source directory clear of ANTLR and DLG spawn. +.IP \fB-p\fP +The complete grammar, collected from all input grammar files and +stripped of all comments and embedded actions, is listed to +\fBstdout\fP. This is intended to aid in viewing the entire grammar +as a whole and to eliminate the need to keep actions concisely stated +so that the grammar is easier to read. Hence, it is preferable to +embed even complex actions directly in the grammar, rather than to +call them as subroutines, since the subroutine call overhead will be +saved. +.IP \fB-pa\fP +This option is the same as \fB-p\fP except that the output is +annotated with the first sets determined from grammar analysis. +.IP "\fB-prc on\fR +Turn on the computation and hoisting of predicate context. +.IP "\fB-prc off\fR +Turn off the computation and hoisting of predicate context. This +option makes 1.10 behave like the 1.06 release with option \fB-pr\fR +on. Context computation is off by default. +.IP "\fB-rl \fIn\fR +Limit the maximum number of tree nodes used by grammar analysis to +\fIn\fP. Occasionally, \fIantlr\fP is unable to analyze a grammar +submitted by the user. This rare situation can only occur when the +grammar is large and the amount of lookahead is greater than one. A +nonlinear analysis algorithm is used by PCCTS to handle the general +case of LL(k) parsing. The average complexity of analysis, however, is +near linear due to some fancy footwork in the implementation which +reduces the number of calls to the full LL(k) algorithm. An error +message will be displayed, if this limit is reached, which indicates +the grammar construct being analyzed when \fIantlr\fP hit a +non-linearity. Use this option if \fIantlr\fP seems to go out to +lunch and your disk start thrashing; try \fIn\fP=10000 to start. Once +the offending construct has been identified, try to remove the +ambiguity that \fIantlr\fP was trying to overcome with large lookahead +analysis. The introduction of (...)? backtracking blocks eliminates +some of these problems\ \(em \fIantlr\fP does not analyze alternatives +that begin with (...)? (it simply backtracks, if necessary, at run +time). +.IP \fB-w1\fR +Set low warning level. Do not warn if semantic predicates and/or +(...)? blocks are assumed to cover ambiguous alternatives. +.IP \fB-w2\fR +Ambiguous parsing decisions yield warnings even if semantic predicates +or (...)? blocks are used. Warn if predicate context computed and +semantic predicates incompletely disambiguate alternative productions. +.IP \fB-\fR +Read grammar from standard input and generate \fBstdin.c\fP as the +parser file. +.SH "SPECIAL CONSIDERATIONS" +.PP +\fIAntlr\fP works... we think. There is no implicit guarantee of +anything. We reserve no \fBlegal\fP rights to the software known as +the Purdue Compiler Construction Tool Set (PCCTS) \(em PCCTS is in the +public domain. An individual or company may do whatever they wish +with source code distributed with PCCTS or the code generated by +PCCTS, including the incorporation of PCCTS, or its output, into +commercial software. We encourage users to develop software with +PCCTS. However, we do ask that credit is given to us for developing +PCCTS. By "credit", we mean that if you incorporate our source code +into one of your programs (commercial product, research project, or +otherwise) that you acknowledge this fact somewhere in the +documentation, research report, etc... If you like PCCTS and have +developed a nice tool with the output, please mention that you +developed it using PCCTS. As long as these guidelines are followed, +we expect to continue enhancing this system and expect to make other +tools available as they are completed. +.SH FILES +.IP *.c +output C parser. +.IP *.cpp +output C++ parser when C++ mode is used. +.IP \fBparser.dlg\fP +output \fIdlg\fR lexical analyzer. +.IP \fBerr.c\fP +token string array, error sets and error support routines. Not used in +C++ mode. +.IP \fBremap.h\fP +file that redefines all globally visible parser symbols. The use of +the #parser directive creates this file. Not used in +C++ mode. +.IP \fBstdpccts.h\fP +list of definitions needed by C files, not generated by PCCTS, that +reference PCCTS objects. This is not generated by default. Not used in +C++ mode. +.IP \fBtokens.h\fP +output \fI#defines\fR for tokens used and function prototypes for +functions generated for rules. +.SH "SEE ALSO" +.LP +dlg(1), pccts(1)