3 This file contains the migration of PCCTS from 1.31 in the order that
4 changes were made. 1.32b7 is the last beta before full 1.32.
5 Terence Parr, Parr Research Corporation 1995.
8 ======================================================================
10 Added Russell Quong to banner, changed banner for output slightly
11 Fixed it so that you have before / after actions for C++ in class def
12 Fixed bug in optimizer that made it sometimes forget to set internal
13 token pointers. Only showed up when a {...} was in the "wrong spot".
15 ======================================================================
17 Added fixes by Dave Seidel for PC compilers in 32 bit mode (config.h
20 ======================================================================
22 Fixed hideous bug in code generator for wildcard and for ~token op.
27 1. in antlr/main.c: change strcasecmp() to stricmp()
29 2. in dlg/output.c: use DLEXER_C instead on "DLexer.C"
31 3. in h/PBlackBox.h: use <iostream.h> instead of <stream.h>
33 ======================================================================
35 When the -ft option was used, any path prefix screwed up
36 the gate on the .h files
38 Fixed yet another bug due to the optimizer.
40 The exception handling thing was a bit wacko:
47 caused an exception if "A C" was the input. In other words,
48 it found that A C didn't match the (A B)? pred and caused
49 an exception rather than trying the next alt. All I did
50 was to change the zzmatch_wsig() macros.
52 Fixed some problems in gen.c relating to the name of token
53 class bit sets in the output.
55 Added the tremendously cool generalized predicate. For the
56 moment, I'll give this bried description.
58 a : <<predicate>>? blah
62 This implies that (assuming blah and foo are syntactically
63 ambiguous) "predicate" indicates the semantic validity of
64 applying "blah". If "predicate" is false, "foo" is attempted.
66 Previously, you had to say:
68 a : <<LA(1)==ID ? predicate : 1>>? ID
72 Now, you can simply use "predicate" without the ?: operator
73 if you turn on ANTLR command line option: "-prc on". This
74 tells ANTLR to compute that all by itself. It computes n
75 tokens of lookahead where LT(n) or LATEXT(n) is the farthest
78 If you give a predicate using "-prc on" that is followed
79 by a construct that can recognize more than one n-sequence,
80 you will get a warning from ANTLR. For example,
82 a : <<isTypeName(LT(1)->getText())>>? (ID|INT)
85 This is wrong because the predicate will be applied to INTs
86 as well as ID's. You should use this syntax to make
87 the predicate more specific:
89 a : (ID)? => <<isTypeName(LT(1)->getText())>>? (ID|INT)
92 which says "don't apply the predicate unless ID is the
93 current lookahead context".
95 You cannot currently have anything in the "(context)? =>"
96 except sequences such as:
98 ( LPAREN ID | LPAREN SCOPE )? => <<pred>>?
100 I haven't tested this THAT much, but it does work for the
103 ======================================================================
106 Added getLine() to the ANTLRTokenBase and DLGBasedToken classes
107 left line() for backward compatibility.
109 Removed SORCERER_TRANSFORM from the ast.h stuff.
111 Fixed bug in code gen of ANTLR such that nested syn preds work more
112 efficiently now. The ANTLRTokenBuffer was getting very large
113 with nested predicates.
115 Memory leak is now gone from ANTLRTokenBuf; all tokens are deleted.
116 For backward compatibility reasons, you have to say parser->deleteTokens()
117 or mytokenbuffer->deleteTokens() but later it will be the default mode.
118 Say this after the parser is constructed. E.g.,
120 ParserBlackBox<DLGLexer, MyParser, ANTLRToken> p(stdin);
121 p.parser()->deleteTokens();
122 p.parser()->start_symbol();
125 ==============================
128 Changed so that deleteTokens() will do a delete ((ANTLRTokenBase *))
129 on the ptr. This gets the virtual destructor.
131 Fixed some weird things in the C++ header files (a few return types).
133 Made the AST routines correspond to the book and SORCERER stuff.
135 New token stuff: See testcpp/14/test.g
137 ANTLR accepts a #pragma gc_tokens which says
138 [1] Generate label = copy(LT(1)) instead of label=LT(1) for
139 all labeled token references.
140 [2] User now has to define ANTLRTokenPtr (as a class or a typedef
141 to just a pointer) as well as the ANTLRToken class itself.
144 To delete tokens in token buffer, use deleteTokens() message on parser.
146 All tokens that fall off the ANTLRTokenBuffer get deleted
147 which is what currently happens when deleteTokens() message
148 has been sent to token buffer.
150 We always generate ANTLRTokenPtr instead of 'ANTLRToken *' now.
151 Then if no pragma set, ANTLR generates a
154 typedef ANTLRToken *ANTLRTokenPtr;
158 Made a warning for x:rule_ref <<$x>>; still no warning for $i's, however.
169 Antlr parser generator Version 1.32b6 1989-1995
170 test.g, line 3: error: There are no token ptrs for rule references: '$x'
175 [With respect to token object garbage collection (GC), 1.32b7
176 backtracks from 1.32b6, but results in better and less intrusive GC.
177 This is the last beta version before full 1.32.]
181 o The "#pragma gc_tokens" is no longer used.
183 o .C files are now .cpp files (hence, makefiles will have to
184 be changed; or you can rerun genmk). This is a good move,
185 but causes some backward incompatibility problems. You can
186 avoid this by changing CPP_FILE_SUFFIX to ".C" in pccts/h/config.h.
188 o The token object class hierarchy has been flattened to include
189 only three classes: ANTLRAbstractToken, ANTLRCommonToken, and
190 ANTLRCommonNoRefCountToken. The common token now does garbage
191 collection via ref counting.
193 o "Smart" pointers are now used for garbage collection. That is,
194 ANTLRTokenPtr is used instead of "ANTLRToken *".
196 o The antlr.1 man page has been cleaned up slightly.
198 o The SUN C++ compiler now complains less about C++ support code.
200 o Grammars which subclass ANTLRCommonToken must wrap all token
201 pointer references in mytoken(token_ptr). This is the only
202 serious backward incompatibility. See below.
207 --------------------------------------------------------
210 The deleteTokens() message to the parser or token buffer has been changed
213 void noGarbageCollectTokens() { inputTokens->noGarbageCollectTokens(); }
214 void garbageCollectTokens() { inputTokens->garbageCollectTokens(); }
216 The token buffer deletes all non-referenced tokens by default now.
218 --------------------------------------------------------
221 The makeToken() message returns a new type. The function should look
224 virtual ANTLRAbstractToken *makeToken(ANTLRTokenType tt,
228 ANTLRAbstractToken *t = new ANTLRCommonToken(tt,txt);
233 --------------------------------------------------------
236 Changed TokenType-> ANTLRTokenType (often forces changes in AST defs due
237 to #[] constructor called to AST(tokentype, string)).
239 --------------------------------------------------------
242 You must define AST(ANTLRTokenPtr t) now in your AST class definition.
243 You might also have to include ATokPtr.h above the definition; e.g.,
244 if AST is defined in a separate file, such as AST.h, it's a good idea
245 to include ATOKPTR_H (ATokPtr.h). For example,
248 class AST : public ASTBase {
252 AST(ANTLRTokenPtr t) { token = t; }
253 void preorder_action() {
254 char *s = token->getText();
259 Note the use of smart pointers rather than "ANTLRToken *".
261 --------------------------------------------------------
264 From robertb@oakhill.sps.mot.com Bob Bailey. Changed ANTLR C++ output
265 to avoid an error in Sun C++ 3.0.1. Made "public" return value
266 structs created to hold multiple return values public.
268 --------------------------------------------------------
271 Fixed genmk so that target List.* is not included anymore. It's
272 called SList.* anyway.
274 --------------------------------------------------------
277 Scott Vorthmann <vorth@cmu.edu> fixed antlr.g in ANTLR so that \r
278 is allowed as the return character as well as \n.
280 --------------------------------------------------------
283 Bug in exceptions attached to labeled token/tokclass references. Didn't gen
284 code for exceptions. This didn't work:
289 catch MismatchedToken : <<printf("eh?\n");>>
291 Now ANTLR generates (which is kinda big, but necessary):
293 if ( !_match_wsig(ID) ) {
294 if ( guessing ) goto fail;
295 _signal=MismatchedToken;
297 case MismatchedToken :
306 which implies that you can recover and continue parsing after a missing/bad
309 --------------------------------------------------------
312 genmk now correctly uses config file for CPP_FILE_SUFFIX stuff.
314 --------------------------------------------------------
315 10 general cleanup / PURIFY
317 Anthony Green <green@vizbiz.com> suggested a bunch of good general
318 clean up things for the code; he also suggested a few things to
319 help out the "PURIFY" memory allocation checker.
321 --------------------------------------------------------
322 11 $-variable references.
324 Manuel ORNATO indicated that a $-variable outside of a rule caused
325 ANTLR to crash. I fixed this.
327 12 Tom Moog suggestion
329 Fail action of semantic predicate needs "{}" envelope. FIXED.
331 13 references to LT(1).
333 I have enclosed all assignments such as:
335 _t22 = (ANTLRTokenPtr)LT(1);
337 in "if ( !guessing )" so that during backtracking the reference count
338 for token objects is not increased.
341 TOKEN OBJECT GARBAGE COLLECTION
345 The class ANTLRCommonToken is now garbaged collected through a "smart"
346 pointer called ANTLRTokenPtr using reference counting. Any token
347 object not referenced by your grammar actions is destroyed by the
348 ANTLRTokenBuffer when it must make room for more token objects.
349 Referenced tokens are then destroyed in your parser when local
350 ANTLRTokenPtr objects are deleted. For example,
354 would be converted to something like:
356 void yourclass::a(void)
359 ANTLRTokenPtr label=NULL; // used to be ANTLRToken *label;
361 label = (ANTLRTokenPtr)LT(1);
366 When the "label" object is destroyed (it's just a pointer to your
367 input token object LT(1)), it decrements the reference count on the
368 object created for the ID. If the count goes to zero, the object
369 pointed by label is deleted.
371 To correctly manage the garbage collection, you should use
372 ANTLRTokenPtr instead of "ANTLRToken *". Most ANTLR support code
373 (visible to the user) has been modified to use the smart pointers.
375 ***************************************************************
376 Remember that any local objects that you create are not deleted when a
377 lonjmp() is executed. Unfortunately, the syntactic predicates (...)?
378 use setjmp()/longjmp(). There are some situations when a few tokens
380 ***************************************************************
384 o The default is to perform token object garbage collection.
385 You may use parser->noGarbageCollectTokens() to turn off
389 o The type ANTLRTokenPtr is always defined now (automatically).
390 If you do not wish to use smart pointers, you will have to
391 redefined ANTLRTokenPtr by subclassing, changing the header
392 file or changing ANTLR's code generation (easy enough to
395 o If you don't use ParserBlackBox, the new initialization sequence is:
397 ANTLRTokenPtr aToken = new ANTLRToken;
398 scan.setToken(mytoken(aToken));
400 where mytoken(aToken) gets an ANTLRToken * from the smart pointer.
402 o Define C++ preprocessor symbol DBG_REFCOUNTTOKEN to see a bunch of
403 debugging stuff for reference counting if you suspect something.
406 3 WHY DO I HAVE TO TYPECAST ALL MY TOKEN POINTERS NOW??????
408 If you subclass ANTLRCommonToken and then attempt to refer to one of
409 your token members via a token pointer in your grammar actions, the
410 C++ compiler will complain that your token object does not have that
411 member. For example, if you used to do this
414 class ANTLRToken : public ANTLRCommonToken {
421 a : t:ID << t->muck = ...; >> ;
424 Now, you must do change the t->muck reference to:
426 a : t:ID << mytoken(t)->muck = ...; >> ;
428 in order to downcast 't' to be an "ANTLRToken *" not the
429 "ANTLRAbstractToken *" resulting from ANTLRTokenPtr::operator->().
430 The macro is defined as:
433 * Since you cannot redefine operator->() to return one of the user's
434 * token object types, we must down cast. This is a drag. Here's
435 * a macro that helps. template: "mytoken(a-smart-ptr)->myfield".
437 #define mytoken(tp) ((ANTLRToken *)(tp.operator->()))
439 You have to use macro mytoken(grammar-label) now because smart
440 pointers are not specific to a parser's token objects. In other
441 words, the ANTLRTokenPtr class has a pointer to a generic
442 ANTLRAbstractToken not your ANTLRToken; the ANTLR support code must
443 use smart pointers too, but be able to work with any kind of
444 ANTLRToken. Sorry about this, but it's C++'s fault not mine. Some
445 nebulous future version of the C++ compilers should obviate the need
446 to downcast smart pointers with runtime type checking (and by allowing
447 different return type of overridden functions).
449 A way to have backward compatible code is to shut off the token object
450 garbage collection; i.e., use parser->noGarbageCollectTokens() and
451 change the definition of ANTLRTokenPtr (that's why you get source code
455 PARSER EXCEPTION HANDLING
457 I've noticed some weird stuff with the exception handling. I intend
458 to give this top priority for the "book release" of ANTLR.
463 o Changed Token class hierarchy to be (Thanks to Tom Moog):
468 ANTLRNoRefCountCommonToken
470 o Added virtual panic() to ANTLRAbstractToken. Made ANTLRParser::panic()
473 o Cleaned up the dup() stuff in AST hierarchy to use shallowCopy() to
474 make node copies. John Farr at Medtronic suggested this. I.e.,
475 if you want to use dup() with either ANTLR or SORCERER or -transform
476 mode with SORCERER, you must defined shallowCopy() as:
478 virtual PCCTS_AST *shallowCopy()
488 virtual PCCTS_AST *shallowCopy()
490 return new AST(*this);
493 if you have defined a copy constructor such as
495 AST(const AST &t) // shallow copy constructor
503 o Added a warning with -CC and -gk are used together. This is broken,
504 hence a warning is appropriate.
506 o Added warning when #-stuff is used w/o -gt option.
508 o Updated MPW installation.
510 o "Miller, Philip W." <MILLERPW@f1groups.fsd.jhuapl.edu> suggested
511 that genmk be use RENAME_OBJ_FLAG RENAME_EXE_FLAG instead of
512 hardcoding "-o" in genmk.c.
514 o made all exit() calls use EXIT_SUCCESS or EXIT_FAILURE.
516 ===========================================================================
519 EXIT_FAILURE and EXIT_SUCCESS were not always defined. I had to modify
520 a bunch of files to use PCCTS_EXIT_XXX, which forces a new version. Sorry