Quick start notes on the use of Coco/R (C/C++ version) ====================================================== These notes apply directly to the MS-DOS versions of Coco/R (C/C++). We know that you can't wait to begin! Installation ============ Please read the file README.1ST for details of how to install the system. Getting going ============= Examples of input for Coco/R can be found in the case study source files in this kit. It is suggested that you experiment with these before developing your own applications. For each application, the user has to prepare a text file to contain the attributed grammar. Points to be aware of are that - it is sensible to work within a "project directory" (say C:\WORK) and not within the "system directory" (C:\COCO); - text file preparation must be done with an ASCII editor, and not with a word processor; - by convention the file is named with a primary name that is based on the grammar's goal symbol, and with an "ATG" extension, for example CALC.ATG. Running Coco/R ============== To start Coco/R, type COCOR, adding the name of the file that contains your attribute grammar: COCOR TEST.ATG A second parameter can be supplied to set compiler options, for example: COCOR TEST.ATG /CS or, if you prefer the Unix form COCOR -CS TEST.ATG For those who need reminding, the command COCOR with no parameters will print a help screen something like the following, and then abort. Coco/R will respond with a screen something like: Coco/R Compiler-Compiler V1.xx (C version) Released by Frankie Arzu ReleaseDate Usage: COCOR [(/|-)Options] Example: COCOR -C -S Test.atg Options: A - Trace automaton C - Generate compiler module D - Include source debugging information (#line) F - Give Start and Follower sets G - Print top-down graph L - Force listing O - Terminal conditions use OR only P - Generate parser only S - Print symbol table T - Grammar tests only - no code generated X - Generate C++ with classes Z - Force extensions to .hpp and .cpp files Environment variables: CRFRAMES: Search directory for frames file. If not specified, frames must be in the working directory. CRHEXT: Extension for the '.h' generated files. If not specified, '.h' for C, '.hpp' for C++ (Dos and Unix). CRCEXT: Extension for the '.c' generated files. If not specified, '.c' for C, '.cpp' for C++ (Dos and Unix). You can also set up these options by using -Dvarname=value\n Input to Coco/R =============== Coco/R takes five (or six) files as input, and produces six (or seven) files as output. These output files can then be combined with a main program and any other auxiliary files needed, so as to produce a complete compiler. The input files needed are grammar.ATG - an attributed grammar (grammar used here for illustration) PARSER_H.FRM PARSER_C.FRM - the frame file for parser generation SCAN_H.FRM SCAN_C.FRM - the frame file for scanner generation optionally grammar.FRM - an application specific frame file for complete compiler generation A "generic" version of this last frame file is given as COMPILER.FRM - the generic frame file for complete compiler generation and this is intended to act as a model for your own applications, a process that will be helped by studying various application specific frame files supplied in the kit. (The other frame files are effectively standardized and should require little if any alteration; they are fairly resilient, and any particular configuring for specific applications will require some experience of the internal workings of Coco/R itself. When using Coco/R, the frame files are assumed to exist in directories specified by the environment variable CRFRAMES. To set this variable, use the SET command, for example SET CRFRAMES=C:\COCO\FRAMES (for dos) set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES (for unix) You may like to add this line to your AUTOEXEC.BAT file, so that it takes effect every time you start your computer. Unix users would set this variable something like set CRFRAMES=/usr/lib/coco/frames; export CRFRAMES (for unix) and possibly add this to the .profile file or equivalent. As from version 1.08 you can also set these variables using a command line option, for example -DCRFRAMES=/usr/lib/coco/frames The frame file for the compiler itself is named as grammar.FRM, where grammar is the grammar name. This is searched for in the directory of the input file. If it is not found, a search is made for the generic COMPILER.FRM in the directories specified in the environment variable CRFRAMES. The basic compiler frame file (COMPILER.FRM) that comes with the kit will allow simple applications to be generated immediately, but it is sensible to copy this basic file to the project directory, and then to rename and edit it to suit the application. Output from Coco/R ================== The generated files are placed in the same directory as the grammar file. Coco/R for C generates the files grammarS.C and .H generated FSA scanner grammarP.C and .H generated recursive descent parser grammarC.H token numbers used in scanner and parser grammarE.H error numbers and corresponding message texts grammar.LST compilation history (if the /L option is used) and, optionally, a file grammar.C generated main module for the complete compiler where grammar is the name of the attributed grammar (this grammar is sensibly stored in the file grammar.ATG). Coco/R for C++ version generates similar files with extensions .CPP and .HPP. If .C/.H and/or .CPP/.HPP extensions are not acceptable to your compiler, the extensions may alternatively be specified by defining the further environment variables CRHEXT and CRCEXT, for example. SET CRHEXT=HHH Hopefully, the system should produce code acceptable to most C/C++ compilers. A list of those with which it is known to work appears in the file DOCS\COCO. Compiling the generated compiler ================================ Once the components of the application have been generated, they are ready to be compiled by your C or C++ compiler. It is assumed that you are familiar with the process of compiling such programs. For a very simple MS-DOS application using the Borland C++ system, one might be able to use commands like BCC -ml -IC:\COCO\CPLUS2 -c CALC.CPP CALCS.CPP CALCP.CPP BCC -ml -LC:\COCO\CPLUS2 -eCALC.EXE CALC.OBJ CALCS.OBJ CALCP.OBJ CR_LIB.LIB but for larger applications a better mechanism is to use the MAKE command in conjunction with a "makefile". Notice that if you are using the C++ system you will also need to incorporate the base class library found in the directory CPLUS2 (please see the README.1ST file for installation details). If you are using Borland C++ you may need to set up a configuration file TURBOC.CFG to reflect the correct paths and options for your compiler. Coco/R options and pragmas ========================== As implied above, various didactic output and useful variations may be invoked by the use of compiler pragmas in the input grammar, or by the use of a command line option. Compiler pragmas take the form $String and the optional command line parameter takes the form /String or -String where String contains one or more of the letters ACDFGLPSTXZ in either upper or lower case. The C D L P T X and Z options are generally useful C - (Compiler) Generate complete compiler driving module, including source listing featuring interleaved error message reporting. To use this option the file COMPILER.FRM (or grammar.FRM) must be available. D - (Debug) Generate source line numbers (#line) for each semantic action. This causes the semantic actions in the generated program to be labelled with reference to the original .ATG file, so that one can use a symbolic debuggers on the .ATG file. L - (Listing) Force listing Normally the listing of the grammar is suppressed if the compilation is error free; any errors are reported in a fairly cryptic form. P - (Parser only) Suppress generation of the scanner. Regeneration of the scanner is often tedious, and results in no changes from the one first generated. This option must be used with care. It can also be used if a hand-crafted scanner is to be supplied (see the notes on the use of hand-crafted scanners in the file COCOL). T - (Tests) Suppress generation of scanner and parser. If this option is exercised, the generation of the scanner and parser is suppressed, but the attributed grammar is parsed and checked for grammatical inconsistencies, LL(1) violations and so on. X - Generate parsers and scanners in the form of C++ classes. Z - Use .CPP/.HPP as extensions in preference to .C/.H. The following options are really intended to help with debugging/teaching applications. Their effect may best be seen by judicious experimentation. A - Trace automaton F - Give First and Follow sets for each non-terminal in the grammar G - Print top-down graph S - Print symbol table Grammar checks ============== Coco/R performs several tests to check if the grammar is well-formed. If one of the following error messages is produced, no compiler parts are generated. NO PRODUCTION FOR X The nonterminal X has been used, but there is no production for it. X CANNOT BE REACHED There is a production for nonterminal X, but X cannot be derived from the start symbol. X CANNOT BE DERIVED TO TERMINALS For example, if there is a production X = "(" X ")" . X - Y, Y - X X and Y are nonterminals with circular derivations. TOKENS X AND Y CANNOT BE DISTINGUISHED The terminal symbols X and Y are declared to have the same structure, e.g. integer = digit { digit } . real = digit { digit } ["." { digit } ]. In this example, a digit string appears ambiguously to be recognized as an integer or as a real. The following messages are warnings. They may indicate an error but they may also describe desired effects. The generated compiler parts may still be valid. If an LL(1) error is reported for a construct X, one must be aware that the generated parser will choose the first of several possible alternatives for X. X NULLABLE X can be derived to the empty string, e.g. X = { Y } . LL(1) ERROR IN X:Y IS START OF MORE THAN ONE ALTERNATIVE Several alternatives in the production of X start with the terminal Y e.g. Statement = ident ":=" Expression | ident [ ActualParameters ] . LL(1) ERROR IN X:Y IS START AND SUCCESSOR OF NULLABLE STRUCTURE Nullable structures are [ ... ] and { ... } e.g. qualident = [ ident "." ] ident . Statement = "IF" Expression "THEN" Statement [ "ELSE" Statement ] . The ELSE at the start of the else part may also be a successor of a statement. This LL(1) conflict is known under the name "dangling else". The Parser Interface ==================== A parser generated by Coco/R defines various routines that may be called from an application. As for the scanner, the form of the interface depends on the host system. The parser generated by Coco/R for C has the following simple interface: #define MinErrDist 2 void Parse(); /* Parses the source */ int Successful(); /* Returns 1 if no errors have been recorded while parsing */ void LexString(char *Lex, int Size); /* Retrieves at most Size characters from the most recently parsed token into Lex */ void LexName(char *Lex, int Size); /* Retrieves at most Size characters from the most recently parsed token into Lex, converted to upper case if IGNORE CASE was specified */ void LookAheadString(char *Lex, int Size); /* Retrieves at most Size characters from the lookahead token into Lex */ void LookAheadName(char *Lex, int Size); /* Retrieves at most Size characters from the lookahead token into Lex, converted to upper case if IGNORE CASE was specified */ void SynError(int errNo); /* Reports syntax error denoted by errNo */ void SemError(int errNo); /* Reports semantic error denoted by errNo */ For the C++ version, it effectively takes the form below. (There is actually an underlying class hierarchy, and the declarations are really slightly different from those presented here). class grammarParser { public: grammarParser(AbsScanner *S, CRError *E); // Constructs parser associated with scanner S and error reporter E void Parse(); // Parses the source int Successful(); // Returns 1 if no errors have been recorded while parsing private: void LexString(char *Lex, int Size); // Retrieves at most Size characters from the most recently parsed // token into Lex void LexName(char *Lex, int Size); // Retrieves at most Size characters from the most recently parsed // token into Lex, converted to upper case if IGNORE CASE was specified long LexPos(); // Retrieves the position of the most recently parsed token void LookAheadString(char *Lex, int Size); // Retrieves at most Size characters from the lookahead token into Lex void LookAheadName(char *Lex, int Size); // Retrieves at most Size characters from the lookahead token into Lex, // converted to upper case if IGNORE CASE was specified long LookAheadPos(); // Retrieves the position of the lookahead token token void SynError(int errNo); // Reports syntax error denoted by errNo void SemError(int errNo); // Reports semantic error denoted by errNo // ... Prototypes of functions for parsing each non-terminal in grammar }; The functionality provides for the parser to - initiate the parse for the goal symbol by calling Parse(). - investigate whether the parse succeeded by calling Successful(). - report on the presence of syntactic and semantic errors by calling SynError and SemError. - obtain the lexeme value of a particular token in one of four ways (LexString, LexName, LookAheadString and LookAheadName). Calls to LexString are most common; the others are used for special variations. A tailored frame file can be supplied, from which Coco/R can generate a main program if the $C pragma/option is used. Examples of this can be found in the kit as well. The Scanner Interface ===================== The scanner generated by Coco/R for C has the following interface (the C++ version is somewhat different) int S_src; /* source file */ int S_Line, S_Col; /* line and column of current symbol */ int S_Len; /* length of current symbol */ long S_Pos; /* file position of current symbol */ int S_NextLine; /* line of lookahead symbol */ int S_NextCol; /* column of lookahead symbol */ int S_NextLen; /* length of lookahead symbol */ long S_NextPos; /* file position of lookahead symbol */ int S_CurrLine; /* current input line (may be higher than line) */ long S_lineStart; /* start position of current line */ int S_Get(); /* Gets next symbol from source file */ void S_Reset(); /* Reads and stores source file internally */ /* Assert: S_src has been opened */ void S_GetString(long pos, int len, char *s); /* Retrieves exact string of max length len at position pos in source file */ void S_GetName(long pos, int len, char *s); /* Retrieves an string of max length len at position pos in source file. Each character in the string will be capitalized if IGNORE CASE is specified */ unsigned char S_CurrentCh(long pos); /* Returns current character at specified file position */ Notes ----- It is rarely necessary to make use of any of this interface directly. The parser interface discussed above exports most of the functionality that is required when actions are required to retrieve token information. The variables S_Line, S_Col, S_Pos, S_Len are apposite for the most recently parsed token. The variables S_NextLine, S_NextCol, S_NextPos, S_NextLen are apposite for the most recently scanned token (the look-ahead token retrieved by the most recent call to S_Get). Tab characters (Ascii 9) are assumed to correspond to 8 character tab stops. Although Borland C's editor allows the user to change the tab size to any number (default 3), Coco/R uses 8 character long tabs for compatibility with UNIX and DOS. If you wish to change the tab size, set the defined constant TAB_SIZE in the frame file scan_c.frm to the size you prefer. Using an incorrect tab size will cause the scanner to report the wrong column of a token (S_Col, S_NextCol). The main module is responsible for opening the source file S_src prior to calling the parser. If you are using MS-DOS add O_BINARY to the open mode options. Don't let the compiler convert CR/LF to LF, as this will cause an invalid file position for reporting errors. Reset is called by the parser to initialize the scanner. Reset reads the entire source into a large internal buffer, thus improving the efficiency of the scanner very markedly. S_Get is called repeatedly from the parser, to get the next token from the source text. S_GetString and S_GetName can be used to obtain the text of a token starting at position pos and having length len. For the C++ version, the interface is effectively that shown below, although there is actually an underlying class hierarchy, so that the declarations are not exactly the same as those shown. Once again, it is rarely necessary to make use of any of this interface directly. class grammarScanner { public: grammarScanner(int SourceFile, int ignoreCase); // Constructs scanner for grammar and associates this with a // previously opened SourceFile. Specifies whether to IGNORE CASE int Get(); // Retrieves next token from source void GetString(Token *Sym, char *Buffer, int Max); // Retrieves at most Max characters from Sym into Buffer void GetString(long Pos, char *Buffer, int Max); // Retrieves at most Max characters from Pos into Buffer void GetName(Token *Sym, char *Buffer, int Max); // Retrieves at most Max characters from Sym into Buffer // Buffer is capitalized if IGNORE CASE was specified long GetLine(long Pos, char *Line, int Max); // Retrieves at most Max characters (or until next line break) // from position Pos in source file into Line }; Automatically generated error explanations are written to a file GrammarE.H by Coco/R in the following form: "EOF expected", "ident expected", "string expected" "number expected", ... This text can then be merged into a program to procedure textual error messages. This is done automatically if the $C pragma (/C command line option) is used. Bootstrapping Coco ================== The parser and scanner used by Coco/R were themselves generated by a bootstrap process; if Coco/R is given the grammar CR.ATG as input, it will reproduce the files CRS.C, CRS.H, CRP.C, CRP.H and CRE.H, CRC.H. It can also regenerate its own main program from the file SOURCES\CR.FRM if the $C pragma is used. This means that Coco/R can be extended and corrected by changing its grammar and recompiling itself. If you feel tempted to do this, please make sure that you have kept copies of the original system in case you destroy or corrupt the scanner and parser! The TASTE package ================= The distribution kit contains, in the "taste" and "taste_cp" directories, three related applications of Coco/R: a compiler/interpreter, a cross-reference generator, and a pretty-printer, for a simple Pascal-like block structured language. New users will find much of interest in these applications, which exemplify the use of symbol table construction, code generation, error handling and so on. Versions are given for both straight C and also for C++, where the various support modules are all defined as a simple set of hierarchical classes. Trademarks ========== All trademarks are acknowledged. =END=