mirror of
https://git.FreeBSD.org/src.git
synced 2025-01-23 16:01:42 +00:00
451 lines
13 KiB
Groff
451 lines
13 KiB
Groff
.\" $Id: yacc.1,v 1.43 2024/01/10 00:30:34 tom Exp $
|
|
.\"
|
|
.TH YACC 1 2024-01-09 "Berkeley Yacc" "User Commands"
|
|
.
|
|
.ds N Yacc
|
|
.ds n yacc
|
|
.
|
|
.ie n .ds CW R
|
|
.el \{
|
|
.ie \n(.g .ds CW CR
|
|
.el .ds CW CW
|
|
.\}
|
|
.
|
|
.de Ex
|
|
.RS +7
|
|
.PP
|
|
.nf
|
|
.ft \*(CW
|
|
..
|
|
.de Ee
|
|
.fi
|
|
.ft R
|
|
.RE
|
|
..
|
|
.\" Escape single quotes in literal strings from groff's Unicode transform.
|
|
.ie \n(.g \{\
|
|
.ds `` \(lq
|
|
.ds '' \(rq
|
|
.ds ' \(aq
|
|
.\}
|
|
.el \{\
|
|
.ie t .ds `` ``
|
|
.el .ds `` ""
|
|
.ie t .ds '' ''
|
|
.el .ds '' ""
|
|
.ie t .ds ' \(aq
|
|
.el .ds ' '
|
|
.\}",
|
|
.\" Bulleted paragraph
|
|
.de bP
|
|
.ie n .IP \(bu 4
|
|
.el .IP \(bu 2
|
|
..
|
|
.SH NAME
|
|
\*N \-
|
|
an LALR(1) parser generator
|
|
.SH SYNOPSIS
|
|
.B \*n [ \-BdghilLPrtvVy ] [ \-b
|
|
.I file_prefix
|
|
.B ] [ \-H
|
|
.I defines_file
|
|
.B ] [ \-o
|
|
.I output_file
|
|
.B ] [ \-p
|
|
.I symbol_prefix
|
|
.B ]
|
|
.I filename
|
|
.SH DESCRIPTION
|
|
.B \*N
|
|
reads the grammar specification in the file
|
|
.I filename
|
|
and generates an LALR(1) parser for it.
|
|
The parsers consist of a set of LALR(1) parsing tables and a driver routine
|
|
written in the C programming language.
|
|
.B \*N
|
|
normally writes the parse tables and the driver routine to the file
|
|
.IR y.tab.c .
|
|
.PP
|
|
The following options are available:
|
|
.TP 5
|
|
\fB\-b \fIfile_prefix\fR
|
|
The
|
|
.B \-b
|
|
option changes the prefix prepended to the output file names to
|
|
the string denoted by
|
|
.I file_prefix.
|
|
The default prefix is the character
|
|
.I y.
|
|
.TP
|
|
.B \-B
|
|
create a backtracking parser (compile-time configuration for \fBbtyacc\fP).
|
|
.TP
|
|
.B \-d
|
|
causes the header file
|
|
.B y.tab.h
|
|
to be written.
|
|
It contains #define's for the token identifiers.
|
|
.TP
|
|
.B \-h
|
|
print a usage message to the standard error.
|
|
.TP
|
|
\fB\-H \fIdefines_file\fR
|
|
causes #define's for the token identifiers
|
|
to be written to the given \fIdefines_file\fP rather
|
|
than the \fBy.tab.h\fP file used by the \fB\-d\fP option.
|
|
.TP
|
|
.B \-g
|
|
The
|
|
.B \-g
|
|
option causes a graphical description of the generated LALR(1) parser to
|
|
be written to the file
|
|
.B y.dot
|
|
in graphviz format, ready to be processed by
|
|
.BR dot (1).
|
|
.TP
|
|
.B \-i
|
|
The \fB\-i\fR option causes a supplementary header file
|
|
.B y.tab.i
|
|
to be written.
|
|
It contains extern declarations
|
|
and supplementary #define's as needed to map the conventional \fIyacc\fP
|
|
\fByy\fP-prefixed names to whatever the \fB\-p\fP option may specify.
|
|
The code file, e.g., \fBy.tab.c\fP is modified to #include this file
|
|
as well as the \fBy.tab.h\fP file, enforcing consistent usage of the
|
|
symbols defined in those files.
|
|
.IP
|
|
The supplementary header file makes it simpler to separate compilation
|
|
of lex- and yacc-files.
|
|
.TP
|
|
.B \-l
|
|
If the
|
|
.B \-l
|
|
option is not specified,
|
|
.B \*n
|
|
will insert \fI#line\fP directives in the generated code.
|
|
The \fI#line\fP directives let the C compiler relate errors in the
|
|
generated code to the user's original code.
|
|
If the \fB\-l\fR option is specified,
|
|
.B \*n
|
|
will not insert the \fI#line\fP directives.
|
|
\&\fI#line\fP directives specified by the user will be retained.
|
|
.TP
|
|
.B \-L
|
|
enable position processing,
|
|
e.g., \*(``%locations\*('' (compile-time configuration for \fBbtyacc\fP).
|
|
.TP
|
|
\fB\-o \fIoutput_file\fR
|
|
specify the filename for the parser file.
|
|
If this option is not given, the output filename is
|
|
the file prefix concatenated with the file suffix, e.g., \fBy.tab.c\fP.
|
|
This overrides the \fB\-b\fP option.
|
|
.TP
|
|
\fB\-p \fIsymbol_prefix\fR
|
|
The
|
|
.B \-p
|
|
option changes the prefix prepended to yacc-generated symbols to
|
|
the string denoted by
|
|
.I symbol_prefix.
|
|
The default prefix is the string
|
|
.B yy.
|
|
.TP
|
|
.B \-P
|
|
create a reentrant parser, e.g., \*(``%pure\-parser\*(''.
|
|
.TP
|
|
.B \-r
|
|
The
|
|
.B \-r
|
|
option causes
|
|
.B \*n
|
|
to produce separate files for code and tables.
|
|
The code file is named
|
|
.IR y.code.c ,
|
|
and the tables file is named
|
|
.IR y.tab.c .
|
|
The prefix \*(``\fIy.\fP\*('' can be overridden using the \fB\-b\fP option.
|
|
.TP
|
|
.B \-s
|
|
suppress \*(``\fB#define\fP\*('' statements generated for string literals in
|
|
a \*(``\fB%token\fP\*('' statement,
|
|
to more closely match original \fByacc\fP behavior.
|
|
.IP
|
|
Normally when \fB\*n\fP sees a line such as
|
|
.Ex
|
|
%token OP_ADD "ADD"
|
|
.Ee
|
|
.IP
|
|
it notices that the quoted \*(``ADD\*('' is a valid C identifier,
|
|
and generates a #define not only for OP_ADD,
|
|
but for ADD as well,
|
|
e.g.,
|
|
.Ex
|
|
#define OP_ADD 257
|
|
.br
|
|
#define ADD 258
|
|
.Ee
|
|
.IP
|
|
The original \fByacc\fP does not generate the second \*(``\fB#define\fP\*(''.
|
|
The \fB\-s\fP option suppresses this \*(``\fB#define\fP\*(''.
|
|
.IP
|
|
POSIX (IEEE 1003.1 2004) documents only names and numbers
|
|
for \*(``\fB%token\fP\*('',
|
|
though original \fByacc\fP and bison also accept string literals.
|
|
.TP
|
|
.B \-t
|
|
The
|
|
.B \-t
|
|
option changes the preprocessor directives generated by
|
|
.B \*n
|
|
so that debugging statements will be incorporated in the compiled code.
|
|
.IP
|
|
\fB\*N\fR sends debugging output to the standard output
|
|
(compatible with both the original \fByacc\fP and \fBbtyacc\fP),
|
|
while \fBbtyacc\fP writes debugging output to the standard error
|
|
(like \fBbison\fP).
|
|
.TP
|
|
.B \-v
|
|
The
|
|
.B \-v
|
|
option causes a human-readable description of the generated parser to
|
|
be written to the file
|
|
.I y.output.
|
|
.TP
|
|
.B \-V
|
|
print the version number to the standard output.
|
|
.TP
|
|
.B \-y
|
|
\fB\*n\fP ignores this option,
|
|
which bison supports for ostensible POSIX compatibility.
|
|
.PP
|
|
The \fIfilename\fP parameter is not optional.
|
|
However, \fB\*n\fP accepts a single \*(``\-\*('' to read the grammar
|
|
from the standard input.
|
|
A double \*(``\-\-\*('' marker denotes the end of options.
|
|
A single \fIfilename\fP parameter is expected after a \*(``\-\-\*('' marker.
|
|
.
|
|
.SH DIAGNOSTICS
|
|
If there are rules that are never reduced, the number of such rules is
|
|
reported on standard error.
|
|
If there are any LALR(1) conflicts, the number of conflicts is reported
|
|
on standard error.
|
|
|
|
.SH EXTENSIONS
|
|
.B \*N
|
|
provides some extensions for
|
|
compatibility with bison and other implementations of yacc.
|
|
It accepts several \fIlong options\fP which have equivalents in \*n.
|
|
The \fB%destructor\fP and \fB%locations\fP features are available
|
|
only if \fB\*n\fP has been configured and compiled to support the
|
|
back-tracking (\fBbtyacc\fP) functionality.
|
|
The remaining features are always available:
|
|
.TP
|
|
\fB %code\fP \fIkeyword\fP { \fIcode\fP }
|
|
Adds the indicated source \fIcode\fP at a given point in the output file.
|
|
The optional \fIkeyword\fP tells \fB\*n\fP where to insert the \fIcode\fP:
|
|
.RS 7
|
|
.TP 5
|
|
\fBtop\fP
|
|
just after the version-definition in the generated code-file.
|
|
.TP 5
|
|
\fBrequires\fP
|
|
just after the declaration of public parser variables.
|
|
If the \fB\-d\fP option is given, the code is inserted at the
|
|
beginning of the defines-file.
|
|
.TP 5
|
|
\fBprovides\fP
|
|
just after the declaration of private parser variables.
|
|
If the \fB\-d\fP option is given, the code is inserted at the
|
|
end of the defines-file.
|
|
.RE
|
|
.IP
|
|
If no \fIkeyword\fP is given, the code is inserted at the
|
|
beginning of the section of code copied verbatim from the source file.
|
|
Multiple \fB%code\fP directives may be given;
|
|
\fB\*n\fP inserts those into the corresponding code- or defines-file
|
|
in the order that they appear in the source file.
|
|
.TP
|
|
\fB %debug\fP
|
|
This has the same effect as the \*(``\-t\*('' command-line option.
|
|
.TP
|
|
\fB %destructor\fP { \fIcode\fP } \fIsymbol+\fP
|
|
defines code that is invoked when a symbol is automatically
|
|
discarded during error recovery.
|
|
This code can be used to
|
|
reclaim dynamically allocated memory associated with the corresponding
|
|
semantic value for cases where user actions cannot manage the memory
|
|
explicitly.
|
|
.IP
|
|
On encountering a parse error, the generated parser
|
|
discards symbols on the stack and input tokens until it reaches a state
|
|
that will allow parsing to continue.
|
|
This error recovery approach results in a memory leak
|
|
if the \fBYYSTYPE\fP value is, or contains,
|
|
pointers to dynamically allocated memory.
|
|
.IP
|
|
The bracketed \fIcode\fP is invoked whenever the parser discards one of
|
|
the symbols.
|
|
Within \fIcode\fP, \*(``\fB$$\fP\*('' or
|
|
\*(``\fB$<\fItag\fB>$\fR\*('' designates the semantic value associated with the
|
|
discarded symbol, and \*(``\fB@$\fP\*('' designates its location (see
|
|
\fB%locations\fP directive).
|
|
.IP
|
|
A per-symbol destructor is defined by listing a grammar symbol
|
|
in \fIsymbol+\fP. A per-type destructor is defined by listing
|
|
a semantic type tag (e.g., \*(``<some_tag>\*('') in \fIsymbol+\fP; in this
|
|
case, the parser will invoke \fIcode\fP whenever it discards any grammar
|
|
symbol that has that semantic type tag, unless that symbol has its own
|
|
per-symbol destructor.
|
|
.IP
|
|
Two categories of default destructor are supported that are
|
|
invoked when discarding any grammar symbol that has no per-symbol and no
|
|
per-type destructor:
|
|
.RS
|
|
.bP
|
|
the code for \*(``\fB<*>\fP\*('' is used
|
|
for grammar symbols that have an explicitly declared semantic type tag
|
|
(via \*(``\fB%type\fP\*('');
|
|
.bP
|
|
the code for \*(``\fB<>\fP\*('' is used
|
|
for grammar symbols that have no declared semantic type tag.
|
|
.RE
|
|
.TP
|
|
\fB %empty\fP
|
|
ignored by \fB\*n\fP.
|
|
.TP
|
|
\fB %expect\fP \fInumber\fP
|
|
tells \fB\*n\fP the expected number of shift/reduce conflicts.
|
|
That makes it only report the number if it differs.
|
|
.TP
|
|
\fB %expect\-rr\fP \fInumber\fP
|
|
tell \fB\*n\fP the expected number of reduce/reduce conflicts.
|
|
That makes it only report the number if it differs.
|
|
This is (unlike bison) allowable in LALR parsers.
|
|
.TP
|
|
\fB %locations\fP
|
|
tells \fB\*n\fP to enable management of position information associated
|
|
with each token, provided by the lexer in the global variable \fByylloc\fP,
|
|
similar to management of semantic value information provided in \fByylval\fP.
|
|
.IP
|
|
As for semantic values, locations can be referenced within actions using
|
|
\fB@$\fP to refer to the location of the left hand side symbol, and \fB@\fIN\fR
|
|
(\fIN\fP an integer) to refer to the location of one of the right hand side
|
|
symbols.
|
|
Also as for semantic values, when a rule is matched, a default
|
|
action is used the compute the location represented by \fB@$\fP as the
|
|
beginning of the first symbol and the end of the last symbol in the right
|
|
hand side of the rule.
|
|
This default computation can be overridden by
|
|
explicit assignment to \fB@$\fP in a rule action.
|
|
.IP
|
|
The type of \fByylloc\fP is \fBYYLTYPE\fP, which is defined by default as:
|
|
.Ex
|
|
typedef struct YYLTYPE {
|
|
int first_line;
|
|
int first_column;
|
|
int last_line;
|
|
int last_column;
|
|
} YYLTYPE;
|
|
.Ee
|
|
.IP
|
|
\fBYYLTYPE\fP can be redefined by the user
|
|
(\fBYYLTYPE_IS_DEFINED\fP must be defined, to inhibit the default)
|
|
in the declarations section of the specification file.
|
|
As in bison, the macro \fBYYLLOC_DEFAULT\fP is invoked
|
|
each time a rule is matched to calculate a position for the left hand side of
|
|
the rule, before the associated action is executed; this macro can be
|
|
redefined by the user.
|
|
.IP
|
|
This directive adds a \fBYYLTYPE\fP parameter to \fByyerror()\fP.
|
|
If the \fB%pure\-parser\fP directive is present,
|
|
a \fBYYLTYPE\fP parameter is added to \fByylex()\fP calls.
|
|
.TP
|
|
\fB %lex\-param\fP { \fIargument-declaration\fP }
|
|
By default, the lexer accepts no parameters, e.g., \fByylex()\fP.
|
|
Use this directive to add parameter declarations for your customized lexer.
|
|
.TP
|
|
\fB %parse\-param\fP { \fIargument-declaration\fP }
|
|
By default, the parser accepts no parameters, e.g., \fByyparse()\fP.
|
|
Use this directive to add parameter declarations for your customized parser.
|
|
.TP
|
|
\fB %pure\-parser\fP
|
|
Most variables (other than \fByydebug\fP and \fByynerrs\fP) are
|
|
allocated on the stack within \fByyparse\fP, making the parser reasonably
|
|
reentrant.
|
|
.TP
|
|
\fB %token\-table\fP
|
|
Make the parser's names for tokens available in the \fByytname\fP array.
|
|
However,
|
|
.B \*n
|
|
does not predefine \*(``$end\*('', \*(``$error\*(''
|
|
or \*(``$undefined\*('' in this array.
|
|
.
|
|
.SH PORTABILITY
|
|
According to Robert Corbett,
|
|
.Ex
|
|
Berkeley Yacc is an LALR(1) parser generator. Berkeley Yacc
|
|
has been made as compatible as possible with AT&T Yacc.
|
|
Berkeley Yacc can accept any input specification that
|
|
conforms to the AT&T Yacc documentation. Specifications
|
|
that take advantage of undocumented features of AT&T Yacc
|
|
will probably be rejected.
|
|
.Ee
|
|
.PP
|
|
The rationale in
|
|
.Ex
|
|
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/yacc.html
|
|
.Ee
|
|
.PP
|
|
documents some features of AT&T yacc which are no longer required for POSIX
|
|
compliance.
|
|
.PP
|
|
That said, you may be interested in reusing grammar files with some
|
|
other implementation which is not strictly compatible with AT&T yacc.
|
|
For instance, there is bison.
|
|
Here are a few differences:
|
|
.bP
|
|
\fBYacc\fP accepts an equals mark preceding the left curly brace
|
|
of an action (as in the original grammar file \fBftp.y\fP):
|
|
.Ex
|
|
| STAT CRLF
|
|
= {
|
|
statcmd();
|
|
}
|
|
.Ee
|
|
.bP
|
|
\fBYacc\fP and bison emit code in different order, and in particular bison
|
|
makes forward reference to common functions such as yylex, yyparse and
|
|
yyerror without providing prototypes.
|
|
.bP
|
|
Bison's support for \*(``%expect\*('' is broken in more than one release.
|
|
For best results using bison, delete that directive.
|
|
.bP
|
|
Bison has no equivalent for some of \fB\*n\fP's command-line options,
|
|
relying on directives embedded in the grammar file.
|
|
.bP
|
|
Bison's \*(``\fB\-y\fP\*('' option does not affect bison's lack of support for
|
|
features of AT&T yacc which were deemed obsolescent.
|
|
.bP
|
|
\fBYacc\fP accepts multiple parameters
|
|
with \fB%lex\-param\fP and \fB%parse\-param\fP in two forms
|
|
.Ex
|
|
{type1 name1} {type2 name2} ...
|
|
{type1 name1, type2 name2 ...}
|
|
.Ee
|
|
.IP
|
|
Bison accepts the latter (though undocumented), but depending on the
|
|
release may generate bad code.
|
|
.bP
|
|
Like bison, \fB\*n\fP will add parameters specified via \fB%parse\-param\fP
|
|
to \fByyparse\fP, \fByyerror\fP and (if configured for back-tracking)
|
|
to the destructor declared using \fB%destructor\fP.
|
|
Bison puts the additional parameters \fIfirst\fP for
|
|
\fByyparse\fP and \fByyerror\fP but \fIlast\fP for destructors.
|
|
\fBYacc\fP matches this behavior.
|
|
.
|
|
.SH SEE ALSO
|
|
\fBbison\fP(1),
|
|
\fBbtyacc\fP(1),
|
|
\fBlex\fP(1),
|
|
\fBflex\fP(1),
|
|
\fByacc\fP(1)
|