mirror of
https://git.savannah.gnu.org/git/emacs.git
synced 2024-11-26 07:33:47 +00:00
480 lines
14 KiB
Plaintext
480 lines
14 KiB
Plaintext
\input texinfo @c -*-texinfo-*-
|
|
@c %**start of header
|
|
@setfilename ../../info/bovine.info
|
|
@set TITLE Bovine parser development
|
|
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
|
|
@settitle @value{TITLE}
|
|
@include docstyle.texi
|
|
|
|
@c *************************************************************************
|
|
@c @ Header
|
|
@c *************************************************************************
|
|
|
|
@c Merge all indexes into a single index for now.
|
|
@c We can always separate them later into two or more as needed.
|
|
@syncodeindex vr cp
|
|
@syncodeindex fn cp
|
|
@syncodeindex ky cp
|
|
@syncodeindex pg cp
|
|
@syncodeindex tp cp
|
|
|
|
@c @footnotestyle separate
|
|
@c @paragraphindent 2
|
|
@c @@smallbook
|
|
@c %**end of header
|
|
|
|
@copying
|
|
Copyright @copyright{} 1999--2004, 2012--2024 Free Software Foundation,
|
|
Inc.
|
|
|
|
@quotation
|
|
Permission is granted to copy, distribute and/or modify this document
|
|
under the terms of the GNU Free Documentation License, Version 1.3 or
|
|
any later version published by the Free Software Foundation; with no
|
|
Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
|
|
and with the Back-Cover Texts as in (a) below. A copy of the license
|
|
is included in the section entitled ``GNU Free Documentation License''.
|
|
|
|
(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
|
|
modify this GNU manual.''
|
|
@end quotation
|
|
@end copying
|
|
|
|
@dircategory Emacs misc features
|
|
@direntry
|
|
* Bovine: (bovine). Semantic bovine parser development.
|
|
@end direntry
|
|
|
|
@iftex
|
|
@finalout
|
|
@end iftex
|
|
|
|
@c @setchapternewpage odd
|
|
@c @setchapternewpage off
|
|
|
|
@titlepage
|
|
@sp 10
|
|
@title @value{TITLE}
|
|
@author by @value{AUTHOR}
|
|
@page
|
|
@vskip 0pt plus 1 fill
|
|
@insertcopying
|
|
@end titlepage
|
|
@page
|
|
|
|
@macro semantic{}
|
|
@i{Semantic}
|
|
@end macro
|
|
|
|
@c *************************************************************************
|
|
@c @ Document
|
|
@c *************************************************************************
|
|
@contents
|
|
|
|
@node top
|
|
@top @value{TITLE}
|
|
|
|
The @dfn{bovine} parser is the original @semantic{} parser, and is an
|
|
implementation of an @acronym{LL} parser. It is good for simple
|
|
languages. It has many conveniences making grammar writing easy. The
|
|
conveniences make it less powerful than a Bison-like @acronym{LALR}
|
|
parser. For more information, @pxref{Top,, Wisent Parser Development,
|
|
wisent}.
|
|
|
|
Bovine @acronym{LL} grammars are stored in files with a @file{.by}
|
|
extension. When compiled, the contents is converted into a file of
|
|
the form @file{NAME-by.el}. This, in turn is byte compiled.
|
|
@xref{top,, Grammar Framework Manual, grammar-fw}.
|
|
|
|
@ifnottex
|
|
@insertcopying
|
|
@end ifnottex
|
|
|
|
@menu
|
|
* Starting Rules:: The starting rules for the grammar.
|
|
* Bovine Grammar Rules:: Rules used to parse a language.
|
|
* Optional Lambda Expression:: Actions to take when a rule is matched.
|
|
* Bovine Examples:: Simple Samples.
|
|
* GNU Free Documentation License:: The license for this documentation.
|
|
@c * Index::
|
|
@end menu
|
|
|
|
@node Starting Rules
|
|
@chapter Starting Rules
|
|
|
|
In Bison, one and only one nonterminal is designated as the ``start''
|
|
symbol. In @semantic{}, one or more nonterminals can be designated as
|
|
the ``start'' symbol. They are declared following the @code{%start}
|
|
keyword separated by spaces. @xref{start Decl,, Grammar Framework
|
|
Manual, grammar-fw}.
|
|
|
|
If no @code{%start} keyword is used in a grammar, then the very first
|
|
is used. Internally the first start nonterminal is targeted by the
|
|
reserved symbol @code{bovine-toplevel}, so it can be found by the
|
|
parser harness.
|
|
|
|
To find locally defined variables, the local context handler needs to
|
|
parse the body of functional code. The @code{scopestart} declaration
|
|
specifies the name of a nonterminal used as the goal to parse a local
|
|
context, @pxref{scopestart Decl,, Grammar Framework Manual,
|
|
grammar-fw}. Internally the
|
|
scopestart nonterminal is targeted by the reserved symbol
|
|
@code{bovine-inner-scope}, so it can be found by the parser harness.
|
|
|
|
@node Bovine Grammar Rules
|
|
@chapter Bovine Grammar Rules
|
|
|
|
The rules are what allow the compiler to create tags from a language
|
|
file. Once the setup is done in the prologue, you can start writing
|
|
rules. @xref{Grammar Rules,, Grammar Framework Manual, grammar-fw}.
|
|
|
|
@example
|
|
@var{result} : @var{components1} @var{optional-semantic-action1})
|
|
| @var{components2} @var{optional-semantic-action2}
|
|
;
|
|
@end example
|
|
|
|
@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
|
|
@var{components} is a list of elements that are to be matched if @var{result}
|
|
is to be made. @var{optional-semantic-action} is an optional sequence
|
|
of simplified Emacs Lisp expressions for concocting the parse tree.
|
|
|
|
In bison, each time an element of @var{components} is found, it is
|
|
@dfn{shifted} onto the parser stack. (The stack of matched elements.)
|
|
When all @var{components}' elements have been matched, it is
|
|
@dfn{reduced} to @var{result}. @xref{Algorithm,,, bison, The GNU Bison Manual}.
|
|
|
|
A particular @var{result} written into your grammar becomes
|
|
the parser's goal. It is designated by a @code{%start} statement
|
|
(@pxref{Starting Rules}). The value returned by the associated
|
|
@var{optional-semantic-action} is the parser's result. It should be
|
|
a tree of @semantic{} @dfn{tags}, @pxref{Semantic Tags,, Semantic
|
|
Application Development, semantic-appdev}.
|
|
|
|
@var{components} is made up of symbols. A symbol such as @code{FOO}
|
|
means that a syntactic token of class @code{FOO} must be matched.
|
|
|
|
@menu
|
|
* How Lexical Tokens Match::
|
|
* Grammar-to-Lisp Details::
|
|
* Order of components in rules::
|
|
@end menu
|
|
|
|
@node How Lexical Tokens Match
|
|
@section How Lexical Tokens Match
|
|
|
|
A lexical rule must be used to define how to match a lexical token.
|
|
|
|
For instance:
|
|
|
|
@example
|
|
%keyword FOO "foo"
|
|
@end example
|
|
|
|
Means that @code{FOO} is a reserved language keyword, matched as such
|
|
by looking up into a keyword table, @pxref{keyword Decl,, Grammar
|
|
Framework Manual, grammar-fw}. This is because @code{"foo"} will be
|
|
converted to
|
|
@code{FOO} in the lexical analysis stage. Thus the symbol @code{FOO}
|
|
won't be available any other way.
|
|
|
|
If we specify our token in this way:
|
|
|
|
@example
|
|
%token <symbol> FOO "foo"
|
|
@end example
|
|
|
|
then @code{FOO} will match the string @code{"foo"} explicitly, but it
|
|
won't do so at the lexical level, allowing use of the text
|
|
@code{"foo"} in other forms of regular expressions.
|
|
|
|
In that case, @code{FOO} is a @code{symbol}-type token. To match, a
|
|
@code{symbol} must first be encountered, and then it must
|
|
@code{string-match "foo"}.
|
|
|
|
@table @strong
|
|
@item Caution:
|
|
Be especially careful to remember that @code{"foo"}, and more
|
|
generally the %token's match-value string, is a regular expression!
|
|
@end table
|
|
|
|
Non symbol tokens are also allowed. For example:
|
|
|
|
@example
|
|
%token <punctuation> PERIOD "[.]"
|
|
|
|
filename : symbol PERIOD symbol
|
|
;
|
|
@end example
|
|
|
|
@code{PERIOD} is a @code{punctuation}-type token that will explicitly
|
|
match one period when used in the above rule.
|
|
|
|
@table @strong
|
|
@item Please Note:
|
|
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
|
|
types, based on the @dfn{syntax class}-character associations
|
|
currently in effect.
|
|
@end table
|
|
|
|
@node Grammar-to-Lisp Details
|
|
@section Grammar-to-Lisp Details
|
|
|
|
For the bovinator, lexical token matching patterns are @emph{inlined}.
|
|
When the grammar-to-lisp converter encounters a lexical token
|
|
declaration of the form:
|
|
|
|
@example
|
|
%token <@var{type}> @var{token-name} @var{match-value}
|
|
@end example
|
|
|
|
It substitutes every occurrences of @var{token-name} in rules, by its
|
|
expanded form:
|
|
|
|
@example
|
|
@var{type} @var{match-value}
|
|
@end example
|
|
|
|
For example:
|
|
|
|
@example
|
|
%token <symbol> MOOSE "moose"
|
|
|
|
find_a_moose: MOOSE
|
|
;
|
|
@end example
|
|
|
|
Will generate this pseudo equivalent-rule:
|
|
|
|
@example
|
|
find_a_moose: symbol "moose" ;; invalid syntax!
|
|
;
|
|
@end example
|
|
|
|
Thus, from the bovinator point of view, the @var{components} part of a
|
|
rule is made up of symbols and strings. A string in the mix means
|
|
that the previous symbol must have the additional constraint of
|
|
exactly matching it, as described in @ref{How Lexical Tokens Match}.
|
|
|
|
@table @strong
|
|
@item Please Note:
|
|
For the bovinator, this task was mixed into the language definition to
|
|
simplify implementation, though Bison's technique is more efficient.
|
|
@end table
|
|
|
|
@node Order of components in rules
|
|
@section Order of components in rules
|
|
|
|
If a rule has multiple components, order is important, for example
|
|
|
|
@example
|
|
headerfile : symbol PERIOD symbol
|
|
| symbol
|
|
;
|
|
@end example
|
|
|
|
would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
|
|
The bovine parser will first attempt to match the long form, and then
|
|
the short form. If they were in reverse order, then the long form
|
|
would never be tested.
|
|
|
|
@c @xref{Default syntactic tokens}.
|
|
|
|
@node Optional Lambda Expression
|
|
@chapter Optional Lambda Expressions
|
|
|
|
The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
|
|
a bovine lambda. This lambda has special short-cuts to simplify
|
|
reading the semantic action definition. An @acronym{OLE} like this:
|
|
|
|
@example
|
|
( $1 )
|
|
@end example
|
|
|
|
results in a lambda return which consists entirely of the string
|
|
or object found by matching the first (zeroth) element of match.
|
|
An @acronym{OLE} like this:
|
|
|
|
@example
|
|
( ,(foo $1) )
|
|
@end example
|
|
|
|
executes @code{foo} on the first argument, and then splices its return
|
|
into the return list whereas:
|
|
|
|
@example
|
|
( (foo $1) )
|
|
@end example
|
|
|
|
executes @code{foo}, and that is placed in the return list.
|
|
|
|
Here are other things that can appear inline:
|
|
|
|
@table @code
|
|
@item $1
|
|
The first object matched.
|
|
|
|
@item ,$1
|
|
The first object spliced into the list (assuming it is a list from a
|
|
non-terminal).
|
|
|
|
@item '$1
|
|
The first object matched, placed in a list. I.e., @code{( $1 )}.
|
|
|
|
@item foo
|
|
The symbol @code{foo} (exactly as displayed).
|
|
|
|
@item (foo)
|
|
A function call to foo which is stuck into the return list.
|
|
|
|
@item ,(foo)
|
|
A function call to foo which is spliced into the return list.
|
|
|
|
@item '(foo)
|
|
A function call to foo which is stuck into the return list in a list.
|
|
|
|
@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
|
|
A list starting with @code{EXPAND} performs a recursive parse on the
|
|
token passed to it (represented by @samp{$1} above.) The
|
|
@dfn{semantic list} is a common token to expand, as there are often
|
|
interesting things in the list. The @var{nonterminal} is a symbol in
|
|
your table which the bovinator will start with when parsing.
|
|
@var{nonterminal}'s definition is the same as any other nonterminal.
|
|
@var{depth} should be at least @samp{1} when descending into a
|
|
semantic list.
|
|
|
|
@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
|
|
Is like @code{EXPAND}, except that the parser will iterate over
|
|
@var{nonterminal} until there are no more matches. (The same way the
|
|
parser iterates over the starting rule (@pxref{Starting Rules}). This
|
|
lets you have much simpler rules in this specific case, and also lets
|
|
you have positional information in the returned tokens, and error
|
|
skipping.
|
|
|
|
@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
|
|
This is used for creating an association list. Each @var{symbol} is
|
|
included in the list if the associated @var{value} is non-@code{nil}.
|
|
While the items are all listed explicitly, the created structure is an
|
|
association list of the form:
|
|
|
|
@example
|
|
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
|
|
@end example
|
|
|
|
@item (TAG @var{name} @var{class} [@var{attributes}])
|
|
This creates one tag in the current buffer.
|
|
|
|
@table @var
|
|
@item name
|
|
Is a string that represents the tag in the language.
|
|
|
|
@item class
|
|
Is the kind of tag being create, such as @code{function}, or
|
|
@code{variable}, though any symbol will work.
|
|
|
|
@item attributes
|
|
Is an optional set of labeled values such as @code{:constant-flag t :parent
|
|
"parenttype"}.
|
|
@end table
|
|
|
|
@item (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
|
|
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
|
|
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
|
|
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
|
|
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
|
|
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
|
|
Create a tag with @var{name} of respectively the class
|
|
@code{variable}, @code{function}, @code{type}, @code{include},
|
|
@code{package}, and @code{code}.
|
|
See @ref{Creating Tags,, Semantic Application Development,
|
|
semantic-appdev}, for the lisp functions these translate into.
|
|
@end table
|
|
|
|
If the symbol @code{%quotemode backquote} is specified, then use
|
|
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
|
|
This lets you send @code{$1} as a symbol into a list instead of having
|
|
it expanded inline.
|
|
|
|
@node Bovine Examples
|
|
@chapter Examples
|
|
|
|
The rule:
|
|
|
|
@example
|
|
any-symbol: symbol
|
|
;
|
|
@end example
|
|
|
|
is equivalent to
|
|
|
|
@example
|
|
any-symbol: symbol
|
|
( $1 )
|
|
;
|
|
@end example
|
|
|
|
which, if it matched the string @samp{"A"}, would return
|
|
|
|
@example
|
|
( "A" )
|
|
@end example
|
|
|
|
If this rule were used like this:
|
|
|
|
@example
|
|
%token <punctuation> EQUAL "="
|
|
@dots{}
|
|
assign: any-symbol EQUAL any-symbol
|
|
( $1 $3 )
|
|
;
|
|
@end example
|
|
|
|
it would match @samp{"A=B"}, and return
|
|
|
|
@example
|
|
( ("A") ("B") )
|
|
@end example
|
|
|
|
The letters @samp{A} and @samp{B} come back in lists because
|
|
@samp{any-symbol} is a nonterminal, not an actual lexical element.
|
|
|
|
To get a better result with nonterminals, use @asis{,} to splice lists
|
|
in like this:
|
|
|
|
@example
|
|
%token <punctuation> EQUAL "="
|
|
@dots{}
|
|
assign: any-symbol EQUAL any-symbol
|
|
( ,$1 ,$3 )
|
|
;
|
|
@end example
|
|
|
|
which would return
|
|
|
|
@example
|
|
( "A" "B" )
|
|
@end example
|
|
|
|
@node GNU Free Documentation License
|
|
@appendix GNU Free Documentation License
|
|
|
|
@include doclicense.texi
|
|
|
|
@c There is nothing to index at the moment.
|
|
@ignore
|
|
@node Index
|
|
@unnumbered Index
|
|
@printindex cp
|
|
@end ignore
|
|
|
|
@iftex
|
|
@contents
|
|
@summarycontents
|
|
@end iftex
|
|
|
|
@bye
|
|
|
|
@c Following comments are for the benefit of ispell.
|
|
|
|
@c LocalWords: bovinator inlined
|