emacs/doc/misc/bovine.texi

\input texinfo  @c -*-texinfo-*-
@c %**start of header
@setfilename ../../info/bovine.info
@set TITLE  Bovine parser development
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}
@include docstyle.texi

@c *************************************************************************
@c @ Header
@c *************************************************************************

@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp

@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header

@copying
Copyright @copyright{} 1999--2004, 2012--2024 Free Software Foundation,
Inc.

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
and with the Back-Cover Texts as in (a) below.  A copy of the license
is included in the section entitled ``GNU Free Documentation License''.

(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
modify this GNU manual.''
@end quotation
@end copying

@dircategory Emacs misc features
@direntry
* Bovine: (bovine).             Semantic bovine parser development.
@end direntry

@iftex
@finalout
@end iftex

@c @setchapternewpage odd
@c @setchapternewpage off

@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page

@macro semantic{}
@i{Semantic}
@end macro

@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents

@node top
@top @value{TITLE}

The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser.  It is good for simple
languages.  It has many conveniences making grammar writing easy.  The
conveniences make it less powerful than a Bison-like @acronym{LALR}
parser.  For more information, @pxref{Top,, Wisent Parser Development,
wisent}.

Bovine @acronym{LL} grammars are stored in files with a @file{.by}
extension.  When compiled, the contents is converted into a file of
the form @file{NAME-by.el}.

@ifnottex
@insertcopying
@end ifnottex

@menu
* Starting Rules::              The starting rules for the grammar.
* Bovine Grammar Rules::        Rules used to parse a language.
* Optional Lambda Expression::  Actions to take when a rule is matched.
* Bovine Examples::             Simple Samples.
* GNU Free Documentation License::  The license for this documentation.
@c * Index::
@end menu

@node Starting Rules
@chapter Starting Rules

In Bison, one and only one nonterminal is designated as the ``start''
symbol.  In @semantic{}, one or more nonterminals can be designated as
the ``start'' symbol.  They are declared following the @code{%start}
keyword separated by spaces.

If no @code{%start} keyword is used in a grammar, then the very first
is used.  Internally the first start nonterminal is targeted by the
reserved symbol @code{bovine-toplevel}, so it can be found by the
parser harness.

To find locally defined variables, the local context handler needs to
parse the body of functional code.  The @code{scopestart} declaration
specifies the name of a nonterminal used as the goal to parse a local
context.  Internally the
scopestart nonterminal is targeted by the reserved symbol
@code{bovine-inner-scope}, so it can be found by the parser harness.

@node Bovine Grammar Rules
@chapter Bovine Grammar Rules

The rules are what allow the compiler to create tags from a language
file.  Once the setup is done in the prologue, you can start writing
rules.

@example
@var{result} : @var{components1} @var{optional-semantic-action1})
       | @var{components2} @var{optional-semantic-action2}
       ;
@end example

@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
@var{components} is a list of elements that are to be matched if @var{result}
is to be made.  @var{optional-semantic-action} is an optional sequence
of simplified Emacs Lisp expressions for concocting the parse tree.

In bison, each time an element of @var{components} is found, it is
@dfn{shifted} onto the parser stack.  (The stack of matched elements.)
When all @var{components}' elements have been matched, it is
@dfn{reduced} to @var{result}.  @xref{Algorithm,,, bison, The GNU Bison Manual}.

A particular @var{result} written into your grammar becomes
the parser's goal.  It is designated by a @code{%start} statement
(@pxref{Starting Rules}).  The value returned by the associated
@var{optional-semantic-action} is the parser's result.  It should be
a tree of @semantic{} @dfn{tags}.

@var{components} is made up of symbols.  A symbol such as @code{FOO}
means that a syntactic token of class @code{FOO} must be matched.

@menu
* How Lexical Tokens Match::
* Grammar-to-Lisp Details::
* Order of components in rules::
@end menu

@node How Lexical Tokens Match
@section How Lexical Tokens Match

A lexical rule must be used to define how to match a lexical token.

For instance:

@example
%keyword FOO "foo"
@end example

Means that @code{FOO} is a reserved language keyword, matched as such
by looking up into a keyword table.  This is because @code{"foo"} will be
converted to
@code{FOO} in the lexical analysis stage.  Thus the symbol @code{FOO}
won't be available any other way.

If we specify our token in this way:

@example
%token <symbol> FOO "foo"
@end example

then @code{FOO} will match the string @code{"foo"} explicitly, but it
won't do so at the lexical level, allowing use of the text
@code{"foo"} in other forms of regular expressions.

In that case, @code{FOO} is a @code{symbol}-type token.  To match, a
@code{symbol} must first be encountered, and then it must
@code{string-match "foo"}.

@table @strong
@item Caution:
Be especially careful to remember that @code{"foo"}, and more
generally the %token's match-value string, is a regular expression!
@end table

Non symbol tokens are also allowed.  For example:

@example
%token <punctuation> PERIOD "[.]"

filename : symbol PERIOD symbol
         ;
@end example

@code{PERIOD} is a @code{punctuation}-type token that will explicitly
match one period when used in the above rule.

@table @strong
@item Please Note:
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
types, based on the @dfn{syntax class}-character associations
currently in effect.
@end table

@node Grammar-to-Lisp Details
@section Grammar-to-Lisp Details

For the bovinator, lexical token matching patterns are @emph{inlined}.
When the grammar-to-lisp converter encounters a lexical token
declaration of the form:

@example
%token <@var{type}> @var{token-name} @var{match-value}
@end example

It substitutes every occurrences of @var{token-name} in rules, by its
expanded form:

@example
@var{type} @var{match-value}
@end example

For example:

@example
%token <symbol> MOOSE "moose"

find_a_moose: MOOSE
            ;
@end example

Will generate this pseudo equivalent-rule:

@example
find_a_moose: symbol "moose"   ;; invalid syntax!
            ;
@end example

Thus, from the bovinator point of view, the @var{components} part of a
rule is made up of symbols and strings.  A string in the mix means
that the previous symbol must have the additional constraint of
exactly matching it, as described in @ref{How Lexical Tokens Match}.

@table @strong
@item Please Note:
For the bovinator, this task was mixed into the language definition to
simplify implementation, though Bison's technique is more efficient.
@end table

@node Order of components in rules
@section Order of components in rules

If a rule has multiple components, order is important, for example

@example
headerfile : symbol PERIOD symbol
           | symbol
           ;
@end example

would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
The bovine parser will first attempt to match the long form, and then
the short form.  If they were in reverse order, then the long form
would never be tested.

@c @xref{Default syntactic tokens}.

@node Optional Lambda Expression
@chapter Optional Lambda Expressions

The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
a bovine lambda.  This lambda has special short-cuts to simplify
reading the semantic action definition.  An @acronym{OLE} like this:

@example
( $1 )
@end example

results in a lambda return which consists entirely of the string
or object found by matching the first (zeroth) element of match.
An @acronym{OLE} like this:

@example
( ,(foo $1) )
@end example

executes @code{foo} on the first argument, and then splices its return
into the return list whereas:

@example
( (foo $1) )
@end example

executes @code{foo}, and that is placed in the return list.

Here are other things that can appear inline:

@table @code
@item $1
The first object matched.

@item ,$1
The first object spliced into the list (assuming it is a list from a
non-terminal).

@item '$1
The first object matched, placed in a list.  I.e., @code{( $1 )}.

@item foo
The symbol @code{foo} (exactly as displayed).

@item (foo)
A function call to foo which is stuck into the return list.

@item ,(foo)
A function call to foo which is spliced into the return list.

@item '(foo)
A function call to foo which is stuck into the return list in a list.

@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
A list starting with @code{EXPAND} performs a recursive parse on the
token passed to it (represented by @samp{$1} above.)  The
@dfn{semantic list} is a common token to expand, as there are often
interesting things in the list.  The @var{nonterminal} is a symbol in
your table which the bovinator will start with when parsing.
@var{nonterminal}'s definition is the same as any other nonterminal.
@var{depth} should be at least @samp{1} when descending into a
semantic list.

@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
Is like @code{EXPAND}, except that the parser will iterate over
@var{nonterminal} until there are no more matches.  (The same way the
parser iterates over the starting rule (@pxref{Starting Rules}).  This
lets you have much simpler rules in this specific case, and also lets
you have positional information in the returned tokens, and error
skipping.

@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
This is used for creating an association list.  Each @var{symbol} is
included in the list if the associated @var{value} is non-@code{nil}.
While the items are all listed explicitly, the created structure is an
association list of the form:

@example
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
@end example

@item (TAG @var{name} @var{class} [@var{attributes}])
This creates one tag in the current buffer.

@table @var
@item name
Is a string that represents the tag in the language.

@item class
Is the kind of tag being create, such as @code{function}, or
@code{variable}, though any symbol will work.

@item attributes
Is an optional set of labeled values such as @code{:constant-flag t :parent
"parenttype"}.
@end table

@item  (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
Create a tag with @var{name} of respectively the class
@code{variable}, @code{function}, @code{type}, @code{include},
@code{package}, and @code{code}.
@end table

If the symbol @code{%quotemode backquote} is specified, then use
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
This lets you send @code{$1} as a symbol into a list instead of having
it expanded inline.

@node Bovine Examples
@chapter Examples

The rule:

@example
any-symbol: symbol
          ;
@end example

is equivalent to

@example
any-symbol: symbol
            ( $1 )
          ;
@end example

which, if it matched the string @samp{"A"}, would return

@example
( "A" )
@end example

If this rule were used like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( $1 $3 )
      ;
@end example

it would match @samp{"A=B"}, and return

@example
( ("A") ("B") )
@end example

The letters @samp{A} and @samp{B} come back in lists because
@samp{any-symbol} is a nonterminal, not an actual lexical element.

To get a better result with nonterminals, use @asis{,} to splice lists
in like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( ,$1 ,$3 )
      ;
@end example

which would return

@example
( "A" "B" )
@end example

@node GNU Free Documentation License
@appendix GNU Free Documentation License

@include doclicense.texi

@c There is nothing to index at the moment.
@ignore
@node Index
@unnumbered Index
@printindex cp
@end ignore

@iftex
@contents
@summarycontents
@end iftex

@bye

@c Following comments are for the benefit of ispell.

@c  LocalWords:  bovinator inlined