# Notes on FsLex and FsYacc

For better or worse the F# compiler contains three tokenizers (`*.fsl`) and three
grammars (`*.fsy`) implemented using FsLex and FsYacc respectively, including the all-important F# grammar itself.
The canonical home for FsLex and FsYacc is https://github.com/fsprojects/FsLexYacc.
FsLex and FsYacc are themselves built using earlier versions of FsLex and FsYacc.

**If you would like to improve, modify, extend, test or document these
tools, generally please do so in that repository.  There are some exceptions, see below.**

The `buildtools\fslex` and `buildtools\fsyacc` directories are an _exact_ copy of  `packages\FsLexYacc.XYZ\src\fslex` and `packages\FsLexYacc.XYZ\src\fsyacc`.  We should really verify this as part of our build.
This copy is done because we needed to have a build-from-source story.
In build-from-source, the only tool we can assume is an install of the .NET SDK.
That means we have to build up FsLex and FsYacc from scratch, _including_ their own generated fslexlex.fs, fslexpars.fs and so on.
We can't pick up the source from "packages" because in a build-from-source scenario we can't even fetch those
packages - we really have to build from just our source tree and .NET SDK.

Please do _not_ modify the code in these directories except by copying over from an upgraded FsLexYacc pacakge.
Without the testing and documentation in the `FsLexYacc` repo, this copied code is just a bunch of untested, undocumented and
largely generated code checked into our source tree.

## What if I want to modify/improve FsLex and FsYacc

First, be clear on what you want to do:

1. You might want to update the _code generators_ for the fslex or fsyacc tools.

2. You might want to update the _runtime_ of the fslex or fsyacc tools.

For (1), to improve the code/table generators, make a PR to the `FsLexYacc` repository and go through the cycle of updating these files to match a package upgrade.

For (2), normally for FsLexYacc-based tools the runtime is either a source inclusion of `Lexing.fs`, Lexing.fsi, Parsing.fs, Parsing.fsi or a reference to the `FsLexYacc.Runtime` package.  The runtime contains LexBuffer and the lexing/parsing table interpreters.

However long ago we decided to duplicate and ingest the _runtime_ files for FsLex and FsYacc into the F# compiler rather than taking them directly from the FsLexYacc project.  This was mainly because we wanted to squeeze optimizations out of them based on profiling and simplify them a bit.  The duplicated files are `prim-lexing.fs`, `prim-parsing.fs` and the corresponding `.fsi` files in `src/utils`.  These files are sufficient to implement the contracts exepcted by the FsLex/FsYacc generated code, and require exactly the same table formats as generated by FsLex/FsYacc.

This means you can improve some aspects of the _runtime_ for FsLex and FsYacc by making direct changes to `prim-lexing.fs` and `prim-parsing.fs`.

For example, the _actual_ `LexBuffer` type being used in the F# compiler (for all three lexers and grammars) is this one: https://github.com/dotnet/fsharp/blob/bdb64624f0ca220ca4433c83d02dd5822fe767a5/src/Compiler/Facilities/prim-lexing.fsi#L102 .  (That version of the Lex/Yacc runtime has added some things: `BufferLocalStore` for example, which we use for the `XmlDoc` accumulator as we strip those out. It's also dropped any mention of async lexing, and any mention of `byte`. The use
of generics for `LexBuffer<'Char>` is also superfluous because `'Char` is always `char` but is needed because the FsLex/FsYacc generated code expects this type to be generic.)

## What if I want to eridicate our use of FsLex and FsYacc?

The use of FsLex and FsYacc in this repo is somewhat controversial since the C# compiler implementation uses hand-written lexers and parsers.

In the balance the use of FsLex is fairly reasonable and unlikely to change, though moving to an alternative tokenization technique wouldn't be
overly difficult given the declarative nature of `FsLex` tokenization.

The use of a table-driven LALR(1) parser is more controversial: there is a general feeling that it would be great to
somehow move on from FsYacc and do parsing some other way. However, it is not at all easy to do that and remain
fully compatible.  For this reason it is unlikely we will remove the use of FsYacc any time soon. However incremental
modifications to extract more information from the grammer may yield good results.

## Why aren't FsLex and FsYacc just ingested into this repo if we depend on them (and even have an exact copy of them for build-from-source)?

FsLex and FsYacc are non-trivial tools that require documentation and testing.  Also, for external users, they require packaging. Changes to their design should be
considered carefully. While we are open to adding features to these tools specifically for use by the F# compiler, the tools are open source and available
independently.  For these reasons it is generally best that these tools live in their own repository.

The copy of the `fslex` and `fsyacc` source code in `buildtools` is an exact copy and is not tested or documented
apart from what's been done before in FsLexYacc repo. Adjusting these copies is not allowed and would be wrong from an engineering persepctive,
because there's no place to put documentation or tests.

Occasionally we discuss ingesting FsLex and FsYacc into this repository. This often comes up in the hope that by doing so
we can somehow eventually code-fold them away until we no longer require them at all, instead moving to hand-written parsers
and lexers. That's an admirable goal.  However, moving the tools into this repo doesn't actually help with eliminating their
use, and may indeed make it harder. This is because these tools use table generation
based on very specific lexer/grammar specifications. The tables are unreadable and unmaintainable.  You can't just
somehow "specialize" the tools to the F# grammar and then get rid of them as this doesn't give a useful, maintainable lexer or parser.
To our knowledge there is no way to convert an LALR(1) parser specification to readable, maintainable recursive descent parsing code.

As a result, ingesting the tools into this repo (and modifying them here) would be counter-productive, as the tools would no longer be tested, documented or
maintained properly, and overall engineering quality would decrease.  Further the bootstrap process for the repo then becomes very unwieldy.


