ANTLR4, .NET Core 2.1, and C#: Using the Visitor

October 16, 2018 by Michael

In the first post in this ANTLR4 series we went over setting up the tooling and tested everything with a simple grammar file.  This post will focus on using ANTLR4 to generate the C# classes need in order to implement a simple visitor.

This post will use the grammar file created in the previous post.  You can work your way through that post or simply download the source code from GitHub.  Whatever works for you.


Before getting to far into the code, it’s probably a good idea to understand a bit more about how ANTLR works.

Michael

Before getting to far into the code, it’s probably a good idea to understand a bit more about how ANTLR works.  Grammar files are used by ANTLR to generate a lexer and a parser.  The lexer is used to turn the raw input into a token stream.  Finally, the parser validates the token stream and generates a syntax tree.  If you want to do more than validate the input you must traverse the syntax tree using one of the two methods ANTLR provides:  listeners and visitors.

By default ANTLR4 will generate the files necessary to provide you with a base listener in the language of your choice.  While the listener approach is perfectly acceptable, I find the visitor pattern to be a better fit for most of my use cases.  You can find a more detailed comparison here.

Generating C# Files With ANTLR4

In the previous post we generated the Java files for your grammar using the antlr4 batch file we created.  It turns out generating C# instead of Java is as simple as using one of many command line options the ANTLR4 tool supports (you can find a complete listing here).

Let’s create a batch file named gencsharp.bat in the same directory as our Calculator.g4 file:

REM gencsharp.bat
antlr4 -Dlanguage=CSharp -o csharp Calculator.g4 -no-listener -visitor

Let’s breakdown the second line piece by piece:

  • antlr4 – This is the batch file we setup in the first part of this series
  • -Dlanguage=CSharp – This option tells the ANTLR4 tool to ignore whatever language is configured in the grammar file and target the specified language instead
  • -o csharp – Tell ANTLR4 to put the output in a directory named csharp
  • -no-listener and -visitor – Tell ANTLR4 to generate a visitor instead of a listener

To simplify generating the Java files, let’s go ahead and setup a batch file for that as well:  call it genjava.bat:

REM genjava.bat
antlr4 -Dlanguage=Java -o java Calculator.g4

Notice that we did not provide the -no-listener and -visitor command line options for the Java version.  The TestRig uses the default output generated by the ANTLR4 tooling.

The Java files will come in handy when we went to use the TestRig to test new features in our grammar files.

At this point you can generate the C# files by simply running the gencsharp.bat file we created above.  When ANTLR4 has finished running you should see the following files in your csharp folder:

csharp folder contents after running ANTLR4
csharp folder contents after running gencsharp.bat

Setting up Visual Studio

When it comes to developing solutions using ANTLR4 you have a lot of options.  You can develop a C# project using Visual Studio, VS Code, Rider, or the text editor of your choice and the command line.  In order to help ease into things, this post will use Visual Studio.  However, of the aforementioned options, Visual Studio has the weakest support for working with ANTLR4 grammar files.  In later posts we’ll look at some of the other options.

Step 1:  Create a New .NET Core Console Application

If you’re researching how to use ANTLR4 with C# and .NET Core, odds are you don’t need much help with this step.  Long story short:

  1. Launch Visual Studio
  2. Select File New Project...
  3. Select Console App (.NET Core) as the project type
  4. Name it Calculator
  5. Click on OK

Step 2:  Add the ANTLR4 Generated C# files to Your Project

Copy the .cs files generated by ANTLR4 (CalculatorLexer.cs, CalculatorParser.csCalculatorBaseVisitor.cs) and paste them into your Visual Studio project folder.  While not necessary, I created a Parsing folder in my project for these files.  CalculatorParser.cs  was renamed to ICalculatorParser.cs in order to adhere to C# naming conventions:   I recommend you do the same.  This is what my solution looks like:

ANTLR4 project file structure
ANTLR4 Project Folder Structure

For now, ignore CalculatorVisitor.cs and ThrowingErrorListener.cs.  They will be added shortly.

Implementing the Visitor

With the C# files generated and added to your project you are finally ready to do something interesting with the grammar we first created in the previous post.

The visitor implemented here is not intended to be used in production code.  It’s a simplified example to be used for learning.  Feel free to provide feedback, but let’s not focus on how elegant (or not) the visitor is.

The base visitor provided by ANTLR4 provides one method per parser rule  in the grammar file (the lowercase rules: operand and expression).  These methods can be overridden in a derived class and provide an insertion point for your logic.  In our case we only have two methods to override: VisitExpression and VisitOperand.  It is common to have one class derived from the base visitor class per rule implementation.  Given the simplicity of this example, we’ll stick to a single class.

Implementing VisitOperand

Let’s first address the implementation of VisitOperand.  Here is the code for that method in it’s entirety:

public override int VisitOperand([NotNull]OperandContext context)
{
    ITerminalNode digit = context.DIGIT();

    return digit != null
        ? int.Parse(digit.GetText())
        : HandleGroup(context.operand(), context.OPERATOR());
}

For reference, here is the associated parser rule:

operand: DIGIT | LPAREN operand (OPERATOR operand)+ RPAREN;

Based on our grammar, we know that operand will either be a DIGIT or a group containing multiple operators and operands (e.g. (1+2+3)).  We can use that knowledge to determine how to best walk the syntax tree.  If context.DIGIT() returns something other than null, then we know that we have a DIGIT.  Otherwise we can assume that we have a group we need to deal with.  The names generated by ANTLR4 are a bit deceiving.  Both context.operand() and context.OPERATOR() appear to be single values; however, both return arrays.

Handling Groups

Handling the DIGIT case is straight forward:  convert the string value into an integer value and return it.  Group handling is a bit more complex.  For that we’ll implement the HandleGroup method:

private int HandleGroup(OperandContext[] operandCtxs, ITerminalNode[] operatorNodes)
{
    List operands = operandCtxs.Select(Visit).ToList();
    Queue operators = new Queue(operatorNodes.Select(o => o.GetText()));

    return operands.Aggregate((a, c) => _funcMap[operators.Dequeue()](a, c));
}

The first line of the method handles converting the operand nodes to a collection of integers.  Visit‘s default implementation will eventually call the VisitOperand method we implemented above.  Linq is used to map the OperandContext array to an array of integers via the aforementioned Visit method.

Operator nodes are all terminal nodes.  That means those nodes represent leaves in the syntax tree:  no need to visit them.  Once again Linq is used to handle mapping values.  We know that we’ll want to use each operator once and only once.  A Queue provides a simple way to keep track of which operators we have and have not used.

Lastly we need to reduce the list of operands and the queue of operators down to a single calculated value.  That can be accomplished with Linq’s equivalent of a reduce method:  Aggregate.  The last line of the method aggregates together two operands using a function map keyed off of the operator.  The result is stored in the accumulator.  This process repeats until all of the operands have been reduced down to one value.

This simple implementation does not have error handling, nor does it properly handle operator precedence.  Please don’t try to use it in production code.  Your boss won’t be happy.

Implementing VisitExpression

Referencing the parser rules for expression and operand you can see that they have a few things in common:

expression: operand (OPERATOR operand)+;

operand: DIGIT | LPAREN operand (OPERATOR operand)+ RPAREN;

It seems fairly reasonable that we can leverage the code we wrote to handle groups in VisitOperand here as well.  Let’s just do that and move on:

public override int VisitExpression([NotNull]ExpressionContext context)
{
    return HandleGroup(context.operand(), context.OPERATOR());
}

At this point we have fully implemented our very basic visitor.  While the code doesn’t reflect production ready practices, it should be more than enough to get you started.

The Complete CalculatorVisitor Class

As mentioned in the introduction, you can download the project source code from GitHub.  If you would prefer not to bother with that, here is the complete listing for the CalculatorVisitor.cs file:

using System;
using System.Collections.Generic;
using System.Linq;
using Antlr4.Runtime.Misc;
using Antlr4.Runtime.Tree;

using OperandContext = CalculatorParser.OperandContext;
using ExpressionContext = CalculatorParser.ExpressionContext;

namespace Calculator.Parsing
{
    internal class CalculatorVisitor : CalculatorBaseVisitor
    {
        #region Member Variables
        private readonly Dictionary> _funcMap =
            new Dictionary>
            {
                {"+", (a, b) => a + b},
                {"-", (a, b) => a - b},
                {"*", (a, b) => a * b},
                {"/", (a, b) => a / b}
            };
        #endregion

        #region Base Class Overrides
        public override int VisitExpression([NotNull]ExpressionContext context)
        {
            return HandleGroup(context.operand(), context.OPERATOR());
        }

        public override int VisitOperand([NotNull]OperandContext context)
        {
            ITerminalNode digit = context.DIGIT();

            return digit != null
                ? int.Parse(digit.GetText())
                : HandleGroup(context.operand(), context.OPERATOR());
        }
        #endregion

        #region Utility Methods
        private int HandleGroup(OperandContext[] operandCtxs, ITerminalNode[] operatorNodes)
        {
            List operands = operandCtxs.Select(Visit).ToList();
            Queue operators = new Queue(operatorNodes.Select(o => o.GetText()));

            return operands.Aggregate((a, c) => _funcMap[operators.Dequeue()](a, c));
        }
        #endregion
    }
}

Putting the Visitor to Use

The last thing we need to do is to actually put the CalculatorVisitor class to use in some sort of meaningful way.  Using the visitor is straight forward.  First, use the lexer to tokenize the input stream.  Second, use the parser to build the syntax tree.  Finally, visit the syntax tree.

For a complete listing of Program.cs please visit the GitHub repository.  The relevant code is listed below:

private int EvaluateInput(string input)
{
    CalculatorLexer lexer = new CalculatorLexer(new AntlrInputStream(input));

    lexer.RemoveErrorListeners();
    lexer.AddErrorListener(new ThrowingErrorListener());

    CalculatorParser parser = new CalculatorParser(new CommonTokenStream(lexer));

    parser.RemoveErrorListeners();
    parser.AddErrorListener(new ThrowingErrorListener());

    return new CalculatorVisitor().Visit(parser.expression());
}

Assuming all has gone well, you should be able to build the program, run it, and calculate the result of simple expressions.  

-2, yeah… that looks right

Helpful Links

While I do my best to provide useful information, you should probably supplement what I’ve written above with some additional information:


Discussion


Leave a Reply

Your email address will not be published. Required fields are marked *