ANTLR4, .NET Core 2.1, and C#: Using the Visitor
October 16, 2018 by Michael
In the first post in this ANTLR4 series we went over setting up the tooling and tested everything with a simple grammar file. This post will focus on using ANTLR4 to generate the C# classes need in order to implement a simple visitor.
This post will use the grammar file created in the previous post. You can work your way through that post or simply download the source code from GitHub. Whatever works for you.
Michael
Before getting to far into the code, it’s probably a good idea to understand a bit more about how ANTLR works.
Before getting to far into the code, it’s probably a good idea to understand a bit more about how ANTLR works. Grammar files are used by ANTLR to generate a lexer and a parser. The lexer is used to turn the raw input into a token stream. Finally, the parser validates the token stream and generates a syntax tree. If you want to do more than validate the input you must traverse the syntax tree using one of the two methods ANTLR provides: listeners and visitors.
By default ANTLR4 will generate the files necessary to provide you with a base listener in the language of your choice. While the listener approach is perfectly acceptable, I find the visitor pattern to be a better fit for most of my use cases. You can find a more detailed comparison here.
Generating C# Files With ANTLR4
In the previous post we generated the Java files for your grammar using the antlr4
batch file we created. It turns out generating C# instead of Java is as simple as using one of many command line options the ANTLR4 tool supports (you can find a complete listing here).
Let’s create a batch file named gencsharp.bat
in the same directory as our Calculator.g4
file:
REM gencsharp.bat
antlr4 -Dlanguage=CSharp -o csharp Calculator.g4 -no-listener -visitor
Let’s breakdown the second line piece by piece:
- antlr4 – This is the batch file we setup in the first part of this series
- -Dlanguage=CSharp – This option tells the ANTLR4 tool to ignore whatever language is configured in the grammar file and target the specified language instead
- -o csharp – Tell ANTLR4 to put the output in a directory named
csharp
- -no-listener and -visitor – Tell ANTLR4 to generate a visitor instead of a listener
To simplify generating the Java files, let’s go ahead and setup a batch file for that as well: call it genjava.bat
:
REM genjava.bat
antlr4 -Dlanguage=Java -o java Calculator.g4
Notice that we did not provide the -no-listener and -visitor command line options for the Java version. The TestRig uses the default output generated by the ANTLR4 tooling.
The Java files will come in handy when we went to use the TestRig
to test new features in our grammar files.
At this point you can generate the C# files by simply running the gencsharp.bat
file we created above. When ANTLR4 has finished running you should see the following files in your csharp
folder:
Setting up Visual Studio
When it comes to developing solutions using ANTLR4 you have a lot of options. You can develop a C# project using Visual Studio, VS Code, Rider, or the text editor of your choice and the command line. In order to help ease into things, this post will use Visual Studio. However, of the aforementioned options, Visual Studio has the weakest support for working with ANTLR4 grammar files. In later posts we’ll look at some of the other options.
Step 1: Create a New .NET Core Console Application
If you’re researching how to use ANTLR4 with C# and .NET Core, odds are you don’t need much help with this step. Long story short:
- Launch Visual Studio
- Select
File
New
Project...
- Select
Console App (.NET Core)
as the project type - Name it
Calculator
- Click on
OK
Step 2: Add the ANTLR4 Generated C# files to Your Project
Copy the .cs
files generated by ANTLR4 (CalculatorLexer.cs
, CalculatorParser.cs
, CalculatorBaseVisitor.cs
) and paste them into your Visual Studio project folder. While not necessary, I created a Parsing
folder in my project for these files. CalculatorParser.cs
was renamed to ICalculatorParser.cs
in order to adhere to C# naming conventions: I recommend you do the same. This is what my solution looks like:
For now, ignore CalculatorVisitor.cs
and ThrowingErrorListener.cs
. They will be added shortly.
Implementing the Visitor
With the C# files generated and added to your project you are finally ready to do something interesting with the grammar we first created in the previous post.
The visitor implemented here is not intended to be used in production code. It’s a simplified example to be used for learning. Feel free to provide feedback, but let’s not focus on how elegant (or not) the visitor is.
The base visitor provided by ANTLR4 provides one method per parser rule in the grammar file (the lowercase rules: operand
and expression
). These methods can be overridden in a derived class and provide an insertion point for your logic. In our case we only have two methods to override: VisitExpression
and VisitOperand
. It is common to have one class derived from the base visitor class per rule implementation. Given the simplicity of this example, we’ll stick to a single class.
Implementing VisitOperand
Let’s first address the implementation of VisitOperand
. Here is the code for that method in it’s entirety:
public override int VisitOperand([NotNull]OperandContext context)
{
ITerminalNode digit = context.DIGIT();
return digit != null
? int.Parse(digit.GetText())
: HandleGroup(context.operand(), context.OPERATOR());
}
For reference, here is the associated parser rule:
operand: DIGIT | LPAREN operand (OPERATOR operand)+ RPAREN;
Based on our grammar, we know that operand
will either be a DIGIT
or a group containing multiple operators and operands (e.g. (1+2+3)
). We can use that knowledge to determine how to best walk the syntax tree. If context.DIGIT()
returns something other than null
, then we know that we have a DIGIT
. Otherwise we can assume that we have a group we need to deal with. The names generated by ANTLR4 are a bit deceiving. Both context.operand()
and context.OPERATOR()
appear to be single values; however, both return arrays.
Handling Groups
Handling the DIGIT
case is straight forward: convert the string value into an integer value and return it. Group handling is a bit more complex. For that we’ll implement the HandleGroup
method:
private int HandleGroup(OperandContext[] operandCtxs, ITerminalNode[] operatorNodes)
{
List operands = operandCtxs.Select(Visit).ToList();
Queue operators = new Queue(operatorNodes.Select(o => o.GetText()));
return operands.Aggregate((a, c) => _funcMap[operators.Dequeue()](a, c));
}
The first line of the method handles converting the operand nodes to a collection of integers. Visit
‘s default implementation will eventually call the VisitOperand
method we implemented above. Linq is used to map the OperandContext
array to an array of integers via the aforementioned Visit
method.
Operator nodes are all terminal nodes. That means those nodes represent leaves in the syntax tree: no need to visit them. Once again Linq is used to handle mapping values. We know that we’ll want to use each operator once and only once. A Queue
provides a simple way to keep track of which operators we have and have not used.
Lastly we need to reduce the list of operands and the queue of operators down to a single calculated value. That can be accomplished with Linq’s equivalent of a reduce method: Aggregate
. The last line of the method aggregates together two operands using a function map keyed off of the operator. The result is stored in the accumulator. This process repeats until all of the operands have been reduced down to one value.
This simple implementation does not have error handling, nor does it properly handle operator precedence. Please don’t try to use it in production code. Your boss won’t be happy.
Implementing VisitExpression
Referencing the parser rules for expression
and operand
you can see that they have a few things in common:
expression: operand (OPERATOR operand)+;
operand: DIGIT | LPAREN operand (OPERATOR operand)+ RPAREN;
It seems fairly reasonable that we can leverage the code we wrote to handle groups in VisitOperand
here as well. Let’s just do that and move on:
public override int VisitExpression([NotNull]ExpressionContext context)
{
return HandleGroup(context.operand(), context.OPERATOR());
}
At this point we have fully implemented our very basic visitor. While the code doesn’t reflect production ready practices, it should be more than enough to get you started.
The Complete CalculatorVisitor Class
As mentioned in the introduction, you can download the project source code from GitHub. If you would prefer not to bother with that, here is the complete listing for the CalculatorVisitor.cs
file:
using System;
using System.Collections.Generic;
using System.Linq;
using Antlr4.Runtime.Misc;
using Antlr4.Runtime.Tree;
using OperandContext = CalculatorParser.OperandContext;
using ExpressionContext = CalculatorParser.ExpressionContext;
namespace Calculator.Parsing
{
internal class CalculatorVisitor : CalculatorBaseVisitor
{
#region Member Variables
private readonly Dictionary> _funcMap =
new Dictionary>
{
{"+", (a, b) => a + b},
{"-", (a, b) => a - b},
{"*", (a, b) => a * b},
{"/", (a, b) => a / b}
};
#endregion
#region Base Class Overrides
public override int VisitExpression([NotNull]ExpressionContext context)
{
return HandleGroup(context.operand(), context.OPERATOR());
}
public override int VisitOperand([NotNull]OperandContext context)
{
ITerminalNode digit = context.DIGIT();
return digit != null
? int.Parse(digit.GetText())
: HandleGroup(context.operand(), context.OPERATOR());
}
#endregion
#region Utility Methods
private int HandleGroup(OperandContext[] operandCtxs, ITerminalNode[] operatorNodes)
{
List operands = operandCtxs.Select(Visit).ToList();
Queue operators = new Queue(operatorNodes.Select(o => o.GetText()));
return operands.Aggregate((a, c) => _funcMap[operators.Dequeue()](a, c));
}
#endregion
}
}
Putting the Visitor to Use
The last thing we need to do is to actually put the CalculatorVisitor
class to use in some sort of meaningful way. Using the visitor is straight forward. First, use the lexer to tokenize the input stream. Second, use the parser to build the syntax tree. Finally, visit the syntax tree.
For a complete listing of Program.cs
please visit the GitHub repository. The relevant code is listed below:
private int EvaluateInput(string input)
{
CalculatorLexer lexer = new CalculatorLexer(new AntlrInputStream(input));
lexer.RemoveErrorListeners();
lexer.AddErrorListener(new ThrowingErrorListener());
CalculatorParser parser = new CalculatorParser(new CommonTokenStream(lexer));
parser.RemoveErrorListeners();
parser.AddErrorListener(new ThrowingErrorListener());
return new CalculatorVisitor().Visit(parser.expression());
}
Assuming all has gone well, you should be able to build the program, run it, and calculate the result of simple expressions.
Helpful Links
While I do my best to provide useful information, you should probably supplement what I’ve written above with some additional information:
Thank you! I have followed this with VS Code and it is working fine.
Hi there, I think your blog could possibly be having browser compatibility issues.
When I take a look at your site in Safari, it looks fine but when opening in IE, it has
some overlapping issues. I just wanted to provide you with a quick heads
up! Other than that, wonderful site!