How to unit and/or integration test parsers? - parsing

I am wondering what the general principles are for testing that a parser properly parses into an AST. I have tried in the past to test the structure of the AST, but there are two huge problems with this.
Readability. It's hard to read the test and understand what it's doing.
Maintainability. There are a huge number of tests to write, so compounds readability. And if you change the parser at all, you might need to rewrite all of your tests.
Say our parser is in JavaScript and our AST is in JSON. Instead of testing the JSON AST output structure matches some "expected" JSON object, like this:
assert.deepEqual(parse('x = 10'), {
type: 'assignment',
left: {
type: 'variable',
name: 'x',
},
right: {
type: 'integer',
value: 10
}
})
I would instead do something like this:
test('x = 10')
function test(str) {
let parsed = parse(str)
let generated = stringify(parsed)
assert.equal(str, generated)
}
Then it would be going full-circle, from string back to string, and you would only have to read the input string. What are the pros and cons of this, and is there a better standard approach? One con with this is that your stringifier has to format the string exactly the same way as the initially written string, which might be less than ideal. But one major pro with this is that you can simply write out your string once, and it is easy to read your tests. You just make sure your test function isn't cheating and returning the initial string, obviously :).
Note: The example I wrote of x = 10 is an extremely simple example. To robustly test the parser we are going to need complex full programs with all the features integrated, so the AST will become even more of a nightmare to "mentally parse" to figure out what the test does, if you compare ASTs to expected JSON structures.
Also note: This is just testing the AST generator works. A separate compiler will also have to eventually be tested. But here we don't concern ourselves with the compiler, and instead are only focused on the parser.

Actually, your first solution is "better" than your second idea.
However, don't check for exact equality. Do "contains" type tests. The result AST should contain "type: variable; name: x". That's one test. And it should contain "type: integer; value: 10". And it should contain "left { variable }". A third test. Now, all 3 tests could be parts of one test and there could be more parts. But think of what semantics you want to capture and make certain they are captured.
Moreover, turn those checks into assertions--not just in your test, but in your AST builder. The AST builder should assert that the lhs of an assignment AST is something assignable (initially just a variable, but more things as you loosen it).
You can do that with strings too, but it is actually harder unless you have control over your strings or you write a custom diff that checks them. Moreover, you don't really know in terms of semantics whether the string test has proven anything. You know you got the same string back, but that can be because you have corresponding flaws in how you read and write something. And that is a mistake you are actually likely to make.
Finally, in my last project we used json outputs for many things, and I wrote a json diff checker. So, that we could save the json once a test "worked" and then compare the json output for all subsequent runs. Yes, that is a "change detector" but when writing compilers, you really want to know when you have changed how something is parsed and not have it be an accident. You don't inspect the json output by hand, unless it has changed (and then if you are ok with the change) you simply save that file as the new regression check. And, if there are parts of json, you expect to vary, simply make your diff tool ignore those differences (perhaps just logging them). See my comparison examples, to see what kinds of things you might want to "match" in your diff tool.

One con with this is that your stringifier has to format the string exactly the same way as the initially written string, which might be less than ideal.
You should not run into this problem, in the sense if you do, JSON should in this scenario be stable one re-encoding.
That is if the AST is JSON text and the equality assertion is again JSON text, both JSON text can be decoded and then you can do your assertion:
actualJsonText = parse('x = 10');
expectedJsonText = lastAssertedSuccess();
assert.deepEqual(decode(actualJsonText), decode(expectedJsonText))
lastAsserted(actualJsonText);
Similar with varying scenarios, e.g. parse() actually has the AST:
actualJsonText = encode(parse('x = 10'));
expectedJsonText = lastAssertedSuccess();
assert.deepEqual(decode(actualJsonText), decode(expectedJsonText))
lastAsserted(actualJsonText);
these may not be the best unit tests as it creates a database that will grow over time. however it allows you to test output for input and if the JSON text is in a pretty-print format, a standard text differ will show what changed in a human readable format.
in case a test is a false positive, the new good master can be used to easily overwrite the last known good assertion. that is, you can relatively easily maintain this kind of test-suite.
I often create such known input -> expected output test-suites when I write a parser or similar (e.g. a tokenizer: binary string -> token stream) and I want to put the overall functionality under test and have a database of known goods to capture regressions.
For that it works pretty well and efficient. It requires when an assertion fails to understand the reason - the test is not telling you that.
What these kind of tests also don't tell you are the internals (of a parser state for example) and you may want more dedicated tests upfront, but these highly depend on the implementation details. And the implementation can be done test-driven, so this is already covered then.

Related

Data model for transforming source into AST and back?

I am working on a custom programming language. On compiling it, the parser first converts the text into a simple stream of tokens. The tokens are then converted into a simple tree. The tree is then converted into an object graph (with holes in it, as the types haven't yet been necessarily fully figured out). The holey tree is then transformed into a compact object graph.
Then we can go further and compile it to, say, JavaScript. The compact object graph is then transformed into a JavaScript AST. The JS AST is then transformed into a "concrete" syntax tree (with whitespace and such), and then that is converted into the JS text.
So in going from text to compact object graph, there are 5 transformation steps (text -> token_list -> tree -> holey_graph -> graph). In other situations (other languages), you might have more or less.
The way I am doing this transformation now is very ad-hoc and not keeping track of line numbers, so it's impossible to really tell where an error is coming from. I would like to fix that.
In my case, I am wondering how you could create a data model to keep track of the line of text where something was defined. This way, you could report any compilation errors nicely to the developer. The way I have modeled that so far is with a sort of "folding" model as I'm calling it. The initial "fold" is on the text -> token_list transformation. For each token, it keeps track of 3 things: the line, the column, and the text length, for the token. At first you may model it like this:
{
token: 'function',
line: 10,
column: 2,
size: 8
}
But that is tying two concepts into one object: the token itself, and the "fold" as I am calling it. Really it would be better like this:
fold = {
line: 10,
column: 2,
size: 8
}
token = {
value: 'function'
}
// bind the two together.
fold.data = token
token.fold = fold
Then, you transform from token to AST node in the simple tree. That might be like:
treeNode = {
type: 'function'
}
fold = {
previous: tokenFold,
data: treeNode
}
And so connecting the dots like this. In the end, you would have a fold list, which could be traversed theorertically from the compact object graph, to the text, so if there was a compile eror when doing typechecking for example, you could report the exact line number and everything to the developer. The navigation would look something like this:
data = compactObjectGraph
.fold
.previous.previous.previous.previous
.data
data.line
data.column
data.size
In theory. But the problem is, the "compact object graph" might have been created not from a simple linear chain of inputs, but from a suite of inputs. While I have modeled this on paper so far, I am starting to think there isn't actually in reality a clear way of mapping from object to object how it was transfformed, using this sort of "fold" sort of system.
The question is, how can I define the data model to allow for getting back to the source text line/column number, given there is a complex sequence of transformations from one data structure to the next? That is, at a high level, what is a way to model this that will allow you to isolate the transformation data structures, yet be able to map from the last generated one to the first, to find how some compact object graph node was actually represented in the original source text?
I would create a data structure containing the filename, line and column. In C++ it may work well to store a reference to this structure, rather than copy it to many places.
There isn't really that many ways to solve this, but having a single structure that is re-usable across your other data structures is almost certainly the right solution.
I answered your question on Quora in July, so maybe you missed it: https://qr.ae/pvkrwJ
Basically you have to stamp all the compiler artifacts with source information from which they are derived. Best represented a some kind of structure (Mats' response). Yeah, that takes effort, because
you have to do it everywhere in the compiler.
To do a perfect job, you'd need to stamp it with the complete set of source items that caused its generation; you're essentially producing a dependency graph. (You could represent such sets as trees of subsets to maximize sharing). Then any complaint the compiler issued could clearly identify the set of causes.
To do a less perfect job, you can pick any of of the contributing items and use that as the source location dependency. That means that a compiler complaint will only identify one source location that might be the cause, and the reader will have to guess at others if that isn't the principal source of the problem. Judicious choice of which-cause source information can arrange it so the answer is right much of the time and that's probably good enough.

Good practice to parse data in a custom format

I'm writing a program that takes in input a straight play in a custom format and then performs some analysis on it (like number of lines and words for each character). It's just for fun, and a pretext for learning cool stuff.
The first step in that process is writing a parser for that format. It goes :
####Play
###Act I
##Scene 1
CHARACTER 1. Line 1, he's saying some stuff.
#Comment, stage direction
CHARACTER 2, doing some stuff. Line 2, she's saying some stuff too.
It's quite a simple format. I read extensively about basic parser stuff like CFG, so I am now ready to get some work done.
I have written my grammar in EBNF and started playing with flex/bison but it raises some questions :
Is flex/bison too much for such a simple parser ? Should I just write it myself as described here : Is there an alternative for flex/bison that is usable on 8-bit embedded systems? ?
What is good practice regarding the respective tasks of the tokenizer and the parser itself ? There is never a single solution, and for such a simple language they often overlap. This is especially true for flex/bison, where flex can perform some intense stuff with regex matching. For example, should "#" be a token ? Should "####" be a token too ? Should I create types that carry semantic information so I can directly identify for example a character ? Or should I just process it with flex the simplest way then let the grammar defined in bison decide what is what ?
With flex/bison, does it makes sense to perform the analysis while parsing or is it more elegant to parse first, then operate on the file again with some other tool ?
This got me really confused. I am looking for an elegant, perhaps simple solution. Any guideline ?
By the way, about the programing language, I don't care much. For now I am using C because of flex/bison but feel free to advise me on anything more practical as long as it is a widely used language.
It's very difficult to answer those questions without knowing what your parsing expectations are. That is, an example of a few lines of text does not provide a clear understanding of what the intended parse is; what the lexical and syntactic units are; what relationships you would like to extract; and so on.
However, a rough guess might be that you intend to produce a nested parse, where ##{i} indicates the nesting level (inversely), with i≥1, since a single # is not structural. That violates one principle of language design ("don't make the user count things which the computer could count more accurately"), which might suggest a structure more like:
#play {
#act {
#scene {
#location: Elsinore. A platform before the castle.
#direction: FRANCISCO at his post. Enter to him BERNARDO
BERNARDO: Who's there?
FRANCISCO: Nay, answer me: stand, and unfold yourself.
BERNARDO: Long live the king!
FRANCISCO: Bernardo?
or even something XML-like. But that would be a different language :)
The problem with parsing either of these with a classic scanner/parser combination is that the lexical structure is inconsistent; the first token on a line is special, but most of the file consists of unparsed text. That will almost inevitably lead to spreading syntactic information between the scanner and the parser, because the scanner needs to know the syntactic context in order to decide whether or not it is scanning raw text.
You might be able to avoid that issue. For example, you might require that a continuation line start with whitespace, so that every line not otherwise marked with #'s starts with the name of a character. That would be more reliable than recognizing a dialogue line just because it starts with the name of a character and a period, since it is quite possible for a character's name to be used in dialogue, even at the end of a sentence (which consequently might be the first word in a continuation line.)
If you do intend for dialogue lines to be distinguished by the fact that they start with a character name and some punctuation then you will definitely have to give the scanner access to the character list (as a sort of symbol table), which is a well-known but not particularly respected hack.
Consider the above a reflection about your second question ("What are the roles of the scanner and the parser?"), which does not qualify as an answer but hopefully is at least food for thought. As to your other questions, and recognizing that all of this is opinionated:
Is flex/bison too much for such a simple parser ? Should I just write it myself...
The fact that flex and bison are (potentially) more powerful than necessary to parse a particular language is a red herring. C is more powerful than necessary to write a factorial function -- you could easily do it in assembler -- but writing a factorial function is a good exercise in learning C. Similarly, if you want to learn how to write parsers, it's a good idea to start with a simple language; obviously, that's not going to exercise every option in the parser/scanner generators, but it will get you started. The question really is whether the language you're designing is appropriate for this style of parsing, not whether it is too simple.
With flex/bison, does it makes sense to perform the analysis while parsing or is it more elegant to parse first, then operate on the file again with some other tool?
Either can be elegant, or disastrous; elegance has more to do with how you structure your thinking about the problem at hand. Having said that, it is often better to build a semantic structure (commonly referred to as an AST -- abstract syntax tree) during the parse phase and then analyse that structure using other functions.
Rescanning the input file is very unlikely to be either elegant or effective.

Partial parsing with flex/antlr

I encountered a problem while doing my student research project. I'm an electrical engineering student, but my project has somewhat to do with theoretical computer science: I need to parse a lot of pascal sourcecode-files for typedefinitions and constants and visualize all occurrences. The typedefinitions are spread recursively over various files, i.e. there is type a = byte in file x, in file y, there is a record (struct) b, that contains type a and then there is even a type c in file z that is an array of type b.
My idea so far was to learn about compiler construction, since the compiler has to resolve all typedefinitions and break them down to the elemental types.
So, I've read about compiler construction in two books (one of which is even written by the pascal inventor), but I'm lacking so many basics of theoretical computer science that it took me one week alone to work my way halfway through. What I've learned so far is that for achieving my goal, lexer and parser should be sufficient. Since this software is only a really smart part of the whole project, I can't spend so much time with it, so I started experimenting with flex and later with antlr.
My hope was, that parsing for typedefinitions only was such an easy task, that I could manage to do it with only using a scanner and let it do some parser's work: The pascal-files consist of 5 main-parts, each one being optional: A header with comments, a const-section, a type-section, a var-section and (in least cases) a code-section. Each section has a start-identifier but no clear end-identifier. So I started searching for the start of the type- and const-section (TYPE, CONST), discarding everything else. In flex, this is fairly easy, because it allows "start conditions". They can be used as various states like "INITIAL", "TYPE-SECTION", "CONST-SECTION" and "COMMENT" with different rules for each state. I wanted to get back a string from the scanner with following syntax " = ". There was one thing that made this task difficult: Some type contain comments like in this example: AuEingangsBool_t {PCMON} = MAX_AuEingangsFeld;. The scanner can not extract such type-definition with a regular expression.
My next step was to do it properly with scanner AND parser, so I searched for a parsergenerator and found antlr. Since I write the tool in C# anyway, I decided to use its scannergenerator, too, so that I do not have to communicate between different programs. Now I encountered following Problem: AFAIK, antlr does not support "start conditions" as flex do. That means, I have to scan the whole file (okay, comments still get discarded) and get a lot of unneccessary (and wrong) tokens. Because I don't use rules for the whole pascal grammar, the scanner would identify most keywords of the pascal syntax as user-identifiers and the parser would nag about all those series of tokens, that do not fit to type- and constant-defintions
Now, finally my question(s): Can anyone of you tell me, which approach leads anywhere for my project? Is there a possibility to scan only parts of the source-files with antlr? Or do I have to connect flex with antlr for that purpose? Can I tell antlr's parser to ignore every token that is not in the const- or type-section? Are those tools too powerful for my task and should I write own routines instead?
You'd be better off to find a compiler for Pascal, and simply modify to report the information you want. Presumably there is such a compiler for your Pascal, and often the source code for such compilers is available.
Otherwise you essentially need to build a parser. Building lexer, and then hacking around with the resulting lexemes, is essentially building a bad parser by ad hoc methods. ANTLR is a good way to go; you can define the lexemes (including means to pick up and ignore comments) pretty easily, especially for older dialects of Pascal. You'll need good BNF rules for the type information that you want, and translate those rules to the parser generator. What you can do to minimize work, is to cheat on rules for the parts of the language you don't care about. For instance, you could write an accurate subgrammar for assignment statements. Since you don't care about them, you can write a sloppy subgrammar that treats assignment statements as anything that begins with an identifier, is followed by arbitrary other tokens, and ends with semicolon. This kind of a grammar is called an "island grammar"; it is only accurate where it needs to be accurate.
I don't know about the recursive bit. Is there a reason you can't just process each file separately? The answer may depend on what information you want to know about each type declaration, and if you go deep enough, you may need a symbol table as well as an island parser. Parser generators offer you no help for this.
First, there can be type and const blocks within other blocks (procedures, in later Delphi versions also classes).
Moreover, I'm not entirely sure that you can actually simply scan for a const token, and then start parsing. Const is also used for other purposes in most common (Borland) Pascal dialects. Some keywords can be reused in a different context, and if you don't parse the global blockstructure, and only look for const and type in specific places you will erroneously start parsing there.
A base problem of course is the comments. Scanners cut out comments as early as possible, and don't regard them further. You probably have to setup the scanner so that comments are attached to the adjacent tokens as field (associate with token before or save them up till a certain token follows).
As far antlr vs flex, no clue. The only parsergenerator I have some minor experience in parsing Pascal with is Coco/R (a parsergenerator popular by Wirthians), but in general I (and many pascalians) prefer handcoded.

Using ANTLR to analyze and modify source code; am I doing it wrong?

I'm writing a program where I need to parse a JavaScript source file, extract some facts, and insert/replace portions of the code. A simplified description of the sorts of things I'd need to do is, given this code:
foo(['a', 'b', 'c']);
Extract 'a', 'b', and 'c' and rewrite the code as:
foo('bar', [0, 1, 2]);
I am using ANTLR for my parsing needs, producing C# 3 code. Somebody else had already contributed a JavaScript grammar. The parsing of the source code is working.
The problem I'm encountering is figuring out how to actually properly analyze and modify the source file. Each approach that I try to take in actually solving the problem leads me to a dead end. I can't help but think that I'm not using the tool as it's intended or am just too much of a novice when it comes to dealing with ASTs.
My first approach was to parse using a TokenRewriteStream and implement the EnterRule_* partial methods for the rules I'm interested in. While this seems to make modifying the token stream pretty easy, there is not enough contextual information for my analysis. It seems that all I have access to is a flat stream of tokens, which doesn't tell me enough about the entire structure of code. For example, to detect whether the foo function is being called, simply looking at the first token wouldn't work because that would also falsely match:
a.b.foo();
To allow me to do more sophisticated code analysis, my second approach was to modify the grammar with rewrite rules to produce more of a tree. Now, the first sample code block produces this:
Program
CallExpression
Identifier('foo')
ArgumentList
ArrayLiteral
StringLiteral('a')
StringLiteral('b')
StringLiteral('c')
This is working great for analyzing the code. However, now I am unable to easily rewrite the code. Sure, I could modify the tree structure to represent the code I want, but I can't use this to output source code. I had hoped that the token associated with each node would at least give me enough information to know where in the original text I would need to make the modifications, but all I get are token indexes or line/column numbers. To use the line and column numbers, I would have to make an awkward second pass through the source code.
I suspect I'm missing something in understanding how to properly use ANTLR to do what I need. Is there a more proper way for me to solve this problem?
What you are trying to do is called program transformation, that is, the automated generation of one program from another. What you are doing "wrong" is assuming is parser is all you need, and discovering that it isn't and that you have to fill in the gap.
Tools that do that this well have parsers (to build ASTs), means to modify the ASTs (both procedural and pattern directed), and prettyprinters which convert the (modified) AST back into legal source code. You seem to be struggling with the the fact that ANTLR doesn't come with prettyprinters; that's not part of its philosophy; ANTLR is a (fine) parser-generator. Other answers have suggested using ANTLR's "string templates", which are not by themselves prettyprinters, but can be used to implement one, at the price of implementing one. This harder to do than it looks; see my SO answer on compiling an AST back to source code.
The real issue here is the widely made but false assumption that "if I have a parser, I'm well on my way to building complex program analysis and transformation tools." See my essay on Life After Parsing for a long discussion of this; basically, you need a lot more tooling that "just" a parser to do this, unless you want to rebuild a significant fraction of the infrastructure by yourself instead of getting on with your task. Other useful features of practical program transformation systems include typically source-to-source transformations, which considerably simplify the problem of finding and replacing complex patterns in trees.
For instance, if you had source-to-source transformation capabilities (of our tool, the DMS Software Reengineering Toolkit, you'd be able to write parts of your example code changes using these DMS transforms:
domain ECMAScript.
tag replace; -- says this is a special kind of temporary tree
rule barize(function_name:IDENTIFIER,list:expression_list,b:body):
expression->expression
= " \function_name ( '[' \list ']' ) "
-> "\function_name( \firstarg\(\function_name\), \replace\(\list\))";
rule replace_unit_list(s:character_literal):
expression_list -> expression_list
replace(s) -> compute_index_for(s);
rule replace_long_list(s:character_list, list:expression_list):
expression_list -> expression_list
"\replace\(\s\,\list)-> "compute_index_for\(\s\),\list";
with rule-external "meta" procedures "first_arg" (which knows how to compute "bar" given the identifier "foo" [I'm guessing you want to do this), and "compute_index_for" which given a string literals, knows what integer to replace it with.
Individual rewrite rules have parameter lists "(....)" in which slots representing subtrees are named, a left-hand side acting as a pattern to match, and an right hand side acting as replacement, both usually quoted in metaquotes " which seperates rewrite-rule language text from target-language (e.g. JavaScript) text. There's lots of meta-escapes ** found inside the metaquotes which indicate a special rewrite-rule-language item. Typically these are parameter names, and represent whatever type of name tree the parameter represents, or represent an external meta procedure call (such as first_arg; you'll note the its argument list ( , ) is metaquoted!), or finally, a "tag" such as "replace", which is a peculiar kind of tree that represent future intent to do more transformations.
This particular set of rules works by replacing a candidate function call by the barized version, with the additional intent "replace" to transform the list. The other two transformations realize the intent by transforming "replace" away by processing elements of the list one at a time, and pushing the replace further down the list until it finally falls off the end and the replacement is done. (This is the transformational equivalent of a loop).
Your specific example may vary somewhat since you really weren't precise about the details.
Having applied these rules to modify the parsed tree, DMS can then trivially prettyprint the result (the default behavior in some configurations is "parse to AST, apply rules until exhaustion, prettyprint AST" because this is handy).
You can see a complete process of "define language", "define rewrite rules", "apply rules and prettyprint" at (High School) Algebra as a DMS domain.
Other program transformation systems include TXL and Stratego. We imagine DMS as the industrial strength version of these, in which we have built all that infrastructure including many standard language parsers and prettyprinters.
So it's turning out that I can actually use a rewriting tree grammar and insert/replace tokens using a TokenRewriteStream. Plus, it's actually really easy to do. My code resembles the following:
var charStream = new ANTLRInputStream(stream);
var lexer = new JavaScriptLexer(charStream);
var tokenStream = new TokenRewriteStream(lexer);
var parser = new JavaScriptParser(tokenStream);
var program = parser.program().Tree as Program;
var dependencies = new List<IModule>();
var functionCall = (
from callExpression in program.Children.OfType<CallExpression>()
where callExpression.Children[0].Text == "foo"
select callExpression
).Single();
var argList = functionCall.Children[1] as ArgumentList;
var array = argList.Children[0] as ArrayLiteral;
tokenStream.InsertAfter(argList.Token.TokenIndex, "'bar', ");
for (var i = 0; i < array.Children.Count(); i++)
{
tokenStream.Replace(
(array.Children[i] as StringLiteral).Token.TokenIndex,
i.ToString());
}
var rewrittenCode = tokenStream.ToString();
Have you looked at the string template library. It is by the same person who wrote ANTLR and they are intended to work together. It sounds like it would suit do what your looking for ie. output matched grammar rules as formatted text.
Here is an article on translation via ANTLR

Better way to test (automatically) a parser?

I’m recently writing a small programming language, and have finished writing its parser. I want to write an automated test for the parser (that its result is an abstract syntax tree), but I’m not sure which way is better.
First what I tried is just to serialize AST to S-expression text and compare it to the expected output text I wrote by hand, but it has some problems:
There are trivial meaningless differences between a serialized text and the expected output like whitespaces. For example, there is no difference between:
(attribute (symbol str) (symbol length))
(that is serialized) and:
(attribute (symbol str)
(symbol length))
(that is handwritten by me) in their meanings, but string comparison distincts them of course. Okay, I could resolve it by normalization.
When a test fails, it doesn’t show the difference between actual tree and expected tree concisely. I want to show only a difference node, not whole tree.
Second what I tried is to write S-expression parser and compare AST that parser (to be tested) generates to AST that S-expression parser (that I just implemented) generates from the handwritten expected output. However I realized that S-expression have to be tested also and it could be really nonsense.
I wonder what is the typical and easy way to test the parser.
PS. I am using Java, and dont’t want any dependencies to third-party libraries.
Providing you are looking for a completely automated and extensible unit testing framework for your parser I'd recommend the following approach:
Incorrect input
Create a set of samples of incorrect inputs. Then feed the parse with each of them making sure the parser rejects them. I's a good idea to provide metadata for each test case that defines the expected output — the specific error code / message the parser is supposed to produce.
Correct input
As in the previous case, create a set of samples representing various correct inputs. Besides the simple validation that the parser accepts all inputs, there's still the problem of validating that the actual Abstract Syntax Tree makes sense.
To address this problem I'd do the following: Describing the expected AST for each test case in some well-known format that can be safely parsed — deserialized into the actual in-memory AST structures — by a 3rd party parser considered bug-free (for your case). The natural choice is XML since most languages / programming frameworks cover XML support and provide the respective (de)serialization facilities. The best solution would be to deserialize right into the AST node types. Since convenient visual editing tools for XML exist it's feasible to construct even large test cases.
Then I'd construct an AST comparer using the visitor pattern which pair-up the two ASTs and compare both nodes in each pair for equality. However, equality is a per-AST-node-type specific operation.
Notes:
This approach would work with most unit-testing frameworks like JUnit.
AST to XML serialization is a welcome tool for debugging the compiler.
The visitor pattern implementation can easily serve as the backbone for multiple processing stages within the compiler.
There are compiler test suites freely available that can provide some inspiration to your project — see for example the Ada Conformity Assesment Test Suite for the Ada programming language, although this test suite deals with higher-level testing, not just parser testing.
Here's what. A grammar defines a language. The language is the set of string that the grammar generates, or that a parser for the grammar accepts.
Given that, more than testing if the ASTs seem right, it's important to test that the parser accepts strings intended to be in the language and rejects strings that in your mind shouldn't belong to it.
In that sense, a simple accept or reject (bonus point for input position for the rejection) is enough to build a nice and large set of test cases.
Examples:
()
(a)
((((((((((a))))))))))
((((((((((a)))))))))
(a (a (a (a (a (a (b)))))))
(((((((b) a) a) a) a) a) a)
(((((((b a) a) a) a) a) a)
((a)(a)(a)(a))
((a)(a a)(a))
(())
(()())
((()())(()())(()()))
((()())()()(()()))
...

Resources