How to generate code samples from a grammar? - parsing

I have access to a grammar, and want to use it to generate code samples that follow the grammar.
Is there some existing tool that does this? If not, what is the right approach?

You could try mutt or some of the tools listed here might do what you are looking for.
If you wanted to implement a text generator yourself, you could randomly select a possible expansion of the current symbol and go down the path until you cannot any more or reached a limit, then try to select a terminator symbol if available

Related

How can I detect dead productions from an ANTLR4 grammar?

I have an ANTLR4 grammar containing a large number of productions I don't want to use. I'd like to clean them out of the grammar file. ANTLR4 doesn't seem to allow you to specify a "goal" symbol, but if it could, I'd like to identify and remove any productions that aren't reachable from that goal symbol.
Is there a way to identify these kinds of unused productions so I can remove them from the grammar file?
There’s no such functionality in ANTLR itself. However, the ANTLR plugin for IntelliJ gives a warning when productions aren’t used:
Use Visual Studio Code along with my antlr4-vscode extension and enable code lenses (preferences: antlr4.referencesCodeLens.enabled). It will give you a reference count for each rule:
Or you can directly run AntlrLanguageSupport.countReferences(fileName, symbol) from the underlying antlr4-graps library in a node shell. More API details in the API doc file.

Transform a tex source so that all macros are replaced by their definition

Is it possible to see the output of the TeX ‘pre-processor’, i. e. the intermediate step before the actual output is done but with all user-defined macros replaced and only a subset of TeX primitives left?
Or is there no such intermediate step?
Write
\edef\xxx{Any text with any commands. For example, $\phantom x$.}
And then for output in the log-file
\show\xxx
or for output in your document
\meaning\xxx
There is no "pre-processor" in TeX. The replacement text for any control sequence at any stage can vary (this is used for a lot of things!). For example
\def\demo{\def\demo{cde}}
\demo
will first define \demo in one way and then change it. In the same way, you can redirect TeX primitives. For example, the LaTeX kernel moves \input to an internal position and alters it. A simplified version:
\let\##input\input
\def\input#1{\##input#1 }
Try the Selective Macro Expander.
TeX has a lot of difference tracing tools built in, including tracing macro expansion. This only traces live macros as they are actually expanded, but it's still quite useful. Full details in The TeXbook and probably elsewhere.
When I'm trying to debug a macro problem I generally just use the big hammer:
\tracingall\tracingonline
then I dig in the output or the .log file for what I want to know.
There's a lot of discussion of this issue on this question at tex.SE, and this question. But I'll take the opportunity to note that the best answer (IMO) is to use the de-macro program, which is a python script that comes with TeXLive. It's quite capable, and can handle arguments as well as simple replacements.
To use it, you move the macros that you want expanded into a <something>-private.sty file, and include it into your document with \usepackage{<something>-private}, then run de-macro <mydocument>. It spits out <mydocument>-clean.tex, which is the same as your original, but with your private macros replaced by their more basic things.

Define every symbol as a command in LaTeX

I'm working on a large project involving multiple documents typeset in LaTeX. I want to be consistent in my use of symbols, so it might be a nice idea to define a command for every symbol that has a specific meaning throughout the project. Does anyone have any experience with this? Are there issues I should pay attention to?
A little more specific. Say that, throughout the document I would denote something called permability by a script P, would it be an idea to define
\providecommand{\permeability}{\mathscr{P}}
or would this be more like the case "defining a command for $n$"?
A few tips:
Using \providecommand will define that command only if it's not been previously defined. So if you're not getting the results you expected, you may be trying to define a command that's been defined elsewhere.
If you wrap the math in your commands with \ensuremath, it will do the right thing regardless of whether you're in math mode when you issue the command:
\providecommand{\permeability}{\ensuremath{\mathscr{P}}}
Now I can easily use \permeability in text or $\permeability$ in math mode.
Using your own commands allows you to easily change the typographical representation of something later. For instance:
\newcommand{\vect}[1]{\ensuremath{\mathbf{#1}}}
would print \vect{x} as a boldfaced x. If you later decide you prefer arrows above your vectors, you could change the command to:
\newcommand{\vect}[1]{\ensuremath{\vec{#1}}}
I have been doing this for anything that has a specific meaning and is longer than a single symbol, mostly to save typing:
\newcommand{\objId}{\mbox{$\mathit{objId}$}\xspace}
\newcommand{\insOp}[1]{#1\mbox{$^+$}\xspace}
\newcommand{\delOp}[1]{#1\mbox{$^-$}\xspace}
However then I noticed that I stopped making inconsistency errors (objId vs ObjId vs ObjID), so I agree that it is a nice idea.
However I am not sure if it is a good idea in case symbols in the output are, well, single Latin symbols, as in:
\newcommand{\numOfObjs}{$n$}
It is too easy to type a single symbol and forget about it even though a command was defined for it.
EDIT: using your example IMHO it'd be a good idea to define \permeability because it is more than a single P that you have to type in without the command. But it's a close call.

Is it possible to write own "packages" for LaTeX?

As a programmer, I wonder if I could create my own package for LaTeX. I need something like that famous "listings" package, but something that is much more capable for my needs. I look for a listings solution that would watch out for a comment line like
// BEGIN LISTING 3122
// END LISTING 3122
No syntax highlighting, but intelligent support for tab indents. That package then would be used with a file name or path, walk through the lines and copy out just the snippets of interest.
I am 100% sure there is absolutely nothing like this on the market. So I want to program it for LaTeX. If that's possible. I have no idea how and what programming language / IDE. Where would I start looking?
This is certainly possible, but it is non-trivial in the TeX programming language. I don't have time to code it up at the moment but here's an algorithm; I suggest asking on comp.text.tex for more specific LaTeX programming advice.
Locally set all catcodes of special chars to "other" (\dospecials) and start reading in the input file line by line (\read)
for each line compare the first however many tokens of the line (some iterative use of \if or \ifx; there might be a package to make this easier such as stringstrings or xstring)
in the default state throw away the input line and read the next
unless it's // BEGIN LISTING, in which case save each line into a macro (something like \g#addto#macro)
until it's // END LISTING, obviously
keep going until the end of the file (\ifeof)
TeX by Topic is a good reference guide for this sort of work.
The rather simple texments package shows how code can be piped into pdflatex: by writing your shell-invocable filter, you should be able to do something similar with your idea.
I'm pretty certain you can't do this in LaTeX. Basically you can go nuts with anything that's either a command (\foo) or an environment (\begin{foo} ... \end{foo}) but not in the way you are describing here. Within environments or commands it is possible to turn off LaTeX's processing and handle everything yourself in some way. This is how verbatim and listings work. It's not very pretty though.
Basically, I think it might be possible, if you make ‘/’ an active character (there is \makeactive for example but I imagine there are more solutions) and then invent some good magic around it. (You will need to emulate/create an environment with your logic.) Similar things are done in some of the internationalisation packages in order to ease the input of letters with diacritics.
For a character like ‘/’ this might be even harder as this one could have been written in other places of your text, too. So of course you’d have to take special care for that.

Will ANTLR Help? Different Suggestion?

Before I dive into ANTLR (because it is apparently not for the faint of heart), I just want to make sure I have made the right decision regarding its usage.
I want to create a grammar that will parse in a text file with predefined tags so that I can populate values within my application. (The text file is generated by another application.) So, essentially, I want to be able to parse something like this:
Name: TheFileName
Values: 5 3 1 6 1 3
Other Values: 5 3 1 5 1
In my application, TheFileName is stored as a String, and both sets of values are stored to an array. (This is just a sample, the file is much more complicated.) Anyway, am I at least going down the right path with ANTLR? Any other suggestions?
Edit
The files are created by the user and they define the areas via tags. So, it might look something like this.
Name: <string>TheFileName</string>
Values: <array>5 3 1 6 1 3</array>
Important Value: <double>3.45</double>
Something along those lines.
The basic question is how is the file more complicated? Is it basically more of the same, with a tag, a colon and one or more values, or is the basic structure of the other lines more complex? If it's basically just more of the same, code to recognize and read the data is pretty trivial, and a parser generator isn't likely to gain much. If the other lines have substantially different structure, it'll depend primarily on how they differ.
Edit: Based on what you've added, I'd go one (tiny) step further, and format your file as XML. You can then use existing XML parsers (and such) to read the files, extract data, verify that they fit a specified format, etc.
It depends on what control you have over the format of the file you are parsing. If you have no control then a parser-generator such as ANTLR may be valuable. (We do this ourselves for FORTRAN output files over which we have no control). It's quite a bit of work but we have now mastered the basic ANTLR lexer/parser strategy and it's starting to work well.
If, however, you have some or complete control over the format then create it with as much markup as necessary. I would always create such a file in XML as there are so many tools for processing it (not only the parsing, but also XPath, databases, etc.) In general we use ANTLR to parse semi-structured information into XML.
If you don't need for the format to be custom-built, then you should look into using an existing format such as JSON or XML, for which there are parsers available.
Even if you do need a custom format, you may be better off designing one that is dirt simple so that you don't need a full-blown grammar to parse it. Designing your own scripting grammar from scratch and doing a good job of it is a lot of work.
Writing grammar parsers can also be really fun, so if you're curious then you should go for it. But I don't recommend carelessly mixing learning exercises with practical work code.
Well, if it's "much more complicated", then, yes, a parser generator would be helpful. But, since you don't show the actual format of your file, how could anybody know what might be the right tool for the job?
I use the free GOLD Parser Builder, which is incredibly easy to use, and can generate the parser itself in many different languages. There are samples for parsing such expressions also.
If the format of the file is up to the user can you even define a grammar for it?
Seems like you just want a lexer at best. Using ANTLR just for the lexer part is possible, but would seem like overkill.

Resources