Context free grammar conversion - normalization

can anyone let me know if there is any software to convert Chomsky normal form to Backus–Naur Form and vice versa?

Well, Chomsky Normal Form and Backus-Naur Form aren't really the same kind of concept, so I don't really think so. But if you told us what you needed this software for we might be able to help.
Now, going from what you asked, I'm assuming you want some kind of code to normalize a BNF grammar to Chomsky Normal Form. As far as I know, no such software exists, but it's possible some exists, assuming that it's a task that's actually computationally feasible.
But, if you can be more concrete about what you actually need we'll be able to give you useful advice about that task.
EDIT: After digging a bit in my books, it turns out that it's entriely possible to formulate an algorithm to convert an arbitrary CFG to Chomsky Normal Form. I don't have the actual algorithm or its complexity though.

Maybe JFLAP will do what you're looking for.
I've not used it yet myself, but it was recommended to me by my Automata professor.
Looking on the "What is JFLAP?" page, it appears that it can convert from "CFG -> CNF" which sounds like what you're looking to do.

As was pointed out, my original answer that BNF is already in CNF was wrong. You can convert any context-free grammar to CNF as explained here (PDF). The conversion process is also illustrated in Sipser 2nd Ed. pages 106-109 if you have access to it. I don't know if there's existing software that automates this process (but it would seem simple to write).

Related

Writing a Parser for mixed Languages

I am trying to write a Parser that can analyze mixed Languages and generate an AST of it. I first tried to build it from scratch on my own in Java and failed, because this is quite a hard topic for a Parser-beginner. Then i googled and found http://www2.cs.tum.edu/projects/cup/examples.php and JFlex.
The question now is: What is the best way to do it?
For example i have a Codefile, that contains several Tags, JS Code, and some $CMS_SET(x,y)$ Code. Is the best way to solve this to define a grammar for all those things in CUP and let CUP generate a Parser based on my grammer that can analyze those mixed Language files and generate and AST Tree of it?
Thanks for all helpful answers. :)
EDIT: I need to do it in Java...
This topic is quite hard even for an expert in this area, which I consider myself to be; check my bio.
The first issue is to build individual parsers for each sublanguage. The first thing you will discover is that defining parsers for specific languages is actually hard; you can read the endless list of SO requests for "can I get a parser for X" or "how do I fix my parser for X". Mostly I think these requests end up not going anywhere; the parsing engines aren't very good, you have to twist the grammars and the parsers to make them work on real languages, there isn't any such thing as "pure HTML", the standards documents disagree and your customer always has some twist in his code that you're not prepared for. Finally there's the glitches related to character set encodings, variations in newline endings, and preprocessors to complicate the parsing parsing problem. The C++ preprocessor is a lot more complex than you might think, and you have to have this right. The easiest way to defeat this problem is to find some parser generator with the languages already predefined. ANTLR has a bunch for the now deprecated ANTLR3; there's no assurance these parsers are robust let alone compatible for your purposes.
CUP isn't a particularly helpful parser generator; none of the LL(x) or LALR(x) parser generators are really helpful, because no real langauge matches the categories of things they can parse. The consequence: an endless stream of requests (on SO!) for help "resolving my shift-reduce conflict", or "eliminating right recursion". The only parser generator IMHO that has stood the test of time is a GLR parser generator (I hear good things about GLL but that's pretty recent). We've done 40+ languages with one GLR parser generator, including production IBM COBOL, full C++14 and Java8.
You second problem will be building ASTs. You can hand code the AST building process, but that gets old fast when you have to change the grammars often and/or you have many grammars as you are effectively contemplating. This you can beat your way throught with sweat. (We chose to push the problem of building ASTs into the parser so we didn't have to put any energy this in building a grammar; to do this, your parser engine has to offer you this help and none of the mainstream ones do.).
Now you need to compose parsers. You need to have one invoke the other as the need comes up; of course your chosen parser isn't designed to do this so you'll have to twist it. The first hard part is provide a parser with clues that a sublanguage is coming up in the input stream, and for it to hand off that parsing to the sublangauge parser, and get it to pass a tree back to be incorporated in the parent parser's tree, presumably with some kind of marker so you can tell where the transitions between different sublangauges are in the tree. You can often do this by hacking one language's lexer, when it sees the clue, to invoke the other; but then what you do with the tree it returns? There's no way to give that tree to the current parser and say "integrate this". You get around this by modifying the parsing machinery in arcane ways.
But all of the above isn't where the problem is.
Parsing is inconvenient, but only a small part of what you need to analyze your programs in any interesting way; you need symbol tables, control and data flow analysis, maybe points-to analyses, and the engineering on these will swamp the work listed above. See my essay on "Life After Parsing" (google or via my bio) for a long discussion of what else you need.
In short, I think you are biting off an enormous task in just "parsing", and you haven't even told us what you intend to do with the result. You are welcome to start down this path, but very few people have succeeded; my team spent over a 50 man years of PhD level engineering to get where we are, and we are hardly done.
Java won't make the solution any easier or harder; the langauge in which you solve all of the above is irrelevant.

Can F# be refactored into a pointfree style?

In researching a topic related to programming I came across a pointfree refactoring tool for Haskell in the lambdabot and was wondering if F# can be refactored into a pointfree style?
I am not advocating the use of pointfree style, but see it as a means to better comprehend a function.
Note: pad answered an earlier version of this question, but I reworded this question as the answer is of value to others learning and using F# and I did not want this to be deleted because of some close votes.
Note: Just because I changed the question, don't take the answer to mean that one can not code in a point free style using F#. It can be done in many cases but there are restrictions you have to follow.
Short answer
No.
Long answer
There are a few things in F# that make such a tool impractical. (1) Due to .NET interop, F# code often has side effects and automatic code transformation becomes really difficult when side effects come in play. It's not the case with Haskell; equational reasoning is much easier in Haskell and you can rewrite left-hand sides by right-hand sides without altering their evaluations. (2) Point-free programming in F# is limited by value restriction. I'm not sure you can do code transformation aggressively without hitting this issue.
I think it's more practical to assume that F# code is pure and value restriction doesn't occur in particular cases in order that we can give users some hints. Users can apply suggestions discretely after evaluating that the suggestions are actually correct. It's closer to HLint approach than the one you referred to. FSharpLint has added some linting rules in this direction.

Software to identify patterns in text files

I work on some software that parses large text files and inserts data into a database. Every time we get a new client, we have to write new parsing code for their text files.
I'm looking for some software to help simplify analyzing the text files. It would be nice to have some software that could identify patterns in the file.
I'm also open to any general purpose parsing libraries (.NET) that may simplify the job. Or any other relevant software.
Thanks.
More Specific
I open a text file with some magic software that shows me repeating patterns that it has identified. Really I'm just looking for any tools that developers have used to help them parse files. If something has helped you do this, please tell me about it.
Well, likely not exactly what you are looking for, but clone detection might be the right kind of idea.
There are a variety of such detectors. Some work only one raw lines of text, and that might apply directly to you.
Some work only on the works ("tokens") that make up the text, for some definition of "token".
You'd have to define what you mean by tokens to such tools.
But you seem to want something that discovers the structure of the text and then looks for repeating blocks with some parametric variation. I think this is really hard to do, unless you know sort of what that structure is in advance.
Our CloneDR does this for programming language source code, where the "known structure" is that of the programming language itself, as described specifically by the BNF grammar rules.
You probably don't want to Java-biased duplicate detection on semi-structured text. But if you do know something about the structure of the documents, you could write that down as a grammar, and our CloneDR tool would then pick it up.

Tool/Application to calculate first and follow sets

I am currently working on a parser and it seems that I have made a few mistakes druing the
follow set calculation. So I was wondering if someone know a good tool to calculate follow and first sets so I could skip/reevaluate this error prone part of the parser construction.
Take a look at http://hackingoff.com/compilers/predict-first-follow-set
It's an awesome tool to compute first and follow sets in a grammar. also, you can check your answer with this visualization tools:
http://smlweb.cpsc.ucalgary.ca/start.html
I found my mistake by comparing my first/follow-sets with the one generated by this web-app
Most parser generators that I've encountered don't have obvious means to dump this information, let alone dump it in a readable way. (I built one that does, for the reason you are suggesting, but it isn't available by itself and I doubt you want the rest of the baggage).
If your parser definition doesn't work, you mostly don't need to know these things to debug it. Staring at the rules amazingly enough helps; it also helps to build the two smallest grammar instances you can think of, one being something you expect to be accepted, and the other being a slight variant that should be rejected.
In spite of having a parser generator that will dump this information, I rarely resort to using it to debug grammars, and I've built 20-30 pretty big grammars with it.

F# parsing Abstract Syntax Trees

What is the best way to use F# to parse an AST to build an interpreter? There are plenty of F# examples for trivial syntax (basic arithmatical operations) but I can't seem to find anything for languages with much larger ranges of features.
Discriminated unions look to be incrediably useful but how would you go about constructing one with large number of options? Is it better to define the types (say addition, subtraction, conditionals, control flow) elsewhere and just bring them together as predefined types in the union?
Or have I missed some far more effective means of writing interpreters? Is having an eval function for each type more effective, or perhaps using monads?
Thanks in advance
Discriminated unions look to be
incrediably useful but how would you
go about constructing one with large
number of options? Is it better to
define the types (say addition,
subtraction, conditionals, control
flow) elsewhere and just bring them
together as predefined types in the
union?
I am not sure what you are asking here; even with a large number of options, DUs still are simple to define. See e.g. this blog entry for a tiny language's DU structure (as well as a more general discussion about writing tree transforms). It's fine to have a DU with many more cases, and common in compilers/interpreters to use such a representation.
As for parsing, I prefer monadic parser combinators; check out FParsec or see this old blog entry. After using such parser combinators, I can never go back to anything like lex/yacc/ANTLR - external DSLs seem so primitive in comparison.
(EDIT: The 'tiny arithmetic examples' you have found are probably pretty much representative of what larger solutions looks like as well. The 'toy' examples usually show off the right architecture.)
You should take a copy of Robert Pickering's "Beginning F#".
The chapter 13, "Parsing Text", contains an example with FsLex and FsYacc, as suggested by Noldorin.
Other than that, in the same book, chapter 12, the author explains how to build an actual simple compiler for an arithmetic language he proposes. Very enlightening. The most important part is what you are looking for: the AST parser.
Good luck.
I second Brian's suggestion to take a look at FParsec. If you're interested in doing things the old-school way with FsLex and FsYacc, one place to look for how to parse a non-trivial language is the F# source itself. See the source\fsharp\FSharp.Compiler directory in the distribution.
You may be interesting in checking out the Lexing and Parsing section of the F# WikiBook. The F# PowerPack library contains the FsLex and FsYacc tools, which asssist greatly with this. The WikiBook guide is a good way to get started with this.
Beyond that, you will need to think about how you actually want to execute the code from the AST form, which is common the design of both compilers and interpreters. This is generally considered the easier part however, and there are lots of general resources on compilers/interpreter out there that should provide information on this.
I haven't done an interpreter myself. Hope the following helps:)
Here's a compiler course taught at Yale using ML, which you might find useful. The lecture notes are very concise (short) and informative. You can follow the first few lecture notes and the assignments. As you know F#, you won't have problem reading ML programs.
Btw, the professor was a student of A. Appel, who is the creator of SML implementation. So from these notes, you also get the most natural way to write a compiler/interpreter in ML family language.
This is an excellent example of a complete Small Basic implementation with F# and FParsec. It includes even IL compiler. The whole code is very accessible and is accompanied by a series of blog post from the author at http://trelford.com/blog/

Resources