In my application I am using Stanford CoreNLP for parsing english text into a graph data structure (Universal Dependencies).
After some modifications of the graph I need to generate a natural language output for which I am using SimpleNLG: https://github.com/simplenlg/simplenlg
However SimpleNLG is using Phrase Grammar.
Therefore in order to successfully use SimpleNLG for natural language generation I need to convert from Universal Dependencies into Phrase Grammar.
What is the easiest way of achieving this?
So far I have only come across this article on this topic:
http://delivery.acm.org/10.1145/1080000/1072147/p14-xia.pdf?ip=86.52.161.138&id=1072147&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=642131329&CFTOKEN=21335001&acm=1468166339_844b802736ce07dab89064efb7f8ede9
I am hoping that someone might have some more practical code examples to share on this issue?
Phrase-structure trees contain more information than dependency trees and therefore you cannot deterministically convert dependency trees to phrase-structure trees.
But if you are using CoreNLP to parse the sentences, take a look at the parse annotator. Unlike the dependency parser, this parser also outputs phrase-structure trees, so you can use this annotator to directly parse your sentences to phrase-structure trees.
Related
In the process of learning about compilers a wrote a simple tokenizer and parser (recursive descent). The parser constructs an abstract syntax tree. Now I am going with semantic analysis. However I have a few questions about a construction of semantic analyser. Should I analyse the code semantically on the generated abstract syntax tree using recursive calls through the tree or maybe should I construct another tree (using a visitor pattern for example) for the purpose of semantic analysis. I found a document online which says that I should analyse the code semantically during the process of parsing, but it does not comply with a rule of single responsibility and makes the whole parser more prone to errors. Or maybe should I make semantic analysis a part of a intermediate representation generator? Maybe I am missing something, I would be grateful if someone could clarify this thing for me.
You are learning. Keep it simple; build a tree and run the semantic analyzer over the tree when parsing is completed.
If you decide (someday) to build a fast compiler, you might consider implementing some of that semantic analysis as you parse. This makes building both the parser and the semantic analyzer harder because they are now interacting (tangled is a better word, read about why most C++ parsers are implemented with a so-called "lexer hack" if you want to get ill). You'll also find that sometimes the information you need isn't available ("is the target of that goto defined so far?" so you usually can't do a complete job as parse runs, or you may have to delay some semantic processing for later in the parse and that's tricky to arrange. I don't recommend adding this kind of complexity early in your compiler education.
Start simple and focus on learning what semantic analysis is about.
You can optimize later when it is clear what you have to optimize and why you should do it.
I am trying to provide parsing on some biodiversity literature but I have my own NER tool that I have developed to identify species names. I need to plug this into the parser pipeline somehow to enhance the dependency parsing but I am not sure how to go about it and haven't been able to find anything that gives me an indication how to approach it.
My tagger is just a simple dictionary look up that runs in Python 3.6.
Any ideas?
I want to analyze a sentence using a context free grammar for NLP tasks.
I want to know which grammar parsers as Stanford Parser, Malt Parser,... would be great?
What are advantages and disadvantages about syntactic parsing and dependency parsing in those parsers?
How can they support library for programming languages as Java, PHP,...?
I recommend Stanford parser. It supports a powerful library for Java coding. You should check this site for more information about Stanford parser http://stanfordnlp.github.io/CoreNLP/. You also can run some demos at http://nlp.stanford.edu:8080/corenlp/
I'm writing a paper where I analyse different available tools for natural language parsing. I found out that two main strategies for parsing are top-down and bottom down.
I wonder which strategy is used in Stanford Parser?
I know that they used probabilistic approach there, but is not based on any of bottom-up or top-down?
As far as I know, it is a CYK parser (see here, Section 1), i.e. a bottom up parser.
I'm looking to write a recursive descent parser by hand and I'm looking for good resources on how to structure it, algorithms, etc.
There is a good tutorial on codeproject under "Compiler Patterns". Lately, you can even just Google "compiler patterns".
http://www.codeproject.com/Articles/286121/Compiler-Patterns
The article covers most aspects of building a simple compiler (the back-end, the BNF, and the patterns used to implement the various BNF rules), but is not very heavy on theory, or even on why a recursive descent compiler works to convert language input into code.
I can suggest "Crafting a Compiler" by Charles N. Fischer and Richard J. LeBlanc.
Edit. This is an updated version: http://www.amazon.com/Crafting-Compiler-Charles-N-Fischer/dp/0136067050/ref=sr_1_2?ie=UTF8&s=books&qid=1258514561&sr=8-2