is there any support for transforming a Rascal AST into Famix meta model (from Moose technology)?
Rascal uses M3 meta-models that can, in principle, be easily converted to Famix (but you would have to write that mapping yourself).
There is M3 support for several languages (and the support is growing) so it also depends on the language you are interested in whether there is support for fact extraction from your source.
Related
I am planning to implement a meta language on top of Xtext. In other words, I am using the Xtext grammar to define my own meta language. This meta language can then be used to define a language (using the syntax that I defined). Using the defined language, a model can be created by the user.
Hence, I would like to use Xtext/Xtend as a generator for parser generators. This would enable me to add as many meta levels as I like. My understanding is, that Xtext itself is defined using Xtext, so this should be possible?
The problem is that I don't know how to approach this, as I am not an expert in Xtext or parser generator frameworks in general. Any solutions/approaches/hints are welcomed.
Update (more details and motivation)
Xtext can be used to generate anything, so I could write a generator based on Xtext that generates a parser. This could be done by specifying my meta language's grammar, using Xtext to generate a parser for that grammar, so I would have access to an AST that represents a model written in my meta language. However, from here on, I would be left alone to do whatever I want with the AST, e.g. generate a parser (because the AST represents the grammar of a user-defined language). But as Xtext has the specific ability to generate parsers, I was thinking of reusing this feature instead of implementing my own parser generator based on the AST of a grammar.
My motivation is the wish to define my own DSL grammar language (as a replacement for Xtext), while still being able to use the infrastructure provided by the Xtext project.
I came to the following solution:
A grammar that was written using my grammar language will be parsed by Xtext. Next, the resulting AST is transformed to the Xtext grammar language AST, which can be used as input for the existing parser generator.
In general, given some grammar language l1, a model written in this language will be parsed and the resulting AST will be transformed to the AST of the grammar language l2 that was used to specify l1. This step is repeated until we have an AST representing a model of the Xtext grammar language, which will be used to generate the new parser.
Naturally, any information added with the definition of a new grammar language will be lost in each transformation step. Therefore, the infrastructure that is developed around a grammar language has the responsibility to create some kind of functionality that makes this information available to a higher language developed using the grammar language.
For a different approach, see:
WWW.XTRAN-LLC.com/xtran.html#parse-gen
In a nutshell, I got tired of creating parsers for XTRAN, our Expert System whose rules language manipulates computer languages, data, and text, so I created a parsing engine that directly executes EBNF at parse time (as opposed to creating parsing code, e.g. Lexx/YACC and ANTLR). Since XTRAN must also render code content represented in its Internal Representation / AST (after it's manipulated) as source code text, I created a corresponding rendering engine that executes (a much simpler form of) EBNF at render time.
If I want to train the Stanford Neural Network Dependency Parser for another language, there is a need for a "treebankLanguagePack"(TLP) but the information about this TLP is very limited:
particularities of your treebank and the language it contains
If I have my "treebank" in another language that follows the same format as PTB, and my data is using CONLL format. The dependency format follows the "Universal Dependency" UD. Do I need this TLP?
As of the current CoreNLP release, the TreebankLanguagePack is used within the dependency parser only to 1) determine the input text encoding and 2) determine which tokens count as punctuation [1].
Your best bet for a quick solution, then, is probably to stick with the UD English TreebankLanguagePack. You should do this by specifying the property language as "UniversalEnglish" (whether you're accessing the dependency parser via code or command line). If you're using the dependency parser via the CoreNLP main entry point, this property key should be depparse.language.
Technical details
Two very subtle details follow. You probably don't need to worry about these if you're just trying to hack something together at first, but it's probably good to mention so that you can avoid apocalyptic / head-smashing bugs in the future.
Evaluation and punctuation: If you do choose to stick with UniversalEnglish, be aware that there is a hack in the evaluation code that overrides the punctuation set for English parsing in particular. Any changes you make to punctuation in PennTreebankLanguagePack (the TLP used for the UniversalEnglish language) will be ignored! If you need to get around this, it should be enough to copy and paste the PennTreebankLanguagePack into your own codebase and name it something different.
Potential memory leak: When building parse results to be returned to the user, the dependency parser draws from a pool of cached GrammaticalRelation objects. This cache does not live-update. This means that if you have relations which aren't formally defined in the language you specified via the language property, they will lead to the instantiation of a new object whenever those relations show up in parser predictions. (This can be a big deal memory-wise if you happen to store the parse objects somewhere.)
[1]: Punctuation is excluded during evaluation. This is a standard "cheat" used throughout the dependency parsing literature.
I have precise and validated descriptions of the behaviors of many X86 instructions in terms amenable to encoding in QF_ABV and solving directly with the standard solver (using no special solving strategies). I wrote an SMT-LIB script whose interface matches my ultimate goal perfectly:
X86State, a record sort describing x86 machine state (registers and flags as bitvectors, and memory as an array).
X86Instr, a record sort describing x86 instructions (enumerated mnemonics, operands as an ML-like discriminated union describing registers, memory expressions, etc.)
A function x86-translate taking an X86State and an X86Instr, and returning a new X86State. It decodes the X86Instr and produces a new X86State in terms of the symbolic effects of the given X86Instr on the input X86State.
It's great for prototyping: the user can write x86 easily and directly. After simplifying a formula built using the library, all functions and extraneous data types are eliminated, leaving a QF_ABV expression. I hoped that users could simply (set-logic QF_ABV) and #include my script (alas, neither the SMT-LIB standard nor Z3 support #include).
Unfortunately, by defining functions and types, the script requires theories such as uninterpreted functions, thus requiring a logic other than QF_ABV (or even QF_AUFBV due to the types). My experience with SMT solvers dictates that the lowest acceptable logic should be specified for best solving time. Also, it is unclear whether I can reuse my SMT-LIB script in a programmatic context (e.g. OCaml, Python, C) as I desire. Finally, the script is a bit verbose given the lack of higher-order functions, and my lack of access to par leading to code duplication.
Thus, despite having accomplished my technical goals, I think that SMT-LIB might be the wrong approach. Is there a more natural avenue for interacting with Z3 to implement my x86 instruction description / QF_ABV translation scheme? Is the SMT-LIB script re-usable at all in these avenues? For example, you can build "custom OCaml top-levels", i.e. interpreters with scripts "burned into them". Something like that could be nice. Or do I have to re-implement the functionality in another language, in a program that interacts with Z3 via a theory extension (C DLL)? What's the best option here?
Well, I don't think that people write .smt2 files by hand. These are usually generated automatically by some program.
I find the Z3 Python interface quite nice, so I guess you could give it a try. But you can always write a simple .smt2 dumper from any language.
BTW, do you plan releasing the specification you wrote for X86? I would be really interested!
How to create dependency graph (parse tree) for random sentences. Is there any predined grammer to parse english sentences using nltk.
Example:
I want to make a parse tree for the sentence
“A large company needs a sustainable business model.”
which should look like this.
Please suggest me how this can be done.
This question is a near-duplicate of 3125926. But I'll elaborate just a little on the answer given there.
I don't have personal experience with dependency parsing under NLTK, but according to the accepted answer, the integration with MaltParser is documented at http://nltk.googlecode.com/svn/trunk/doc/api/nltk.parse.malt.MaltParser-class.html
If for some reason MaltParser doesn't suit your needs, you might also take a look at MSTParser and the Stanford Parser. I think those three options are the best-known, and I expect one (or all) of them will work for you.
Note that the Stanford Parser includes routines to convert from constituency trees and between several of the standard dependency representations, so if you need a specific format, you might look at the format-conversion arguments to the edu.stanford.nlp.trees.EnglishGrammaticalStructure class.
e.g., to convert from constituency trees to basic dependencies:
java -cp stanford-parser.jar edu.stanford.nlp.trees.EnglishGrammaticalStructure -treeFile <input trees> -basic
In researching type inference differences between F# and OCaml I found they tended to focus on nominative vs. structural type system. Then I found Distinctive traits of functional programming languages which list typing and type inference as different traits.
Since the trait article says OCaml and F# both use Damas-Milner type inference which I thought was a standard algorithm, i.e. an algorithm that does not allow for variations, how do the two traits relate? Is it that Damas-Milner is the basis upon which both type inference systems are built but that they each modify Damas-Milner based on the typing?
Also I checked the F# source code for the words Damas, Milner and Hindley and found none. A search for the word inference turned up the code for type inference.
If so, are there any papers that discuss the details of each type inference algorithm for the particular language, or do I have to look at the source code for OCaml and F#.
EDIT
Here is a page that highlights some differences related to type inference between OCaml and F#.
Concerning your DM question, you are right. For both F# and OCaml, DM algorithm is just a pattern. Type checkers are extended to support custom features. In OCaml these features include objects with row types, poly variants, first-class modules. In F# - .NET type system interop (classes, interfaces, structs, subtyping, method overloads), units of measure. I think F# type inference is also skewed in a left-to-right fashion to allow more efficient interactive checking, therefore some code surprisingly needs annotations.
As far as type checking and inference goes, OCaml is more expressive and intuitive than F#. SML would be closer than either of them to a vanilla HM, but SML also has a few extensions for some operator polymorphism and record support.
I believe that when they talk about structural typing in OCaml, they are probably referring to the object system (the "O" part of "OCaml"). The non-object parts of OCaml are pretty standard ML type system; it's the object system that is unusual.
The object system in OCaml is very different from the .NET class-based object system in F#. In OCaml, you can create objects directly without using a class. And classes are basically a convenience function for creating objects. An object after creation (either created directly using a literal, or using a class) has no concept of its class.
Look at what happens when you write a function that takes an object and calls a particular method on it:
# let foo x = x#bar;;
val foo : < bar : 'a; .. > -> 'a = <fun>
The argument type is inferred to be an abstract type that includes a method named bar. So it can take any object with such a method and type.
That's what it means when they say that the object system is structurally-typed. The only things that matter about an object is its set of methods, which determines where it can be used. So compatibility is just based on the "structure" of methods. And not on any idea of "class".
Since the trait article says OCaml and F# both use Damas-Milner type inference which I thought was a standard algorithm, i.e. an algorithm that does not allow for variations, how do the two traits relate?
The Damas-Milner algorithm (also known as Algorithm W) can be extended and, indeed, all practically-relevant implementations of it have added many extensions including both OCaml and F#.
Is it that Damas-Milner is the basis upon which both type inference systems are built but that they each modify Damas-Milner based on the typing?
Exactly, yes. In particular, OCaml has a great many different experimental extensions to a Damas-Milner core including polymorphic variants, objects, first-class modules. F# is simpler but also has some extensions that OCaml does not have, most notably overloading (primarily operators).
I don't believe there are summary papers describing the whole type systems of either OCaml or F#. Indeed, I do not know of a paper that describes today's F# type system. For OCaml, you have many different papers each covering different aspects. I would start with Jacques Garrigue's own publications and then follow the references therein.