In agda there's a module Data.Nat.Properties. It contains a lot of useful facts, which are hidden inside of records, for example, isCommutativeSemiring. How can I extract, for example * associativity and use it?
Open the modules in question. For example:
open import Algebra
open import Data.Nat.Properties
open CommutativeSemiring commutativeSemiring
-- now you can use *-assoc, *-comm, etc.
If you want to browse the contents of a module, try the C-c C-o key combination, since the recursive opening and re-exporting of algebraic structures makes it hard to see what's available.
Related
I'm studying how dependent pattern matching works in agda.
If I can see elaborated core terms(https://github.com/agda/agda/blob/master/src/full/Agda/Syntax/Internal.hs#L202) of arbitrary source code of .agda file,
it will be really helpful for me.
However, agda cli seems not to offer any options for this usage. Is there any?
There's three options you could try depending on how much detail you want, though none of them are perfect:
If all you want is to see what implicit arguments Agda has inserted, you can enable the flags --show-implicit and --show-irrelevant, create a new hole with the term you want to inspect by adding _ = {! yourTerm !} at the bottom of the file, reload the file with C-c C-l, and then press C-u C-c C-m with the cursor inside the hole. [Writing this out made me realize there ought to be a simpler way to do this.]
If you want to inspect and possibly manipulate the full AST of an Agda term, you can do so using the reflection API (https://agda.readthedocs.io/en/v2.6.2.1/language/reflection.html). In particular, you can get the reflected syntax of an arbitrary Agda term by using the quoteTerm primitive.
Finally, if you need more information you can look in the source code of Agda itself and enable the debug flags for printing the information you want. Note that there is no guarantee that this debug information will be useful or even readable, as it is intended for use by the developers. With that being said, you could for example print the case tree generated from a definition by pattern matching by adding {-# OPTIONS -v tc.cc:12 #-} at the top of your file. In Emacs, this debug information will end up in a separate buffer titled *Agda debug* (which you'll have to open manually after loading the .agda file).
I am new to Haskell, and I have been trying to write a JSON parser using Parsec as an exercise. This has mostly been going well, I am able to parse lists and objects with relatively little code which is also readable (great!). However, for JSON I also need to parse primitives like
Integers (possibly signed)
Floats (possibly using scientific notation such as "3.4e-8")
Strings with e.g. escaped quotes
I was hoping to find ready to use parsers for things like these as part of Parsec. The closest I get is the Parsec.Tokens module (defines integer and friends), but those parsers require a "language definition" that seems way beyond what I should have to make to parse something as simple as JSON -- it appears to be designed for programming languages.
So my questions are:
Are the functions in Parsec.Token the right way to go here? If so, how to make a suitable language definition?
Are "primitive" parsers for integers etc defined somewhere else? Maybe in another package?
Am I supposed to write these kinds of low-level parsers myself? I can see myself reusing them frequently... (obscure scientific data formats etc.)
I have noticed that a question on this site says Megaparsec has these primitives included [1], but I suppose these cannot be used with parsec.
Related questions:
How do I get Parsec to let me call `read` :: Int?
How to parse an Integer with parsec
Are the functions in Parsec.Token the right way to go here?
Yes, they are. If you don't care about the minutiae specified by a language definition (i.e. you don't plan to use the parsers which depend on them, such as identifier or reserved), just use emptyDef as a default:
import Text.Parsec
import qualified Text.Parsec.Token as P
import Text.Parsec.Language (emptyDef)
lexer = P.makeTokenParser emptyDef
integer = P.integer lexer
As you noted, this feels unnecesarily clunky for your use case. It is worth mentioning that megaparsec (cf. Alec's suggestion) provides a corresponding integer parser without the ceremony. (The flip side is that megaparsec doesn't try to bake in support for e.g. reserved words, but that isn't difficult to implement in the cases you actually need it.)
If I want to train the Stanford Neural Network Dependency Parser for another language, there is a need for a "treebankLanguagePack"(TLP) but the information about this TLP is very limited:
particularities of your treebank and the language it contains
If I have my "treebank" in another language that follows the same format as PTB, and my data is using CONLL format. The dependency format follows the "Universal Dependency" UD. Do I need this TLP?
As of the current CoreNLP release, the TreebankLanguagePack is used within the dependency parser only to 1) determine the input text encoding and 2) determine which tokens count as punctuation [1].
Your best bet for a quick solution, then, is probably to stick with the UD English TreebankLanguagePack. You should do this by specifying the property language as "UniversalEnglish" (whether you're accessing the dependency parser via code or command line). If you're using the dependency parser via the CoreNLP main entry point, this property key should be depparse.language.
Technical details
Two very subtle details follow. You probably don't need to worry about these if you're just trying to hack something together at first, but it's probably good to mention so that you can avoid apocalyptic / head-smashing bugs in the future.
Evaluation and punctuation: If you do choose to stick with UniversalEnglish, be aware that there is a hack in the evaluation code that overrides the punctuation set for English parsing in particular. Any changes you make to punctuation in PennTreebankLanguagePack (the TLP used for the UniversalEnglish language) will be ignored! If you need to get around this, it should be enough to copy and paste the PennTreebankLanguagePack into your own codebase and name it something different.
Potential memory leak: When building parse results to be returned to the user, the dependency parser draws from a pool of cached GrammaticalRelation objects. This cache does not live-update. This means that if you have relations which aren't formally defined in the language you specified via the language property, they will lead to the instantiation of a new object whenever those relations show up in parser predictions. (This can be a big deal memory-wise if you happen to store the parse objects somewhere.)
[1]: Punctuation is excluded during evaluation. This is a standard "cheat" used throughout the dependency parsing literature.
As part of a program which dynamically loads user inputted strings as Haskell source code, I want to do some pre-processing on the user's input before compiling it.
One of the things I would like to be able to do is to search the source for particular function occurrences and add an extra argument to them. So, for example , I might want all occurrences of :
addThreeNumbers 3 5
To become:
addThreeNumbers 3 5 10
What is the best way of accomplishing such behavior? Is it complicated enough to warrant manipulating some sort of abstract syntax tree with functions in the GHC API / Template Haskell? Or is this something simple that can be accomplished with some sort of Haskell pre-processing / parsing library? If so, what libraries and resources would you recommend?
Ghc 7.6 qualified imports, ghc-pkg hide, and ghc's -package option allow you to seamlessly add a layer between the importing file and the imported file.
Example:
Create a package with your own Data.Char, with standard .cabal file and cabal install.
{-# language PackageImports #-}
module Data.Char (
toUpper
, Char
, String
-- ... Export every else from "Base" Data char because the limitation of
-- the current export facility you can not use
-- module Data.Char hiding (toUpper)
) where
import "base" Data.Char hiding (toUpper)
import qualified "base" Data.Char as OldChar
toUpper :: Char -> IO Char
toUpper c = do
print "Oh Yeahhhhhhhhh"
return $ OldChar.toUpper c
Hide the base package ghc-pkg hide base -- this hides many modules in this case an you need to wrap all of them if you need them.
> ghci -XNoImplicitPrelude -- We need language flag because the Prelude is in
-- base and I did not make a wrapped Prelude
ghci> import Data.Char
ghci> toUpper 'c' -- The wrapped function
"Oh Yeahhhhhhhhh"
'C'
ghci> isSpace ' ' -- The unwrapped normal Data.Char function
True
Now you can use template Haskell to wrap your functions and call any IO action you need to get external information. The Users do not even need to change any of their function calls or module imports with some variation of adding 'internal' to their name.
Being able to wrap module interfaces seamlessly also means you can change the implantation of a imported module without touching the package/module code or the existing code base you are working with either; you only have to make a middle layer.
Edit response to question:
Sure you can the ghc-api lets you do all of that, but is considerably more complex, fewer examples then I would like are floating around and I seem to see more people having a hard time with it then success stories.
For evaluation of code hint
pluggins is suggested for dynamic loading of modules
haskell-src-ext suggested for parsing and changing code. This is what stylish-haskell uses to do small modification to code and is your best bet. It reportedly covers most(all?) of Haskell 2010, and many but not all GHC extensions and is probably your best bet if you do not like the first solution I provided.
The GHC-API is the only one fully compatibly with GHC compatible code as far as I know but is considerably more complex, less well documented, and more likely to change from GHC version to GHC version, or at least there is no promise it will be the same, from my limited experience. I suggested putting a module in the middle because it seemed like the quickest to get working with good test coverage, took the least amount of new knowledge and fulfilled the requirements that I picked out of your question.
I'm writing a program where I need to parse a JavaScript source file, extract some facts, and insert/replace portions of the code. A simplified description of the sorts of things I'd need to do is, given this code:
foo(['a', 'b', 'c']);
Extract 'a', 'b', and 'c' and rewrite the code as:
foo('bar', [0, 1, 2]);
I am using ANTLR for my parsing needs, producing C# 3 code. Somebody else had already contributed a JavaScript grammar. The parsing of the source code is working.
The problem I'm encountering is figuring out how to actually properly analyze and modify the source file. Each approach that I try to take in actually solving the problem leads me to a dead end. I can't help but think that I'm not using the tool as it's intended or am just too much of a novice when it comes to dealing with ASTs.
My first approach was to parse using a TokenRewriteStream and implement the EnterRule_* partial methods for the rules I'm interested in. While this seems to make modifying the token stream pretty easy, there is not enough contextual information for my analysis. It seems that all I have access to is a flat stream of tokens, which doesn't tell me enough about the entire structure of code. For example, to detect whether the foo function is being called, simply looking at the first token wouldn't work because that would also falsely match:
a.b.foo();
To allow me to do more sophisticated code analysis, my second approach was to modify the grammar with rewrite rules to produce more of a tree. Now, the first sample code block produces this:
Program
CallExpression
Identifier('foo')
ArgumentList
ArrayLiteral
StringLiteral('a')
StringLiteral('b')
StringLiteral('c')
This is working great for analyzing the code. However, now I am unable to easily rewrite the code. Sure, I could modify the tree structure to represent the code I want, but I can't use this to output source code. I had hoped that the token associated with each node would at least give me enough information to know where in the original text I would need to make the modifications, but all I get are token indexes or line/column numbers. To use the line and column numbers, I would have to make an awkward second pass through the source code.
I suspect I'm missing something in understanding how to properly use ANTLR to do what I need. Is there a more proper way for me to solve this problem?
What you are trying to do is called program transformation, that is, the automated generation of one program from another. What you are doing "wrong" is assuming is parser is all you need, and discovering that it isn't and that you have to fill in the gap.
Tools that do that this well have parsers (to build ASTs), means to modify the ASTs (both procedural and pattern directed), and prettyprinters which convert the (modified) AST back into legal source code. You seem to be struggling with the the fact that ANTLR doesn't come with prettyprinters; that's not part of its philosophy; ANTLR is a (fine) parser-generator. Other answers have suggested using ANTLR's "string templates", which are not by themselves prettyprinters, but can be used to implement one, at the price of implementing one. This harder to do than it looks; see my SO answer on compiling an AST back to source code.
The real issue here is the widely made but false assumption that "if I have a parser, I'm well on my way to building complex program analysis and transformation tools." See my essay on Life After Parsing for a long discussion of this; basically, you need a lot more tooling that "just" a parser to do this, unless you want to rebuild a significant fraction of the infrastructure by yourself instead of getting on with your task. Other useful features of practical program transformation systems include typically source-to-source transformations, which considerably simplify the problem of finding and replacing complex patterns in trees.
For instance, if you had source-to-source transformation capabilities (of our tool, the DMS Software Reengineering Toolkit, you'd be able to write parts of your example code changes using these DMS transforms:
domain ECMAScript.
tag replace; -- says this is a special kind of temporary tree
rule barize(function_name:IDENTIFIER,list:expression_list,b:body):
expression->expression
= " \function_name ( '[' \list ']' ) "
-> "\function_name( \firstarg\(\function_name\), \replace\(\list\))";
rule replace_unit_list(s:character_literal):
expression_list -> expression_list
replace(s) -> compute_index_for(s);
rule replace_long_list(s:character_list, list:expression_list):
expression_list -> expression_list
"\replace\(\s\,\list)-> "compute_index_for\(\s\),\list";
with rule-external "meta" procedures "first_arg" (which knows how to compute "bar" given the identifier "foo" [I'm guessing you want to do this), and "compute_index_for" which given a string literals, knows what integer to replace it with.
Individual rewrite rules have parameter lists "(....)" in which slots representing subtrees are named, a left-hand side acting as a pattern to match, and an right hand side acting as replacement, both usually quoted in metaquotes " which seperates rewrite-rule language text from target-language (e.g. JavaScript) text. There's lots of meta-escapes ** found inside the metaquotes which indicate a special rewrite-rule-language item. Typically these are parameter names, and represent whatever type of name tree the parameter represents, or represent an external meta procedure call (such as first_arg; you'll note the its argument list ( , ) is metaquoted!), or finally, a "tag" such as "replace", which is a peculiar kind of tree that represent future intent to do more transformations.
This particular set of rules works by replacing a candidate function call by the barized version, with the additional intent "replace" to transform the list. The other two transformations realize the intent by transforming "replace" away by processing elements of the list one at a time, and pushing the replace further down the list until it finally falls off the end and the replacement is done. (This is the transformational equivalent of a loop).
Your specific example may vary somewhat since you really weren't precise about the details.
Having applied these rules to modify the parsed tree, DMS can then trivially prettyprint the result (the default behavior in some configurations is "parse to AST, apply rules until exhaustion, prettyprint AST" because this is handy).
You can see a complete process of "define language", "define rewrite rules", "apply rules and prettyprint" at (High School) Algebra as a DMS domain.
Other program transformation systems include TXL and Stratego. We imagine DMS as the industrial strength version of these, in which we have built all that infrastructure including many standard language parsers and prettyprinters.
So it's turning out that I can actually use a rewriting tree grammar and insert/replace tokens using a TokenRewriteStream. Plus, it's actually really easy to do. My code resembles the following:
var charStream = new ANTLRInputStream(stream);
var lexer = new JavaScriptLexer(charStream);
var tokenStream = new TokenRewriteStream(lexer);
var parser = new JavaScriptParser(tokenStream);
var program = parser.program().Tree as Program;
var dependencies = new List<IModule>();
var functionCall = (
from callExpression in program.Children.OfType<CallExpression>()
where callExpression.Children[0].Text == "foo"
select callExpression
).Single();
var argList = functionCall.Children[1] as ArgumentList;
var array = argList.Children[0] as ArrayLiteral;
tokenStream.InsertAfter(argList.Token.TokenIndex, "'bar', ");
for (var i = 0; i < array.Children.Count(); i++)
{
tokenStream.Replace(
(array.Children[i] as StringLiteral).Token.TokenIndex,
i.ToString());
}
var rewrittenCode = tokenStream.ToString();
Have you looked at the string template library. It is by the same person who wrote ANTLR and they are intended to work together. It sounds like it would suit do what your looking for ie. output matched grammar rules as formatted text.
Here is an article on translation via ANTLR