Error "subscript not supported on set[Declaration]" for list comprehension in Rascal - rascal

I do not understand why I get the error I'm currently getting in Rascal.
|cwd:///loader.rsc|(391,1,<19,33>,<19,34>): subscript not supported on set[Declaration] at |cwd:///loader.rsc|(391,1,<19,33>,<19,34>)
Advice: |http://tutor.rascal-mpl.org/Errors/Static/UnsupportedOperation/UnsupportedOperation.html|
I get this on the following list comprehension:
{asts[astIndexes[i]] | int i <- [0 .. size(astIndexes)]}
If needed, this is the entire file:
module loader
import IO;
import Set;
import List;
import lang::java::m3::Core;
import lang::java::m3::AST;
import String;
set[Declaration] asts = {};
void getAsts(list[loc] partialScanList){
asts = {};
for (loc m <- partialScanList)
asts += createAstFromFile(m, true);
}
void scanMetric(void (set[Declaration]) metricFunction, list[int] astIndexes){
metricFunction({asts[astIndexes[i]] | int i <- [0 .. size(astIndexes)]});
println(0);
}

The answer is that the subscript operator is defined on maps and relations and not on sets. For example on a rel[int,int] x = {<1,2>} you could x[1] and get {2}, and on map[int,int] y = (1:2) you could y[1] and get 2.
A side-note, this codes looks like its computing lookup indexes for AST nodes, but Rascal already has pretty efficient hashes for all ADT constructor trees and those are used to lookup in relations and maps. Since these hash-codes are also integers and their distribution is pretty uniform, it is very hard to increase performance by introducing your own indexing scheme on top of this. Most likely it would degrade performance rather than improve it.
So you if you need a lookup per AST node, you could use a rel[Declaration, Something else]. People often also use loc as references to AST nodes, since they are supposed to be pretty unique. That helps if you can't keep all ASTs in memory all the time.

Related

What is the algorithm used in sort method in dart?

I found out that dart language has a built-in sort method in List class and I would like to know what is the algorithm they used in this method and what is the Big O notation of it?
I found out that dart language has a built-in sort method in List class and I would like to know what is the algorithm they used in this method and what is the Big O notation of it?
If we take a look inside the SDK we can find the following implementation of the sort method on List:
void sort([int compare(E a, E b)]) {
Sort.sort(this, compare ?? _compareAny);
}
https://github.com/dart-lang/sdk/blob/b86c6e0ce93e635e3434935e31fac402bb094705/sdk/lib/collection/list.dart#L340-L342
Which just forward the sorting to the following internal helper class:
https://github.com/dart-lang/sdk/blob/a75ffc89566a1353fb1a0f0c30eb805cc2e8d34c/sdk/lib/internal/sort.dart
Which has the following comment about the sorting algorithm:
/**
* Dual-Pivot Quicksort algorithm.
*
* This class implements the dual-pivot quicksort algorithm as presented in
* Vladimir Yaroslavskiy's paper.
*
* Some improvements have been copied from Android's implementation.
*/
This sorting algorithm is actually the same used in Java (at least Java 7):
http://www.docjar.com/html/api/java/util/DualPivotQuicksort.java.html
Here we can see that the O-notation is mostly O(n log(n)):
* This class implements the Dual-Pivot Quicksort algorithm by
* Vladimir Yaroslavskiy, Jon Bentley, and Josh Bloch. The algorithm
* offers O(n log(n)) performance on many data sets that cause other
* quicksorts to degrade to quadratic performance, and is typically
* faster than traditional (one-pivot) Quicksort implementations.
For more details you can read the paper by the designer of the Dual-Pivot Quicksort algorithm:
https://web.archive.org/web/20151002230717/http://iaroslavski.narod.ru/quicksort/DualPivotQuicksort.pdf
But, also please note that Dart also has the following constant:
// When a list has less then [:_INSERTION_SORT_THRESHOLD:] elements it will
// be sorted by an insertion sort.
static const int _INSERTION_SORT_THRESHOLD = 32;
...
static void _doSort<E>(
List<E> a, int left, int right, int compare(E a, E b)) {
if ((right - left) <= _INSERTION_SORT_THRESHOLD) {
_insertionSort(a, left, right, compare);
} else {
_dualPivotQuicksort(a, left, right, compare);
}
}
So for small lists, it makes more sense to use the traditional insertion sort algorithm which has a worst case big O of О(n^2). But since the input are very small, it is properly faster than the Dual-Pivot Quicksort algorithm.
At https://dartpad.dartlang.org/, try the following code. I can't answer your question about what the implementation is under the covers, but you can bet it's O(n log n).
I'm using an answer because I can't supply code in a comment easily.
void main() {
Map<String, int> map = {'a': 1, 'b': 2};
List<String> list = ['banana', 'apple', 'age', 'bob'];
list.sort((String a, String b) => a.compareTo(b));
print(list);
}
BTW list.sort(); would give the same results since that custom comparitor is the same as the default.

Erlang flatten function time complexity

I need a help with following:
flatten ([]) -> [];
flatten([H|T]) -> H ++ flatten(T).
Input List contain other lists with a different length
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]).
What is the time complexity of this function?
And why?
I got it to O(n) where n is a number of elements in the Input list.
For example:
flatten([[1,2,3],[4,7],[9,9,9,9,9,9]]) n=3
flatten([[1,2,3],[4,7],[9,9,9,9,9,9],[3,2,4],[1,4,6]]) n=5
Thanks for help.
First of all I'm not sure your code will work, at least not in the way standard library works. You could compare your function with lists:flatten/1 and maybe improve on your implementation. Try lists such as [a, [b, c]] and [[a], [b, [c]], [d]] as input and verify if you return what you expected.
Regarding complexity it is little tricky due to ++ operator and functional (immutable) nature of the language. All lists in Erlang are linked lists (not arrays like in C++), and you can not just add something to end of one without modifying it; before it was pointing to end of list, now you would like it to link to something else. And again, since it is not mutable language you have to make copy of whole list left of ++ operator, which increases complexity of this operator.
You could say that complexity of A ++ B is length(A), and it makes complexity of your function little bit greater. It would go something like length(FirstElement) + (lenght(FirstElement) + length(SecondElement)) + .... up to (without) last, which after some math magic could be simplified to (n -1) * 1/2 * k * k where n is number of elements, and k is average length of element. Or O(n^3).
If you are new to this it might seem little bit odd, but with some practice you can get hang of it. I would recommend going through few resources:
Good explanation of lists and how they are created
Documentation on list handling with DO and DO NOT parts
Short description of ++ operator myths and best practices
Chapter about recursion and tail-recursion with examples using ++ operator

Stanford parser - count of tags

I have been using the Stanford Parser for CFG analysis. I can get the output displayed as a tree, but what I really want is a count of tags.
So I can get out, for example (taken from another query on Stack Overflow):
(ROOT (S (NP (PRP$ My) (NN dog)) (ADVP (RB also)) (VP (VBZ likes) (NP (JJ eating) (NN sausage))) (. .)))
But what I really want is a count of the tags output in a CSV file:
PRP - 1
JJ - 1
Is this possible with the Stanford parser, particularly as I want to process several text files, or should I use a different program?
Yes, this is easily possible.
You will need:
import java.util.HashMap;
import edu.stanford.nlp.trees.Tree;
I assume from the other question you have an existing Tree object already.
I suspect you only want a list with the leave nodes (PRP, NN, RB... in your example), but you could do it for every node in general.
Then iterate over all nodes and count only the leaves:
Tree tree = ...
for (int i = 1; i < tree.size(); i++) {
Tree node = tree.getNodeNumber(i);
if (node.isLeaf()) {
// count here
}
}
The counting is done using a HashMap, you will find many examples on stackoverflow here.
Basically start with a Hashmap, using the tag as key and the tag-count as value.
edit: sorry, corrected a negation mistake in the code.
The previous answer, while being correct, iterates over all nodes in the parse tree. While there is no readily available method that returns the POS tag counts, you can directly get leaf nodes using methods in the edu.stanford.nlp.trees.Trees class as follows:
(I am using Guava's Function for a little extra elegance in the code, but a simple for loop will work just as well.)
Tree tree = sentence.get(TreeAnnotation.class); // parse tree of the sentence
List<CoreLabel> labels = Trees.taggedLeafLabels(tree); // returns the labels of the leaves in a Tree, augmented with POS tags.
List<String> tags = Lists.transform(labels, getPOSTag);
for (String tag : tags)
Collections.frequency(tags, tag);
where
Function<CoreLabel, String> getPOSTag = new Function<CoreLabel, String>() {
public String apply(CoreLabel core_label) { return core_label.get(PartOfSpeechAnnotation.class); }
};

Not allowed to use cons on list; "does not match any of the declared (overloaded) signature patterns"

According to this page:
http://tutor.rascal-mpl.org/Rascalopedia/List/List.html
This is how you use cons on lists:
cons(1,[2,3]); //should return [1,2,3]
Trying this in the Rascal console:
import List;
cons(1,[2,3]);
Gives me this error:
|stdin:///|(1,13,<1,1>,<1,14>): The called signature: cons(int, list[int]),
does not match any of the declared (overloaded) signature patterns:
Symbol = cons(Symbol,str,list[Symbol])
Production = cons(Symbol,list[Symbol],set[Attr])
Symbol = cons(Symbol,str,list[Symbol])
Production = cons(Symbol,list[Symbol],set[Attr])
Is there some name collision with the standard imported functions and datatypes?
Good question. First a straightforward answer. Cons does not exist as a library function for constructing lists (it is rather a part of the reified types API meaning something different).
We write this:
rascal>[1, *[2,3,4]]
list[int]: [1,2,3,4]
rascal>1 + [2,3,4]
list[int]: [1,2,3,4]
rascal>[1] + [2,3,4]
list[int]: [1,2,3,4]
rascal>list[int] myList(int i) = [0..i];
list[int] (int): list[int] myList(int);
rascal>[*myList(10), *myList(20)]
list[int]: [0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19]
Now an explanation for the confusion. The Rascalopedia course is on abstract concepts and terms typically good to know when working with Rascal. This may be too basic for you. I would focus on the Rascal language and libraries course for more concrete examples and information relevant to programming in Rascal. See http://tutor.rascal-mpl.org/Rascal/Expressions/Values/List/List.html for the list literals and http://tutor.rascal-mpl.org/Rascal/Libraries/Prelude/List/List.html for library functions on lists.
Some other list handiness, used for deconstructing as opposed to constructing:
rascal>[1,2,3,4][1..]
list[int]: [2,3,4]
rascal>[1,2,3,4][1..2]
list[int]: [2]
rascal>[1,2,3,4][..-1]
list[int]: [1,2,3]
rascal>if ([*x, *y] := [1,2,3,4], size(x) == size(y)) println(<x,y>);
<[1,2],[3,4]>
ok
rascal>for ([*x, *y] := [1,2,3,4]) println(<x,y>);
<[],[1,2,3,4]>
<[1],[2,3,4]>
<[1,2],[3,4]>
<[1,2,3],[4]>
<[1,2,3,4],[]>

How to declare an immutable graph with circular references?

I want to declare a graph of all states where the edges represent contiguous states. I think what I am trying to do might be called "tying the knot" (not sure about that though). It's not working like I expected, and I have a couple of questions.
First, I want a State type that has a string name and a list of contiguous states. But this declaration gives compiler error "...immediate cyclic reference...":
type State = string * (State list)
This way works:
type State(name:string, contigs: (State list)) =
let name = name
let contigs = contigs
But it's really not a requirement to name the members. A tuple is fine. How can I make that terse syntax work?
Second, the following code attempts to declare what should be three graphs of contiguous states (HI and AK are graphs consisting of a single node, all the remaining states constitute the last graph), followed by a list of all nodes. (For brevity I've only actually declared a handful of states here):
let rec hi = State("hi", [])
and mo = State("mo", [il ia])
and il = State("il", [mo])
and ia = State("ia", [mo])
and states = [hi,mo,il,ia]
This gives a variety of errors though including "mo will eventually be evaluated as part of it's own definition" and "expression was expected to have type 'a->'b but here has type State". I thought the 'rec' and 'and' keywords would allow this to work. Can I define this self referencing graph? If so, how?
The problem is your data structure and using invalid list element delimiters (should be semicolon). This works: (see edit)
type State =
| State of string * State list
let rec hi = State("hi", [])
and mo = State("mo", [il; ia])
and il = State("il", [mo])
and ia = State("ia", [mo])
let states = [hi; mo; il; ia]
Recursive references will be materialized as thunks (lazy). So you could, with a bit more typing do the same thing yourself with mutable lazys--just FYI--what you have is idiomatic.
EDIT
Intellisense didn't have a problem with it, but the compiler says
Recursive values cannot appear directly as a construction of the type 'List`1' within a recursive binding. This feature has been removed from the F# language. Consider using a record instead.
You can fix this by using seq instead of list.
type State =
| State of string * State seq
let rec hi = State("hi", [])
and mo = State("mo", seq { yield il; yield ia })
and il = State("il", seq { yield mo })
and ia = State("ia", seq { yield mo })
let states = [hi; mo; il; ia]
Although what Daniel says is correct I would contest the assertion that it is "idiomatic" because that does not produce a very useful data structure for representing graphs in the general case. Specifically, it only permits the addition of new vertices and edges from them but not adding or removing edges between existing vertices. In particular, this basically means your graph must be statically defined as a constant in your source code so you cannot load such a graph from disk easily.
The idiomatic purely functional representation of a graph is to replace dereferences with dictionary lookups. For example, represent the graph as a Map from vertices to Sets of vertices to which there are edges:
> let g =
Map["hi", set[]; "mo", set["il"; "ia"]; "il", set["mo"]; "ia", set["mo"]];;
val g : Map<string,Set<string>> =
map
[("hi", set []); ("ia", set ["mo"]); ("il", set ["mo"]);
("mo", set ["ia"; "il"])]
For example, you can lookup the vertices directly reachable via edges from mo like this:
> g.["mo"];;
val it : Set<string> = set ["ia"; "il"]
This is easier to debug than the mutable representation but it has significant disadvantages:
Lookup in a purely functional dictionary like Map is at least 200× slower than dereferencing a pointer for traversing graphs (according to a quick test here).
The garbage collector no longer reclaims unreachable subgraphs for you. The imperative solution is to use a weak dictionary but there are no known purely functional weak dictionaries.
So this is only feasible if performance and leaks will not be a problem. This is most commonly the case when your graphs are small or static.

Resources