Hashtable in Ocaml where key is a tuple - parsing

I am trying to construct a LL1 parse table in ocaml. I'd like for the key to be a Nonterminal, input_symbol tuple. Is this possible?
I know you can do a stack of tuples:
let (k : (string*string) Stack.t) = Stack.create ();;
Thank you in advance!!

The key of a hash table in OCaml can have any type that can be compared for equality and can be hashed to an integer. The "vanilla" interface uses the built-in polymorphic comparison to compare for equality, and a built-in polymorphic hash function.
The built-in polymorphic comparison function fails for function types and for cyclic values.
There is also a functorial interface that lets you define your own equality and hash functions. So you can even have a hash table with keys that contain functions if you do a little extra work (assuming you don't expect to compare the functions for equality).
It is not difficult to make a hash table with tuples as keys:
# let my_table = Hashtbl.create 64;;
val my_table : ('_weak1, '_weak2) Hashtbl.t = <abstr>
# Hashtbl.add my_table (1, 2) "one two";;
- : unit = ()
# Hashtbl.add my_table (3, 4) "three four";;
- : unit = ()
# Hashtbl.find my_table (1, 2);;
- : string = "one two"

Related

Are there use cases for single case variants in Ocaml?

I've been reading F# articles and they use single case variants to create distinct incompatible types. However in Ocaml I can use private module types or abstract types to create distinct types. Is it common in Ocaml to use single case variants like in F# or Haskell?
Another specialized use case fo a single constructor variant is to erase some type information with a GADT (and an existential quantification).
For instance, in
type showable = Show: 'a * ('a -> string) -> showable
let show (Show (x,f)) = f x
let showables = [ Show (0,string_of_int); Show("string", Fun.id) ]
The constructor Show pairs an element of a given type with a printing function, then forget the concrete type of the element. This makes it possible to have a list of showable elements, even if each elements had a different concrete types.
For what it's worth it seems to me this wasn't particularly common in OCaml in the past.
I've been reluctant to do this myself because it has always cost something: the representation of type t = T of int was always bigger than just the representation of an int.
However recently (probably a few years) it's possible to declare types as unboxed, which removes this obstacle:
type [#unboxed] t = T of int
As a result I've personally been using single-constructor types much more frequently recently. There are many advantages. For me the main one is that I can have a distinct type that's independent of whether it's representation happens to be the same as another type.
You can of course use modules to get this effect, as you say. But that is a fairly heavy solution.
(All of this is just my opinion naturally.)
Yet another case for single-constructor types (although it does not quite match your initial question of creating distinct types): fancy records. (By contrast with other answers, this is more a syntactic convenience than a fundamental feature.)
Indeed, using a relatively recent feature (introduced with OCaml 4.03, in 2016) which allows writing constructor arguments with a record syntax (including mutable fields!), you can prefix regular records with a constructor name, Coq-style.
type t = MakeT of {
mutable x : int ;
mutable y : string ;
}
let some_t = MakeT { x = 4 ; y = "tea" }
(* val some_t : t = MakeT {x = 4; y = "tea"} *)
It does not change anything at runtime (just like Constr (a,b) has the same representation as (a,b), provided Constr is the only constructor of its type). The constructor makes the code a bit more explicit to the human eye, and it also provides the type information required to disambiguate field names, thus avoiding the need for type annotations. It is similar in function to the usual module trick, but more systematic.
Patterns work just the same:
let (MakeT { x ; y }) = some_t
(* val x : int = 4 *)
(* val y : string = "tea" *)
You can also access the “contained” record (at no runtime cost), read and modify its fields. This contained record however is not a first-class value: you cannot store it, pass it to a function nor return it.
let (MakeT fields) = some_t in fields.x (* returns 4 *)
let (MakeT fields) = some_t in fields.x <- 42
(* some_t is now MakeT {x = 42; y = "tea"} *)
let (MakeT fields) = some_t in fields
(* ^^^^^^
Error: This form is not allowed as the type of the inlined record could escape. *)
Another use case of single-constructor (polymorphic) variants is documenting something to the caller of a function. For instance, perhaps there's a caveat with the value that your function returns:
val create : unit -> [ `Must_call_close of t ]
Using a variant forces the caller of your function to pattern-match on this variant in their code:
let (`Must_call_close t) = create () in (* ... *)
This makes it more likely that they'll pay attention to the message in the variant, as opposed to documentation in an .mli file that could get missed.
For this use case, polymorphic variants are a bit easier to work with as you don't need to define an intermediate type for the variant.

Spark: join key-tuple pairs into key-list value

I have many RDDs ( let say 4 ) of this kind: K,(v1,v2,..,vN) and I have to join them, so I simply run
r1.join(r2).join(r3).join(r4)
The result will be something like K,((v1,v2,..,vN),(v1,v2,...,vN)),(v1,v2,...,vN))... and so on. Basically, I will get a nested structure of tuples, one for each join operation.
I was wondering if there exists a way to tell Spark to output as result of the join a union of the values of each RDD. In other words, I would like to get something like:
K, [ v1,v2,..., vN,v1,v2,..., vN,v1,v2,..., v1,v2,...,vN ]
You could do a multi-join or you could save yourself from nested syntax and apply a version of cogroup instead. However, since cogroup() only allows you to group up to 4 RDD's you can kind of monkey patch it to group more. Below is an example of a multiCogroup() function:
def multiCogroup[K : ClassTag, V : ClassTag](numPartitions: Int, inputRDDs: RDD[(K, V)]*) : RDD[(K, Seq[V])] = {
val cg = new CoGroupedRDD[K](inputRDDs.toSeq, new HashPartitioner(numPartitions))
cg.mapValues { case iterables => iterables.foldLeft(Seq[V]())(_ ++ _.asInstanceOf[Iterable[V]].toSeq) }
}
Run on an example, you can see the following:
import org.apache.spark.rdd._
import org.apache.spark.HashPartitioner
import scala.reflect.ClassTag
val rdd1 = sc.parallelize(Seq(("a", 1),("b", 2),("c", 3),("d", 4)))
val rdd2 = sc.parallelize(Seq(("a", 4),("b", 3),("c", 2),("d", 1)))
val rdd3 = sc.parallelize(Seq(("c", 0),("d", 0),("e", 0)))
val rdd4 = sc.parallelize(Seq(("a", 5),("b", 5),("e", 5)))
val rdd5 = sc.parallelize(Seq(("b", -1),("c", -1),("d", -1)))
val combined = multiCogroup[String, Int](2, rdd1, rdd2, rdd3, rdd4, rdd5)
combined.foreach(println)
// (d,List(4, 1, 0, -1))
// (b,List(2, 3, 5, -1))
// (e,List(0, 5))
// (a,List(1, 4, 5))
// (c,List(3, 2, 0, -1))
A few things to note:
If your input RDD value types are non-uniform, you could umbrella the output type V into type a super type (e.g. Int and Long into Integral, String and Int into Any). This might not be recommendable though as it could cause some ambiguity issues later on in your program. In general, I think the best use case of this is when all input value types are the same.
I've defined the function to use a HashPartitioner with the number of partitions being the parameter numPartitions. It may make sense to tunnel your own Partitioner in by replacing the numPartitions argument. You can then pass the input partitioner directly to CoGroupedRDD[K](), similarly done in the implementation of cogroup.
I would probably apply some caution on using this method on large RDDs. Joins themselves can be kind of tricky depending on the size of the input data as well as the distribution of the key set. Expanding this to grouping multiple RDDs in a single cogroup could lead to similar memory issues quicker.

lua - table.concat with string keys

I'm having a problem with lua's table.concat, and suspect it is just my ignorance, but cannot find a verbose answer to why I'm getting this behavior.
> t1 = {"foo", "bar", "nod"}
> t2 = {["foo"]="one", ["bar"]="two", ["nod"]="yes"}
> table.concat(t1)
foobarnod
> table.concat(t2)
The table.concat run on t2 provides no results. I suspect this is because the keys are strings instead of integers (index values), but I'm not sure why that matters.
I'm looking for A) why table.concat doesn't accept string keys, and/or B) a workaround that would allow me to concatenate a variable number of table values in a handful of lines, without specifying the key names.
Because that's what table.concat is documented as doing.
Given an array where all elements are strings or numbers, returns table[i]..sep..table[i+1] ··· sep..table[j]. The default value for sep is the empty string, the default for i is 1, and the default for j is the length of the table. If i is greater than j, returns the empty string.
Non-array tables have no defined order so table.concat wouldn't be all that helpful then anyway.
You can write your own, inefficient, table concat function easily enough.
function pconcat(tab)
local ctab, n = {}, =1
for _, v in pairs(tab) do
ctab[n] = v
n = n + 1
end
return table.concat(ctab)
end
You could also use next manually to do the concat, etc. yourself if you wanted to construct the string yourself (though that's probably less efficient then the above version).

Extending Query Expressions

Are there any documents or examples out there on how one can extend/add new keywords to query expressions? Is this even possible?
For example, I'd like to add a lead/lag operator.
In addition to the query builder for the Rx Framework mentioned by #pad, there is also a talk by Wonseok Chae from the F# team about Computation Expressions that includes query expressions. I'm not sure if the meeting was recorded, but there are very detailed slides with a cool example on query syntax for generating .NET IL code.
The source code of the standard F# query builder is probably the best resource for finding out what types of operations are supported and how to annotate them with attributes.
The key attributes that you'll probably need are demonstrated by the where clause:
[<CustomOperation("where",MaintainsVariableSpace=true,AllowIntoPattern=true)>]
member Where :
: source:QuerySource<'T,'Q> *
[<ProjectionParameter>] predicate:('T -> bool) -> QuerySource<'T,'Q>
The CustomOperation attribute defines the name of the operation. The (quite important) parameter MaintainsVariableSpace allows you to say that the operation returns the same type of values as it takes as the input. In that case, the variables defined earlier are still available after the operation. For example:
query { for p in db.Products do
let name = p.ProductName
where (p.UnitPrice.Value > 100.0M)
select name }
Here, the variables p and name are still accessible after where because where only filters the input, but it does not transform the values in the list.
Finally, the ProjectionParameter allows you to say that p.UnitValue > 100.0M should actually be turned into a function that takes the context (available variables) and evaluates this expression. If you do not specify this attribute, then the operation just gets the value of the argument as in:
query { for p in .. do
take 10 }
Here, the argument 10 is just a simple expression that cannot use values in p.
Pretty cool feature for the language. Just implemented the reverse to query QuerySource.
Simple example, but just a demonstration.
module QueryExtensions
type ExtendedQueryBuilder() =
inherit Linq.QueryBuilder()
/// Defines an operation 'reverse' that reverses the sequence
[<CustomOperation("reverse", MaintainsVariableSpace = true)>]
member __.Reverse (source : Linq.QuerySource<'T,System.Collections.IEnumerable>) =
let reversed = source.Source |> List.ofSeq |> List.rev
new Linq.QuerySource<'T,System.Collections.IEnumerable>(reversed)
let query = ExtendedQueryBuilder()
And now it being used.
let a = [1 .. 100]
let specialReverse =
query {
for i in a do
select i
reverse
}

How to extract data from F# list

Following up my previous question, I'm slowly getting the hang of FParsec (though I do find it particularly hard to grok).
My next newbie F# question is, how do I extract data from the list the parser creates?
For example, I loaded the sample code from the previous question into a module called Parser.fs, and added a very simple unit test in a separate module (with the appropriate references). I'm using XUnit:
open Xunit
[<Fact>]
let Parse_1_ShouldReturnListContaining1 () =
let interim = Parser.parse("1")
Assert.False(List.isEmpty(interim))
let head = interim.Head // I realise that I have only one item in the list this time
Assert.Equal("1", ???)
Interactively, when I execute parse "1" the response is:
val it : Element list = [Number "1"]
and by tweaking the list of valid operators, I can run parse "1+1" to get:
val it : Element list = [Number "1"; Operator "+"; Number "1"]
What do I need to put in place of my ??? in the snippet above? And how do I check that it is a Number, rather than an Operator, etc.?
F# types (including lists) implement structural equality. This means that if you compare two lists that contain some F# types using =, it will return true when the types have the same length and contain elements with the same properties.
Assuming that the Element type is a discriminated union defined in F# (and is not an object type), you should be able to write just:
Assert.Equal(interim, [Number "1"; Operator "+"; Number "1"])
If you wanted to implement the equality yourself, then you could use pattern matching;
let expected = [Number "1"]
match interim, expected with
| Number a, Number b when a = b -> true
| _ -> false

Resources