Count unique in a Deedle Series - f#

I want to have an overview of a Series in my dataframe, something like pandas' unique values counting. I don't know if there's a built-in function for that.
So far i've done a function to just get the numbers of different features. I could manage to do the job, my question is only about a built-in function.
let unique (s:Deedle.Series<'a,'a>) =
s.Values
|>Seq.distinct
|>Seq.length
I want a result like :
[("value1",5);("value2",8)]

You can use the groupInto function - this lets you group values of the series, so you can group the data using the actual value as the key and then aggregate each group into a single value by counting the total number of items in the group:
let unique s =
s |> Series.groupInto (fun _ v -> v) (fun _ g -> Stats.count g)
Series.ofValues [ 1;2;1;2;3 ] |> unique

Related

Type-checked way of modeling parent-child relationships?

I have a type, Item. An Item may have a parent Item or one or more child Items or neither (ie, there doesn't have to be a relationship, and relationships are at most one level deep).
Transformations on Items sometimes affect the parent, and sometimes the children. So I need to be able to traverse both ways.
A naive representation is to have a ParentID: ID option and ChildIDs: ID list fields on Item. But that allows illegal states.
I could instead have a Relationships field on Item of type Relationship = ParentID of ID | ChildIDs of ID * ID list | None, which is better (the ChildIDs tuple to ensure there's at least one child in that case).
But is there a way to have the compiler ensure the consistency of the two-way relationship?
I was thinking I could do away with linkages on IDs altogether, and introduce a discriminated union at the very top: type Item = Item of Item | Compound of Item * Item * Item list. (Again, the Compound tuple represents a parent, at least one child, maybe more children.)
The downside is now every transformation on Item needs to check cases and handle Compounds. (Or maybe this is good since it forces me to think about knock-on effects of every transformation?)
The advantage is that all the Items that a function may need to touch are always available in one, consistent bundle.
Is that "idiomatic"? Any other approaches to consider?
Would something like this work? The Item is a record that contains the payload (ID and data attributes).
The discriminated union defines your four scenarios.
type Item = {ID: int; Name:string} //whatever is required
type ItemNode =
| StandaloneItem of Item
| ItemWithParent of Item * Item
| ItemWithChilren of Item * Item list
| ItemWithParentAndChildren of Item * Item * Item list
let processItem (item:Item) = 42
Then you could implement your processing of the nodes similar to this:
let processNode (item:ItemNode) =
match item with
| StandaloneItem it -> it |> processItem
| ItemWithParent (parent, it) -> [parent; it] |> List.map processItem |> List.sum
| ItemWithChilren (it, children) -> (it |> processItem) + (children |> List.map processItem |> List.sum)
| ItemWithParentAndChildren (it, parent, children) -> (it |> processItem)
+ (parent |> processItem)
+ (children |> List.map processItem |> List.sum)
In this way you do not have to add any conditional logic (on top of match) to your processing - you deal with the tuples with known content.
You can also implement records instead of tuples, which will lead to even bigger transparency.

Esper statement to check, if A is followed by B without any other As in between

I have the two events: A and B. Everytime A occurs, B have to occur afterwards without any As in between.
Has anybody got an idea, how to implement this? I thought about something like
pattern[every A -> A until B]
But this statement is true, even if A is followed B without any other As in between. But it should only be true in case of AAB or AAAAB and so on..
Thank you for your help.
One possible solution is the pattern
A -> (A and not B)
Doing so the query is only true, when the rule is violated. But if it is fulfilled, I don't get any hint.
Is there a better solution?
Match-recognize pattern matching has immediately-followed-by-semantics. You could do something like this:
create schema A();
create schema B();
select * from pattern[every a=A or every b=B]
match_recognize (
measures p1 as a, p2 as b
pattern (p1 p2)
define
p1 as typeof(p1.a) = 'A',
p2 as typeof(p2.b) = 'B'
)
Or you could use an approach with insert-into.
insert into CombinedStream select id, 'a' as type from A;
insert into CombinedStream select id, 'b' as type from B;
select * from CombinedStream
match_recognize (
measures a as a, b as b
pattern (a b)
define
a as a.type = 'A',
b as b.type = 'B'
)
And when you want to go with EPL pattern langague that can also work. EPL patterns always add and remove from filter indexes and that can be less performant depending on how many incoming events are matched and unmatched/discarded (i.e. per-event-analysis versus need-in-a-haystack)
every A -> (B and not A) // read: every A followed by B and not A

neo4j: List of ints from plain string representation

Context
I would like to read from a csv-file into my database and create nodes and connections. For the to be created order nodes, one of the fields to read is a stuffed list of Products (relational key), i.e. looks like this "[123,456,789]" where the numbers are the product ids.
Now reading the data into the db I have no problem to create nodes for the Orders and the Products; going over another iteration I now want to create the edges by kind of unwinding the list of products in the Order and linking to the according products.
Best would be if I could at creation time of the Order-nodes convert the string containing the list into a list of ints, so that a simple loop over these values and matching the Product-nodes would do the trick (also for storage efficiency this would be better).
Problem
However I cannot figure out how to convert the said string into the said format of a list containing ints. All my attempts with coming up with a cypher for this failed miserably. I will post some of them below, starting from the string l:
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s
UNWIND s as ss
COLLECT(toInteger(ss) ) as k
return k
WITH '[123,456,789]' as l
WITH split(replace(replace(l,'[',''),']',''),',') as s, [] as k
FOREACH(ss IN s| SET k = k + toInteger(ss) )
return k
both statements failing.
EDIT
I have found a partial solution, I am however not quite satisfied with as it applied only to my task at hand, but is not a solution to the more general problem of this list conversion.
I found out that one can create an empty list as an property of a node, which can be successively updated:
CREATE (o:Order {k: []})
WITH o, '[123,456]' as l
WITH o, split(replace(replace(l,'[',''),']',''),',') as s
FOREACH(ss IN s | SET o.k= o.k + toInteger(ss) )
RETURN o.k
strangly this will only work on properties of nodes, but not on bound variables (see above)
Since the input string looks like a valid JSON object, you can simple use the apoc.convert.fromJsonList function from the APOC library:
WITH "[123,456,789]" AS l
RETURN apoc.convert.fromJsonList(l)
You can use substring() to trim out the brackets at the start and the end.
This approach will allow you to create a list of the ints:
WITH '[123,456,789]' as nums
WITH substring(nums, 1, size(nums)-2) as nums
WITH split(nums, ',') as numList
RETURN numList
You can of course perform all these operations at once, and then UNWIND the subsequent list, convert them to ints, and match them to products:
WITH '[123,456,789]' as nums
UNWIND split(substring(nums, 1, size(nums)-2), ',') as numString
WITH toInteger(numString) as num
MATCH (p:Product {id:num})
...
EDIT
If you just want to convert this to a list of integers, you can use list comprehension to do this once you have your list of strings:
WITH '[123,456,789]' as nums
WITH split(substring(nums, 1, size(nums)-2), ',') as numStrings
WITH [str in numStrings | toInteger(str)] as nums
...

Check if a sequence exist in in collect in neo4j

Can somebody please tell how we can compare a sequence is present in the collection or not in Cypher / Neo4j?
Like if I say that while collect() is collecting the elements on traversal , can we check that this sequence is present when it has done collection [Element1, Element2, Element3]?
Depending if you allow gaps, you could either find the index of e1..e3 and see that they are ascending (with gaps) (apoc.coll.indexOf)
Or you could extract-3-element sublists and compare them.
WITH [1,2,3,4,5] as coll, [2,3,4] as seq
WHERE any(idx IN range(0,length(coll)-length(seq)) WHERE coll[idx..idx+length(seq)] = seq)
RETURN coll, seq

Filter Sequence of Record Types by Property

I have an ordered sequence of the following type:
type Comparison<'a when 'a :> IKey > = {Id: string; src: 'a; dest: 'a}
What I'd like to do is where there is more than one record with the same Id to only take the latest record in the sequence for each id (the sequence has been generated from ordered query results) as well as records where the Id is not shared with other records.
Is there a method in F# to generate a new sequence in this way?
How about:
items
|> Seq.groupBy (fun x -> x.Id)
|> Seq.map (snd >> Seq.last)
This groups the items into a sequence of tuples, where the first item is the ID and the second is a sequence of elements with that Id.
Then the map applies Seq.last to the second elements of those tuples.

Resources