Arrays of arrays or relationships in neo4j - neo4j

The question came up after reading Natural Language Analytics made simple and visual with Neo4j blog entry created by Michael Hunger
When a word is used by more than one sentence (or more than one time in the same sentence), this word will have two or more [NEXT] relationships. In order to know the correct path for each sentence we need to store the segment id and the position id [sid,idx]
Storing one instance is clear, it create an array with two values. But, how do we add two or more arrays? As far as I know, neo4j only accepts basic data types
Instead of using this solution, would it make sense to store one [NEXT] relationship for each sentence path? Of course this would generate a very big amount on relationships
Thanks

NOTE: In the referenced article, there is a typo on the last line of the query in the "I also want to sentence number and word position" section. That is, r.pos = r.pos = [sid,idx] should be: r.pos = r.pos + [sid,idx].
When you use the + operator on 2 collections, you end up with a single collection that merges the contents of the 2 original collections. So, if r.pos starts out as [1, 2], then r.pos + [3, 4] will produce: [1, 2, 3, 4].
Therefore, the article does not an "array of arrays" problem.

Related

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

table size difference. are both examples identical?

tNum={[2]=true , [3]=true,[4]=true, [5]=true ,[6]=true }
#tNum-->0
tNum={}
tNum[2]=true
tNum[3]=true
tNum[4]=true
tNum[5]=true
tNum[6]=true
#tNum-->6
why such a difference in size?
are both examples identical?
Your two tables are semantically identical, but using # on them is ambiguous. Both 0 and 6 are correct lengths. Here's an abridged version of the docs:
The length operator applied on a table returns a border in that table. A border in a table t is any natural number that satisfies the following condition:
(border == 0 or t[border] ~= nil) and t[border + 1] == nil
A table with exactly one border is called a sequence.
When t is not a sequence, #t can return any of its borders. (The exact one depends on details of the internal representation of the table, which in turn can depend on how the table was populated and the memory addresses of its non-numeric keys.)
This is an example of undefined behavior (UB). (That may not be the right word, because the behavior is partially defined. UB in Lua can't launch nuclear weapons, as it can in C.) Undefined behavior is important, because it gives the devs the freedom to choose the fastest possible algorithm without worrying about what happens when a user violates their assumptions.
To find a length, Lua makes, at most, log n guesses instead of looking at every element to find an unambiguous length. For large arrays, this speeds things up a lot.
The issue is that when you define a table as starting at index [2], the length operator breaks because it assumes that tables start at index [1].
The following code works as intended:
tNum = {[1]=false, [2]=true, [3]=true, [4]=true, [5]=true, [6]=true}
#tNum => 6
The odd behaviour is caused because when you initialize an array with tNum={} it initializes by assigning every index to nil, and the first index is [1] (It doesn't actually initialize every value to nil, but it's easier to explain that way).
Conversely, when you initialize an array with tNum={[2]=true} you are explicitly telling the array that tNum[1] does not exist and the array begins at index 2. The length calculation breaks when you do this.
For a more thorough explanation, see this section of the lua wiki near the bottom where it explains:
For those that really want their arrays starting at 0, it is not difficult to write the following:
days = {[0]="Sunday", "Monday", "Tuesday", "Wednesday",
"Thursday", "Friday", "Saturday"}
Now, the first value, "Sunday", is at index 0. That zero does not affect the other fields, but "Monday" naturally goes to index 1, because it is the first list value in the constructor; the other values follow it. Despite this facility, I do not recommend the use of arrays starting at 0 in Lua. Remember that most functions assume that arrays start at index 1, and therefore will not handle such arrays correctly.
The Length operator assumes your array will begin at index [1], and since it does not, it doesn't work correctly.
I hope this was helpful, good luck with your code!

Rails – assign an order number to each record

So I am importing passages from a book into my application. I am giving all the passages in a given book the class Passage. i.e. Passage.all
I do have many books so I also have a class Book. Therefore, when I am finding all the passages from one book I call:
Passage.where(book_id: self.book_id)
When I use the where method, does it preserve the "natural order", which Passage.all would generally return. If not, I could change the code to:
Passage.where(book_id: self.book_id).order("created_at ASC")
Anyway, I then proceed to write this code:
a = Passage.where(book_id: self.book_id)
b = a.index(self)+1
self.passage_number = b
[first line: returns all the passages in the book]
[second line: returns their number in the array + 1 to account for the 0 starting value thing (pardon the colloquialism)]
[third line: assigns that index value to the passage number]
Ultimately, I am trying to compute passage numbers, without having to hard code them.
SO WHAT'S MY ISSUE? Right now I am getting three passage #3's, and two passage #4's. My last passage is this:
Passage.last.passage_number = 217
Passage.where(book_id: 5).count = 241
It is skipping numbers and incorrectly indexing, so I think I need to code a better method! What's a better way to index an array in this context?
There is no such thing as "natural order": without an order clause Passage.all may return things in any order the database wants (which could depend on things like location of items on disk, query plan etc).
The first and last methods are special in that they order by id if your relation does not already have an order applied to it.
If you need things in a specific order then add an order clause.

Suppress delimiters in Ruby's String#split

I'm importing data from old spreadsheets into a database using rails.
I have one column that contains a list on each row, that are sometimes formatted as
first, second
and other times like this
third and fourth
So I wanted to split up this string into an array, delimiting either with a comma or with the word "and". I tried
my_string.split /\s?(\,|and)\s?/
Unfortunately, as the docs say:
If pattern contains groups, the respective matches will be returned in the array as well.
Which means that I get back an array that looks like
[
[0] "first"
[1] ", "
[2] "second"
]
Obviously only the zeroth and second elements are useful to me. What do you recommend as the neatest way of achieving what I'm trying to do?
You can instruct the regexp to not capture the group using ?:.
my_string.split(/\s?(?:\,|and)\s?/)
# => ["first", "second"]
As an aside note
into a database using rails.
Please note this has nothing to do with Rails, that's Ruby.

How to sort a list of 1million records by the first letter of the title

I have a table with 1 million+ records that contain names. I would like to be able to sort the list by the first letter in the name.
.. ABCDEFGHIJKLMNOPQRSTUVWXYZ
What is the most efficient way to setup the db table to allow for searching by the first character in the table.name field?
The best idea right now is to add an extra field which stores the first character of the name as an observer, index that field and then sort by that field. Problem is it's no longer necessarily alphabetical.
Any suggestions?
You said in a comment:
so lets ignore the first letter part. How can I all records that start with A? All A's no B...z ? Thanks – AnApprentice Feb 21 at 15:30
I issume you meant How can I RETURN all records...
This is the answer:
select * from t
where substr(name, 1, 1) = 'A'
I agree with the questions above as to why you would want to do this -- a regular index on the whole field is functionally equivalent. PostgreSQL (with some new ones in v. 9) has some rather powerful indexing capabilities for special cases which you might want to read about here http://www.postgresql.org/docs/9.1/interactive/sql-createindex.html

Resources