Order by numbers last - neo4j

I have a neo4j database which has nearly 500k CK_ITEM nodes defined as follows:
CK_ITEM: {
id (String),
name (String),
description (String)
}
Suppose we have this sample data:
+--------+----+-----------------+
| name | id | description |
+--------+----+-----------------+
| Mark | 1 | A lot of things |
| Gerald | 9 | Coff2e |
| Carl | 2 | 1 mango |
| James | 3 | 5 lemons |
| Edward | 4 | Coffee |
+--------+----+-----------------+
I need to order the data by description ASC. This is my query:
MATCH (n:CK_ITEM)
ORDER BY
n.description ASC
This results in:
+--------+----+-----------------+
| name | id | description |
+--------+----+-----------------+
| Carl | 2 | 1 mango | <-- '1' < '5'
| James | 3 | 5 lemons | <-- '5' < 'A'
| Mark | 1 | A lot of things | <-- 'A' < 'C'
| Gerald | 9 | Coff2e | <-- '2' < 'e'
| Edward | 4 | Coffee |
+--------+----+-----------------+
Now, the customer asked me to order the results so that they are still in ascending order, but numbers are left last.
Basically he wants the results to be:
+--------+----+-----------------+
| name | id | description |
+--------+----+-----------------+
| Mark | 1 | A lot of things |
| Edward | 4 | Coffee |
| Gerald | 9 | Coff2e | <-- Coff2e after Coffee
| Carl | 2 | 1 mango | <-- 1 and 5 ASC after everything
| James | 3 | 5 lemons |
+--------+----+-----------------+
Translated to a pseudo-query, it would be something like this:
MATCH CK_ITEM ORDER BY letters(description) ASC numbers(description) ASC
Is it possible to have this kind of sorting (letters first ascending, numbers last ascending) in a single query? How?

The following is a Cypher query that will perform a sort where digits come last (at every character position).
NOTE: THIS APPROACH IS NOT EFFICIENT, but is presented as an example of how to do this in Cypher if you absolutely needed to.
The query splits every description value into single-character strings, tests each character to see if it is a digit, constructs a new string (character by character) -- replacing every digit with a corresponding UTF-16 character in the hex range FFF6 to FFFF (these are the highest possible UTF-16 character encodings, and your raw data is unlikely to already be using them), and uses that new string for sorting purposes.
WITH {`0`:'\uFFF6',`1`:'\uFFF7',`2`:'\uFFF8',`3`:'\uFFF9',`4`:'\uFFFA',`5`:'\uFFFB',`6`:'\uFFFC',`7`:'\uFFFD',`8`:'\uFFFE',`9`:'\uFFFF'} AS big
MATCH (n:CK_ITEM)
WITH n, SPLIT(n.description, '') AS chars, big
RETURN n
ORDER BY
REDUCE(s='', i IN RANGE(0, LENGTH(chars)-1) |
CASE WHEN '9' >= chars[i] >= '0'
THEN s + big[chars[i]]
ELSE s + chars[i]
END)

You can order this way:
MATCH (n:CK_ITEM)
RETURN n
ORDER BY
substring(n.description,0,1) in ['0','1','2','3','4','5','6','7','8','9'], n.name

You can use reqular expression and UNION:
MATCH (n:CK_ITEM) WHERE NOT n.description =~ '[0-9].*'
RETURN n
ORDER BY
n.description ASC
UNION
MATCH (n:CK_ITEM) WHERE n.description =~ '[0-9].*'
RETURN n
ORDER BY
n.description ASC

Related

Influx: doing math the same fields in different groups

I have InfluxDB measurement currently set up with following "schema":
+----+-------------+-----------+
| ts | cost(field) | type(tag) |
+----+-------------+-----------+
| 1 | 10 | 'a' |
| 1 | 20 | 'b' |
| 2 | 12 | 'a' |
| 2 | 18 | 'b' |
| 2 | 22 | 'c' |
+------------------+-----------+
I am trying to write a query that will group my table by timestamp and get a delta between field values of two different tags. If I want to get delta between tag 'a' and tag 'b', it will give me following result (please not that I ignore tag 'c'):
+----+-----------+------------+
| ts | type(tag) | delta_cost |
+----+-----------+------------+
| 1 | 'a' | 10 |
| 2 | 'b' | 6 |
+----+-----------+------------+
Is it something Influx can do or am I using the wrong tool?
Just managed to answer my own question. While one of the obvious ways would be performing self-join, Influx does not support joins anymore. We can, however, use nested selects in a following format:
SELECT MEAN(cost_a) - MEAN(cost_b) as delta_cost
FROM
(SELECT cost as cost_a, tag, tablename where tag='a'),
(SELECT cost as cost_b, tag, tablename where tag='b')
GROUP BY time(60s)
Since I am getting my data every 60 seconds anyway, and I have a guarantee of just one point per tag per 60 seconds, I can use GROUP BY and take MEAN without any problems

Query completed with an empty output

https://docs.google.com/spreadsheets/d/1033hNIUutMjjdwiZZ40u59Q8DvxBXYr7pcWyRRHAdXk
That's a link to the file in which it is not working! If you open it, go to sheet named "My query stinks".
The sheet called deposits has data like this in columns A (date), B (description), and C (amount):
+---+-----------+-----------------+---------+
| | A | B | C |
+---+-----------+-----------------+---------+
| 1 | 6/29/2016 | 1000000044 | 480 |
| 2 | 6/24/2016 | 1000000045 | 359.61 |
| 3 | 8/8/2016 | 201631212301237 | 11.11 |
+---+-----------+-----------------+---------+
The sheet "My Query Stinks" has data in columns A (check number), B (failing query) and C (amount):
+---+-----------------+------+--------+
| | A | B | C |
+---+-----------------+------+--------+
| 1 | 1000000044 | #N/A | 480 |
| 2 | 1000000045 | #N/A | 359.61 |
| 3 | 201631212301237 | #N/A | 11.11 |
+---+-----------------+------+--------+
In Column B on My Query Stinks, I want to enter a query. Here's what I'm trying:
=query(Deposits!A:C,"select A where A =" & A2)
For some reason, it returns "#N/A Error Query completed with an empty output." I want it to find that 1000000044 (the value in C4) matches 1000000044 over on Deposits and return the date.
Try
=query(Deposits!A:C,"select A where B ='" &A2&"'")
Explanation
Values like 1000000044 in Column B of the Deposit sheet and Column A of My Query Stinks sheets are set as text (string) values, so they should be enclosed on single quotes (apostrophes) otherwise QUERY think this values are numbers or variable names.
Try this:
=query(Deposits!A:C,"select A where B = '"&A2&"' LIMIT 1")
You'll need LIMIT 1 as you have multiple deposits for the same value in your second column.
Another solution for this problem could be to replace '=' with 'contains':
=query(Deposits!A:C,"select A where B contains '" &A2&"'")
Simple, but this error cost me half a morning.

find rows where there is another row with an opposite value in the table

Im trying to find an efficient way to solve the problem:
I need to find all rows in a table where there is another row with an opposite column value.
For example I have transactions with columns id and amount
| id | amount |
|----|--------|
| 1 | 1 |
| 2 | -1 |
| 3 | 2 |
| 4 | -2 |
| 5 | 3 |
| 6 | 4 |
| 7 | 5 |
| 8 | 6 |
The query should return only the first 4 rows:
| id | amount |
|----|--------|
| 1 | 1 |
| 2 | -1 |
| 3 | 2 |
| 4 | -2 |
My current solution is terribly efficient as I am going through 1000's of transactions:
transactions.find_each do |transaction|
unless transactions.where("amount = #{transaction.amount * -1}").count > 0
transactions = transactions.where.not(amount: transaction.amount).order("# amount DESC")
end
end
transactions
Are there any built in Rails or Postgresql functions that could help with this?
Use following query:
SELECT DISTINCT t1.*
FROM transactions t1
INNER JOIN transactions t2 ON t1.amount = t2.amount * -1;
SELECT * FROM the_table t
WHERE EXISTS (
SELECT * FROM the_table x
WHERE x.amount = -1*t.amount
-- AND x.amount > t.amount
);
Consider storing an absolute value indexed column then query for the positive value. Postgres has an absolute value function; but I think the beauty of ActiveRecord is that Arel abstracts away the SQL. DB specific SQL can be a pain if you change later.
There is type called abs which will return irrespective of symobol. From my example data is the table name
SELECT id,amount FROM DATA WHERE id = ABS(amount)
This is the sample test table
Here is the output

Create on NOT MATCH command for Neo4j's CQL?

I have a non-unique node (:Neighborhood) that uniquely appears [:IN] a (:City) node. I would like to create a new neighborhood node and establish its relationship ONLY if that neighborhood node does not exist in that city. There can be multiple neighborhoods that have the same name, but each neighborhood must appear uniquely appear in the property city.
Following the advice from the Gil's answer here: Return node if relationship is not present, how can I do something like:
MATCH a WHERE NOT (a:Neighborhood {name : line.Neighborhood})-[r:IN]->(c:City {name : line.City})
ON MATCH SET (a)-[r]-(c)
So then it would only create a new neighborhood node if it doesn't already exist in the city.
**UPDATE:**I upgraded and profiled it and still can't take advantage of any optimizations...
PROFILE LOAD CSV WITH HEADERS FROM "file://THEFILE" as line
WITH line LIMIT 0
MATCH (c:City { name : line.City})
MERGE (n:Neighborhood {name : toInt(line.Neighborhood)})-[:IN]->(c)
;
+--------------+------+--------+---------------------------+------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+--------------+------+--------+---------------------------+------------------------------+
| EmptyResult | 0 | 0 | | |
| UpdateGraph | 5 | 16 | anon[340], b, neighborhood, line | MergePattern |
| SchemaIndex | 5 | 10 | b, line | line.City; :City(name) |
| ColumnFilter | 5 | 0 | line | keep columns line |
| Filter | 5 | 0 | anon[216], line | anon[216] |
| Extract | 5 | 0 | anon[216], line | anon[216] |
| Slice | 5 | 0 | line | { AUTOINT0} |
| LoadCSV | 5 | 0 | line | |
+--------------+------+--------+---------------------------+------------------------------+
I think you could simply use MERGE for this:
MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})
If you haven't already imported all of the cities, you can create those with MERGE:
MATCH (c:City {name: line.City})
MERGE c<-[:IN]-(a:Neighborhood {name : line.Neighborhood})
But beware of the Eager operator:
http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/
In short: You should run your LOAD CSV (I assume that's what you're doing here) twice, once to load the cities and once to load the neighborhoods.

Cypher / Should I use the WITH clause to pass values to next MATCH?

Using Neo4j 2.1.X, let's suppose this query, returning the user 123's friends that bought a Car:
MATCH (u1:User("123"))-[:KNOWS]-(friend)
MATCH (friend)-[:BUYS]->(c:Car)
RETURN friend
In this article, it is written regarding the WITH clause:
So, how does it work? Well, with is basically just a stream, as lazy
as it can be (as lazy as return can be), passing results on to the
next query.
So it seems I should transform the query like this:
MATCH (u1:User("123"))-[:KNOWS]-(friend)
WITH friend
MATCH (friend)-[:BUYS]->(c:Car)
RETURN friend
Should I? Or does the current version of Cypher already handle MATCH chaining while passing values through them?
The more accurate starting point you give in the upfront of your query, the more efficient it will be.
Your first match is not so accurate, indeed it will use the traversal matcher to match all possible relationships.
Taken the following neo4j console example : http://console.neo4j.org/r/jsx71g
And your first query who will look like this in the example :
MATCH (n:User { login: 'nash99' })-[:KNOWS]->(friend)
RETURN count(*)
You can see the amount of dbhits in the upfront :
Execution Plan
ColumnFilter
|
+EagerAggregation
|
+Filter
|
+TraversalMatcher
+------------------+------+--------+-------------+-----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+------------------+------+--------+-------------+-----------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(*) |
| EagerAggregation | 1 | 0 | | |
| Filter | 8 | 320 | | Property(n,login(2)) == { AUTOSTRING0} |
| TraversalMatcher | 160 | 201 | | friend, UNNAMED32, friend |
+------------------+------+--------+-------------+-----------------------------------------+
Total database accesses: 521
If you use a more accurate starting point, you're the king of the road when you start from this point, look at this example query and see the difference in db hits :
Execution Plan
ColumnFilter
|
+EagerAggregation
|
+SimplePatternMatcher
|
+Filter
|
+NodeByLabel
+----------------------+------+--------+------------------------+-----------------------------------------+
| Operator | Rows | DbHits | Identifiers | Other |
+----------------------+------+--------+------------------------+-----------------------------------------+
| ColumnFilter | 1 | 0 | | keep columns count(*) |
| EagerAggregation | 1 | 0 | | |
| SimplePatternMatcher | 8 | 0 | n, friend, UNNAMED51 | |
| Filter | 1 | 40 | | Property(n,login(2)) == { AUTOSTRING0} |
| NodeByLabel | 20 | 21 | n, n | :User |
+----------------------+------+--------+------------------------+-----------------------------------------+
Total database accesses: 61
So to terminate your query, I will do something like this :
MATCH (n:User { login: 'nash99' })
WITH n
MATCH (n)-[:KNOWS]->(friend)-[:BUYS]->(c:Car)
RETURN friend
You can also specify that the friends can not be the same as the user :
MATCH (n:User { login: 'nash99' })
WITH n
MATCH (n)-[:KNOWS]->(friend)-[:BUYS]->(c:Car)
WHERE NOT friend.id = n.id
RETURN friend
Note that there is no difference between the above query and the following in matter of db hits :
MATCH (n:User { login: 'nash99' })
WITH n
MATCH (n)-[:KNOWS]->(friend)
WITH friend
MATCH (friend)-[:BUYS)->(c:Car)
RETURN (friend)
I recommend that you use the neo4j console to look at the result details showing you the above informations.
If you need to quickly protoype a graph for test, you can use Graphgen, export the graph in cypher statements and load these statements in the neo4j console.
Here is the link to the graphgen generation I used for the console http://graphgen.neoxygen.io/?graph=29l9XJ0HxJ2pyQ
Chris

Resources