I've searched a lot but can't find any documantation or an example for GROUP BY queries.
For now, even putting RLMResults in a loop can solve my problem but is there an elegant way to get?
It was discussed at the Java group and also applies for the iOS implementation.
Nov 2014
GROUP BY is only interesting in combination with some kind of
aggregate function. We have the most common ones directly on
RealmResults, like sum(), average(), max() and min(), are these what
you are looking for or do you have something else in mind?
We actually implemented DISTINCT but found some bugs that needed to be
fixed before it can be released. So hopefully really soon :)
The distinct operation is yet not available but there is an easy workaround for it.
Oct 2015
DISTINCT made progress in core, but isn't yet fully implemented and
wasn't so exposed yet neither in the Java binding nor in any other.
We're working on that and will make sure to report back here once we
have something to test out.
Regarding examples, the official docs have one:
The aggregate expressions #count, #min, #max, #sum and #avg are
supported on RLMArray properties, e.g.
[Company objectsWhere:#"employees.#count > 5"] to find all companies with more
than five employees.
Related
I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)
I'm running a production website for 4 years with azure SQL.
With help of 'Top Slow Request' query from alexsorokoletov on github I have 1 super slow query according to Azure query stats.
The one on top is the one that uses a lot of CPU.
When looking at the linq query and the execution plans / live stats, I can't find the bottleneck yet.
And the live stats
The join from results to project is not directly, there is a projectsession table in between, not visible in the query, but maybe under the hood of entity framework.
Might I be affected by parameter sniffing? Can I reset a hash? Maybe the optimized query plan was used in 2014 and now result table is about 4Million rows and the query is far from optimal?
If I run this query in Management Studio its very fast!
Is it just the stats that are wrong?
Regards
Vincent - The Netherlands.
I would suggest you try adding option(hash join) at the end of the query, if possible. Once you start getting into large arity, loops join is not particularly efficient. That would prove out if there is a more efficient plan (likely yes).
Without seeing more of the details (your screenshots are helpful but cut off whether auto-param or forced parameterization has kicked in and auto-parameterized your query), it is hard to confirm/deny this explicitly. You can read more about parameter sniffing in a blog post I wrote a bit longer ago than I care to admit ;) :
https://blogs.msdn.microsoft.com/queryoptteam/2006/03/31/i-smell-a-parameter/
Ultimately, if you update stats, dbcc freeproccache, or otherwise cause this plan to recompile, your odds of getting a faster plan in the cache are higher if you have this particular query + parameter values being executed often enough to sniff that during plan compilation. Your other option is to add optimize for unknown hints which will disable sniffing and direct the optimizer to use an average value for the frequency of any filters over parameter values. This will likely encourage more hash or merge joins instead of loops joins since the cardinality estimates of the operators in the tree will likely increase.
First of all, I bet that there is an answer on this question somewhere in docs, but since 'Manual: Labels and Indexes' link here gives me 404 error, I'm going to ask you anyway.
Is it possible to create an index on some label and specify it as an automatic one (just like legacy indexes I'm currently using, but for labels)?
If someone from neo4j team is reading this post, please let me know if I'm looking for the documentation in the right place, 'cause I can't find anything more or less informative on labels and indexes (except a couple of posts in Michael Hunger's blog and, maybe, some presentations, what is obviously not enough).
This is a more technical one: is it possible to find an item in the index by the regex? Suppose I have node with property 'n' -> '/a/b/c', and another node 'n' -> '/a/*/c. Can I somehow match them?
I don't work for Neo4j but I'll answer anyway.
All label indexing is automatic. Once you've created the index it maintains itself, possibly with minimal delay.
The manual for the last stable release can always be found here. The chapter on indexing for the embedded Java API is here.
You cannot use regexp with label indices yet. It's said to be on the agenda, along with index support for array lookups, i.e. what in Cypher would be
MATCH (a:MyLabel) WHERE a.value IN ['val1', 'val2']
I have been trying to come across a query builder for Neo4j's query language Cypher, ideally using a fluent API. I did not find much, and decided to invest some time on building one myself.
The result so far is a fluent API query builder for the Cypher 1.9 spec.
I wanted to use StackOverflow to kick off a discussion and see what the thoughts are, before I release the code.
Here is a demo query that you would want to send off to Neo4j using Cypher.
Show me all people who John knows who know software engineers at Google (Google company code assumed to be 12345).
The relationship strength between John and the people who connect him to Google employees should be at least 3 (assuming a range from 1-5).
Return all of John's connections and the people they know at Google, including the relationships between those people.
Sort the results by name of John's connections in ascending order and then by relationship strength in descending order.
Using Fluent-Cypher:
Cypher
.on(Node.named("john").with(Index.named("PERSON_NAMES").match(Key.named("name").is("John"))))
.on(Node.named("google").with(Id.is(12345)))
.match(Connection.named("rel1").andType("KNOWS").between("john").and("middle"))
.match(Connection.named("rel2").andType("KNOWS").between("middle").and("googleEmployee"))
.match(Connection.withType("WORKS_AT").from("googleEmployee").to("google"))
.where(Are.allOfTheseTrue(Column.named("rel1.STRENGTH").isGreaterThanOrEqualTo(3)
.and(Column.named("googleEmployee.TITLE").isEqualTo("Software Engineer"))))
.returns(Columns.named("rel1", "middle", "rel2", "googleEmployee"))
.orderBy(Asc.column("middle.NAME"), Desc.column("rel1.STRENGTH"))
which yields the following query:
START john=node:PERSON_NAMES(name='John'),google=node(12345) MATCH john-[rel1:KNOWS]-middle,middle-[rel2:KNOWS]-googleEmployee,googleEmployee-[:WORKS_AT]->google WHERE ((rel1.STRENGTH >= '3' AND googleEmployee.TITLE = 'Software Engineer')) RETURN rel1,middle,rel2,googleEmployee ORDER BY middle.NAME ASC,rel1.STRENGTH DESC
I agree that you should build this with an eye towards Cypher 2.0. As of 2.0, it's very important that WHERE clauses are matched up with the correct START, (OPTIONAL) MATCH, and WITH clauses making the design of a fluent API a bit more challenging.
I like your first example where you just use the text to describe the query. The second option, to tell you the truth, doesn't look so much easier to me than constructing the Cypher query itself. The language is quite easy to use and is well documented. Adding another layer of abstraction would only increase complexity. However, if you find a way of translating this natural language request into a Cypher request, that'd be cool :)
Also, why not start working directly with Cypher 2.0?
Finally, check out this here: http://github.com/noduslabs/infranodus – I'm working on a similar problem but for adding the nodes into the database, not querying them. I chose to use #hashtags to make it easier for people to understand how their queries should be structured (as we already use them). So in your case it could become something like
#show-all #people who #John :knows who :know #software-engineers :at #Google.
#relationship-strength between #John and the #people who are #linked to #Google #software-engineers should be at least #3
#return #all of #John's #connections and the #people they :know at #Google, including the #relationships-between those #people.
#sort the #results #by-name of #John's #connections in #ascending order and then by #relationship-strength in #descending order.
(let's say the #hashtags refer to nodes, the #at refers to actions on them)
If you could pull something like this off, I think that'd be a much better and more useful simplification of the already easy-to-use Cypher.
I am trying to implement the Limit clause in Cypher in Neo4jClient but it doe not seem to be supported (?).
I thought I saw some test cases / other documents refer to it but I can't seem to find it in my intellisense.
Is there a work around to limit results to only top 5? I have 5000+ results and results are painfully slow.
I am using Neo4jClient version 1.0.0.572. Many thanks!
The tests that you pointed to clearly indicate that it exists, and works, and is supported, and tested.
Without having posted your query, I can only guess at the problem.
The Limit method only shows up after you've called Return, because it's defined on ICypherFluentQuery<TResults>, not ICypherFluentQuery.