Does DSE search support aggregate involving CQL Solr-query - datastax-enterprise

I am trying to use DSE search for some basic reporting and need aggregate function like sum, count etc.
want to know can I run aggregate queries involving CQL solr_query . Also which version of DSE?

Aggregate queries are not supported in solr_query, only simple faceting. Please use the Solr HTTP JSON API for analytics functionality from Solr with DSE 5.1 and later.

Related

Which steps/gremlin queries from Tinkerpop 3 are not supported in Cosmos db graph

I am currently evaluating Neo4j against Cosmos db graph .
As the present system lies in cosmos thus we started building graph in cosmos .
But in recent times came to know about certain tinkerpop3 queries which are not supported in cosmos db graph like regex, filter and other lambda operations.
Do we have a list of such supported/unsupported operations anywhere so that we are in better place to choose between the two databases without compromising on the features we wish to build.
The full list of Gremlin steps supported by CosmosDB can be found here. It is worth clarifying that TinkerPop does not support regex natively. You can only do that through a lambda expression with filter() step:
gremlin> g = TinkerFactory.createModern().traversal()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], standard]
gremlin> p = Pattern.compile("(marko|j.*h)")
==>(marko|j.*h)
gremlin> g.V().values('name').filter{p.matcher(it.get()).matches()}
==>marko
==>josh
While this works, there is no graph that I am aware of that will optimize that particular traversal (i.e. it will not be index based). In fact, no graph will optimize any lambda - it is arbitrary code that the graph will simply execute. You will need to look for graphs that natively support regex or some form of full-text search. The only ones that I know of that have such support in Gremlin itself would be JanusGraph and DSE Graph. Other graphs do have such support natively, but it isn't necessarily exposed in a way that it could be used in Gremlin directly.
TinkerPop is adding native support for text predicates now that more graphs seem to be supporting this feature and the pattern for doing so is relative consistent. We should see that in TinkerPop 3.4.0 when that is released.

Neo4J nodes aggregation

Is there a concept of (advanced) nodes aggregation in Neo4J?
I mean something like this for elasticsearch.
The build in neo4j aggregate functions can be found here
But I expect you want custom aggregation functions. You'll have to write the functions in java. I think the documentation is pretty clear, and should be found here.

How to get node by name in neo4j 1.9.3 by using Java API?

I'm using neo4j version 1.9.3. There will be more than 10 Billion nodes in database.
As of now, I am using Indexing (got the idea from here) for getting nodes by the specific property (i.e name of the user etc). But it seems indexing makes slower to write to the database.
How is it possible to use Java API in neo4j 1.9.3 for querying nodes by specific property?
With indexes as you said. You could also create in-graph structures (e.g. tree's or lists) to access your nodes in a structured fashion, see:
http://neo4j.com/docs/stable/cypher-cookbook-path-tree.html
And as in any other databases indexes add write performance impact.
Why don't you upgrade to a more recent version? Like 2.3.1 ?

Neo4j - Cypher vs Gremlin query language

I'm starting to develop with Neo4j using the REST API.
I saw that there are two options for performing complex queries - Cypher (Neo4j's query language) and Gremlin (the general purpose graph query/traversal language).
Here's what I want to know - is there any query or operation that can be done by using Gremlin and can't be done with Cypher? or vice versa?
Cypher seems much more clear to me than Gremlin, and in general it seems that the guys in Neo4j are going with Cypher.
But - if Cypher is limited compared to Gremlin - I would really like to know that in advance.
For general querying, Cypher is enough and is probably faster. The advantage of Gremlin over Cypher is when you get into high level traversing. In Gremlin, you can better define the exact traversal pattern (or your own algorithms) whereas in Cypher the engine tries to find the best traversing solution itself.
I personally use Cypher because of its simplicity and, to date, I have not had any situations where I had to use Gremlin (except working with Gremlin graphML import/export functions). I expect, however, that even if i would need to use Gremlin, I would do so for a specific query I would find on the net and never come back to again.
You can always learn Cypher really fast (in days) and then continue with the (longer-run) general Gremlin.
We have to traverse thousands of nodes in our queries. Cypher was slow. Neo4j team told us that implementing our algorithm directly against the Java API would be 100-200 times faster. We did so and got easily factor 60 out of it. As of now we have no single Cypher query in our system due to lack of confidence. Easy Cypher queries are easy to write in Java, complex queries won't perform. The problem is when you have multiple conditions in your query there is no way in Cypher to tell in which order to perform the traversals. So your cypher query may go wild into the graph in a wrong direction first.
I have not done much with Gremlin, but I could imagine you get much more execution control with Gremlin.
The Neo4j team's efforts on Cypher have been really impressive, and it's come a long way. The Neo team typically pushes people toward it, and as Cypher matures, Gremlin will probably get less attention. Cypher is a good long-term choice.
That said- Gremlin is a Groovy DSL. Using it through its Neo4j REST endpoint allows full, unfettered access to the underlying Neo4j Java API. It (and other script plugins in the same category) cannot be matched in terms of low-level power. Plus, you can run Cypher from within the Gremlin plugin.
Either way, there's a sane upgrade path where you learn both. I'd go with the one that gets you up and running faster. In my projects, I typically use Gremlin and then call Cypher (from within Gremlin or not) when I need tabular results or expressive pattern matching- both are a pain in the Gremlin DSL.
I initially started using Gremlin. However, at the time, the REST interface was a little unstable, so I switched to Cypher. It has much better support for Neo4j. However, there are some types of queries that are simply not possible with Cypher, or where Cypher can't quite optimize the way you can with Gremlin.
Gremlin is built over Groovy, so you can actually use it as a generic way to get Neo4j to execute 'Java' code and perform various tasks from the server, without having to take the HTTP hit from the REST interface. Among others, Gremlin will let you modify data.
However, when all I want is to query data, I go with Cypher as it is more readable and easier to maintain. Gremlin is the fallback when a limitation is reached.
Gremlin queries can be generated programmatically.
(See http://docs.sqlalchemy.org/en/rel_0_7/core/tutorial.html#intro-to-generative-selects to know what I mean.)
This seems to be a bit more tricky with Cypher.
Cypher only works for simple queries. When you start incorporating complex business logic into your graph traversals it becomes prohibitively slow or stops working altogether.
Neo4J clearly knows that Cypher isn't cutting it, because they also provide the APOC procedures which include an alternate path expander (apoc.path.expand, apoc.path.subgraphAll, etc).
Gremlin is harder to learn but it's more powerful than Cypher and APOC. You can implement any logic you can think of in Gremlin.
I really wish Neo4J shipped with a toggleable Gremlin server (from reading around, this used to be the case). You can get Gremlin running against a live Neo4J instance, but it involves jumping through a lot of hoops. My hope is that since Neo4J's competitors are allowing Gremlin as an option, Neo4J will follow suit.
Cypher is a declarative query language for querying graph databases. The term declarative is important because is a different way of programming than programming paradigms like imperative.
In a declarative query language like Cypher and SQL we tell the underlying engine what data we want to fetch and we do not specify how we want the data to be fetched.
In Cypher a user defines a sub graph of interest in the MATCH clause. Then underlying engine runs a pattern matching algorithm to search for the similar occurrences of sub graph in the graph database.
Gremlin is both declarative and imperative features. It is a graph traversal language where a user has to give explicit instructions as to how the graph is to be navigated.
The difference between these languages in this case is that in Cypher we can use a Kleene star operator to find paths between any two given nodes in a graph database. In Gremlin however we will have to explicitly define all such paths. But we can use a repeat operator in Gremlin to find multiple occurrences of such explicit paths in a graph database. However, doing iterations over explicit structures in not possible in Cypher.
If you use gremlin, then it allow you to migrate the to different graph databases,
Since most of the graph databases supports the gremlin traversal, Its good idea to chose the gremlin.
Long answer short : Use cypher for query and gremlin for traversal. You will see the response timing yourself.

cql limitations

I may want to use erlacassa to communicate between Cassandra and Eralng. It is a CQL client. So I was wondering, what are the limitations of CQL (cassandra query language) compared to cassandra accessed by thrift?
For example I have found over the internet that:
CQL has some current limitations and does not support operations such
as GROUP BY, ORDER BY
This partly depends on the version of Cassandra that you are using. For example, CQL did not support composite columns until CQL 3.0 (which is available in Cassandra 1.1 but not turned on by default). But for the most part all major features are available both in the thrift API and in CQL.
As for group by this is not supported by either CQL or the thrift API. Order by is in CQL 3.0, but it is only used to specify a reversed ordering (which is the same limitation you would have through Thrift). It sounds like the article you found was comparing Cassandra to a traditional SQL database.
Aside from syntax differences, the biggest difference is that CQL pretends to be SQL whereas the thrift APIs make no such pretension. Developers will see the SQL and make relational assumptions that simply don't apply to Cassandra. For example, there is discussion here advocating for using order by. In the world of Cassandra, it is far better to denormalize materialized views of every way that you wish to access the data rather than by changing the query.
Don't get me wrong. I see a lot of value in replacing the thrift interface with a DSL such as http://glennengstrand.info/nosql/cassandra/cql but I believe that the familiarity with SQL as a way to access relational data will lead developers into using Cassandra in ways that will simply not scale.

Resources