How do I filter the results of an aggregation in IBM Watson discovery service - watson

In Watson discovery service I am already aggregating my documents to get the top concepts. Using the following query:
term(enriched_metadata_text.concepts.text,count:10)
However, my source material is already concentrated around a few central concepts. In this case Ice Hockey, so what I want to do is be able to then filter out thosse concepts. with a query that looks something like this:
term(enriched_metadata_text.concepts.text,count:10).filter(enriched_metadata_text.concepts.text:!("National Hockey League"|"Ice hockey"))
This however does not work. I can get it to work if I filter first:
filter(enriched_metadata_text.concepts.text:!("National Hockey League"|"Ice hockey")).term(enriched_metadata_text.concepts.text,count:10)
The Issue with this however is it filters the documents with the concept "Ice Hockey" and then aggregates the results. I want to get the list of concepts and then filter THAT down, without lossing any documents.
Thanks in advance for your help.

I believe you should be able to accomplish your task with the nested aggregation to achieve your goal.
nested will scope the aggregation to the subdocuments (which is what the concepts are)
So my suggestion would be to run the following query:
nested(enriched_metadata_text.concepts).filter(enriched_metadata_text.concepts.text:!("National Hockey League"|"Ice hockey")).term(enriched_metadata_text.concepts.text,count:10)
Please let me know if this works!

Related

Are multiple vertex labels in Gremlin/Janusgraph possible, or is an alternative solution better?

I am working on an import runner for a new graph database.
It needs to work with:
Amazon Neptune - Gremlin implementation, has great infrastructure support in production, but a pain to work with locally, and does not support Cypher. No visualization tool provided.
Janusgraph - easy to work with locally as a Gremlin implementation, but requires heavy investment to support in production, hence using Amazon Neptune. No visualization tool provided.
Neo4j - Excellent visualization tool, Cypher language feels very familiar, even works with Gremlin clients, but requires heavy investment to support in production, and there appears to be no visualization tool that is anywhere nearly as good as the one found in Neo4j that works with Gremlin implementations.
So I am creating the graph where the Entity (Nodes/Verticies) have multiple Types (Labels), some being orthogonal to each other, as well as multi-dimensional.
For example, an Entity representing an order made online would be labeled as Order, Online, Spend, Transaction.
| Spend Chargeback
----------------------------------------
Transaction | Purchase Refund
Line | Sale Return
Zooming into the Spend column.
| Online Instore
----------------------------------------
Purchase | Order InstorePurchase
Sale | OnlineSale InstoreSale
In Neo4j and its Cypher query language, this proves to be very powerful for creating Relationships/Edges across multiple types without explicitly knowing what transaction_id values are in the graph :
MATCH (a:Transaction), (b:Line)
WHERE a.transaction_id = b.transaction_id
MERGE (a)<-[edge:TRANSACTED_IN]-(b)
RETURN count(edge);
Problem is, Gremlin/Tinkerpop does not natively support multiple Labels for its Verticies.
Server implementations like AWS Neptune will support this using a delimiter eg. Order::Online::Spend::Transaction and the Gremlin client does support it for a Neo4j server but I haven't been able to find an example where this works for JanusGraph.
Ultimately, I need to be able to run a Gremlin query equivalent to the Cypher one above:
g
.V().hasLabel("Line").as("b")
.V().hasLabel("Transaction").as("a")
.where("b", eq("a")).by("transaction_id")
.addE("TRANSACTED_IN").from("b").to("a")';
So there are multiple questions here:
Is there a way to make JanusGraph accept multiple vertex labels?
If not possible, or this is not the best approach, should there be an additional vertex property containing a list of labels?
In the case of option 2, should the label name be the high-level label (Transaction) or the low-level label (Order)?
Is there a way to make JanusGraph accept multiple vertex labels?
No, there is not a way to have multiple vertex labels in JanusGraph.
If not possible, or this is not the best approach, should there be
an additional vertex property containing a list of labels?
In the case of option 2, should the label name be the high-level label
(Transaction) or the low-level label (Order)?
I'll answer these two together. Based on what you have described above I would create a single label, probably named Transaction, and with different properties associated with them such as Location (Online or InStore) and Type (Purchase, Refund, Return, Chargeback, etc.). Looking at how you describe the problem above you are really talking only about a single entity, a Transaction where all the other items you are using as labels (Online/InStore, Spend/Refund) are really just additional metadata about how that Transaction occurred. As such the above approach would allow for simple filtering on one or more of these attributes to achieve anything that could be done with the multiple labels you are using in Neo4j.

Can graph database query "nodes that a given node has no relationship with"?

I am working on a dating app where users can "like" or "dislike" other users and get matched.
As you can imagine the most important query of the app would be:
Give me a stack of nearby user profiles that I have NOT liked/disliked before.
I tried to work on this with a document database (Firestore) and figured it's simply not suitable for such kind of application and hence landed in the graph database world which is new and fascinating to me.
I understand that by nature a graph database retrieves data by tracing through the relationships and make relationships first-class citizens. My question now is that what if the nodes that I am trying to get are those with no relationship from the given node? What would the query look like? Can anyone provide an example query?
Edit:
- added nearby criteria to the query statement
This is definitely possible, here is a query example :
MATCH (me:Profile {name: "Chris"})
MATCH (other:Profile) WHERE NOT (other)-[:LIKES]->(me)
As stated in the comments of your original question, on a large dataset it might not scale well, that said it is pretty uncommon that you would use only one criteria for matching, for example, the list of possible profiles to match from can be grouped by :
geolocation
profiles in depth 2 ( who is liking me, then find who other people they like, do those people like me ? )
shared interests
age group
skin color
...

How to implement fuzzy search

I'm using Neo4j 3 REST API and i have node named customer it has properties like name etc i need to get search results of name of customer eg i should get results for name "john" for my input "joan".how to implement fuzzy search to get my desired results.
Thanks in advance
First off, I want to make that you know that if you're using Neo4j 3.x that 3.x is currently in beta and isn't considered stable yet.
You have two options to implement a fuzzy search in Neo4j. You can use the legacy indexes to implement Lecene-based indexing. That should provide anything that Lucene can do, though you'd probably need to do a bit more work. You can also implement your own unmanaged extension which will allow you to use Lucene a bit more directly.
Perhaps the easier alternative is to use elasticsearch with Neo4j and have elasticsearch do your full-text indexing. You might take a look at the Neo4j and ElasticSearch page on neo4j.com. There they provide a link to a GitHub repository which is a plugin for Neo4j which automagically updates ElasticSearch with data from Neo4j and which provides and endpoint for querying your graph fuzzily. There is also a video tutorial on how to do this.
You will have to try using https://neo4j.com/developer/kb/how-to-perform-a-soundex-search/ which in this case will work. If your input is Joan you will not get John as the response, unless you just give jo as input in which you will get both. To get what you are expecting you will have to use the soundex search.
Stepping back a little, what is the problem you are trying to solve with fuzzy matching?
My experience has been that misspellings and typos are far less common than you might think, and humans prefer exact matches whenever possible. If there is no exact match (often just missing a space between words), that's a good time to use a spellchecker, and that's where the fuzzy matching should kick in.
In addition, your example would match "joan" to "john", but some synonyms like "joanie" would be more useful. If you have a big corpus of content to work with, you may be able to extract some relationships, using fuzzy & machine learning to identify "joanne" and "joni" as possible synonyms and then submit that to a human curator. "Jon" looks like a related name but it's not, while "jo" and even "nonie" may or may not be nicknames in these groupings.

Neo4j Cypher Query Builder

I have been trying to come across a query builder for Neo4j's query language Cypher, ideally using a fluent API. I did not find much, and decided to invest some time on building one myself.
The result so far is a fluent API query builder for the Cypher 1.9 spec.
I wanted to use StackOverflow to kick off a discussion and see what the thoughts are, before I release the code.
Here is a demo query that you would want to send off to Neo4j using Cypher.
Show me all people who John knows who know software engineers at Google (Google company code assumed to be 12345).
The relationship strength between John and the people who connect him to Google employees should be at least 3 (assuming a range from 1-5).
Return all of John's connections and the people they know at Google, including the relationships between those people.
Sort the results by name of John's connections in ascending order and then by relationship strength in descending order.
Using Fluent-Cypher:
Cypher
.on(Node.named("john").with(Index.named("PERSON_NAMES").match(Key.named("name").is("John"))))
.on(Node.named("google").with(Id.is(12345)))
.match(Connection.named("rel1").andType("KNOWS").between("john").and("middle"))
.match(Connection.named("rel2").andType("KNOWS").between("middle").and("googleEmployee"))
.match(Connection.withType("WORKS_AT").from("googleEmployee").to("google"))
.where(Are.allOfTheseTrue(Column.named("rel1.STRENGTH").isGreaterThanOrEqualTo(3)
.and(Column.named("googleEmployee.TITLE").isEqualTo("Software Engineer"))))
.returns(Columns.named("rel1", "middle", "rel2", "googleEmployee"))
.orderBy(Asc.column("middle.NAME"), Desc.column("rel1.STRENGTH"))
which yields the following query:
START john=node:PERSON_NAMES(name='John'),google=node(12345) MATCH john-[rel1:KNOWS]-middle,middle-[rel2:KNOWS]-googleEmployee,googleEmployee-[:WORKS_AT]->google WHERE ((rel1.STRENGTH >= '3' AND googleEmployee.TITLE = 'Software Engineer')) RETURN rel1,middle,rel2,googleEmployee ORDER BY middle.NAME ASC,rel1.STRENGTH DESC
I agree that you should build this with an eye towards Cypher 2.0. As of 2.0, it's very important that WHERE clauses are matched up with the correct START, (OPTIONAL) MATCH, and WITH clauses making the design of a fluent API a bit more challenging.
I like your first example where you just use the text to describe the query. The second option, to tell you the truth, doesn't look so much easier to me than constructing the Cypher query itself. The language is quite easy to use and is well documented. Adding another layer of abstraction would only increase complexity. However, if you find a way of translating this natural language request into a Cypher request, that'd be cool :)
Also, why not start working directly with Cypher 2.0?
Finally, check out this here: http://github.com/noduslabs/infranodus – I'm working on a similar problem but for adding the nodes into the database, not querying them. I chose to use #hashtags to make it easier for people to understand how their queries should be structured (as we already use them). So in your case it could become something like
#show-all #people who #John :knows who :know #software-engineers :at #Google.
#relationship-strength between #John and the #people who are #linked to #Google #software-engineers should be at least #3
#return #all of #John's #connections and the #people they :know at #Google, including the #relationships-between those #people.
#sort the #results #by-name of #John's #connections in #ascending order and then by #relationship-strength in #descending order.
(let's say the #hashtags refer to nodes, the #at refers to actions on them)
If you could pull something like this off, I think that'd be a much better and more useful simplification of the already easy-to-use Cypher.

Order Solr results by degrees of friendship

I am currently using Solr 1.4 (soon to upgrade to 3.3). The friendship table is pretty standard:
id | follower_id | user_id
I would like to perform a regular keyword solr search and order the results by degrees of separation as well as the standard score ordering. From the result set, given the keyword matched any of my immediate friends, they would show up first. Secondly would be the friends of my friends, and thirdly friends by 3rd degree of separation. All other results would come after.
I am pretty sure Solr doesn't offer any 'pre-baked' way of doing this therefore I would likely have to do a join on MySQL to properly order the results. Curious if anyone has done this before and/or has some insights.
It's simply not possible in Solr. However, if you aren't too restricted and could use another platform for this, consider neo4j?
This "connections" and degrees is exactly where Neo4j steps in.
http://neo4j.org/
One way might be to create fields like degree_1, degree_2 etc. and store the list of friends at degree x in the field degree_x. Then you could fire multiple queries - the first restricting the results to those who have you in degree_1, the second restricting the results to those who have you in degree_2 and so on.
It is a bit complicated, but the only solution I could think of using Solr.
I haven't represented a graph in solr before, but I think at a high level, this is what you could do. First, represent people as nodes and the social network as a graph in the database. Implement transitive closure function in sql to allow you to walk the graph. Then you would index the result into solr with the social network info stored into payloads, for example.
I was able to achieve this by performing multiple queries and with the scope "with" to restrict to the id's of colleagues, 2nd and 3rd degree colleagues, using the id's and using mysql to do the select.
#search_1 = perform_search(1, options)
#search_2 = perform_search(2, options)
if degree == 1
with(:id).any_of(options[:colleague_ids])
elsif degree == 2
with(:id).any_of(options[:second_degree_colleagues])
end
It's kinda of a dirty solution as I have to perform multiple solr queries, but until I can use dynamic field sorting options (solr 3.3, not currently supported by sunspot) I really don't know any other way to achieve this.

Resources