datastax enterprise spatial search - datastax-enterprise

Problem Statement: I am currently trying to configure spatial search in Datastax Enterprise 4.0. I have lat and long as two separate columns in cassandra column family. In what way/ways can I combine lat and long into one comma separated solr spatial search field so that I can take advantage of geofilt, geodist functions. Do I need another column in cassandra which would store lat,long ?

Use the Solr LatLonType which maps to a Cassandra UTF8Type.
http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/srch/srchSolrType.html

Related

Joining three tables in solr

I am trying to join three cassandra tables using Solr. According to datastax documentation ,
DataStax Enterprise 4.0.2 and later supports the OS Solr query time
join through a custom implementation. You can join Solr documents,
including those having different Solr cores under these conditions:
Solr cores need to have the same keyspace and same Cassandra partition key.
Both Cassandra tables that support the Solr cores to be joined have to be either Thrift- or CQL-compatible. You cannot have one
that is Thift-compatible and one that is CQL-compatible.
The type of the unique key (Cassandra key validator of the partition key) are the same.
The order of table partition keys and schema unique keys are the same.
I could join two tables as per the documentation. I was wondering whether I can join three tables together satisfying the conditions. Couldn't find any documentation on joining more than two tables. I desperately need to join three tables. Is it really possible or should I drop the idea right now?
What I needed was a recursive join. From the same documentation as the above example, I could find this example:
Use a nested join query to recursively join the songs and lyrics documents with the videos document, and to select the song that mentions love and also won a video award.
http://localhost:8983/solr/internet.songs/select/?q=
{!join+fromIndex=internet.lyrics}words:love AND _query_:
{!join+fromIndex=internet.videos}award:true&indent=true&wt=json
Output is:
"response":{"numFound":1,"start":0,"docs":[
{
"song":"a3e64f8f-bd44-4f28-b8d9-6938726e34d4",
"title":"Dangerous",
"artist":"Big Data"}]
}}

Neo4j modelling - sorting nodes ordered by distance

I am using Neo4j with PHP.
In my project, I have restaurant nodes. Each node has latitude, longitude and taxonomy properties.
I need to return the restaurant nodes matching user's given taxonomy with results ordered by distance from user's location (that is nearest restaurant at the first).
What is the easiest solution?
I have worked on Mongo DB and Elasticsearch,this is very easy to achieve there using special indexing. But I could not find a straightforward way in Neo4j.
There are a couple of solutions :
Using neo4j spatial plugin : https://github.com/neo4j-contrib/spatial
Computing the distance yourself with haversin in Cypher : http://neo4j.com/docs/stable/query-functions-mathematical.html#functions-spherical-distance-using-the-haversin-function
In 3.0Mx there should be basic Cypher functions for point and distance : https://github.com/neo4j/neo4j/pull/5397/files (I didn't tested it though)
Besides the aforementioned Neo4j-Spatial, in Neo4j 3.0 there is also a built in distance() function.
See this GraphGist:
http://jexp.github.io/graphgist/idx?dropbox-14493611%2Fcypher_spatial.adoc
So if you find and match your restaurants some way you can order them by distance:
MATCH (a:Location), (b:Restaurant)
WHERE ... filtering ...
RETURN b
ORDER BY distance(point(a),point(b))
Neo4j Spatial features distance queries (among lots of other things) and also cares about ordering.

How do a general search across string properties in my nodes?

Working with Neo4j in a Rails app.
I have nodes with several string properties containing long strings of user generated content. For example in my nodes of type: "Book", I might have properties, "review", and "summary", which would contain long-form string values.
I was trying to design queries that returned nodes which match those properties to general language search terms provided by a user in a search box. As my query got increasingly complicated, it occurred to me that I was trying to resolve natural language search.
I looked into some of the popular search gems in Rails, but they all seem to depend on ActiveRecord. What search solutions exist for Neo4j.rb?
There are a few ways that you could go about this!
As FrobberOfBits said, Neo4j has what are called "legacy indexes" which use Lucene it the background to provide indexing of generic things. It does support the new schema indexes. Unfortunately those are based on exact matches (though I'm pretty sure that will change in Neo4j 2.3.x somewhat).
Neo4j does support pattern matching on strings via the =~ operator, but those queries aren't indexed. So the performance depends on the size of your database.
We often recommend a gem called searchkick which lets you define indexes for Elasticsearch in your models. Then you can just call a Model.search method to do your searches and it will first query elasticsearch to get the node IDs and then load those nodes via Neo4j.rb. You can use that via the neo4j-searchkick gem: https://github.com/neo4jrb/neo4j-searchkick
Lastly, if you're doing NLP and are trying to extract important words from your text, you could create a Tag/Word label and create relationships from your nodes to these NLP extracted nodes so that you can search based on those nodes in the future. You could even build recommendations from one text node to another based on the number/type of common tag nodes.
I don't know if anything specific exists for neo4j.rb and activerecord. What I can say is that generally this stuff is handled through the use of legacy indexes that are implemented by Lucene.
The premise is that you create a lucene-managed index on certain properties, and that then gives you access to use the Lucene query language via cypher to get data from those indices. Relative to neo4j.rb, it doesn't look any different than running cypher queries, like this:
START item=node:node_auto_index("(title:'foo bar' AND body:baz*) OR title:'bat'")
RETURN item
Note that lucene indexes and that query language can only be used in a START block, not a MATCH block. Refer to the Lucene Query Syntax to discover more about what you can do with that query syntax (fuzzy matching, wildcards, etc -- quite a bit more extensive than what regex would give you).

Splitting the workload between DSE Search and DSE Analytics Spark

I have 2 types of use cases - search and analytics. I also have 2 distinct ways to categorize my primary key candidate fields.
Partition keys by high-cardinality fields, where number of distinct values is between 100,000 and 10,000,000 for example:
Customer_id
Employee_id
IP_address
MAC_address
The query by a row key here typically returns a handful of results. Secondary indexes and faucets are practical, because they are on low-cardinality fields - see the #2 below.
Partition keys by low-cardinality fields, where number of unique values is less than a 100, for example:
event_type - like "purchase" or "authenticated_OK"
platform - like 5 types of OS or 50 types of Aplications
metric_type - like CPU_utilization
protocol - like http or ftp
SNMP MIB name
country code, like us, ca, uk
state, like de, ny
A typical query by a row key returns millions of results, maybe for further analytics.
Secondary indexes are less practical here, because they are often on high-cardinality fields of the kind #1 above.
My question::
is modeling the data like in #1 above more suitable to DSE Search; and
data modeling like #2 above more suitable for DSE Analytics?
Thanks
The First Use Case, if properly data modeled and on an appropriately sized cluster will be fine querying cassandra without any additional indexing (no secondary indexes or need for solr aka DSE Search).
The Second Use Case, is quite hard to know with the information provided; however, it does sound like it could be a case where a proper data model and an appropriately sized cluster for cassandra plus secondary indexes on the low cardinality fields, may be a good fit. However, its unclear exactly what your access patterns are with the information provided.
I suggest you read this which provides some great info on seconday indexes and solr with cassandra: When to use Cassandra vs. Solr in DSE?

Neo4j schema indexes for fuzzy search

Right now I'm thinking on possibility to create fuzzy search in my application over my Neo4j database.
The main criteria are: fuzzy search and performance.
What is the best way to achive these goals with a last version of Neo4j community edition ?
Fuzzy search is a tricky thing. Even in plain lucene (where you can do fuzzy search with lucene query strings) it is not recommended because it is quite expensive.
You can use that query syntax in Neo4j too when you indexed your data with a manual index.
The solution that most suggest is to rather go with auto-suggestion, i.e. match on the first few characters, present the options in the auto-complete box and then search by using the user selected strings.

Resources