Mongoid Return Specific Object from Array in Document - ruby-on-rails

This seems as though it should be simple but I have been struggling with this for a while with no luck.
Let's assume I have a simple document that looks like the following:
{
data: [
{
name: "Minnesota",
},
{
name: "Mississippi",
},
...
]
}
If I run the following query in my Mongo Shell, everything works as I would expect:
db.collection.find({}, {data: {$elemMatch: {name: "Michigan"}}})
Returns:
{ "_id" : ObjectId("5e9ba60998d1ff88be83fffe"), "data" : [ { "name" : "Michigan" } ] }
However, using mongoid, attempting to run a similar query returns every object inside of the data array. Here is one of the may queries I've tried:
Model.where({data: {"$elemMatch": {name: "Michigan"}}}).first
As I mentioned above, that little query returns everything inside the data array, not the specific object I'm trying to pull out of the document.
Any help would be appreciated. I'm trying to avoid returning the results and post-processing them with Ruby. I'd love to handle this at the DB level.
Thank you.

There was a very similar question earlier for a different driver. Apparently the ruby driver behaves differently than the shell.
Try running your find as the equivalent database command:
session.command({'find' => 'my_collection', 'filter' => {}, projection => {data: {$elemMatch: {name: "Michigan"}}}})

Mongoid syntax for projections is only.

Related

Conditional queries for ElasticSearch 5.x (elasticsearch-rails/elasticsearch-model)

New to ElasticSearch and I was wondering if there is a way to construct conditional queries/filters. I am working with Rails, so I suppose it has to be on that particular level, as I couldn't find anything that points to conditional queries at ES-Level and I am pretty sure it was silly just to assume!
So here is the (working) query I have:
search_definition = {
query: {
bool: {
must: [
{
more_like_this: {
fields: tag_types,
docs: [
{
_index: self.class.index_name,
_type: self.class.document_type,
_id: id
}
],
min_term_freq: 1
}
}
],
should: [
range: {
age: {
gte: min_age,
lte: max_age,
boost: 4.0
}
}
],
filter: {
bool: {
must: [
term: {
active: true
}
],
must: [
geo_distance: {
distance: xdistance,
unit: "km",
location: {
lat: xlat,
lon: xlng
},
boost: 5.0
}
]
}
}
}
},
size: how_many
}
And it works perfectly fine. Now let's assume I'd like to apply additional filters, in this particular example I need to verify when the user who is searching, that the users on the other end are, in fact, looking for a person of gender for whoever is searching. This is held in 2 separate boolean attributes in the database (male/female). I thought it would be simple enough to prepare two similar filters - however, there are a few more conditional filters that run into the queries, and I would eventually end up with more than ten pre-prepared filters. There must be a more elegant way! Thank you!
Are you familiar with elasticsearch search templates?
Using search templates you can have conditional and dynamic queries. for example you can have a list of fields and values to do terms filter and pass it to search template as a parameter.
As suggested by Mohammad - in the end, I pursued a solution using ES search templates which made my life a lot easier. The problem with JBuilder, ElasticSearch-DSL and other solutions is that they appear not to be working with the latest ES, and subsequently, I am not sure where I end up should there me ever any changes to gems or version of ES. So cutting the middle man out and taking full control with templates that are in fact super easy to create made a lot of sense to me. The versions I set up with JBuilder and ES-DSL never worked correctly as their output was random at best.
Search Templates -> More Information
JBuilder -> More Information
ElasticSearch-DSL -> More Information
There are other solutions that I haven't tried, but with search templates, I didn't see any need for that.

How do I effectively use CouchDB with normalized data?

It has taken me quite a long (calendar) time to get my head around CouchDB and map/reduce and how I can utilize it for various use cases. One challenge I've put myself to understanding is how to use it for normalized data effectively. Sources all over the internet simply stop with "don't use it for normalized data.". I do not like the lack of analysis on how to use it effectively with normalized data!
Some of the better resources I've found are below:
CouchDB: Single document vs "joining" documents together
http://www.cmlenz.net/archives/2007/10/couchdb-joins
In both cases, the authors do a great job at explaining how to do a "join" when it is necessary to join documents when there is denormalized commonality across them. If, however, I need to join more than two normalized "tables" the view collation tricks leveraged to query just one row of data together do not work. That is, it seems you need some sort of data about all elements in the join to exist in all documents that would participate in the join, and thus, your data is not normalized!
Consider the following simple Q&A example (question/answer/answer comment):
{ id: "Q1", type: "question", question: "How do I...?" }
{ id: "A1", type: "answer", answer: "Simple... You just..." }
{ id: "C1", type: "answer-comment", comment: "Great... But what about...?" }
{ id: "C2", type: "answer-comment", comment: "Great... But what about...?" }
{ id: "QA1", type: "question-answer-relationship", q_id:"Q1", a_id:"A1" }
{ id: "AC1", type: "answer-comment-relationship", a_id:"A1", c_id:"C1" }
{ id: "AC2", type: "answer-comment-relationship", a_id:"A1", c_id:"C2" }
{ id: "Q2", type: "question", question: "What is the fastest...?" }
{ id: "A2", type: "answer", answer: "Do it this way..." }
{ id: "C3", type: "answer-comment", comment: "Works great! Thanks!" }
{ id: "QA2", type: "question-answer-relationship", q_id:"Q2", a_id:"A2" }
{ id: "AC3", type: "answer-comment-relationship", a_id:"A2", c_id:"C3" }
I want to get one question, its answer, and all of its answer's comments, and no other records from the databse with only one query.
With the data set above, at a high level, you'd need to have views for each record type, ask for a particular question with an id in mind, then in another view, use the question id to look up relationships specified by the question-answer-relationship type, then in another view look up the answer by the id obtained by the question-answer-relationship type, and so on and so forth, aggregating the "row" over a series of requests.
Another option might be to create some sort of application that does process above to cache denormalized documents in the desired format that automatically react to the normalized data being updated. This feels awkward and like a reimplementation of something that already exists/should exist.
After all of this background, the ultimate question is: Is there a better way to do this so the database, rather than the application, does the work?
Thanks in advance for anyone sharing their experience!
The document model you have is what I would do if I'm using traditional relational database, since you can perform joins more naturally with those ids.
For a document database however, this will introduce complexity since 'joining' document with MapReduce isn't the same thing.
In the Q&A scenario you presented, I would model it as follow:
{
id: "Q1",
type: "question",
question: "How do I...?"
answers: [
{
answer: "Simple... You just...",
comments: [
{ comment: "Great... But what about...?" },
{ comment: "Great... But what about...?" }
]
},
{
answer: "Do it this way...",
comments: [
{ comment "Works great! Thanks!" },
{ comment "Nope, it doen't work" }
]
}
]
}
This can solve a-lot of issues with read from the db, but it does make your write more complex, for example when adding a new comment to an answer, you will need to
Get the document out from CouchDB.
Loop through the answer and find the correct position, and push comment into the array.
Save document back to CouchDB.
I'd only consider to spit the answer as a separate document if there's a-lot of them (e.g. 1 question yield 1000 answers'), otherwise it's easier to just package them in a single document. But even in that case, try putting the relationship info inside the document, e.g.
{
id: "Q1",
type: "question",
question: "How do I...?"
}
{
id: "A1",
type: "answer",
answer: "Simple... You just..."
question_id: "Q1"
}
{
id: "C1",
type: "comment",
comment: "Works great! Thanks!"
answer_id: "A1"
}
This can make you'r write operation easier but you will need to create view to join the documents so it returns all documents with one request.
And always keep in mind that the return result from a view is not necessary a flat structure like rows like in sql query.

What are the parameters for the search method of the Tire gem?

I need to run a search using Tire with my query specifically defined as a parameter, but I'm unsure how to proceed.
search = {
query: {
function_score: {
query: { match_all: {} },
# filters is an array previously built
functions: filters,
score_mode: "total"
}
}
}
Program.tire.search(load: true, size: 50, search)
I'm receiving the following error: /Users/app/models/program_match.rb:122: syntax error, unexpected ')', expecting tASSOC which makes me believe I'm simply missing a key word before I call search.
Any help would be greatly appreciated!
You probably just need to do:
Program.tire.search({load: true, size: 50}.merge(search))
EDIT
Actually, looking at the source for search (https://github.com/karmi/retire/blob/master/lib/tire/model/search.rb), it looks like you need to do:
Program.tire.search(search, {load: true, size: 50})
search expects two params (query, options) or one param (for options) and a block (for the query). Ruby gets confused because you have started a hash (load: true ...) and then just put a new hash (your search hash), which it sees as a hash key (with no value).
Also, if you are just starting out with Tire, I would suggest checking out elasticsearch-rails, which, according to the author, is replacing Tire.
I recently converted a Tire project to elasticsearch-rails, and have found that it can do everything Tire does, although it doesn't provide the query DSL (it seems like you're not using that anyway, so no loss).
EDIT 2
You can do a simple match_all query like:
Program.tire.search(load: true, size: 50) { query { all } }
You can get something similar by doing:
Program.tire.search('*', load: true, size: 50)
As I noted in a comment below, a query as the first param for search will always be wrapped in a query_string query.
Probably the best way to do what you asked initially is to do:
Tire.search(Video.tire.index_name, query: {
function_score: {
query: { match_all: {} },
functions: filters,
score_mode: "total"
}
}).results
I just tested a similar function_score query on a local project and confirmed that it produces the expected query.
EDIT 3
I've never used the load option before, but it looks like you can do:
Tire.search(Video.tire.index_name, payload: {
query: {
function_score: {
query: { match_all: {} },
functions: filters,
score_mode: "total"
}
}
}, load: true).results
Note that you have to wrap the query as the value for payload.

Elastic Search: how to see the indexed data

I had a problem with ElasticSearch and Rails, where some data was not indexed properly because of attr_protected. Where does Elastic Search store the indexed data? It would be useful to check if the actual indexed data is wrong.
Checking the mapping with Tire.index('models').mapping does not help, the field is listed.
Probably the easiest way to explore your ElasticSearch cluster is to use elasticsearch-head.
You can install it by doing:
cd elasticsearch/
./bin/plugin -install mobz/elasticsearch-head
Then (assuming ElasticSearch is already running on your local machine), open a browser window to:
http://localhost:9200/_plugin/head/
Alternatively, you can just use curl from the command line, eg:
Check the mapping for an index:
curl -XGET 'http://127.0.0.1:9200/my_index/_mapping?pretty=1'
Get some sample docs:
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1'
See the actual terms stored in a particular field (ie how that field has been analyzed):
curl -XGET 'http://127.0.0.1:9200/my_index/_search?pretty=1' -d '
{
"facets" : {
"my_terms" : {
"terms" : {
"size" : 50,
"field" : "foo"
}
}
}
}
More available here: http://www.elasticsearch.org/guide
UPDATE : Sense plugin in Marvel
By far the easiest way of writing curl-style commands for Elasticsearch is the Sense plugin in Marvel.
It comes with source highlighting, pretty indenting and autocomplete.
Note: Sense was originally a standalone chrome plugin but is now part of the Marvel project.
Absolutely the easiest way to see your indexed data is to view it in your browser. No downloads or installation needed.
I'm going to assume your elasticsearch host is http://127.0.0.1:9200.
Step 1
Navigate to http://127.0.0.1:9200/_cat/indices?v to list your indices. You'll see something like this:
Step 2
Try accessing the desired index:
http://127.0.0.1:9200/products_development_20160517164519304
The output will look something like this:
Notice the aliases, meaning we can as well access the index at:
http://127.0.0.1:9200/products_development
Step 3
Navigate to http://127.0.0.1:9200/products_development/_search?pretty to see your data:
ElasticSearch data browser
Search, charts, one-click setup....
Aggregation Solution
Solving the problem by grouping the data - DrTech's answer used facets in managing this but, will be deprecated according to Elasticsearch 1.0 reference.
Warning
Facets are deprecated and will be removed in a future release. You are encouraged to
migrate to aggregations instead.
Facets are replaced by aggregates - Introduced in an accessible manner in the Elasticsearch Guide - which loads an example into sense..
Short Solution
The solution is the same except aggregations require aggs instead of facets and with a count of 0 which sets limit to max integer - the example code requires the Marvel Plugin
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : { <= Whatever you want this to be
"terms" : {
"field" : "first_name", <= Name of the field you want to aggregate
"size" : 0
}
}
}
}
Full Solution
Here is the Sense code to test it out - example of a houses index, with an occupier type, and a field first_name:
DELETE /houses
# Index example docs
POST /houses/occupier/_bulk
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "john" }
{ "index": {}}
{ "first_name": "mark" }
# Basic aggregation
GET /houses/occupier/_search?search_type=count
{
"aggs" : {
"indexed_occupier_names" : {
"terms" : {
"field" : "first_name",
"size" : 0
}
}
}
}
Response
Response showing the relevant aggregation code. With two keys in the index, John and Mark.
....
"aggregations": {
"indexed_occupier_names": {
"buckets": [
{
"key": "john",
"doc_count": 2 <= 2 documents matching
},
{
"key": "mark",
"doc_count": 1 <= 1 document matching
}
]
}
}
....
A tool that helps me a lot to debug ElasticSearch is ElasticHQ. Basically, it is an HTML file with some JavaScript. No need to install anywhere, let alone in ES itself: just download it, unzip int and open the HTML file with a browser.
Not sure it is the best tool for ES heavy users. Yet, it is really practical to whoever is in a hurry to see the entries.
Kibana is also a good solution. It is a data visualization platform for Elastic.If installed it runs by default on port 5601.
Out of the many things it provides. It has "Dev Tools" where we can do your debugging.
For example you can check your available indexes here using the command
GET /_cat/indices
If you are using Google Chrome then you can simply use this extension named as Sense it is also a tool if you use Marvel.
https://chrome.google.com/webstore/detail/sense-beta/lhjgkmllcaadmopgmanpapmpjgmfcfig
Following #JanKlimo example, on terminal all you have to do is:
to see all the Index:
$ curl -XGET 'http://127.0.0.1:9200/_cat/indices?v'
to see content of Index products_development_20160517164519304:
$ curl -XGET 'http://127.0.0.1:9200/products_development_20160517164519304/_search?pretty=1'

Storing graph-like structure in Couch DB or do include_docs yourself

I am trying to store network layout in Couch DB, but my solution provides rather randomized graph.
I store a nodes with a document:
{_id ,
nodeName,
group}
and storing links in traditional:
{_id, source_id, target_id, value}
Following multiple tutorials on handling joins and multiple relationship in Couch DB I created view:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit("source", {'_id': doc.source_id});
if(doc.target_id)
emit("target", {'_id': doc.target_id});
}
}
which should have emitted sequence of source and target id, then I pass it to the list function with include_docs=true, assumes that source and target come in pairs stitches everything back in a structure like this:
{
"nodes":[
{"nodeName":"Name 1","group":"1"},
{"nodeName":"Name 2","group":"1"},
],
"links": [
{"source":7,"target":0,"value":1},
{"source":7,"target":5,"value":1}
]
}
Although my list produce a proper JSON, view map returns number of rows of source docs and then target docs.
So far I don't have any ideas how to make this thing working properly - I am happy to fetch additional values from document _id in the list, but so far I havn't find any good examples.
Alternative ways of achieving the same goal are welcome. _id values are standard for CouchDB so far.
Update: while writing a question I came up with different view which sorted my immediate problem, but I still would like to see other options.
updated map:
function(doc) {
if(doc.type == 'connection') {
if (doc.source_id)
emit([doc._id,0,"source"], {'_id': doc.source_id});
if(doc.target_id)
emit([doc._id,1,"target"], {'_id': doc.target_id});
}
}
Your updated map function makes more sense. However, you don't need 0 and 1 in your key since you have already "source"and "target".

Resources