I'm trying to translate aggregation from the MongoDB shell to ruby code that uses Mongoid as ODM.
I have some documents like this (very simplified example):
{
"name": "Foo",
"tags": ["tag1", "tag2", "tagN"]
},
{
"name": "Bar",
"tags": ["tagA", "tag2"]
},
...
Now I'd like to get all documents with the name field and the total number of tags for each.
In the MongoDB shell I can achieve it using aggregation framework like this:
db.documents.aggregate(
{$project: {name: 1, tags_count: {$size: $tags}}
)
And it will return:
[{"name": "Foo", "tags_count": 3},
{"name": "Bar", "tags_count": 2}]
Now the frustrating part, I'm trying to implement the same query inside a rails app using Mongoid as ODM.
The code looks like (using rails console):
Document.collection.aggregate(
[
{'$project': {name: 1, tags_count: {'$size': '$tags'}}}
]
).to_a
And it returns the next error:
Mongo::Error::OperationFailure: The argument to $size must be an Array, but was of type: EOO (17124)
My question is: How can I make Mongoid understand that $tags makes reference to the correct field? Or what I'm missing from the code?
Thanks
It looks like there is data which does not consistently have an array in the field. For this you can use $ifNull to place an empty array where none is found and thus return the $size as 0:
Document.collection.aggregate(
[
{'$project': {name: 1, tags_count: {'$size': { '$ifNull': [ '$tags', [] ] } } }}
]
).to_a
Alternately you could simply skip where the field is not present at all using $exists:
Document.collection.aggregate(
[
{'$match': { 'tags_count': { '$exists': true } } },
{'$project': {name: 1, tags_count: {'$size': '$tags'}}}
]
).to_a
But of course that will filter those documents from the selection, which may or may not be the desired effect.
Related
I am using Sync trying to parse some JSON to Core Data.
My "Creature" entity has a parent-children relationship that looks like this:
and the JSON has a format similar to this:
[
{
"id": 1,
"name": "Mad king",
"parent": null,
"children": [
5
]
},
{
"id": 2,
"name": "Drogon",
"parent": 5,
"children": []
},
{
"id": 3,
"name": "Rhaegal",
"parent": 5,
"children": []
},
{
"id": 4,
"name": "Viserion",
"parent": 5,
"children": []
},
{
"id": 5,
"name": "Daenerys",
"parent": 1,
"children": [
2,
3,
4
]
}
]
The Mad king has one child Daenerys who has 3 children (Drogon, Rhaegal and Viserion).
Now, I know that Sync does support this sort of setup (where the JSON contains only the ids of parents/children instead of whole objects) and I suspect I have to parse the file twice - one for just getting all the objects and the second to create the relationships among them. For the second to work, I need to rename children to children_ids and parent to parent_id (as described in their README).
However I can't understand how exactly would I do that. Is it possible to ignore the parent/children during the first pass and then take them into account (using the modified keys) during the second?
Or could someone maybe propose a better solution that would (ideally) require just one pass?
According to the documentation:
For example, in the one-to-many example, you have a user, that has
many notes. If you already have synced all the notes then your JSON
would only need the notes_ids, this can be an array of strings or
integers. As a side-note only do this if you are 100% sure that all
the required items (notes) have been synced, otherwise this
relationships will get ignored and an error will be logged.
So you can, in theory, just blindly perform a full sync to actually get all the models(letting it fail on the relationships), and then sync again immediately after to get the relationships.
If you want to avoid the errors, you might want to write some helper functions to create 2 sets of JSON for these models, 1 to define the objects, and then a second to define the relationships. Either way, you'd need to do 2 passes.
I have a rails object, for example :
Book {id:1, name: "name1"}
Book {id:2, name: "name2"}
I use the DataTables and the best ajax data is get the object as an string array, without the columns name.
How can I change the Book.all to array of data without columns names
The expected results is like this
Thanks
{
"data": [
[
"Tiger Nixon",
"System Architect",
"Edinburgh",
"5421",
"2011/04/25",
"$320,800"
],
[
"Garrett Winters",
"Accountant",
"Tokyo",
"8422",
"2011/07/25",
"$170,750"
],
[
"Ashton Cox",
"Junior Technical Author",
"San Francisco",
"1562",
"2009/01/12",
"$86,000"
]
]
}
Use pluck method like below, then you can get array of values.
Book.pluck(:title, :author)
To implement the DataTables in Rails.
Use this gem jquery-datatables-rails Click Here
You have to get the data from the controller as rails object. And populate table your in the view. Finally trigger the DataTable function on that table like this
`
$('#yourTable').DataTable({
// ajax: ...,
// autoWidth: false,
// pagingType: 'full_numbers',
// processing: true,
// serverSide: true,
// Optional, if you want full pagination controls.
// Check dataTables documentation to learn more about available options.
// http://datatables.net/reference/option/pagingType
});
No need of conversion into String or JSON
I am building an app that give users the ability to construct there own graphs. I have been using parameters for all queries and creates. But when I want to give users the ability to create a node where they can also Label it anything they want(respecting neo4j restrictions on empty string labels). How would I parameterize this type of transaction?
I tried this:
.CREATE("(a:{dynamicLabel})").WithParams(new {dynamicLabel = dlabel})...
But this yields a syntax error with neo. I am tempted to concatenate, but am worried that this may expose an injection risk to my application.
I am tempted to build up my-own class that reads the intended string and rejects any type of neo syntax, but this would limit my users a bit and I would rather not.
There is an open neo4j issue 4334, which is a feature request for adding the ability to parameterize labels during CREATE.So, this is not yet possible.
That issue contains a comment that suggests generating CREATE statements with hardcoded labels, which will work. It is, unfortunately, not as performant as using parameters (should it ever be supported in this case).
I searched like hell and finally found it out.
you can do it like that:
// create or update nodes with dynamic label from import data
WITH "file:///query.json" AS url
call apoc.load.json(url) YIELD value as u
UNWIND u.cis as ci
CALL apoc.merge.node([ ci.label ], {Id:ci.Id}, {}, {}) YIELD node
RETURN node;
The JSON looks like that:
{
"cis": [
{
"label": "Computer",
"Id": "1"
},
{
"label": "Service",
"Id": "2"
},
{
"label": "Person",
"Id": "3"
}
],
"relations": [
{
"end1Id": "1",
"Id": "4",
"end2Id": "2",
"label": "USES"
},
{
"end1Id": "3",
"Id": "5",
"end2Id": "1",
"label": "MANAGED_BY"
}
]
}
If you are using a Java client, then you can do it like this.
Node node = GraphDatabaseService.createNode();
Label label = new Label() {
#Override
public String name() {
return dynamicLabelVal;
}
};
node.addLabel(label);
You can then have a LabelCache which will avoid Label object creation for every node.
I have a Rails application with term => definitions stored in nodes on Neo4j that I want my users to search using elastic search. Through usage we've found they far more commonly want to find the term name first before they want to search the description. But I'm having trouble finding the function that returns results for a certain field first over other fields.
[
{
"id": 1,
"data": {
"name": "Foo",
"description": "Something super awesome."
}
},
{
"id": 2,
"data": {
"name": "Bar",
"description": "Something that depends on Foo"
}
}
]
search for "Foo"
Because both terms contain the word Foo in either name or description, my app returns both in alphabetical order and since Bar is alphabetically before Foo, Bar appears first. This can get very tiring when my users search for a common term used in many other terms.
How do I return results from the name field first followed by the secondary results in the description?
I have a feeling this has more to do with neo4j than elastic search
Its possible by Adding term and fields frequency value to your type mapping. http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scoring-theory.html
name": {
"type": "string",
"store": true,
"norms": {
"enabled": false
},
"index_options": "docs"
}
let me known any queries.
I have a Solr index of about 5 million documents at 8GB using Solr 4.7.0. I require grouping in Solr, but find it to be too slow. Here is the group configuration:
group=on
group.facet=on
group.field=workId
group.ngroups=on
The machine has ample memory at 24GB and 4GB is allocated to Solr itself. Queries are generally taking about 1200ms compared to 90ms when grouping is turned off.
I ran across a plugin called CollapsingQParserPlugin which uses a filter query to remove all but one of a group.
fq={!collapse field=workId}
It's designed for indexes that have a lot of unique groups. I have about 3.8 million. This approach is much much faster at about 120ms. It's a beautiful solution for me except for one thing. Because it filters out other members of the group, only facets from the representative document are counted. For instance, if I have the following three documents:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book"
},
{
"id": "2",
"workId": "abc",
"type": "ebook"
},
{
"id": "3",
"workId": "abc",
"type": "ebook"
}
]
once collapsed, only the top one shows up in the results. Because the other two get filtered out, the facet counts look like
"type": ["book":1]
instead of
"type": ["book":1, "ebook":1]
Is there a way to get group.facet counts using the collapse filter query?
According to Yonik Seeley, the correct group facet counts can be gathered using the JSON Facet API. His comments can be found at:
https://issues.apache.org/jira/browse/SOLR-7036?focusedCommentId=15601789&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15601789
I tested out his method and it works great. I still use the CollapsingQParserPlugin to collapse the results, but I exclude the filter when counting up the facets like so:
fq={!tag=workId}{!collapse field=workId}
json.facet={
type: {
type: terms,
field: type,
facet: {
workCount: "unique(workId)"
},
domain: {
excludeTags: [workId]
}
}
}
And the result:
{
"facets": {
"count": 3,
"type": {
"buckets": [
{
"val": "ebook",
"count": 2,
"workCount": 1
},
{
"val": "book",
"count": 1,
"workCount": 1
}
]
}
}
}
I was unable to find a way to do this with Solr or plugin configurations, so I developed a work around to effectively create group facet counts while still using the CollapsingQParserPlugin.
I do this by making a duplicate of the fields I'll be faceting on and making sure all facet values for the entire group are in each document like so:
"docs": [
{
"id": "1",
"workId": "abc",
"type": "book",
"facetType": [
"book",
"ebook"
]
},
{
"id": "2",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
},
{
"id": "3",
"workId": "abc",
"type": "ebook",
"facetType": [
"book",
"ebook"
]
}
]
When I ask Solr to generate facet counts, I use the new field:
facet.field=facetType
This ensures that all facet values are accounted for and that the counts represent groups. But when I use a filter query, I revert back to using the old field:
fq=type:book
This way the correct document is chosen to represent the group.
I know this is a dirty, complex way to make it work, but it does work and that's what I needed. Also it requires the ability to query your documents before insertion into Solr, which calls for some development. If anyone has a simpler solution I would still love to hear it.