I am wanting to write some ruby to iterate through documents in a collection in a MongoDB.
My data has the schema:
"_id" : ObjectId("560ff830eeb4db07875b59b9"),
"userId" : NumberInt(1),
"movieId" : NumberInt(50),
"rating" : 4.0,
"timestamp" : NumberInt(1329753504)
I firstly want to count each time userId = 1 is present in the whole collection, and if less than 5 discard them all.
I'm really unsure how to tackle this, so any advice would be great.
You would need to count the number of documents which have userId = 1 through the count() method. Thus from the shell (command-line), you can do the following:
var count = db.collection.find({ "userId": 1 }).count();
if (count < 5) db.collection.remove()
You'll then have to do something similar with Ruby, but it should be pretty straightforward. Refer to the documentation on the Ruby driver for this:
Get a count of matching documents in the collection.
Remove documents from the collection
Related
I want to query my data such that I get the objects whose timeStamps are in the range of my desired timeStamps.
My JSON data is:
"-KHbFGKIyefgWKEmlkY1" : {
"createdAt" : 1463094941,
"customer" : "user1",
"professional" : "professional1",
"service" : "service2",
"startsAt" : 1470070000,
"status" : "WaitingForApproval"
}
My code to make the query:
let minimumDesiredTimeStamp = 1470000000
let maximumDesiredTimeStamp = 1480000000
ref.queryOrderedByChild("startsAt")
.queryStartingAtValue(minimumDesiredTimeStamp)
.queryEndingAtValue(maximumDesiredTimeStamp)
.observeSingleEventOfType(.Value, withBlock: { (snapshotOfChildrenWhichStartsAtTimeStampRange) in
})
The problem is that query never jumps in the closure.
If it's not possible with current Firebase , can you suggest a way on how to achieve this?
NOTE: For now, I've solved the problem by querying with only combining queryOrderedByChild and queryStartingAtValue and then filtering the objects whose start time exceeds my range, with the help of Swift arrays' filter method.
NOTE2: When I tried the code again with hard coded values, I realised that the query really works. So it seems I've misidentified the problem and something else causing the query not jumping in the closure.
So remember: querying child values in combination of queryOrderedByChild.queryStartingAtValue.queryEndingAtValue works!
I've read a lot of posts about finding the highest-valued objects in arrays using max and max_by, but my situation is another level deeper, and I can't find any references on how to do it.
I have an experimental Rails app in which I am attempting to convert a legacy .NET/SQL application. The (simplified) model looks like Overlay -> Calibration <- Parameter. In a single data set, I will have, say, 20K Calibrations, but about 3,000-4,000 of these are versioned duplicates by Parameter name, and I need only the highest-versioned Parameter by each name. Further complicating matters is that the version lives on the Overlay. (I know this seems crazy, but this models our reality.)
In pure SQL, we add the following to a query to create a virtual table:
n = ROW_NUMBER() OVER (PARTITION BY Parameters.Designation ORDER BY Overlays.Version DESC)
And then select the entries where n = 1.
I can order the array like this:
ordered_calibrations = mainline_calibrations.sort do |e, f|
[f.parameter.Designation, f.overlay.Version] <=> [e.parameter.Designation, e.overlay.Version] || 1
end
I get this kind of result:
C_SCR_trc_NH3SensCln_SCRT1_Thd 160
C_SCR_trc_NH3SensCln_SCRT1_Thd 87
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 310
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 160
C_SCR_trc_NH3Sen_DewPtHiThd_Tbl 87
So I'm wondering if there is a way, using Ruby's Enumerable built-in methods, to loop over the sorted array, and only return the highest-versioned elements per name. HUGE bonus points if I could feed an integer to this method's block, and only return the highest-versioned elements UP TO that version number ("160" would return just the second and fourth entries, above).
The alternative to this is that I could somehow implement the ROW_NUMBER() OVER in ActiveRecord, but that seems much more difficult to try. And, of course, I could write code to deal with this, but I'm quite certain it would be orders of magnitude slower than figuring out the right Enumerable function, if it exists.
(Also, to be clear, it's trivial to do .find_by_sql() and create the same result set as in the legacy application -- it's even fast -- but I'm trying to drag all the related objects along for the ride, which you really can't do with that method.)
I'm not convinced that doing this in the database isn't a better option, but since I'm unfamiliar with SQL Server I'll give you a Ruby answer.
I'm assuming that when you say "Parameter name" you're talking about the Parameters.Designation column, since that's the one in your examples.
One straightforward way you can do this is with Enumerable#slice_when, which is available in Ruby 2.2+. slice_when is good when you want to slice an array "between" values that are different in some way. For example:
[ { id: 1, name: "foo" }, { id: 2, name: "foo" }, { id: 3, name: "bar" } ]
.slice_when {|a,b| a[:name] != b[:name] }
# => [ [ { id: 1, name: "foo" }, { id: 2, name: "foo" } ],
# [ { id: 3, name: "bar" } ]
# ]
You've already sorted your collection, so to slice it you just need to do this:
calibrations_by_designation = ordered_calibrations.slice_when do |a, b|
a.parameter.Designation != b.parameter.Designation
end
Now calibrations_by_designation is an array of arrays, each of which is sorted from greatest Overlay.Version to least. The final step, then, is to get the first element in each of those arrays:
highest_version_calibrations = calibrations_by_designation.map(&:first)
Using Usage.group(:song_id), I can get all the usages of any particular song in my app. Using Usage.group(:song_id).count, I can get a hash like {song_id => usage_count ...}.
How do I produce a count of that count though? i.e something like this:
[
used_once: number_of_songs_used_once,
used_twice: number_of_songs_used_twice,
used_thrice: number_of_songs_used_thrice,
etc.
]
(Okay, so really I would expect output to look something like {1=>14, 2=>6, 3=>2, 4=>1}).
You can use inject on the hash.
song_values = Usage.group(:song_id).count.values
times_used = song_values.inject({}) do |used, count|
if used[count].nil?
used[count] = 1
else
used[count] += 1
end
used
end
You could use a ternary operator if you want the if/else to be one line
used[count].nil? ? used[count] = 1 : used[count] += 1
This is just looping over the songs usage count building a hash where the key is the number of times that song was used, updating the value for each count (not explained that particularly well but I hope you understand).
So my Rails app using elasticsearch (with searchkick), is working just fine using the _geo_distance ordering function, however I need to do a more complex ordering that includes location AND an attempt to promote a business name exact string match.
For example, if I make a query and there are 10 ascending distance returned results, but the #5 result is also an exact string match on the business name in the record, I would like to promote/elevate that to the #1 position (basically overriding the distance sorting for that record).
There are two ways I can see to try to solve this issue, but I am running into issues with both.
First, would be to do this on the initial query, so that elasticsearch handles the work.
Second, would be to do some type of post-process re-sort on the result returned by elasticsearch to look for an exact match and re-order if needed.
The issue with the first method is that the built in scoring mechanisms seem to shift completely to distance when invoking _geo_distance, leaving me to wonder how to mix custom scoring functions with location.
And the issue with the second method is that the search results returned are a custom type of SearchKick object that does not seem to like normal array or hash sorting mechanisms for a post-process.
Is there a way to do something pre- or post- query to promote a document in the results in this manner?
Thanks.
In fact, there are many ways to "control" the scoring. Before indexing, if you already some document is meant to get high score/boost. You can give high score for the special document before indexing, please reference here.
If you cannot determine the boost before the indexing, you can boost it in the query command. About the boosting query, there are also many options and it's dependent on what kind query you used.
For query string query:
You can boost some fields, such as fields" : ["content", "name.*^5"], or boost some query command such as, quick^2 fox(this might work for you, just extra boost the name).
For others:
You can give boost for term query, such as boosting the "ivan" case:
"term" : {"name" : {"value" : "ivan","boost" : 10.0}}
you can wrap it into bool query and boost the desired case. ex. find all 'ivan', boost 'ji' on name field.
{ "query" : { "bool" : { "must": [{"match":{"name":"ivan"}}],
"should" : [ { "term" : { "name": { "value" : "ji", "boost" : 10 }}}]}}}
Except for term query, there are a lot of queries that support boost, such as prefix query, match query. You can use it under situations. Here are some official examples: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_boosting_query_clauses.html
Boosting might not easy for controlling score, because it needs normalization. You can specify the score using the function_score query to specify the direct score: It's really a useful query if you need more directly control.
In short, you can wrap your query in bool and add some boost for the name matching, as follow:
{ "query" : {
"bool" : {
"must": [
{"filtered" : {
"filter" : {
"geo_distance" : {
"distance" : "2000km",
"loc" : {
"lat" : 10,
"lon" : 10
}
}
}
}}],
"should" : [ { "term" : { "name": { "value" : "ivan", "boost" : 10 }}}]}},
"sort" : [
"_score",
{
"_geo_distance" : {
"loc" : [10, 10],
"order" : "asc",
"unit" : "km",
"mode" : "min",
"distance_type" : "sloppy_arc"
}
}
]
}
For more detailed, you can check my gist https://gist.github.com/hxuanji/e5acd9a5174ea10c08b8. I boost the "ivan" name. In the result, the "ivan" document becomes first rather than the (10,10) document.
Does the SC.Gridview support grouping? If so, can someone give me some pointers how to get started?
I'm trying to build gridview of tiles separated into logical groups. My underlying model is similar to the following:
TestApp.personModel.FIXTURES = [
{
"name" : "Bob",
"group" : "group1"
},
{
"name" : "Alice",
"group" : "group1"
},
{
"name" : "Tom",
"group" : "group2"
}
];
So, for example, I'd like Bob and Alice tiles to be in 1 group and Tom to be a separate group.
I don't want to use the SC.ListView because each item is going to be arbitrarily complex (i.e., not just a name).
Thanks in advance.
As long as you create a controller that has a list of each group (so an SC.ArrayController) that has a list of the personModel objects releated to each group (fire off a query that will group your results per group), the SC.GridView is able to display each group just as you would like. I would recommend to have an "ItemView" that defines how each item in the grid is rendered. This ItemView is linked up with the GridView via the exampleView property.
Have a look at the following code for the EurekaJ application on how the GridView is used to display a list of charts.
https://github.com/joachimhs/EurekaJ/blob/master/EurekaJ.View/apps/EurekaJView/views/chart/chart_grid.js