I've got a collection of complex documents.
{
<some fields>,
meta_info: {
company_name: String,
invalid: Boolean,
mobile_layout: String,
<more fields>
}
<lots more fields>
}
I ask Rails to find me all those documents where meta_info.invalid is true/false/nil using
finished_acts.where('meta_info.invalid' => true/false/nil)
Now there is ONE document where the field does not exist. I ask...
finished_acts.find('meta_info.invalid' => {'$exists' => false})
=> nil
which is simply untrue (plus it also yields nil if I ask {'$exists' => true}), and so is
finished_acts.where('meta_info.invalid' => {'$exists' => false}).count
=> 0
How can I find this document? I've spent days with a collection of statistical data which was always off by one count when compared to the info given me by the database itself, and this nonexistent field was the reason.
I am using mongoDB V3.4.17 and Mongoid 6.1.0.
EDIT: I've since learned that I used the .find command incorrectly, it is only intended to be used for the _id field and does not accept JSON/Hashes.
My problem obviously shows a bug in the implementation of the Active Record adaptation of Mongoid, and I am slowly converting my code to always use aggregations. When doing so, I get the correct number of documents. Of course, the structure returned by aggregations is more complex to handle since it is only hashes and arrays, but if that's the trade-off for getting correct results I am happy with it.
Don't rely on the ActiveRecord adaptation in Mongoid; use the aggregation pipeline. This will (to the best of my knowledge) always return correct results since it simply pushes the argument hash into mongo.
The aggregation pipeline at first seems unintuitive to use, but after you get to know how to use it it is a very powerful tool if you want to make complex queries. The Rails syntax is as follows:
MyClassName.collection.aggregate( <array> )
where the <array> contains hashes, each of which is a command used on the result of the execution of the preceding hash. The documentation can be found at https://docs.mongodb.com/manual/aggregation/.
To convert the commands there to Ruby it is only required to surround the entries by quotes.
To give an example: The following is the mongo syntax:
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])
This takes all documents from the orders collection and selects all those where the status field matches "A" (the word $match left of the colon is a mongo command). Those then get grouped ($group) by the cust_id field and the sum ($sum) of the contents of the amount field is computed.
If you want to convert this to Ruby, you change it into
Orders.collection.aggregate([
{ '$match': {'status': 'A'}},
{ '$group': {'_id': '$cust_id', 'total': {'$sum': '$amount'}}}
])
This worked for me, and what's even better is that it takes significantly less time than using Orders.where(...) in Rails and doing the computation in Rails.
The trade-off is that you don't get returned Ruby objects but hashes of arrays of hashes.
Related
I have a web page where a user can search through documents in a mongoDB collection.
I get the user's input through #q = params[:search].to_s
I then run a mongoid query:
#story = Story.any_of( { :Tags => /#{#q}/i}, {:Name => /#{#q}/i}, {:Genre => {/#{#q}/i}} )
This works fine if the user looks for something like 'humor' 'romantic comedy' or 'mystery'. But if looking for 'romance fiction', nothing comes up. Basically I'd like to add 'and' 'or' functionality to my search so that it will find documents in the database that are related to all strings that a user types into the input field.
How can this be done while still maintaining the substring search capabilties I currently have?Thanks in advance for help!
UPDATE:
Per Eugene's comment below...
I tried converting to case insensitive with #q.map! { |x| x="/#{x}/i"}. It does save it properly as ["/romantic/i","/comedy/i"]. But the query Story.any_of({:Tags.in => #q}, {:Story.in => #q})finds nothing.
When I change the array to be ["Romantic","Comedy"]. Then it does.
How can I properly make it case insensitive?
Final:
Removing the quotes worked.
However there is now no way to use an .and() search to find a book that has both words in all these fields.
to create an OR statement, you can convert the string into an array of strings, and then convert the array of strings into an array of regex and then use the '$in' option. So first, pick a delimeter - perhaps commas or space or you can set up a custom like ||. Let's say you do comma seperated. When user enters:
romantic, comedy
you split that into ['romantic', 'comedy'], then convert that to [/romantic/i, /comedy/i] then do
#story = Story.any_of( { :Tags.in => [/romantic/i, /comedy/i]}....
To create an AND query, it can get a little more complicated. There is an elemMatch function you could use.
I don't think you could do {:Tags => /romantic/i, :Tags => /comedy/i }
So my best thought would be to do sequential queries, even though there would be a performance hit, but if your DB isn't that big, it shouldn't be a big issue. So if you want Romantic AND Comedy you can do
query 1: find all collections that match /romantic/i
query 2: take results of query 1, find all collections that match /comedy/i
And so on by iterating through your array of selectors.
I am writing specs for a method 'scores' a string of text according to a fairly complex set of rules having to do with a large set of various combinations of keywords.
My test set of strings is 50,000 strings. my_method_being_tested("some test string") produce a score with 3 elements [boolean, boolean, integer].
I have a tally of 50,000 inputs & expected outputs, something like this
test_set = [ {"test string one" => [true, false, 0] } , { "test string 2" => [false, false, 10] } , ... ]
What is the best way to store/manage a 50,000 element test set when using Rspec, so I can loop thru the array something like:
test_set.each do | a_set |
my_method_being_tested(a_set.key).should == a_set.value
end
There is no underlying ActiveRecord Model for the method in my app so I cannot simply store a fixture and load it into an activerecord table (unless perhaps it makes sense somehow to create an activerecord-less model of some kind and load a fixture into that?
StackOverflow isn't really set up for answering "what is best" questions, but your basic approach is fine and you just need to make sure that your data structure and access mechanisms match up. Given the code snippets you showed, I would suggest reading up on Ruby hashes and structs.
At a meta level you'd have:
test_set = my_test_setup
test_set.each do |pair|
expect(my_method_being_tested(my_key_accessor(pair)).to match_array(my_value_accessor(pair))
end
If you want to keep test_set as as is, then you can change your test loop to be:
test_set.each do |pair|
expect(my_method_being_tested(pair.keys.first)).to match_array(pair.values.first)
end
If you want to keep your test loop as is, you can change our test_setup to be:
TestPair = Struct.new(:key, :value)
test_set = [
TestPair.new("test string one", [true, false, 0]),
TestPair.new("test string 2", [false, false, 10]),
... ]
Note that I'm using the new "expect" syntax rather than the deprecated "should" syntax, but that's a separate issue.
UPDATE: As for storing key/value pairs in a file, there are myriad options as well. YAML, as you note in your comment is fine, and you can combine it with DBM, letting you do something like:
require 'yaml/dbm'
YAML::DBM.load('your_yaml_file.yml').each do |key, value|
expect(my_method_being_tested(key)).to match_array(value)
end
That, of course, assumes that you've stored your key/value pairs in the YAML+DBM file in the first place, which gets you back to creating some Ruby to represent the key/value pairs.
A set of 50k items isn't that big to keep in memory, but if you're really concerned about doing the reading and testing of each pair incrementally, you can always read a line at a time from a file. But that still begs the question of what to store in that line (e.g. JSON) and formatting it in the first place.
For various reasons, I'm creating an app that takes a SQL query string as a URL parameter and passes it off to Postgres(similar to the CartDB SQL API, and CFPB's Qu). Rails then renders a JSON response of the results that come from Postgres.
Snippet from my controller:
#table = ActiveRecord::Base.connection.execute(#query)
render json: #table
This works fine. But when I use Postgres JSON functions (row_to_json, json_agg), it renders the nested JSON property as a string. For example, the following query:
query?q=SELECT max(municipal) AS series, json_agg(row_to_json((SELECT r FROM (SELECT sch_yr,grade_1 AS value ) r WHERE grade_1 IS NOT NULL))ORDER BY sch_yr ASC) AS values FROM ed_enroll WHERE grade_1 IS NOT NULL GROUP BY municipal
returns:
{
series: "Abington",
values: "[{"sch_yr":"2005-06","value":180}, {"sch_yr":"2005-06","value":180}, {"sch_yr":"2006-07","value":198}, {"sch_yr":"2006-07","value":198}, {"sch_yr":"2007-08","value":158}, {"sch_yr":"2007-08","value":158}, {"sch_yr":"2008-09","value":167}, {"sch_yr":"2008-09","value":167}, {"sch_yr":"2009-10","value":170}, {"sch_yr":"2009-10","value":170}, {"sch_yr":"2010-11","value":153}, {"sch_yr":"2010-11","value":153}, {"sch_yr":"2011-12","value":167}, {"sch_yr":"2011-12","value":167}]"
},
{
series: "Acton",
values: "[{"sch_yr":"2005-06","value":353}, {"sch_yr":"2005-06","value":353}, {"sch_yr":"2006-07","value":316}, {"sch_yr":"2006-07","value":316}, {"sch_yr":"2007-08","value":323}, {"sch_yr":"2007-08","value":323}, {"sch_yr":"2008-09","value":327}, {"sch_yr":"2008-09","value":327}, {"sch_yr":"2009-10","value":336}, {"sch_yr":"2009-10","value":336}, {"sch_yr":"2010-11","value":351}, {"sch_yr":"2010-11","value":351}, {"sch_yr":"2011-12","value":341}, {"sch_yr":"2011-12","value":341}]"
}
So, it only partially renders the JSON, running into problems when I have nested JSON arrays created with the Postgres functions in the query.
I'm not sure where to start with this problem. Any ideas? I am sure this is a problem with Rails.
ActiveRecord::Base.connection.execute doesn't know how to unpack database types into Ruby types so everything – numbers, booleans, JSON, everything – you get back from it will be a string. If you want sensible JSON to come out of your controller, you'll have to convert the data in #table to Ruby types by hand and then convert the Ruby-ified data to JSON in the usual fashion.
Your #table will actually be a PG::Result instance and those have methods such as ftype (get a column type) and fmod (get a type modifier for a column) that can help you figure out what sort of data is in each column in a PG::Result. You'd probably ask the PG::Result for the type and modifier for each column and then hand those to the format_type PostgreSQL function to get some intelligible type strings; then you'd map those type strings to conversion methods and use that mapping to unpack the strings you get back. If you dig around inside the ActiveRecord source, you'll see AR doing similar things. The AR source code is not for the faint hearted though, sorry but this is par for the course when you step outside the narrow confines of how AR things you should interact with databases.
You might want to rethink your "sling hunks of SQL around" approach. You'll probably have an easier time of things (and be able to whitelist when the queries do) if you can figure out a way to build the SQL yourself.
The PG::Result class (the type of #table), utilizes TypeMaps for type casts of result values to ruby objects. For your example, you could use PG::TypeMapByColumn as follows:
#table = ActiveRecord::Base.connection.execute(#query)
#table.type_map = PG::TypeMapByColumn.new [nil, PG::TextDecoder::JSON.new]
render json: #table
A more generic approach would be to use the PG::TypeMapByOid TypeMap class. This requires you to provide OIDs for each PG attribute type. A list of these can be found in pg_type.dat.
tm = PG::TypeMapByOid.new
tm.add_coder PG::TextDecoder::Integer.new oid: 23
tm.add_coder PG::TextDecoder::Boolean.new oid: 16
tm.add_coder PG::TextDecoder::JSON.new oid: 114
#table.type_map = tm
I have a Rails model that has a field array_field, which is a serialized text array. I want the combination of this array value and the value of another_field to be unique.
Should be straightforward, no?
class Foo < ActiveRecord::Base
validates_uniqueness_of :array_field, scope: [:another_field]
serialize :filters, Array
end
This doesn't work. However, if I switch them around in the validations,
validates_uniqueness_of :another_field, scope: [:array_field] works as expected.
Can someone explain why this is the case? Is this expected behavior?
The Postgres error for the former setup when array_field's value is nil or [] is this:
PG::SyntaxError: ERROR: syntax error at or near ")"
LINE 1: ...other_field" = 103 AND "foo"."array_field" = ) LIMIT 1
When array_field is [[1, 2], [3, 4, 5]] (a sample multiarray I was using), it's:
PG::UndefinedFunction: ERROR: operator does not exist: text = integer
LINE 1: ...other_field" = 103 AND "foo"."array_field" = 1, 2, 3, 4, 5) LIMIT 1
It seems that Rails doesn't know how to translate the serialized object for this query. Am I missing something or is this a bug?
Edit: This is occurring in Rails 4.0.2.
Second Edit:
Clarification: I understand why this is happening (Rails has custom logic for list queries), and I'm using both a custom validator to manually perform the serialization before validating and a custom serializer to avoid problems with comparisons of Yaml strings (as detailed in my other question here).
At this point I'm mostly just wondering why validates_uniqueness_of treats the primary field differently from the scope fields, and am hoping someone can shed some light.
I can't explain why the validations work one way around, but not the other.
But I think basically your problems are due to the fact that serialize only defines that an attribute is to be serialized using Yaml on save and deserialized upon load.
In other words: the only thing you say by doing serialize :filters, Array is that
when saving a Foo, serialize it's filters attribute using Yaml first,
when loading a Foo from the DB, make sure that the value of the
filters attribute is an Array after deserialization, otherwise raise an exception
It does not affect how queries are constructed. Instead, Rails' usual rules for queries are used. So an array is converted into a comma separated list of numbers. This makes sense for example when constructing a LIKE query. This is the reason why the query fails. The DB field is a string but you're trying to compare it to a list.
I haven't used native PostgreSQL array columns with Rails 4, but my guess is that these issues would solved if you used those instead a serialization-type solution. You get the added benefit of being able to search within the contents of arrays on the DB level.
I have the following json
Suppose my selection in mobile then this fields will be generated
{"Style":"convertible","Year":"2010","Color":"green"}
{"Style":"convertible","Year":"2010","Color":"red"}
And if my selection is bike then this field will be generatd
{"model":"2012","mileage":"20kmph","Color":"red"}
How do i achieve the above result.
Edit-1
I have the form in which some of the fields with be auto generated based on category selection. I have converted the auto generated fields to json and stored in database as single column.
Image url
I don't know how to explain can you understand what I am looking for. Check out my screenshots for better understanding
I'm assuming (for some crazy reason) that you will be using Ruby to do this.
But first, your expected output is wrong because you can't have a hash with duplicate keys:
{"Color": "green", "Color": "red"}
...is impossible. Same goes for the "Year" keys. Think of keys within a hash as Highlanders. THERE CAN ONLY BE ONE (of the same name). Therefore, your actual expected output would be:
{"Style":"convertible", "Year":"2012", "Color":"red", "name":"test"}
Or whatever. Anyway...
Step 1: Convert JSON to a Ruby Hash
require 'json'
converted = JSON.parse '[{"Style":"convertible","Year":"2010","Color":"green"},
{"Style":"convertible","Year":"2010","Color":"red"},
{"name":"test","Year":"2012","Color":"red"}]'
Step 2: Merge them
merged = {}
converted.each { |c| merged.merge! c }
Now the merged variable should look like the above actual expected output.
The only problem left is deciding which duplicate keys override which other duplicate keys. What matters here is the order in which you merge the hashes. The ones merged last overrides any existing duplicate key/values. Hope that helps.