Searching serialized data, using active record - ruby-on-rails

I'm trying to do a simple query of a serialized column, how do you do this?
serialize :mycode, Array
1.9.3p125 :026 > MyModel.find(104).mycode
MyModel Load (0.6ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`id` = 104 LIMIT 1
=> [43565, 43402]
1.9.3p125 :027 > MyModel.find_all_by_mycode("[43402]")
MyModel Load (0.7ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = '[43402]'
=> []
1.9.3p125 :028 > MyModel.find_all_by_mycode(43402)
MyModel Load (1.2ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` = 43402
=> []
1.9.3p125 :029 > MyModel.find_all_by_mycode([43565, 43402])
MyModel Load (1.1ms) SELECT `mymodels`.* FROM `mymodels` WHERE `mymodels`.`mycode` IN (43565, 43402)
=> []

It's just a trick to not slow your application. You have to use .to_yaml.
exact result:
MyModel.where("mycode = ?", [43565, 43402].to_yaml)
#=> [#<MyModel id:...]
Tested only for MySQL.

Basically, you can't. The downside of #serialize is that you're bypassing your database's native abstractions. You're pretty much limited to loading and saving the data.
That said, one very good way to slow your application to a crawl could be:
MyModel.all.select { |m| m.mycode.include? 43402 }
Moral of the story: don't use #serialize for any data you need to query on.

Serialized array is stored in database in particular fashion eg:
[1, 2, 3, 4]
in
1\n 2\n 3\n etc
hence the query would be
MyModel.where("mycode like ?", "% 2\n%")
put space between % and 2.

Noodl's answer is right, but not entirely correct.
It really depends on the database/ORM adapter you are using: for instance PostgreSQL can now store and search hashes/json - check out hstore. I remember reading that ActiveRecord adapter for PostgreSQl now handles it properly. And if you are using mongoid or something like that - then you are using unstructured data (i.e. json) on a database level everywhere.
However if you are using a db that can't really handle hashes - like MySQL / ActiveRecord combination - then the only reason you would use serialized field is for somet data that you can create / write in some background process and display / output on demand - the only two uses that I found in my experience are some reports ( like a stat field on a Product model - where I need to store some averages and medians for a product), and user options ( like their preferred template color -I really don't need to query on that) - however user information - like their subscription for a mailing list - needs to be searchable for email blasts.
PostgreSQL hstore ActiveRecord Example:
MyModel.where("mycode #> 'KEY=>\"#{VALUE}\"'")
UPDATE
As of 2017 both MariaDB and MySQL support JSON field types.

You can query the serialized column with a sql LIKE statement.
MyModel.where("mycode LIKE '%?%'", 43402)
This is quicker than using include?, however, you cannot use an array as the parameter.

Good news! If you're using PostgreSQL with hstore (which is super easy with Rails 4), you can now totally search serialized data. This is a handy guide, and here's the syntax documentation from PG.
In my case I have a dictionary stored as a hash in an hstore column called amenities. I want to check for a couple queried amenities that have a value of 1 in the hash, I just do
House.where("amenities #> 'wifi => 1' AND amenities #> 'pool => 1'")
Hooray for improvements!

There's a blog post from 2009 from FriendFeed that describes how to use serialized data within MySQL.
What you can do is create tables that function as indexes for any data that you want to search.
Create a model that contains the searchable values/fields
In your example, the models would look something like this:
class MyModel < ApplicationRecord
# id, name, other fields...
serialize :mycode, Array
end
class Item < ApplicationRecord
# id, value...
belongs_to :my_model
end
Creating an "index" table for searchable fields
When you save MyModel, you can do something like this to create the index:
Item.where(my_model: self).destroy
self.mycode.each do |mycode_item|
Item.create(my_model: self, value: mycode_item)
end
Querying and Searching
Then when you want to query and search just do:
Item.where(value: [43565, 43402]).all.map(&:my_model)
Item.where(value: 43402).all.map(&:my_model)
You can add a method to MyModel to make that simpler:
def find_by_mycode(value_or_values)
Item.where(value: value_or_values).all.map(&my_model)
end
MyModel.find_by_mycode([43565, 43402])
MyModel.find_by_mycode(43402)
To speed things up, you will want to create a SQL index for that table.

Using the following comments in this post
https://stackoverflow.com/a/14555151/936494
https://stackoverflow.com/a/15287674/936494
I was successfully able to query a serialized Hash in my model
class Model < ApplicationRecord
serialize :column_name, Hash
end
When column_name holds a Hash like
{ my_data: [ { data_type: 'MyType', data_id: 113 } ] }
we can query it in following manner
Model.where("column_name = ?", hash.to_yaml)
That generates a SQL query like
Model Load (0.3ms) SELECT "models".* FROM "models" WHERE (column_name = '---
:my_data:
- :data_type: MyType
:data_id: 113
')
In case anybody is interested in executing the generated query in SQL terminal it should work, however care should be taken that value is in exact format stored in DB. However there is another easy way I found at PostgreSQL newline character to use a raw string containing newline characters
select * from table_name where column_name = E'---\n:my_data:\n- :data_type: MyType\n :data_id: 113\n'
The most important part in above query is E.
Note: The database on which I executed above is PostgreSQL.

To search serialized list you need to prefix and postfix the data with unique characters.
Example:
Rather than something like:
2345,12345,1234567 which would cause issues you tried to search for 2345 instead, you do something like <2345>,<12345>,<1234567> and search for <2345> (the search query get's transformed) instead. Of course choice of prefix/postfix characters depends on the valid data that will be stored. You might instead use something like ||| if you expect < to be used and potentially| to be used. Of course that increases the data the field uses and could cause performance issues.
Using a trigrams index or something would avoid potential performance issues.
You can serialize it like data.map { |d| "<#{d}>" }.join(',') and deserialize it via data.gsub('<').gsub('>','').split(','). A serializer class would do the job quite well to load/extract tha data.
The way you do this is by setting the database field to text and using rail's serialize model method with a custom lib class. The lib class needs to implement two methods:
def self.dump(obj) # (returns string to be saved to database)
def self.load(text) # (returns object)
Example with duration. Extracted from the article so link rot wouldn't get it, please visit the article for more information. The example uses a single value, but it's fairly straightforward to serialize a list of values and deserialize the list using the methods mentioned above.
class Duration
# Used for `serialize` method in ActiveRecord
class << self
def load(duration)
self.new(duration || 0)
end
def dump(obj)
unless obj.is_a?(self)
raise ::ActiveRecord::SerializationTypeMismatch,
"Attribute was supposed to be a #{self}, but was a #{obj.class}. -- #{obj.inspect}"
end
obj.length
end
end
attr_accessor :minutes, :seconds
def initialize(duration)
#minutes = duration / 60
#seconds = duration % 60
end
def length
(minutes.to_i * 60) + seconds.to_i
end
end

If you have serialized json column and you want to apply like query on that. do it like that
YourModel.where("hashcolumn like ?", "%#{search}%")

Related

Rails ActiveRecord 4: correct way to write a greater than

I know that
Object.where('key > ?', value)
works.
But if the query happens to have several tables involved, with multiple key columns, it might break as the query produced is:
SELECT "tablename".* FROM "tablename" WHERE "tablename"."user_id" = $1 AND (key > 0) [["user_id", 29]]
A solution would be
Object.where('tablename.key > ?', value)
But ain't there an arel way to write this instead? My app has (enforced) weird table names, I'd rather not write them there and that they get added dynamically by active record.
Thanks
I'd personally still try to stay with AR on that one, and do something with a range and a hash query:
Object.where(tablename: { key: value..Float::INFINITY}) # If value is a number
Object.where(tablename: { key: value..DateTime::Infinity.new}) # If value is a DateTime
It's a bit verbose, but you can use arel to do this. For example
Object.where(Object.arel_table[:key].gt(123))
will select objects where key > 123.
If I was doing this, I would probably define some helper methods, perhaps something along the lines of
class Foo < ActiveRecord::Base
def self.column(name)
Foo.arel_table[name]
end
#now you can do
def self.some_method
Foo.where(column(:key).gt(123))
end
end
If you're querying from one object (one table) you can drop the table name in the where clause.
Object.where('key > ?', value)
Unfortunately that's the best way there is to do it.

ActiveRecord query array intersection?

I'm trying to figure out the count of certain types of articles. I have a very inefficient query:
Article.where(status: 'Finished').select{|x| x.tags & Article::EXPERT_TAGS}.size
In my quest to be a better programmer, I'm wondering how to make this a faster query. tags is an array of strings in Article, and Article::EXPERT_TAGS is another array of strings. I want to find the intersection of the arrays, and get the resulting record count.
EDIT: Article::EXPERT_TAGS and article.tags are defined as Mongo arrays. These arrays hold strings, and I believe they are serialized strings. For example: Article.first.tags = ["Guest Writer", "News Article", "Press Release"]. Unfortunately this is not set up properly as a separate table of Tags.
2nd EDIT: I'm using MongoDB, so actually it is using a MongoWrapper like MongoMapper or mongoid, not ActiveRecord. This is an error on my part, sorry! Because of this error, it screws up the analysis of this question. Thanks PinnyM for pointing out the error!
Since you are using MongoDB, you could also consider a MongoDB-specific solution (aggregation framework) for the array intersection, so that you could get the database to do all the work before fetching the final result.
See this SO thread How to check if an array field is a part of another array in MongoDB?
Assuming that the entire tags list is stored in a single database field and that you want to keep it that way, I don't see much scope of improvement, since you need to get all the data into Ruby for processing.
However, there is one problem with your database query
Article.where(status: 'Finished')
# This translates into the following query
SELECT * FROM articles WHERE status = 'Finished'
Essentially, you are fetching all the columns whereas you only need the tags column for your process. So, you can use pluck like this:
Article.where(status: 'Finished').pluck(:tags)
# This translates into the following query
SELECT tags FROM articles WHERE status = 'Finished'
I answered a question regarding general intersection like queries in ActiveRecord here.
Extracted below:
The following is a general approach I use for constructing intersection like queries in ActiveRecord:
class Service < ActiveRecord::Base
belongs_to :person
def self.with_types(*types)
where(service_type: types)
end
end
class City < ActiveRecord::Base
has_and_belongs_to_many :services
has_many :people, inverse_of: :city
end
class Person < ActiveRecord::Base
belongs_to :city, inverse_of: :people
def self.with_cities(cities)
where(city_id: cities)
end
# intersection like query
def self.with_all_service_types(*types)
types.map { |t|
joins(:services).merge(Service.with_types t).select(:id)
}.reduce(scoped) { |scope, subquery|
scope.where(id: subquery)
}
end
end
Person.with_all_service_types(1, 2)
Person.with_all_service_types(1, 2).with_cities(City.where(name: 'Gold Coast'))
It will generate SQL of the form:
SELECT "people".*
FROM "people"
WHERE "people"."id" in (SELECT "people"."id" FROM ...)
AND "people"."id" in (SELECT ...)
AND ...
You can create as many subqueries as required with the above approach based on any conditions/joins etc so long as each subquery returns the id of a matching person in its result set.
Each subquery result set will be AND'ed together thus restricting the matching set to the intersection of all of the subqueries.

Postgres ORDER BY values in IN list using Rails Active Record

I receive a list of UserIds(about 1000 at a time) sorted by 'Income'. I have User records in "my system's database" but the 'Income' column is not there. I want to retrieve the Users from "my system's database"
in the Sorted Order as received in the list. I tried doing the following using Active Record expecting that the records would be retrieved in the same order as in the Sorted List but it does not work.
//PSEUDO CODE
User.all(:conditions => {:id => [SORTED LIST]})
I found an answer to a similar question at the link below, but am not sure how to implement the suggested solution using Active Record.
ORDER BY the IN value list
Is there any other way to do it?
Please guide.
Shardul.
Your linked to answer provides exactly what you need, you just need to code it in Ruby in a flexible manner.
Something like this:
class User
def self.find_as_sorted(ids)
values = []
ids.each_with_index do |id, index|
values << "(#{id}, #{index + 1})"
end
relation = self.joins("JOIN (VALUES #{values.join(",")}) as x (id, ordering) ON #{table_name}.id = x.id")
relation = relation.order('x.ordering')
relation
end
end
In fact you could easily put that in a module and mixin it into any ActiveRecord classes that need it, since it uses table_name and self its not implemented with any specific class names.
MySQL users can do this via the FIELD function but Postgres lacks it. However this questions has work arounds: Simulating MySQL's ORDER BY FIELD() in Postgresql

Texticle and ActsAsTaggableOn

I'm trying to implement search over tags as part of a Texticle search. Since texticle doesn't search over multiple tables from the same model, I ended up creating a new model called PostSearch, following Texticle's suggestion about System-Wide Searching
class PostSearch < ActiveRecord::Base
# We want to reference various models
belongs_to :searchable, :polymorphic => true
# Wish we could eliminate n + 1 query problems,
# but we can't include polymorphic models when
# using scopes to search in Rails 3
# default_scope :include => :searchable
# Search.new('query') to search for 'query'
# across searchable models
def self.new(query)
debugger
query = query.to_s
return [] if query.empty?
self.search(query).map!(&:searchable)
#self.search(query) <-- this works, not sure why I shouldn't use it.
end
# Search records are never modified
def readonly?; true; end
# Our view doesn't have primary keys, so we need
# to be explicit about how to tell different search
# results apart; without this, we can't use :include
# to avoid n + 1 query problems
def hash
id.hash
end
def eql?(result)
id == result.id
end
end
In my Postgres DB I created a view like this:
CREATE VIEW post_searches AS
SELECT posts.id, posts.name, string_agg(tags.name, ', ') AS tags
FROM posts
LEFT JOIN taggings ON taggings.taggable_id = posts.id
LEFT JOIN tags ON taggings.tag_id = tags.id
GROUP BY posts.id;
This allows me to get posts like this:
SELECT * FROM post_searches
id | name | tags
1 Intro introduction, funny, nice
So it seems like that should all be fine. Unfortunately calling
PostSearch.new("funny") returns [nil] (NOT []). Looking through the Texticle source code, it seems like this line in the PostSearch.new
self.search(query).map!(&:searchable)
maps the fields using some sort of searchable_columns method and does it ?incorrectly? and results in a nil.
On a different note, the tags field doesn't get searched in the texticle SQL query unless I cast it from a text type to a varchar type.
So, in summary:
Why does the object get mapped to nil when it is found?
AND
Why does texticle ignore my tags field unless it is varchar?
Texticle maps objects to nil instead of nothing so that you can check for nil? - it's a safeguard against erroring out checking against non-existent items. It might be worth asking tenderlove himself as to exactly why he did it that way.
I'm not completely positive as to why Texticle ignores non-varchars, but it looks like it's a performance safeguard so that Postgres does not do full table scans (under the section Creating Indexes for Super Speed):
You will need to add an index for every text/string column you query against, or else Postgresql will revert to a full table scan instead of using the indexes.

Rails 3 - Expression-based Attribute in Model

How do I define a model attribute as an expression of another attribute?
Example:
Class Home < ActiveRecord::Base
attr_accessible :address, :phone_number
Now I want to be able to return an attribute like :area_code, which would be an sql expression like "substr(phone_number, 1,3)".
I also want to be able to use the expression / attribute in a group by query for a report.
This seems to perform the query, but does not return an object with named attributes, so how do I use it in a view?
Rails Console:
#ac = Home.group("substr(phone_number, 1,3)").count
=> #<OrderedHash {"307"=>3, "515"=>1}>
I also expected this to work, but not sure what kind of object it is returning:
#test = Home.select("substr(phone_number, 1,3) as area_code, count(*) as c").group("substr(phone_number, 1,3)")
=> [#<Home>, #<Home>]
To expand on the last example. Here it is with Active Record logging turned on:
>Home.select("substr(phone_number, 1,3) as area_code, count(*) as c").group("substr(phone_number, 1,3)")
Output:
Home Load (0.3ms) SELECT substr(phone_number, 1,3) as area_code, count(*) as c FROM "homes" GROUP BY substr(phone_number, 1,3)
=> [#<Home>, #<Home>]
So it is executing the query I want, but giving me an unexpected data object. Shouldn't I get something like this?
[ #<area_code: "307", c: 3>, #<area_code: "515", c: 1> ]
you cannot access to substr(...) because it is not an attribute of the initialized record object.
See : http://guides.rubyonrails.org/active_record_querying.html "selecting specific fields"
you can workaround this this way :
#test = Home.select("substr(phone_number, 1,3) as phone_number").group(:phone_number)
... but some might find it a bit hackish. Moreover, when you use select, the records will be read-only, so be careful.
if you need the count, just add .count at the end of the chain, but you will get a hash as you already had. But isn't that all you need ? what is your purpose ?
You can also use an area_code column that will be filled using callbacks on create and update, so you can index this column ; your query will run fast on read, though it will be slower on insertion.

Resources