Rails - given an array of Users - how to get a output of just emails? - ruby-on-rails

I have the following:
#users = User.all
User has several fields including email.
What I would like to be able to do is get a list of all the #users emails.
I tried:
#users.email.all but that errors w undefined
Ideas? Thanks

(by popular demand, posting as a real answer)
What I don't like about fl00r's solution is that it instantiates a new User object per record in the DB; which just doesn't scale. It's great for a table with just 10 emails in it, but once you start getting into the thousands you're going to run into problems, mostly with the memory consumption of Ruby.
One can get around this little problem by using connection.select_values on a model, and a little bit of ARel goodness:
User.connection.select_values(User.select("email").to_sql)
This will give you the straight strings of the email addresses from the database. No faffing about with user objects and will scale better than a straight User.select("email") query, but I wouldn't say it's the "best scale". There's probably better ways to do this that I am not aware of yet.
The point is: a String object will use way less memory than a User object and so you can have more of them. It's also a quicker query and doesn't go the long way about it (running the query, then mapping the values). Oh, and map would also take longer too.
If you're using Rails 2.3...
Then you'll have to construct the SQL manually, I'm sorry to say.
User.connection.select_values("SELECT email FROM users")
Just provides another example of the helpers that Rails 3 provides.

I still find the connection.select_values to be a valid way to go about this, but I recently found a default AR method that's built into Rails that will do this for you: pluck.
In your example, all that you would need to do is run:
User.pluck(:email)
The select_values approach can be faster on extremely large datasets, but that's because it doesn't typecast the returned values. E.g., boolean values will be returned how they are stored in the database (as 1's and 0's) and not as true | false.
The pluck method works with ARel, so you can daisy chain things:
User.order('created_at desc').limit(5).pluck(:email)

User.select(:email).map(&:email)

Just use:
User.select("email")

While I visit SO frequently, I only registered today. Unfortunately that means that I don't have enough of a reputation to leave comments on other people's answers.
Piggybacking on Ryan's answer above, you can extend ActiveRecord::Base to create a method that will allow you to use this throughout your code in a cleaner way.
Create a file in config/initializers (e.g., config/initializers/active_record.rb):
class ActiveRecord::Base
def self.selected_to_array
connection.select_values(self.scoped)
end
end
You can then chain this method at the end of your ARel declarations:
User.select('email').selected_to_array
User.select('email').where('id > ?', 5).limit(4).selected_to_array

Use this to get an array of all the e-mails:
#users.collect { |user| user.email }
# => ["test#example.com", "test2#example.com", ...]
Or a shorthand version:
#users.collect(&:email)

You should avoid using User.all.map(&:email) as it will create a lot of ActiveRecord objects which consume large amounts of memory, a good chunk of which will not be collected by Ruby's garbage collector. It's also CPU intensive.
If you simply want to collect only a few attributes from your database without sacrificing performance, high memory usage and cpu cycles, consider using Valium.
https://github.com/ernie/valium
Here's an example for getting all the emails from all the users in your database.
User.all[:email]
Or only for users that subscribed or whatever.
User.where(:subscribed => true)[:email].each do |email|
puts "Do something with #{email}"
end
Using User.all.map(&:email) is considered bad practice for the reasons mentioned above.

Related

Independent ActiveRecord query inside ActiveRecord::Relation context

There is some ruby on rails code
class User < ActiveRecord::Base
def self.all_users_count
User.all
end
end
User.all_users_count
returns, for example, 100
User.limit(5).all_users_count
Now it return 5 because of ActiveRecord::Relation context, in despite of i wroute name of class User.all instead simple all
(.to_sql show that query always contains limit or where id or other things in other cases)
So, how can i make context-independent AR queries inside model methods? like User.all and others?
Thank you!
Ps. Or maybe my code has an error or something like this, and in fact User.all inside any methods and context always must returns correct rows count of this model table
This is very weird and unexpected (unfortunately I can't confirm that, because my computer crashed, and have no rails projects at hand).
I would expect
User.all
to create a new scope (or as you call it - context)
Try working around this with
User.unscoped.all
Edit:
I tried it out on my project and on clean rails repo, and the results are consistent.
And after thinking a bit - this is maybe not even an issue - I think your approach could be faulty.
In what scenario would you chain User.limit(2).all_users_count ?? I can't think of any. Because either you need all users count, and you call User.all_usert_count (or just User.count)
... or you need something else and you call User.limit(2).where(...) - there's no point in calling all_users_count in that chain, is it?
And, when you think of it, it makes sense. Imagine you had some different method like count_retired, what would you expect from such call:
User.limit(2).count_retired ?
The number of retired users not bigger than 2, or the number of all retired users in the system? I would expect the former.
So I think one of two possibilities here:
either you implemented it wrong and should do it in a different way (as described above in the edit section)
or you have some more complex issue, but you boiled your examples down to a point where they don't make much sense anymore (please follow up with another question if you please, and please, ping me in the comment with a link if you do, because it sounds interesting)

How to add attribute/property to each record/object in an array? Rails

I'm not sure if this is just a lacking of the Rails language, or if I am searching all the wrong things here on Stack Overflow, but I cannot find out how to add an attribute to each record in an array.
Here is an example of what I'm trying to do:
#news_stories.each do |individual_news_story|
#user_for_record = User.where(:id => individual_news_story[:user_id]).pluck('name', 'profile_image_url');
individual_news_story.attributes(:author_name) = #user_for_record[0][0]
individual_news_story.attributes(:author_avatar) = #user_for_record[0][1]
end
Any ideas?
If the NewsStory model (or whatever its name is) has a belongs_to relationship to User, then you don't have to do any of this. You can access the attributes of the associated User directly:
#news_stories.each do |news_story|
news_story.user.name # gives you the name of the associated user
news_story.user.profile_image_url # same for the avatar
end
To avoid an N+1 query, you can preload the associated user record for every news story at once by using includes in the NewsStory query:
NewsStory.includes(:user)... # rest of the query
If you do this, you won't need the #user_for_record query — Rails will do the heavy lifting for you, and you could even see a performance improvement, thanks to not issuing a separate pluck query for every single news story in the collection.
If you need to have those extra attributes there regardless:
You can select them as extra attributes in your NewsStory query:
NewsStory.
includes(:user).
joins(:user).
select([
NewsStory.arel_table[Arel.star],
User.arel_table[:name].as("author_name"),
User.arel_table[:profile_image_url].as("author_avatar"),
]).
where(...) # rest of the query
It looks like you're trying to cache the name and avatar of the user on the NewsStory model, in which case, what you want is this:
#news_stories.each do |individual_news_story|
user_for_record = User.find(individual_news_story.user_id)
individual_news_story.author_name = user_for_record.name
individual_news_story.author_avatar = user_for_record.profile_image_url
end
A couple of notes.
I've used find instead of where. find returns a single record identified by it's primary key (id); where returns an array of records. There are definitely more efficient ways to do this -- eager-loading, for one -- but since you're just starting out, I think it's more important to learn the basics before you dig into the advanced stuff to make things more performant.
I've gotten rid of the pluck call, because here again, you're just learning and pluck is a performance optimization useful when you're working with large amounts of data, and if that's what you're doing then activerecord has a batch api you should look into.
I've changed #user_for_record to user_for_record. The # denote instance variables in ruby. Instance variables are shared and accessible from any instance method in an instance of a class. In this case, all you need is a local variable.

Mongoid identity_map and memory usage, memory leaks

When I executing query
Mymodel.all.each do |model|
# ..do something
end
It uses allot of memory and amount of used memory increases at all the time and at the and it crashes. I found out that to fix it I need to disable identity_map but when I adding to my mongoid.yml file identity_map_enabled: false I am getting error
Invalid configuration option: identity_map_enabled.
Summary:
A invalid configuration option was provided in your mongoid.yml, or a typo is potentially present. The valid configuration options are: :include_root_in_json, :include_type_for_serialization, :preload_models, :raise_not_found_error, :scope_overwrite_exception, :duplicate_fields_exception, :use_activesupport_time_zone, :use_utc.
Resolution:
Remove the invalid option or fix the typo. If you were expecting the option to be there, please consult the following page with repect to Mongoid's configuration:
I am using Rails 4 and Mongoid 4, Mymodel.all.count => 3202400
How can I fix it or maybe some one know other way to reduce amount of memory used during executing query .all.each ..?
Thank you very much for the help!!!!
I started with something just like you by doing loop through millions of record and the memory just keep increasing.
Original code:
#portal.listings.each do |listing|
listing.do_something
end
I've gone through many forum answers and I tried them out.
1st attempt: I try to use the combination of WeakRef and GC.start but no luck, I fail.
2nd attempt: Adding listing = nil to the first attempt, and still fail.
Success Attempt:
#start_date = 10.years.ago
#end_date = 1.day.ago
while #start_date < #end_date
#portal.listings.where(created_at: #start_date..#start_date.next_month).each do |listing|
listing.do_something
end
#start_date = #start_date.next_month
end
Conclusion
All the memory allocated for the record will never be released during
the query request. Therefore, trying with small number of record every
request does the job, and memory is in good condition since it will be
released after each request.
Your problem isn't the identity map, I don't think Mongoid4 even has an identity map built in, hence the configuration error when you try to turn it off. Your problem is that you're using all. When you do this:
Mymodel.all.each
Mongoid will attempt to instantiate every single document in the db.mymodels collection as a Mymodel instance before it starts iterating. You say that you have about 3.2 million documents in the collection, that means that Mongoid will try to create 3.2 million model instances before it tries to iterate. Presumably you don't have enough memory to handle that many objects.
Your Mymodel.all.count works fine because that just sends a simple count call into the database and returns a number, it won't instantiate any models at all.
The solution is to not use all (and preferably forget that it exists). Depending on what "do something" does, you could:
Page through all the models so that you're only working with a reasonable number of them at a time.
Push the logic into the database using mapReduce or the aggregation framework.
Whenever you're working with real data (i.e. something other than a trivially small database), you should push as much work as possible into the database because databases are built to manage and manipulate big piles of data.

How much memory is consumed if I create a ruby object?

I want to know how much memory is consumed if I create a ruby object. Does Ruby have any method to tell?
Is there any difference for memory consumption in the following?
users = User.where("created_at > ?", 2.months.ago) # select all fields
users = User.select(:user_name).where("created_at > ?", 2.months.ago) # just select one field
You could use ruby-prof, a wonderful ruby profiler that will tell you everything your code is doing, including memory allocation. The usage is really simple:
require 'ruby-prof'
# Profile the code
result = RubyProf.profile do
...
[code to profile]
...
end
# Print a flat profile to text
printer = RubyProf::FlatPrinter.new(result)
printer.print(STDOUT)
It can output results as text, text graph, html graph, call stack and more. In the readme there is also a section about profiling rails applications. The installation is immediate, so give it a try:
gem install ruby-prof
Firstly there's no easy way to measure how much memory an object consumes. If it's a Rails App you can use this Unix Script to check. Also there's a great blog post that may help you about this issue.
In your second question. The 2nd query is probably gonna consume less memory since ActiveRecord isnt processing all the fields to build an AR object. Ultimately it's better to use .pluck for you second query.
users = User.where("created_at > ?", 2.months.ago).pluck(:user_name)

Concurrency and Mongoid

I'm currently trying my hand at developing a simple web based game using rails and Mongoid. I've ran into some concurrency issues that i'm not sure how to solve.
The issue is i'm not sure how to atomically do a check and take an action based upon it in Mongoid.
Here is a sample of the relevant parts of the controller code to give you an idea of what i'm trying to do:
battle = current_user.battle
battle.submitted = true
battle.save
if Battle.where(opponent: current_user._id, submitted: true, resolving: false).any?
battle.update_attribute(:resolving, true)
#Resolve turn
A battle is between two users, but i only want one of the threads to run the #Resolve turn. Now unless i'm completely off both threads could check the condition one after another, but before setting resolving to true, therefore both end up running the '#Resolve turn' code.
I would much appreciate any ideas on how to solve this issue.
I am however getting an increasing feeling that doing user synchronization in this way is fairly impractical and that there's a better way altogether. So suggestions for other techniques that could accomplish the same thing would be greatly appreciated!
Sounds like you want the mongo findAndModify command which allows you to atomically retrieve and update a row.
Unfortunately mongoid doesn't appear to expose this part of the mongo api, so it looks like you'll have to drop down to the driver level for this one bit:
battle = Battle.collection.find_and_modify(query: {oppenent: current_user._id, ...},
update: {'$set' => {resolving: true})
By default the returned object does not include the modification made, but you can turn this on if you want (pass {:new => true})
The value returned is a raw hash, if my memory is correct you can do Battle.instantiate(doc) to get a Battle object back.

Resources