Ruby partially retrieve large amount of records and iterate over them - ruby-on-rails

I'm newbie in Ruby but I have a lot of experience in other programming languages. I need to iterate over large amount of records (from db or any persistent storage). Storage engine allows me to retrieve records partially by ranges. In PHP I usually write custom iterator that loads range of records iterate over them and when need loads next part of records and forget about previous part. Some trade-off between script memory usage and count of request to storage. Something like this (copied from comments here):
class Database_Result_Iterator {
...
private $_db_resource = null;
private $_loaded = false;
private $_valid = false;
function rewind() {
if ($this->_db_resource) {
mysql_free($this->_db_resource);
$this->_db_resource = null;
}
$this->_loaded = false;
$this->_valid = false;
}
function valid() {
if ($this->_loaded) {
$this->load();
}
return $this->_valid;
}
private function load() {
$this->_db_resource = mysql_query(...);
$this->_loaded = true;
$this->next(); // Sets _valid
}
}
How such approach is transformed in Ruby? I.e. I have some class Voter and method get_votes that returns all votes belong to current voter object. It is possible to retrieve not an array with all votes but collection of votes with possibility to iterate over it. How should I implement it?
UPDATE
Please not consider ActiveRecord and RDBMS as only one possible storage. And what about Redis as storage and commands like LRANGE? I'm interested in common code pattern for solution such kind of problem in Ruby.

From the guides on Ruby on Rails:
User.all.each do |user|
NewsLetter.weekly_deliver(user)
end
Is very innefficient. You probably want to do most of the filtering in the database, to start with. ActiveRecord offers a method called find_each for this:
User.find_each(:batch_size => 5000) do |user|
NewsLetter.weekly_deliver(user)
end
The :batch_size parameter allows to fetch slices of data instead of getting the entire resultset. Extremely helpfull in most cases.
But, you probably don't want to operate on all records in the first place:
User.with_newsletter.each do |user|
NewsLetter.weekly_deliver(user)
end
Where with_newsletter is a so called scope.

I really don't see the point of this question.
AR is an API for querying RDBMS and that's how you do it in AR.
If you want to do redis you'll have to either write it yourself at the driver level or find a similar abstraction to AR for Redis... I think DataMapper had a redis adapter.
If there is a universal way to do this for any data store it is likely in DataMapper, but the basic pattern to follow when creating your own would be to look at how AR implements find_each/find_in_batches and do it for your store of choice.

It sounds like you want to use find_each (http://apidock.com/rails/ActiveRecord/Batches/ClassMethods/find_each). This lets you iterate through a large dataset by loading in a small number, iterating over them, then loading in another batch and so on.
User.find_each do |user|
user.do_some_stuff
end
will iterate through all users without loading a bajillion of them into memory at once.

Related

Rails - Returning all records for who a method returns true

I have a method:
class Role
def currently_active
klass = roleable_type.constantize
actor = Person.find(role_actor_id)
parent = klass.find(roleable_id)
return true if parent.current_membership?
actor.current_membership?
end
end
I would like to return all instances of Role for who this method is true, however can't iterate through them with all.each as this takes around 20 seconds. I'm trying to use where statements, however they rely on an attribute of the model rather than a method:
Role.where(currently_active: true)
This obviously throws an error as there is no attribute called currently_active. How can I perform this query the most efficient way possible, and if possible using Active Records rather than arrays?
Thanks in advance
It seems impossible, in your case you have to do iterations. I think the best solution is to add a Boolean column in your table, so you can filter by query and this will be much faster.
After seeing your method after edit, it seems that it's not slow because of the loop, it is slow because Person.find and klass.find , you are doing alot of queries and database read here. (You better use associations and do some kind of eager loading, it will be much faster)
Another work-around is you can use ActiveModelSerializers , in the serializer you can get the attributes on the object based on condition. and after that you can work your logic to neglect the objects that have some kind of flag or attribute.
See here the documentation of active model serializer
Conditional attributes in Active Model Serializers
Wherever possible you better delegate your methods to SQL through activerecord when you're seeking better efficiency and speed and avoid iterating through objects in ruby to apply the method. I understand this is an old question but still many might get the wrong idea.
There is not enough information on current_membership? methods on associations but here's an example based on some guess-work from me:
roleables = roleable_type.pluralize
roleable_type_sym = roleable_type.to_sym
Role.joins(roleables_sym).where(" ? BETWEEN #{roleables}.membership_start_date AND #{roleables}.membership_end_date", DateTime.current).or(Role.joins(:person).where(" ? BETWEEN persons.membership_start_date AND persons.membership_end_date", DateTime.current))
so you might have to re-implement the method you have written in the model in SQL to improve efficiency and speed.
Try the select method: https://www.rubyguides.com/2019/04/ruby-select-method/
Role.all.select { |r| r.currently_active? }
The above can be shortened to Role.select(&:currently_active?)

Grails get domain properties

I'm trying to accelerate the performance of my app and wonder if there is a difference between accessing domain property value with instance.name and instance.getName()
If it is, which one is the best in terms of performance ?
Example
class User {
String name
}
User user = User.get(100);
//is it better this way
user.name
//or this way
user.getName()
Thank you
It doesn't matter for the usage you've provided, because user.name uses user.getName() behind scenes. So it's the same. If you want to access property directly you have to use # like this user.#name. See more here
But I don't think this is the way you can speed up your app.
It is very likely you will find a lot easier ways for improving performance of your code. Here are some ideas where to start if you like to improve performance.
A) Number of queries. Try to avoid the the N+1 problem. For example if one user hasMany [events: Event], code like user.events.each { access event.anyPropertyExceptId } will dispatch new queries for each event.
B) Efficiency of queries. Grails per default creates indexes for all gorm associations / other nested domains. However anything you use to search, filter etc. you need to do "manually" for example.
static mapping = {
anyDomainProperty index: 'customIndexName'
}
C) Only query for the data you are interested in, replace for example:
User.all.each { user ->
println user.events.size()
}
with
Event.withCriteria {
projections {
property('user')
countDistinct('id')
groupProperty('user')
}
}
D) If you really need to speed up your groovy code and your problem is rather a single request than general cpu usage, take a look at http://gpars.codehaus.org and http://grails.org/doc/2.3.8/guide/async.html and try to parallize work.
I doubt any performance issues in your application are related to how you are accessing your properties of your domain classes. In fact, if you profile/measure your application I'm sure you will see that is the case.

Avoiding subqueries in HQL using Grails

I have two object, a room type and a reservation. Simplified they are:
class Room {
String description
int quantity
}
class Reservation {
String who
Room room
}
I want to query for all rooms along with the number of rooms available for each type. In SQL this does what I want:
select id, quantity, occupied, quantity-coalesce(occupied, 0) as available
from room left join(select room_id, count(room_id) as occupied from reservation)
on id = room_id;
I'm not getting anywhere trying to work out how to do this with HQL.
I'd appreciate any pointers since it seems like I'm missing something fairly fundamental in either HQL or GORM.
The problem here is your trying to represent fields that are not your domain classes like available and occupied. Trying to get HQL\GORM to do this can be a bit a little frustrating, but not impossible. I think you have a couple options here...
1.) Build your domain classes so that there easier to use. Maybe your Room needs to know about it's Reservations via a mapping table or, perhaps write what you want the code to look like and then adjust the design.
For example. Maybe you want your code to look like this...
RoomReservation.queryAllByRoomAndDateBetween(room, arrivalDate, departureDate);
Then you would implement it like this...
class RoomReservation{
...
def queryAllByRoomAndDateBetween(def room, Date arrivalDate, Date departureDate){
return RoomReservation.withCriteria {
eq('room', room)
and {
between('departureDate', arrivalDate, departureDate)
}
}
}
2.) My second thought is... It's okay to use the database for what it's good for. Sometimes using sql in you code is simply the most effective way to do something. Just do it in moderation and keep it centralized and unit tested. I don't suggest you use this approach because you query isn't that complex, but it is an option. I use stored procedures for things like 'dashboard view's' that query millions of objects for summary data.
class Room{
...
def queryReservations(){
def sql = new Sql(dataSoruce);
return sql.call("{call GetReservations(?)}", [this.id]) //<-- stored procedure.
}
}
I'm not sure how you can describe a left join with a subquery in HQL. INn any case you can easily execute raw SQL in grails too, if HQL is not expressive enough:
in your service, inject the dataSource and create a groovy.sql.Sql instance
def dataSource
[...]
def sql= new Sql(dataSource)
sql.eachRow("...."){row->
[...]
}
I know it's very annoying when people try to patronize you into their way of thinking when you ask a question, instead of answering your question or just shut up, but in my opinion, this query is sufficiently complex that I would create a concept for this number in my data structure, perhaps an Availability table associated to the Room, which would keep count not only of the quantity but also of the occupied value.
This is instead of computing it every time you need it.
Just my $.02 just ignore it if it annoys you.

Grails. Store read-only collection in variable

UPDATE. Check these benchmarks to test it for yourself.
Should I store the collection of objects in a variable of some service like this:
ConfigService{
private def countries = Country.findAllBySomeCondition()
public def countries(){
return countries
}
}
or use:
ConfigService{
public def countries(){
return Country.findAllBySomeCondition()
}
}
If the collection will be used often only for read.
Depends. In your first example, the value is cached, which may be a little more efficient, but if at some point more Countries are added, they may not show up on your service call. However, if you call .countries() often with your second example, you may incur some performance hits.
The best option would probably be to get some benchmarks as to how long the queries take, and decide whether it's best to try to cache the value yourself, or to ensure that it's always up-to-date. My suggestion would be to stick with the second example, as Hibernate handles some caching for you already, and chances are the list is not big enough to significantly hinder your app.
#GrailsGuy's answer is spot on and I'm gave him a +1. To provide you with some other options you could:
1) If your list of Countries doesn't change you could put them in an enum and avoid the DB all together.
2) If you give the user the ability to add/remove/edit Countries then you could cache the list like you are in example 1 but then when the user add/remove/edits a country you can force reload the list.
ConfigService{
private def countries
public def countries(){
if(countries == null) {
countries = Country.findAllBySomeCondition()
}
return countries
}
}
CountryService {
def configService
def addCountry() {
//Do add country stuff
configService.countries = null
}
}
This way you can cache the countries until they get updated. Like #GrailsGuy said though Hibernate will do this to some extent for you.

Repository Interface - Available Functions & Filtering Output

I've got a repository using LINQ for modelling the data that has a whole bunch of functions for getting data out. A very common way of getting data out is for things such as drop down lists. These drop down lists can vary. If we're creating something we usually have a drop down list with all entries of a certain type, which means I need a function available which filters by the type of entity. We also have pages to filter data, the drop down lists only contain entries that currently are used, so I need a filter that requires used entries. This means there are six different queries to get the same type of data out.
The problem with defining a function for each of these is that there'd be six functions at least for every type of output, all in one repository. It gets very large, very quick. Here's something like I was planning to do:
public IEnumerable<Supplier> ListSuppliers(bool areInUse, bool includeAllOption, int contractTypeID)
{
if (areInUse && includeAllOption)
{
}
else if (areInUse)
{
}
else if (includeAllOption)
{
}
}
Although "areInUse" doesn't seem very English friendly, I'm not brilliant with naming. As you can see, logic resides in my data access layer (repository) which isn't friendly. I could define separate functions but as I say, it grows quite quick.
Could anyone recommend a good solution?
NOTE: I use LINQ for entities only, I don't use it to query. Please don't ask, it's a constraint on the system not specified by me. If I had the choice, I'd use LINQ, but I don't unfortunately.
Have your method take a Func<Supplier,bool> which can be used in Where clause so that you can pass it in any type of filter than you would like to construct. You can use a PredicateBuilder to construct arbitrarily complex functions based on boolean operations.
public IEnumerable<Supplier> ListSuppliers( Func<Supplier,bool> filter )
{
return this.DataContext.Suppliers.Where( filter );
}
var filter = PredicateBuilder.False<Supplier>();
filter = filter.Or( s => s.IsInUse ).Or( s => s.ContractTypeID == 3 );
var suppliers = repository.ListSuppliers( filter );
You can implement
IEnumerable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will retrieve all suppliers from the database that are then filtered using LINQ.
Assuming you are using LINQ to SQL you can also implement
IQueryable<Supplier> GetAllSuppliers() { ... }
end then use LINQ on the returned collection. This will only retrieve the necessary suppliers from the database when the collection is enumerated. This is very powerful and there are also some limits to the LINQ you can use. However, the biggest problem is that you are able to drill right through your data-access layer and into the database using LINQ.
A query like
var query = from supplier in repository.GetAllSuppliers()
where suppliers.Name.StartsWith("Foo") select supplier;
will map into SQL similar to this when it is enumerated
SELECT ... WHERE Name LIKE 'Foo%'

Resources