Sanitizing arrays in select_all SQL queries in Rails 4 - ruby-on-rails

I need to pull aggregated columns from my database in a way that's unique to this page, crosses several models, and is expensive to aggregate in memory on a page that already has performance concerns. As a result, I want to make a hand-written SQL query to the database and return a bunch of simple objects (like hashes, or structs) which I can then dole out to the existing objects that consume this information. ActiveRecord::Base.connection.select_all is perfect for what I want.
My SQL string takes an array of ids:
WHERE entry.task_id IN (#{#project.tasks.id})
but this doesn't work because it returns a wrapped array, and connection.quote returns a bulleted list for some reason. What would be the best way to get the information I want? Should I manually strip off the [ and ]? Is there a handy, accessible function like sanitize_sql_array that'll do the trick? Should I never be calling select_all as part of normal operation?
This appears to have been easier in the past, but most of the methods I would use have been protected (including sanitize_sql_array). Rails really wants me to use ActiveRecord query methods, it seems.

Join and make your array a string, then pass it to the SQL:
ids_string = ids.join(', ')

Related

How can I disable lazy loading of active record queries?

I want to query some objects from the database using a WHERE clause similar to the following:
#monuments = Monument.where("... lots of SQL ...").limit(6)
Later on, in my view I use methods like #monuments.first, then I loop through #monuments, then I display #monuments.count.
When I look at the Rails console, I see that Rails queries the database multiple times, first with a limit of 1 (for #monuments.first), then with a limit of 6 (for looping through all of them), and finally it issues a count() query.
How can I tell ActiveRecord to only execute the query once? Just executing the query once with a limit of 6 should be enough to get all the data I need. Since the query is slow (80ms), repeating it costs a lot of time.
In your situation you'll want to trigger the query before you your call to first because while first is a method on Array, it's also a “finder method” on ActiveRecord objects that'll fetch the first record.
You can prompt this with any method that requires data to work with. I prefer using to_a since it's clear that we'll be dealing with an array after:
#moments = Moment.where(foo: true).to_a
# SQL Query Executed
#moments.first #=> (Array#first) <Moment #foo=true>
#moments.count #=> (Array#count) 42
In this case, you can also use first(6) in place of limit(6), which will also trigger the query. It may be less obvious to another developer on your team that this is intentional, however.
AFAIK, #monuments.first should not hit the db, I confirmed it on my console, maybe you have multiple instance with same variable or you are doing something else(which you haven't shared here), share the exact code and query and we might debug.
Since, ActiveRecord Collections acts as array, you can use array analogies to avoid querying the db.
Regarding first you can do,
#monuments[0]
Regarding the count, yes, it is a different query which hits the db, to avoid it you can use length as..
#monuments.length

Is there a lazy select_all option in Rails?

I've got a complicated query that I need to run, and it can potentially yield a large result set. I need to iterate linearly through this result set in order to crunch some numbers.
I'm executing the query like so:
ActiveRecord::Base.connection.select_all(query)
find_in_batches Won't work for my use case, as it's critical that I get the records in a custom order. Also, my query returns some fields that aren't part of any models, so I need to get the records as hashes.
The problem is, select_all is not lazy (from what I can tell). It loads all of the records into memory. Does Rails have a way to lazily get the results for a custom SQL query? .lazy doesn't seem applicable here, as I need custom ordering of the results.
This is possible in other languages (C#, Haskell, JavaScript), so it seems like it would be possible in Ruby.
Not sure but maybe you're asking for eager_load or preload.
http://blog.arkency.com/2013/12/rails4-preloading/
Hope this can help you.
You can try find_each or find_in_batches ActiveRecord methods.
Both query database in configurable-sized batches.
The difference it that find_each yields objects one-by-one to block (they are lazy initialized).
find_in_batches yields whole batch group.
If you can't use above methods due to custom sorting, what you can do is query the database using limit and offset. This way you will deal with data in portions. Memory consumption will decrease, but number of queries will increase.
Other solution may be to let database engine perform arithmetic operations, that you need and return calculated result.

Ruby's .where vs. detect

I'm looking for a method that is faster and uses less server processing. In my application, I can use both .where and .detect:
Where:
User.where(id: 1)
# User Load (0.5ms)
Detect:
User.all.detect{ |u| u.id == 1 }
# User Load (0.7ms). Sometimes increases more than .where
I understand that .detect returns the first item in the list for which the block returns TRUE but how does it compares with .where if I have thousands of Users?
Edited for clarity.
.where is used in this example because I may not query for the id alone. What if I have a table column called "name"?
In this example
User.find(1) # or
User.find_by(id: 1)
will be the fastest solutions. Because both queries tell the database to return exactly one record with a matching id. As soon as the database finds a matching record, it doesn't look further but returns that one record immediately.
Whereas
User.where(id: 1)
would return an array of objects matching the condition. That means: After a matching record was found the database would continue looking for other records to match the query and therefore always scan the whole database table. In this case – since id is very likely a column with unique values – it would return an array with only one instance.
In opposite to
User.all.detect { |u| u.id == 1 }
that would load all users from the database. This will result in loading thousands of users into memory, building ActiveRecord instances, iterating over that array and then throwing away all records that do not match the condition. This will be very slow compared to just loading matching records from the database.
Database management systems are optimized to run selection queries and you can improve their ability to do so by designing a useful schema and adding appropriate indexes. Every record loaded from the database will need to be translated into an instance of ActiveRecord and will consume memory - both operations are not for free. Therefore the rule of thumb should be: Whenever possible run queries directly in the database instead of in Ruby.
NB One should use ActiveRecord#find in this particular case, please refer to the answer by #spickermann instead.
User.where is executed on DB level, returning one record.
User.all.detect will return all the records to the application, and only then iterate through on ruby level.
That said, one must use where. The former is resistant to an amount of records, there might be billions and the execution time / memory consumption would be nearly the same (O(1).) The latter might even fail on billions of records.
Here's a general guide:
Use .find(id) whenever you are looking for a unique record. You can use something like .find_by_email(email) or .find_by_name(name) or similar (these finders methods are automatically generated) when searching non-ID fields, as long as there is only one record with that particular value.
Use .where(...).limit(1) if your query is too complex for a .find_by query or you need to use ordering but you are still certain that you only want one record to be returned.
Use .where(...) when retrieving multiple records.
Use .detect only if you cannot avoid it. Typical use cases for .detect are on non-ActiveRecord enumerables, or when you have a set of records but are unable to write the matching condition in SQL (e.g. if it involves a complex function). As .detect is the slowest, make sure that before calling .detect you have used SQL to narrow down the query as much as possible. Ditto for .any? and other enumerable methods. Just because they are available for ActiveRecord objects doesn't mean that they are a good idea to use ;)

why is Model.all different to Model.where('true') in rails 3

I have a query, which works fine:
ModelName.where('true')
I can chain this with other AR calls such as where, order etc. However when I use:
ModelName.all
I receive the "same" response but can't chain a where or order to it as it's an array rather than a AR collection.
Whereas I have no pragmatic problem using the first method it seems a bit ugly/unnecessary. Is there a cleaner way of doing this maybe a .to_active_record_collection or something?
There is an easy solution. Instead of using
ModelName.where('true')
Use:
ModelName.scoped
As you said:
ModelName.where('true').class #=> ActiveRecord::Relation
ModelName.all.class #=> Array
So you can make as many lazy loading as long as you don't use all, first or last which trigger the query.
It's important to catch these differences when you consider caching.
Still I can't understand what kind of situation could lead you to something like:
ModelName.all.where(foobar)
... Unless you need the whole bunch of assets for one purpose and get it loaded from the database and need a subset of it to other purposes. For this kind of situation, you'd need to use ruby's Array filtering methods.
Sidenote:
ModelName.all
should never be used, it's an anti-pattern since you don' control how many items you'll retrieve. And hopefully:
ModelName.limit(20).class #=> ActiveRecord::Relation
As you said, the latter returns an array of elements, while the former is an ActiveRecord::Relation. You can order and filter array using Ruby methods. For example, to sort by id you can call sort_by(&:id). To filter elements you can call select or reject. For ActiveRecord::Relation you can chain where or order to it, as you said.
The difference is where the sorting and processing goes. For Array, it is done by the application; for Relation - by the database. The latter is usually faster, when there is more records. It is also more memory efficient.

Linq-to-SQL query - Need to filter by IDs returned by Full-Text Search sql functions - Hitting limit for Contains

My objective:
I have built a working controller action in MVC which takes user input for various filter criteria and, using PredicateBuilder (part of LinqKit - sorry, I'm not allowed enough links yet) builds the appropriate LINQ query to return rows from a "master" table in SQL with a couple hundred thousand records. My implementation of the predicates is totally inelegant, as I'm new to a lot of this, and under a very tight deadline, but it did make life easier. The page operates perfectly as-is.
To this, I need to add a Full-Text search filter. Understanding the way LINQ translates Contains to LIKE(%%), using the advice in Simon Blog: LINQ-to-SQL - Enabling Full-Text Searching, I've already prepared Table Functions in SQL to run Freetext queries on the relevant columns. I have 4 functions, to match the query against 4 separate tables.
My approach:
At the moment, I'm building the predicates (I'll spare you) for the initial IQueryable data object, running a LINQ command to return them, like so:
var MyData = DB.Master_Items.Where(outer);
Then, I'm attempting to further filter MyData on the Keys returned by my full-text search functions:
var FTS_Matches_Subtable_1 = (from tbl in DB.Subtable_1
join fts in DB.udf_Subtable_1_FTSearch(KeywordTerms)
on tbl.ID equals fts.ID
select tbl.ForeignKey);
... I have 4 of those sets of matches which I've tried to use to filter my original dataset in several ways with no success. For instance:
MyNewData = MyData.Where(d => FTS_Matches_Subtable_1.Contains(d.Key) ||
FTS_Matches_Subtable_2.Contains(d.Key) ||
FTS_Matches_Subtable_3.Contains(d.Key) ||
FTS_Matches_Subtable_4.Contains(d.Key));
I just get the error: The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters were provided in this RPC request. The maximum is 2100.
I get that it's because I'm trying to pass a relatively large set of data into the Contains function and LINQ is converting each record into a separate parameter, exceeding the limit.
I just don't know how to get around it.
I found another post linq expression to return property value which seemed SO promising. I tried ifwdev's solution (2nd highest ranked answer): using LinqKit to build an extension that will break up the queries into manageable chunks. But I can't figure out how to implement it. Out of my depth right now maybe?
Is there another approach that I'm missing? Some simpler way to accomplish this that I've overlooked?
Sorry for the long post. But thank you for any help you can provide!
This is a perfect time to go back to raw ado.net.
Twisting things around just to use linq to sql is probably just as time consuming if you wrote the query and hydration by hand.

Resources