I have three tables in my database: values, keys, and sources. Both keys and sources have many values. sources has a datetime column called file_date. For each key, I have to select all values that have a source that is between a specific date range (usually within two years). Not all dates within that range have a source, and not all sources have a value. I need to create an array that contains all values from within that range.
I at first tried to simply query the entire sources array at once, like so:
Value.where(key: 1, source: sources_array)
However, since there are several values that are nil, it simply returned records that have a value at that date.
So now, I've created an array containing all of the sources between that date range. Days that don't have a source simply have nil. Then, for each key, I iterate through the sources array, returning the value that matches that foo and source. That is obviously not ideal, and it takes about 7 seconds for the page to load.
Here is that code:
sources.map do |source|
Value.where(source: source, key: key)
end
Any ideas?
You could also generate a single SQL query if you only need to retrieve the data and getting back arrays will suffice.
output = []
sources.each do |source|
output << "select * from values where (source like #{source} and key like #{key})"
end
result = ActiveRecord::Base.connection.execute(output.join(" union ")).to_a
The big issue with your .map is that it runs several queries, instead of one query with all the results you want. With each additional query, you add the overhead of sending the request to the database and receiving the response back. What you need to do is generate the query in a way that it returns all the results you may be looking for, and then dealing with any grouping or sorting application side.
You could try wrapping your queries in a single transaction:
ActiveRecord::Base.transaction do
sources.map do |source|
Value.where(source: source, key: key)
end
end
I'll need to refactor this like crazy, but I have something that has cut it down to 3 seconds:
source_ids = sources.map do |source|
if source
source.id
else
nil
end
end
values = Value.where(source: sources, key: key)
value_sources = values.pluck(:source_id)
value_decimals = values.pluck(:value_decimal)
source_ids.map do |id|
index = value_sources.index(id)
if index
value_decimals[index]
else
nil
end
end
Related
Given an array of part ids containing duplicates, how can I find the corresponding records in my Part model, including the duplicates?
An example array of part ids would be ["1B", "4", "3421", "4"]. If we assume I have a record corresponding to each of those values I would like to see 4 records returned in total, not 3. If possible, I was hoping to be able to make additional SQL operations on whatever is returned.
Here's what I'm currently using which doesn't include the duplicates:
#parts = Part.where(:part_id => params[:ids])
To give a little background, I'm trying to upload an XML file containing a list of parts used in some item. My application is meant to parse the XML file and compare the parts listed within against my Parts database so that I can see how much the part weighs. These items will sometimes contain duplicates of various parts so that's what I'm trying to account for here.
The only way I can think of doing it is using map...
#parts = params[:ids].map { |id| Part.find_by_id(id) }
hard to tell exactly what you are doing, are you looking up weight from the xml or from your data?
parts_xml = some_method_that_loads_xml
part_ids_from_xml = part_xml.... # pull out the ids
parts = Part.where("id IN (?)", part_ids_from_xml)
now you have two arrays (xml data and your 'matching' database records) and you can use select or detect to do in memory lookups by part_id
part_ids_from_xml.each do |part_id|
weight = parts.detect { |item| item.id == part_id }.weight
puts "#{id} weighs #{weight}"
end
see http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-detect
and http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-select
I am fetching some data from the database as below in my ruby file:
#main1= $connection.execute("SELECT * FROM builds
WHERE platform_type LIKE 'TOTAL';")
#main2= $connection.execute("SELECT * FROM builds
WHERE platform_type NOT LIKE 'TOTAL';")
After doing this I am performing hashing and a bunch of other stuff on these results. To be clear, this does not return an array as such, but it returns some mysql2 type object. So I just map it to 2 arrays to be safe:
#arr1 = Array.new
#arr1 = #main1.map
#arr2 = Array.new
#arr2 = #main2.map
Is there any way to avoid executing 2 different queries and getting all the results in 2 different arrays by executing just one query. I basically want to split the results into 2 arrays, the first one having platform_type = TOTAL and everything else in the other one.
Also without getting into why you're doing what you're doing, I would use Enumerable#partition as such:
rows = $connection.execute('SELECT * FROM builds')
like_total, not_like_total = rows.partition { |row|
row['platform_type'] =~ /TOTAL/
}
Note that, IIRC, SQL LIKE 'TOTAL' isn't the same as Ruby's "string" =~ /TOTAL/ (which is more like LIKE '%TOTAL%' in SQL—am not sure what you need).
To answer your question without getting into why you're doing it like that:
Return them all in one query, with the extra criteria, then you can group them however you want with group_by:
all_results = $connection.execute("SELECT *, platform_type LIKE 'TOTAL' as is_like_total FROM builds").
This will give each of your results an 'is_like_total' "column" that you can group_by on.
http://ruby-doc.org/core-2.0/Enumerable.html#method-i-group_by
So I am pulling my hair over this issue / gotcha. Basically I used find_by_sql to fetch data from my database. I did this because the query has lots of columns and table joins and I think using ActiveRecord and associations will slow it down.
I managed to pull the data and now I wanted to modify returned values. I did this by looping through the result ,for example.
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
end
What I found out is that project['mycolumn'] was not changed at all.
So my question:
Does find_by_sql return an array Hashes?
Is it possible to modify the value of one of the attributes of hash as stated above?
Here is the code : http://pastie.org/4213454 . If you can have a look at summarize_roles2() that's where the action is taking place.
Thank you. Im using Rails 2.1.1 and Ruby 1.8. I can't really upgrade because of legacy codes.
Just change the method above to access the values, print value of project and you can clearly check the object property.
The results will be returned as an array with columns requested encapsulated as attributes of the model you call this method from.If you call Product.find_by_sql then the results will be returned in a Product object with the attributes you specified in the SQL query.
If you call a complicated SQL query which spans multiple tables the columns specified by the SELECT will be attributes of the model, whether or not they are columns of the corresponding table.
Post.find_by_sql "SELECT p.title, c.author FROM posts p, comments c WHERE p.id = c.post_id"
> [#<Post:0x36bff9c #attributes={"title"=>"Ruby Meetup", "first_name"=>"Quentin"}>, ...]
Source: http://api.rubyonrails.org/v2.3.8/
Have you tried
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
project.save
end
I am developing a Rails app. I would like to use an array to hold 2,000,000 data, then insert the data into database like following:
large_data = Get_data_Method() #get 2,000,000 raw data
all_values = Array.new
large_data.each{ |data|
all_values << data[1] #e.g. data[1] has the format "(2,'john','2002-09-12')"
}
sql="INSERT INTO cars (id,name,date) VALUES "+all_values.join(',')
ActiveRecord::Base.connection.execute(sql)
When I run the code, it takes a long long time at the point of large_data.each{...} . Actually I am now still waiting for it to finish(it has been running for 1 hour already still not finish the large_data.each{...} part).
Is it because of the number of elements is too large for the ruby array that the array can not hold 2,000,000 elements ? or ruby array can hold that much elements and it is reasonable to wait this long?
Since I would like to use bulk insertion in SQL to speed up the large data insertion time in mysql database, so I would like to use only one INSERT INTO statement, that's why I did the above thing. If this is a bad design, can you recommand me a better way?
Some notes:
Don't use the pattern "empty array + each + push", use Enumerable#map.
all_values = large_data.map { |data| data[1] }
Is it possible to write get_data to return items lazily? if the answer is yes, check enumerators and use them to do batched inserts into the database instead of puting all objects at once. Something like this:
def get_data
Enumerator.new do |yielder|
yielder.yield some_item
yielder.yield another_item
# yield all items.
end
end
get_data.each_slice(1000) do |data|
# insert those 1000 elements into the database
end
That said, there're projects for doing efficient bulk insertions, check ar-extensions and activerecord-import for Rails >= 3.
An array of 2m items is never going to be the easyist thing to manage, have you taken a look at MongoDB, this is a database which can be accessed just like an array and could be the answer to your issues.
An easy fix would be to split your inserts into blocks of 1000, that would make the whole process more manageable.
I'm trying to limit the number of times I do a mysql query, as this could end up being 2k+ queries just to accomplish a fairly small result.
I'm going through a CSV file, and I need to check that the format of the content in the csv matches the format the db expects, and sometimes I try to accomplish some basic clean-up (for example, I have one field that is a string, but is sometimes in the csv as jb2003-343, and I need to strip out the -343).
The first thing I do is get from the database the list of fields by name that I need to retrieve from the csv, then I get the index of those columns in the csv, then I go through each line in the csv and get each of the indexed columns
get_fields = BaseField.find_by_group(:all, :conditions=>['group IN (?)',params[:group_ids]])
csv = CSV.read(csv.path)
first_line=csv.first
first_line.split(',')
csv.each_with_index do |row|
if row==0
col_indexes=[]
csv_data=[]
get_fields.each do |col|
col_indexes << row.index(col.name)
end
else
csv_row=[]
col_indexes.each do |col|
#possibly check the value here against another mysql query but that's ugly
csv_row << row[col]
end
csv_data << csv_row
end
end
The problem is that when I'm adding the content of the csv_data for output, I no longer have any connection to the original get_fields query. Therefore, I can't seem to say 'does this match the type of data expected from the db'.
I could work my way back through the same process that got me down to that level, and make another query like this
get_cleanup = BaseField.find_by_csv_col_name(first_line[col])
if get_cleanup.format==row[col].is_a
csv_row << row[col]
else
# do some data clean-up
end
but as I mentioned, that could mean the get_cleanup is run 2000+ times.
instead of doing this, is there a way to search within the original get_fields result for the name, and then get the associated field?
I tried searching for 'search rails object', but kept getting back results about building search, not searching within an already existing object.
I know I can do array.search, but don't see anything in the object api about search.
Note: The code above may not be perfect, because I'm not running it yet, just wrote that off the top of my head, but hopefully it gives you the idea of what I'm going for.
When you populate your col_indexes array, rather than storing a single value, you can store a hash which includes index and the datatype.
get_fields.each do |col|
col_info = {:row_index = row.index(col.name), :name=>col.name :format=>col.format}
col_indexes << col_info
end
You can then access all your data in the for loop