How to include duplicate records in query? - ruby-on-rails

Given an array of part ids containing duplicates, how can I find the corresponding records in my Part model, including the duplicates?
An example array of part ids would be ["1B", "4", "3421", "4"]. If we assume I have a record corresponding to each of those values I would like to see 4 records returned in total, not 3. If possible, I was hoping to be able to make additional SQL operations on whatever is returned.
Here's what I'm currently using which doesn't include the duplicates:
#parts = Part.where(:part_id => params[:ids])
To give a little background, I'm trying to upload an XML file containing a list of parts used in some item. My application is meant to parse the XML file and compare the parts listed within against my Parts database so that I can see how much the part weighs. These items will sometimes contain duplicates of various parts so that's what I'm trying to account for here.

The only way I can think of doing it is using map...
#parts = params[:ids].map { |id| Part.find_by_id(id) }

hard to tell exactly what you are doing, are you looking up weight from the xml or from your data?
parts_xml = some_method_that_loads_xml
part_ids_from_xml = part_xml.... # pull out the ids
parts = Part.where("id IN (?)", part_ids_from_xml)
now you have two arrays (xml data and your 'matching' database records) and you can use select or detect to do in memory lookups by part_id
part_ids_from_xml.each do |part_id|
weight = parts.detect { |item| item.id == part_id }.weight
puts "#{id} weighs #{weight}"
end
see http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-detect
and http://ruby-doc.org/core-2.0.0/Enumerable.html#method-i-select

Related

Sort a resource based on the number of associated resources of other type

I have a Movie model that has many comments,
I simply want to sort them (Movies) using SQL Inside active record based on the number of associated comments per movie.
How can we achieve a behavior like this in the most efficient way.
I want to do this on the fly without a counter cache column
you can do something like this
#top_ten_movies = Comment.includes(:movie).group(:movie_id).count(:id).sort_by{|k, v| v}.reverse.first(10)
include(:movie) this to prevent n+1 in sql
group(:movie_id) = grouping based on movie for each comment
sort_by{|k,v|v} = this will result an array of array for example [[3,15],[0,10][2,7],...]
for first part [3,15] = meaning movie with id = 3, has 15 comments
you can access array #top_ten_movies[0] = first movie which has top comments
default is ascending, with reverse you will get descending comments

RoR - inline query in array transform (collect)

I'm building a summary of data based on multiple entities - to keep things simple for eg. a list of categories and the number of items present in each category returned as json e.g.
{"report":["Fruit",35]}
#array = []
#active_rec = Category.all
#array = #active_rec.collect{ |u| [u.name, ?how to insert AR query result? }
How can I plug a value along with the name that is the result of another query eg. is it possible to perform a query inline on a current row ?
Thanks!
Made some assumptions about your date model:
Fruit.joins(:category).group('categories.id').select('categories.name, COUNT(fruits.id)')
Or (depending on how you want to handle the case of duplicate category names):
Fruit.joins(:category).group('categories.name').count('fruits.id')
Note the output will be in a different format depending on which of these you choose.

Modifying the returned value of find_by_sql

So I am pulling my hair over this issue / gotcha. Basically I used find_by_sql to fetch data from my database. I did this because the query has lots of columns and table joins and I think using ActiveRecord and associations will slow it down.
I managed to pull the data and now I wanted to modify returned values. I did this by looping through the result ,for example.
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
end
What I found out is that project['mycolumn'] was not changed at all.
So my question:
Does find_by_sql return an array Hashes?
Is it possible to modify the value of one of the attributes of hash as stated above?
Here is the code : http://pastie.org/4213454 . If you can have a look at summarize_roles2() that's where the action is taking place.
Thank you. Im using Rails 2.1.1 and Ruby 1.8. I can't really upgrade because of legacy codes.
Just change the method above to access the values, print value of project and you can clearly check the object property.
The results will be returned as an array with columns requested encapsulated as attributes of the model you call this method from.If you call Product.find_by_sql then the results will be returned in a Product object with the attributes you specified in the SQL query.
If you call a complicated SQL query which spans multiple tables the columns specified by the SELECT will be attributes of the model, whether or not they are columns of the corresponding table.
Post.find_by_sql "SELECT p.title, c.author FROM posts p, comments c WHERE p.id = c.post_id"
> [#<Post:0x36bff9c #attributes={"title"=>"Ruby Meetup", "first_name"=>"Quentin"}>, ...]
Source: http://api.rubyonrails.org/v2.3.8/
Have you tried
a = Project.find_by_sql("SELECT mycolumn, mycolumn2 FROM my_table").each do |project|
project['mycolumn'] = project['mycolumn'].split('_').first
project.save
end

searching within an already retrieved mysql result

I'm trying to limit the number of times I do a mysql query, as this could end up being 2k+ queries just to accomplish a fairly small result.
I'm going through a CSV file, and I need to check that the format of the content in the csv matches the format the db expects, and sometimes I try to accomplish some basic clean-up (for example, I have one field that is a string, but is sometimes in the csv as jb2003-343, and I need to strip out the -343).
The first thing I do is get from the database the list of fields by name that I need to retrieve from the csv, then I get the index of those columns in the csv, then I go through each line in the csv and get each of the indexed columns
get_fields = BaseField.find_by_group(:all, :conditions=>['group IN (?)',params[:group_ids]])
csv = CSV.read(csv.path)
first_line=csv.first
first_line.split(',')
csv.each_with_index do |row|
if row==0
col_indexes=[]
csv_data=[]
get_fields.each do |col|
col_indexes << row.index(col.name)
end
else
csv_row=[]
col_indexes.each do |col|
#possibly check the value here against another mysql query but that's ugly
csv_row << row[col]
end
csv_data << csv_row
end
end
The problem is that when I'm adding the content of the csv_data for output, I no longer have any connection to the original get_fields query. Therefore, I can't seem to say 'does this match the type of data expected from the db'.
I could work my way back through the same process that got me down to that level, and make another query like this
get_cleanup = BaseField.find_by_csv_col_name(first_line[col])
if get_cleanup.format==row[col].is_a
csv_row << row[col]
else
# do some data clean-up
end
but as I mentioned, that could mean the get_cleanup is run 2000+ times.
instead of doing this, is there a way to search within the original get_fields result for the name, and then get the associated field?
I tried searching for 'search rails object', but kept getting back results about building search, not searching within an already existing object.
I know I can do array.search, but don't see anything in the object api about search.
Note: The code above may not be perfect, because I'm not running it yet, just wrote that off the top of my head, but hopefully it gives you the idea of what I'm going for.
When you populate your col_indexes array, rather than storing a single value, you can store a hash which includes index and the datatype.
get_fields.each do |col|
col_info = {:row_index = row.index(col.name), :name=>col.name :format=>col.format}
col_indexes << col_info
end
You can then access all your data in the for loop

Combining table, web service data in Grails

I'm trying to figure out the best approach to display combined tables based on matching logic and input search criteria.
Here is the situation:
We have a table of customers stored locally. The fields of interest are ssn, first name, last name and date of birth.
We also have a web service which provides the same information. Some of the customers from the web service are the same as the local file, some different.
SSN is not required in either.
I need to combine this data to be viewed on a Grails display.
The criteria for combination are 1) match on SSN. 2) For any remaining records, exact match on first name, last name and date of birth.
There's no need at this point for soundex or approximate logic.
It looks like what I should do is extract all the records from both inputs into a single collection, somehow making it a set on SSN. Then remove the blank ssn.
This will handle the SSN matching (once I figure out how to make that a set).
Then, I need to go back to the original two input sources (cached in a collection to prevent a re-read) and remove any records that exist in the SSN set derived previously.
Then, create another set based on first name, last name and date of birth - again if I can figure out how to make a set.
Then combine the two derived collections into a single collection. The collection should be sorted for display purposes.
Does this make sense? I think the search criteria will limit the number of record pulled in so I can do this in memory.
Essentially, I'm looking for some ideas on how the Grails code would look for achieving the above logic (assuming this is a good approach). The local customer table is a domain object, while what I'm getting from the WS is an array list of objects.
Also, I'm not entirely clear on how the maxresults, firstResult, and order used for the display would be affected. I think I need to read in all the records which match the search criteria first, do the combining, and display from the derived collection.
The traditional Java way of doing this would be to copy both the local and remote objects into TreeSet containers with a custom comparator, first for SSN, second for name/birthdate.
This might look something like:
def localCustomers = Customer.list()
def remoteCustomers = RemoteService.get()
TreeSet ssnFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.ssn <=> c2.ssn}))
ssnFilter.addAll(localCustomers)
ssnFilter.addAll(remoteCustomers)
TreeSet nameDobFilter = new TreeSet(new ClosureComparator({c1, c2 -> c1.firstName + c1.lastName + c1.dob <=> c2.firstName + c2.lastName + c2.dob}))
nameDobFilter.addAll(ssnFilter)
def filteredCustomers = nameDobFilter as List
At this point, filteredCustomers has all the records, except those that are duplicates by your two criteria.
Another approach is to filter the lists by sorting and doing a foldr operation, combining adjacent elements if they match. This way, you have an opportunity to combine the data from both sources.
For example:
def combineByNameAndDob(customers) {
customers.sort() {
c1, c2 -> (c1.firstName + c1.lastName + c1.dob) <=>
(c2.firstName + c2.lastName + c2.dob)
}.inject([]) { cs, c ->
if (cs && c.equalsByNameAndDob(cs[-1])) {
cs[-1].combine(c) //combine the attributes of both records
cs
} else {
cs << c
}
}
}

Resources