new records insertion into database table in my case - ruby-on-rails

I am developing a Rails v2.3 application which is a service to search projects' information where projects info are stored in database.
There is an existing projects table in the database like following:
For the sake of satisfying customer’s requirement, this table needs to insert new data in the mid-night everyday.
The reason of creating these new records is to make the Rails application be able to search projects by a single word besides searching by the full name.
For example, if search by word "portal", both Car rental portal and Position track portal records should be found by the Rails application. That's the app.'s database needs to have all the records of each single word from project_name.
So, my plan is to generate those new records by spliting the value in project_name column (of the above projects table) into single words and then use each single word as a new record's project_name while keep other columns of the record unchanged.
For example, in above table, the first record has project_name "Car rental portal", what I gonna do is to split this string into 3 words and construct the following three new records to be inserted into the table:
To achieve this. I tried to make a rake task which gets all records from the original projects table, and for each record, the rake task splits the string value of project_name column into words, then construct the new records with words and insert into the table. My rake task looks like the code below:
all_records = ActiveRecord::Base.execute("select * from projects;")
all_records.each do |record|
user_id = record[0]
project_name=record[1]
department = record[2]
other = record[3]
words=project_name.split()
words.each do |word|
sql = "insert into project values (#{user_id},#{word},#{department},#{other});"
ActiveRecord::Base.execute(sql)
end
end
The rake task works well, it creates the expected new records and inserted into the projects table, BUT the problem is it takes 36 hours to complete!
It is understandable since the origin table is very very large, if split the string to words and create the new record it's like create a 3 times larger table (suppose each string of project_name has 3 words).
My question:
Could some Rails experts suggest me some more efficient way to achieve the new record insertion thing I described above?
Or any new way to enable single word search in my case? (That's do not use the way I designed to have each single word store in the database.)

If you do this only for searching purpose, why don't you use Sunspot ? It supports full text search.
Splitting project name sounds like a really bad idea for me.
But if you want to take it less time, then I'd encourage you to split this single task into more rake tasks, that would do the same, but for other set of projects.

For faster importing, you want to use activerecord-import, it will speed up your execution by a couple orders of magnitude.
columns = [:title, :project_name, :department, :other]
values = all_records.inject([]) do |values_arr, record|
user_id, project_name, department, other = record
project_name.split.each do |name|
values_arr << [user_id, name, department, other]
end
values_arr
end
class TempModel < ActiveRecord::Base; set_table_name "projects"; end
TempModel.import columns, values, :validate=>false

Related

Ruby on Rails, creating a many to many association after an activerecord-import

I am creating a few records programmatically based on a users input and creating an array of records to import.
When I check the database I can see the relationship has been created if they are new records.
If one of the records already exists in the database I can see an entry of the following in the association table but I can also see the new records have been created in their respective table so they exist but the records ID is not being updated in the association table.
user_id: 1
keyword_id: null
but if I run the code for a second time it will add the relationship correctly.
This is my code
records_to_add = []
words.each do |word|
keyword = Keyword.find_or_initialize_by(
word: word,
device: device,
)
records_to_add.push(keyword)
end
keywords_added = Keyword.import records_to_add, on_duplicate_key_ignore: true, validate: true
user.keywords << records_to_add
I think there is something wrong with this part of the code
user.keywords << records_to_add
It isn't creating the relationship correctly if one of the records already exists...
You are calling 'find_or_initialize_by' in your words loop, and then importing those records, which creates a new row in your Keyword table for all the new records.
So far, so good.
Then your script takes the first list (persisted and new records) and attempts to associate them to the user. At this point, it creates associations for existing Keyword records, but tries to create new Keyword records again for the ones that it just created in the import and associate those. These probably fail a unique validation at that point, and are not associated nor persisted.
That leaves you with just the unassociated but newly created records.

ruby on rails find.all where and append results to different model

OK, so I have a table of data called "Projects" where I have an ID and a name. Let's say its in the world of zillions of objects and I'm not allowed to add any relationships to it, or columns, indexes or anything like that. I want to work with a specific set of these projects that has "TI" as the seventh and eighth digit of the name, and I want to store them in another model called TI Projects where there is an ID and name column as well, so I can create relationships, add columns, etc.
So, what I want to do is find all Projects where the 7th and 8th digit of name are TI, and then I want to insert those that are found in to the TI Projects model where name is carried over from Projects to TI Projects model. I'm having a hard time with this, obviously. I'm doing
#tiprojects = Projects.all(:conditions => { :name => %%%%%%TI% } )
#tiprojects.create

Constructing a 1-many relationship with custom string foreign keys in PGSQL ActiveRecord

I have the following tables (Showing only the relevant fields):
lots
history_id
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
users
username
So I am rebuilding an exisiting application that dealt with a rather large amount of bureaucracy, and needs to keep track of five separate dates (as shown in the histories table). The problem that I am having is that I don't know how best to model this in ActiveRecord, historically it's been done by having the histories tables represented as so:
histories
initial_date
updated_date
r_doc_date
l_doc_date
datasheet_finalized_date
username
Where only one of the five date fields could ever be filled at one time...which in my opinion is a terrible way to go about modeling this...
So basically I want to build a unique queryable connection between every date in the histories table and its specific relevant user. Is it possible to use every timestamp in the histories table as a foreign key to query the specific user?
I think that there's a simpler approach to what you're trying to accomplish. It sounds like you want to be able to query each lot and find the 'relevant user' (I am guessing that this refers to the user who did whatever action is necessary to update the specific column on the histories table). To do this I would first create a join table between users and histories, called user_histories:
user_histories
user_id
history_id
I would create a row on this table any time a lot's history is updated and one of the relevant dates changes. But that now brings up the issue of being able to differentiate which specific date-type the user actually changed (since there are five). Instead of using each one as a foreign key (since they wouldn't necessarily be unique) I would recommend creating a 'history_code' on the user_histories table to represent each one of the history date-types (much like how a polymorphic_type is used). Resulting in the user_histories table looking like this:
user_histories
user_id
history_id
history_code
And an example record looking like this:
UserHistory.sample = {
user_id: 1,
history_id: 1,
history_code: "Initial"
}
Allowing you to query the specific user who changed a record in the histories table with the following:
history.user_histories.select { |uhist| hist.history_code == "Initial" }
I would recommend building these longer queries out into model methods, allowing for a faster, cleaner query down the line, for example:
#app/models/history.rb
def initial_user
self.user_histories.select { |uhist| hist.history_code == "Initial" }
end
This should give you the results you want, but should get around the whole issue of the dates not being suitable for foreign keys, since you can't guarantee their uniqueness.

In Ruby/Rails, how can I create one column in a table based on the value of two other columns in the same table?

I am trying to create a Reddit type app where the order of a list depends on a combination of the number of upvotes a link has and the created date. My plan is to create a new column in my "Links" table that combines "created_date" and "upvotes" into a "Rank Value" and then sort the list by the "Rank Value".
Is this the right approach? If so, how do I create this table column using ActiveRecord?
If there is a meta attribute that is used purely for display purposes, creating a method that will generate it on the fly would be appropriate.
If you want to use it for sorting your objects as well, it's better to store it in a column. Hopefully, it doesn't depend on things like the current time, and only on its other attributes:
before_save :calculate_rank
def calculate_rank
self.rank = self.upvotes + self.clicks * 5;
end
Unfortunately, for your use case you specifically said your column depends on the creation date, probably in terms of "how fresh is it" -- a moving target.
You can solve this two ways: by constantly increasing the rank values for newer links indefinitely, or by putting items into time buckets and updating them periodically (degrading their scores when the day or week ends, perhaps).
You can create methods in your model such as "rank_value" which would sort by your criteria and just call Model.rank_value

How to group by multiple attributes on children, and then count?

In a Rails 3.2 app I have a User model that has many Awards.
The Award class has :type, :level and :image attributes.
On a User's show page I want to show their Awards, but with some criteria. User.awards should be grouped by both type and level, and for each type-level combination I want to display its image, and a count of the awards.
I'm struggling to construct the queries and views to achieve this (and to explain this clearly).
How can I group on two attributes of a child record, and then display both a count and attribute (i.e. image) of those children?
It took me some time to figure this out because of the complicated mix of active record objects, arrays and grouped arrays.
Anyway, incase this is useful for anyone else
Given a User has many Awards, and Award has attributes :type, :level, :image.
for award in #user.awards.group_by{ |award| [award.type,award.level] }.sort_by{|award| [award[0][0], award[0][1]]}
puts "#{(award[0][0]).capitalize} - Level #{award[0][1]}" # e.g. Award_Name - Level 1
puts award[1].first.image #outputs the value of award.image, i.e. the image url
puts award[1].count #counts the number of grouped awards
end
A bit fiddly! Maybe there are ways to optimize this code?
Depending on the database you're using you have to build a custom SQL query using a GROUP BY on type and level:
SELECT * FROM users GROUP BY users.type, users.level
(Postgres has a special interpretation of the GROUP BY so check the document of the database you're using).
To write it in Rails read the documentation: http://guides.rubyonrails.org/active_record_querying.html#group
For the count you'll have to do it in a second step (Ruby could do it using the size method on the Array of ActiveRecord object the query will return you).

Resources