Rails Migration Prepend Column? - ruby-on-rails

I'm trying to figure out how to run a migration that prepends a string to the beginning of a column. In this specific case I have a column called url that currently stores everything after the domain (e.g. /test.html). However I now want to prepend a single string http://google.com to the beginning of the url. In the case of this example, the resulting string value of url for this entry would be http://google.com/test.html.
How can I accomplish this with a migration?

I'm not sure this really qualifies as something you should put into a migration; generally, migrations change the structure of your database, rather than change the format of the data inside of it.
The easiest and quickest way to do this would not be to futz around in your database at all, and instead just make the url method of that model return something like "http://google.com#{read_attribute(:url)}". If you really want to change the data in your database, I'd make a rake task to do it, something like:
namespace :data do
task :add_domain do
Model.each do |model|
model.url = "http://google.com#{model.url}" if model.url !~ /google\.com/
model.save if model.changed?
end
end
end
If this must be a migration for you, then your migration's up would look very similar to the internals of that rake task. (Or it would call that rake task directly.)

you could use migration or a rake task to do this.
If you want to run it as a migration,
def up
execute("update TABLE set url = 'http://google.com' || url") // '||' concatenates string in postgres. Use the function provided by your database
end
def down
//this is little tricky. I would advice to leave this empty
end

Related

How to import CSV into database on Rails whilst skipping callbacks

I am currently trying to import over 40 CSV's exported from sqlite3 into oracle db but I seem to have issues whilst importing some of the CSV's, into the corresponding tables.
The code line with:
class_name.create!(row.to_hash)
produces errors on some classes because the callbacks are also triggered when the
.create!() method is called
def import_csv_into_db
Dir.foreach(Rails.root.join('db', 'csv_export')) do |filename|
next if filename == '.' or filename == '..' or filename == 'extract_db_into_csv.sh' or filename =='import_csv.rb'
filename_renamed = File.basename(filename, File.extname(filename)).chomp('s').titleize.gsub(/\s+/, "")
CSV.foreach(Rails.root.join('db', 'csv_export',filename), headers: true) do |row|
class_name = Object.const_get(filename_renamed)
puts class_name
class_name.create!(row.to_hash)
puts "Insert on table #{class_name}s complete with: #{row.to_hash}"
end
end
end
The issue at hand is that my CSV import function is in the seeds.rb, so whenver I run bundle exec rake db:seed the CSV's are imported.
How exactly can I avoid the callbacks being triggered when class_name.create!(row.to_hash) is triggered within the function in the seeds.rb ?
In my customer.rb I have callbacks such as:
after_create :add_default_user or after_create :add_build_config
I'd like to manipulate my function within the seeds.rb to skip the callbacks when the function tries importing a CSV file like customers.csv (which would logically call Customer.create!(row.to_hash)).
There are lower level methods which will not run callbacks. For example, instead of create! you can call insert!. Instead of destroy you can call delete.
Side note: use insert_all! to bulk insert multiple rows at once. Your import will be much faster and it does not use validations. Though I would recommend the more flexible active-import instead.
However, skipping callbacks might cause problems if they are necessary for the integrity of the data. If you delete instead of destroy associated data may not be deleted, or you may get errors because of referential integrity. Be sure to add on delete actions on your foreign keys to avoid this. Then the database itself will take care of it.
Consider whether your db:seeds is doing too much. If importing this CSV is a hindrance to seeding the database, consider if it should be a separate rake task instead.
Consider whether your callbacks can be rewritten to be idempotent, that is to be able to run multiple times. For example, after_create :add_default_user should recognize there already is a default user and not try to re-add it.
Finally, consider whether callbacks which are run every time a model is created are the correct place to do this work.

How to make rake task fall over if data is equal to something

I'm currently looking for a way to make my whole rake task finish if something the column in the database starts with www.google.co.uk/
I've tried doing this
if ('row[14] LIKE ?', "http://www.google.co.uk/")
else
everything else in the rake task
Also, If the link is
http://www.google.co.uk/hgfadasdyfyasgfyregyfhsagtyfgtfae/fsafsasfs/sfaf/asfa/dd
Would this return true?
Thanks
Sam
Using the "LIKE" clause requires that you have wildcards in the test string, in the form of a "%" sign. Eg
where(['row[14] LIKE ?', "http://www.google.co.uk%"])
In your case it's hard to advise exactly what the code should be, since there's no context or any other information about what you are actually doing. You seem to be trying to use an sql query fragment inside an "if" test which doesn't really make sense: you would do an if test on an object.
If you want to do a database search then use the syntax above. If you have already loaded the record into a Rails object then you could test the string with a regex, eg
if url =~ /^http\:\/\/www\.google\.co\.uk/
BTW, this is the sort of thing that is super easy to google for, and it's better to do that before coming to stack overflow with a very simple question like this.

Seeds file - adding id column to existing data

My seeds file populated the countries table with a list of countries. But now it needs to be changed to hard-code the id (instead of rails generating the id column for me).
I added the id column and values as per below:
zmb: {id: 103,code: 'ZMB', name: Country.human_attribute_name(:zambia, default: 'Error!'), display_order: nil, create_user: user, update_user: user, eff_date: Time.now, exp_date: default_exp_date},
skn: {id: 104,code: 'SKN', name: Country.human_attribute_name(:st_kitts_and_nevis, default: 'Error!'), display_order: nil, create_user: user, update_user: user, eff_date: Time.now, exp_date: default_exp_date}
countries.each { |key, value| countries_for_later[key] = Country.find_or_initialize_by(id: value[:id]); countries_for_later[key].assign_attributes(value); countries_for_later[key].save!; }
Above it just a snippet. I have added an id: for every country.
But when I run db:seed I get the following error:
ActiveRecord::RecordInvalid: Validation failed: Code has already been taken
I am new to rails so I'm not sure what is causing this - is it because the ID column already exists in the database?
What I think is happening is you have existing data in your database ... let's say
[{id:1 , code: 'ABC'},
{id:2 , code: 'DEF'}]
Now you run your seed file which has {id: 3, 'DEF'} for example.
Because you are using find_or_initialize_by with id you are running into errors. Since you can potentially insert duplicates.
I recon you should just clear your data, but you can try doing find_or_initialize_by using code instead of id. That way you wont ever have a problem of trying to create a duplicate country code.
Country.find_or_initialize_by(code: value[:code])
I think you might run into problems with your ids, but you will have to test that. It's generally bad practice to do what you are doing. Whether they ids change or now should be irrelevant. Your seed file should reference the objects that are being created not ids.
Also make sure you aren't using any default_scopes ... this would affect how find_or_initialize_by works.
The error is about Code: Code has already been taken. You've a validation which says Code should be uniq. You can delete all Countries and load seeds again.
Run this in the rails console:
Country.delete_all
Then re-run the seed:
rake db:seed
Yes, it is due to duplicate entry. In that case run ModelName.delete_all in your rails console and then run rake db:seed again being in the current project directory. Hope this works.
ActiveRecord::RecordInvalid: Validation failed: Code has already been taken
is the default error message for the uniqueness validator for :code.
Running rake db:reset will definitely clear and reseed your database. Not sure about the hardcoded ids though.
Check this : Overriding id on create in ActiveRecord
you will have to disable protection with
save(false)
or
Country.create(attributes_for_country, without_protection: true)
I haven't tested this though, be careful with your validators.
Add the line for
countries_for_later[key].id = value[:id]
the problem is that you can't set :id => value[:id] to Country.new because id is a special attribute, and is automatically protected from mass-assignment
so it will be:
countries.each { |key, value|
countries_for_later[key] = Country.find_or_initialize_by(id: value[:id])
countries_for_later[key].assign_attributes(value)
countries_for_later[key].id = value[:id] if countries_for_later[key].new_record?
countries_for_later[key].save(false)
}
The ids data that you are using in your seeds file: does that have any meaning outside of Rails? Eg
zmb: {id: 103,code: 'ZMB',
is this some external data for Zambia, where 103 is it's ID in some internationally recognised table of country codes? (in my countries database, Zambia's "numcode" value is 894). If it is, then you should rename it to something else, and let Rails decide what the id field should be.
Generally, mucking about with the value of ID in rails is going to be a pain in the ass for you. I'd recommend not doing it. If you need to do tests on data, then use some other unique field (like 'code') to test whether associations etc have been set up, or whatever you want to do, and let Rails worry about what value to use for ID.

How to write Rake task to import data to Rails app?

Goal: Using a CRON task (or other scheduled event) to update database with nightly export of data from an existing system.
All data is created/updated/deleted in an existing system. The website does no directly integrate with this system, so the rails app simply needs to reflect the updates that appear in the data export.
I have a .txt file of ~5,000 products that looks like this:
"1234":"product name":"attr 1":"attr 2":"ABC Manufacturing":"2222"
"A134":"another product":"attr 1":"attr 2":"Foobar World":"2447"
...
All values are strings enclosed in double quotes (") that are separated by colons (:)
Fields are:
id: unique id; alphanumeric
name: product name; any character
attribute columns: strings; any character (e.g., size, weight, color, dimension)
vendor_name: string; any character
vendor_id: unique vendor id; numeric
Vendor information is not normalized in the current system.
What are best practices here? Is it okay to delete the products and vendors tables and rewrite with the new data on every cycle? Or is it better to only add new rows and update existing ones?
Notes:
This data will be used to generate Orders that will persist through nightly database imports. OrderItems will need to be connected to the product ids that are specified in the data file, so we can't rely on an auto-incrementing primary key to be the same for each import; the unique alphanumeric id will need to be used to join products to order_items.
Ideally, I'd like the importer to normalize the Vendor data
I cannot use vanilla SQL statements, so I imagine I'll need to write a rake task in order to use Product.create(...) and Vendor.create(...) style syntax.
This will be implemented on EngineYard
I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.
If you have a Product active record model, you can do:
p = Product.find_or_initialize_by_identifier(<id you get from file>)
p.name = <name from file>
p.size = <size from file>
etc...
p.save!
The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.
To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.
Inside data.rake, it would look something like this:
namespace :data do
desc "import data from files to database"
task :import => :environment do
file = File.open(<file to import>)
file.each do |line|
attrs = line.split(":")
p = Product.find_or_initialize_by_identifier(attrs[0])
p.name = attrs[1]
etc...
p.save!
end
end
end
Than to call the rake task, use "rake data:import" from the command line.
Since Products don't really change that often, the best way I would see is to update only the records that change.
Get all the deltas
Mass update using a single SQL statement
If you are having your normalization code in the models, you could use Product.create and Vendor.create or else it would be just a overkill. Also, Look into inserting multiple records in a single SQL transaction, its much faster.
Create an importer rake task that is cronned
Parse the file line by line using Faster CSV or via vanilla ruby like:
file.each do |line|
products_array = line.split(":")
end
Split each line on the ":" and push in into a hash
Use a find_or_initialize to populate your db such as:
Product.find_or_initialize_by_name_and_vendor_id("foo", 111)

Writing an ActiveRecord plugin for Rails

I'm writing my first rails plugin and could use a little help. In a very simplified way, I'd like to do allow the developer to specify a value which I can count through a rake task. I'm thinking of something like this...
class User < ActiveRecord::Base
monitor "Users", count
monitor "Active Users", count("activated_at != NULL")
end
I guess monitor needs to be a class method of ActiveRecord::Base but how/where do I specify it in my plugin?
The argument to the monitor function shouldn't be the value but a block of code to execute. I'm not quite sure of the best way to specify this and keep the syntax simple. Perhaps it'll have to be monitor "Active Users", {count "activated_at != NULL"}?
I'd prefer if the developer didn't have to specify User.count, just count, i.e. it would pick up the Class automatically (and the blocks will be called on the class not the instance). If this isn't possible, I guess there's no reason to put the monitor statements into the model (see #5).
The actual counting of these values (i.e., execution of the blocks) will be done by a rake task offline. What should the monitor function do to make these blocks available to the rake task? Store them in a class variable?
Perhaps the monitor statements don't need to be specified in the model at all. Maybe it clutters it up so I'd welcome any alternative places to put them.
I'm just sketching out my ideas at the moment and trying to figure out what is/isn't possible in Ruby. Any help appreciated.
Update: I'll try to be clearer on the plugin's purpose. I want the developer to be able to define metrics which should be monitored by the rake task. The rake task will iterate over those metrics and write the values to a file (I've simplified this a bit). The rake task will be very simple, something like rake monitors:update (i.e., no params required)
You are probably putting the definition of the rake tasks in the wrong place. The model should only contain logic that is valid for any of its consumers, and not concern itself with specific applications like rake.
A better approach may be to define some named scopes in your models, and specify the actions you wish to be available in your rake tasks. The named scopes can be reused easily in other areas of your application. A model may look like this (note that this is a Rails feature -- no work required on your part):
class User < ActiveRecord::Base
named_scope :active_users, :conditions => "activated_at != NULL"
end
And then you would create a very simple DSL that can be used within rake files (e.g. in lib/tasks/count.rake). Something that will allow you to do this, for example:
require "your-plugin"
namespace :count do
# Make your plugin rewrite this internally to User.count
YourPlugin::CountTask.new :users
# Make your plugin rewrite this to User.active_users.count
YourPlugin::CountTask.new :users, :active_users
# Perhaps allow usage of blocks as well?
YourPlugin::CountTask.new :users, :complicated do
User.count(complex_conditions)
end
end
This should then provide the user with tasks named count:users, count:users:active_users and count:users:complicated.
Try looking at the code for named_scope
whats the design for the rake task looking like?
rake monitor:user:active_users ?
OT:
activated_at is not null is the SQL that you want
Come to think of it, why not forget defining monitor, and just use named_scopes ? where instead of returning a select *, you do a select count(*)
Something like this should do what you want:
module Monitored
##monitors = []
def self.monitor(name, method)
##monitors.push [name, method]
end
def self.run_monitor(name)
send ##monitors.select{|m| m[0] == name}[0][1]
end
end
Untested, but you get the idea, I hope.
Thanks for all your help, however I went with a different approach (extracted below).
Instead of specifying the attributes in the models, I used an approach seen in the whenever gem. I placed a ruby file "dashboard.rb" in my config directory:
dashboard "Users", User.count
dashboard "Activated Users", User.count('activated_at')
My lib consists of two functions:
def self.dashboard(name, attribute)
puts "** dailydashboard: #{name} = #{attribute.to_s}"
end
def self.update(file)
eval File.read(file)
end
Basically, my rake task calls update, which loads dashboard.rb and evaluates it and repeatedly calls the dashboard function, which outputs this:
** dailydashboard: Users = 2
** dailydashboard: Activated Users = 1
Sorry for going around the houses a little bit. For background/offline things this seems like a very simple approach and does what I need. Thanks for your help though!

Resources