postgresql nextval generating existing values - ruby-on-rails

I had to migrate from a mySql based ruby on rails app to using postgresql. No problems but one so far, and I don't know how to solve it.
The migration of data brought ids along with it, and postgresql is now having problems with existing ids: it's not clear to me where it gets the value that it uses to determine the base for nextval: it certainly isn't the highest value in the column, although you might think that would be a good idea. In any case, it's now colliding with existing id values. id column, created from a standard RoR migration is defined as
not null default nextval('geopoints_id_seq'::regclass)
Is there some place that the value it uses as a base can be hacked? This problem could now arise in any of 20 or so tables: I could use
'select max(id) from <table_name>'
but that seems to make the idea of an autoincrement column pointless.
How is this best handled?

There is a reset_pk_sequences! method on the Postgres adapter. You can call it and it will set it to max(id) + 1, which is probably what you want.
In some projects I get data ETL'ed in often enough to warrant a rake task to do this for all models, or for a specified model. Here's the task - include it in some Rakefile or in it's own under lib/tasks:
desc "Reset all sequences. Run after data imports"
task :reset_sequences, :model_class, :needs => :environment do |t, args|
if args[:model_class]
classes = Array(eval args[:model_class])
else
puts "using all defined active_record models"
classes = []
Dir.glob(RAILS_ROOT + '/app/models/**/*.rb').each { |file| require file }
Object.subclasses_of(ActiveRecord::Base).select { |c|
c.base_class == c}.sort_by(&:name).each do |klass|
classes << klass
end
end
classes.each do |klass|
next if klass == CGI::Session::ActiveRecordStore::Session && ActionController::Base.session_store.to_s !~ /ActiveRecordStore/
puts "reseting sequence on #{klass.table_name}"
ActiveRecord::Base.connection.reset_pk_sequence!(klass.table_name)
end
end
Now you can run this either for all models (defined under RAIS_ROOT/app/models) using rake reset_sequences, or for a specific model by passing in a class name.

The rails 3 version looks like this:
namespace :db do
desc "Reset all sequences. Run after data imports"
task :reset_sequences, :model_class, :needs => :environment do |t, args|
if args[:model_class]
classes = Array(eval args[:model_class])
else
puts "using all defined active_record models"
classes = []
Dir.glob(RAILS_ROOT + '/app/models/**/*.rb').each { |file| require file }
ActiveRecord::Base.subclasses.select { |c|c.base_class == c}.sort_by(&:name).each do |klass|
classes << klass
end
end
classes.each do |klass|
puts "reseting sequence on #{klass.table_name}"
ActiveRecord::Base.connection.reset_pk_sequence!(klass.table_name)
end
end
end
https://gist.github.com/909032

with that definition, the column will get the next value from the geopoints_id_seq sequence.
That sequence is not directly attached to the table. If you're migrating data, you have to create or update that sequence so its starting point is larger than the current max id in your table.
You should be able to set its new value with e.g.
ALTER SEQUENCE geopoints_id_seq RESTART with 1692;
Or whatever select max(id) from table_name; yields

PG uses sequences :
Make it's current value 1 higher than the highest value in your table like this.
SELECT setval('geopoints_id_seq', 999999999, true);
Also see these
http://www.postgresql.org/docs/8.4/interactive/datatype-numeric.html#DATATYPE-SERIAL
http://www.postgresql.org/docs/8.4/interactive/functions-sequence.html

Use setval() to set the starting value for the sequence.

Related

Rails: Code optimization/restructuring requested

I have the following code snippet that works perfectly and as intended:
# Prepares the object design categories and connects them via bit mapping with the objects.design_category_flag
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with one transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for object design category
ObjectDesignCategory.where('0 = (#rownum:=0)').update_all('bit_flag_position = 1 << (#rownum := 1 + #rownum)')
# Resets the object design category flag
Object.update_all(design_category_flag: 0)
# Sets the new object design category bit flag
object_group_relation = Object.joins(:object_design_categories).select('BIT_OR(bit_flag_position) AS flag, objects.id AS object_id').group(:id)
join_str = "JOIN (#{object_group_relation.to_sql}) sub ON sub.object_id = objects.id"
Object.joins(join_str).update_all('design_category_flag = sub.flag')
end
But in my opinion it is quite difficult to read. So I tried to rewrite this code without raw SQL. What I created was this:
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with via transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for the object color group
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
# Resets the object color group flag
Object.update_all(color_group_flag: 0)
# Sets the new object color group bit flag
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
end
end
This also works fine, but when I run a benchmark for about 2000+ records, the second option is about a factor of 65 slower than the first. So my question is:
Does anyone have an idea how to redesign this code so that it doesn't require raw SQL and is still fast?
I can see 2 sources of slowing:
N+1 problem
Instantiating objects
Calls to DB
This code has the N+1 Problem. I think this may be the major cause of the slowing.
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
Change to
Object.includes(:object_color_groups).find_each do |object|
...
end
You can also use Object#update class method on this code (see below).
I don't think you can get around #2 without using raw SQL. But, you will need many objects (10K or 100K or more) to see a big difference.
To limit the calls to the DB, you can use Object#update class method to update many at once.
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
to
color_groups = ObjectColorGroup.with_index.map do |group, index|
[group.id, { bit_flag_position: group.bit_flag_position: 1 << index }]
end.to_h
ObjectColorGroup.update(color_groups.keys, color_groups.values)
The following is a single query, so no need to change.
Object.update_all(color_group_flag: 0)
Reference:
ActiveRecord#update class method API
ActiveRecord#update class method blog post
Rails Eager Loading

Check if a value exists in a database and return the value if it does Rails

I am running Rails here with a User model that contains an :email_address column
I have an array - emails_to_check[email1,email2,email3] that i want to check if they exist in the User database.
I only want to return the values that DO exist in the database
Here's a simple one-liner for you. There may be more performant ways, but this is maybe the most straight-forward and idiomatic Rails.
emails_to_check = ['email1', 'email2', 'email3']
User.where(email_address: emails_to_check).pluck(:email_address)
Here is the resulting SQL query:
SELECT `users`.`email_address` FROM `users` WHERE `users`.`email_address` IN ('email1', 'email2', 'email3');
so i solved this using a rake task
task :find_users_in_array,[:emails_to_find] => :environment do |task, args|
emails = args[:emails_to_find].split
emails.each do |email|
if User.find_by(email:email)
puts "#{email}"
end
end
end
I can pass in a list using rake:find_users_in_array[email1 email2 email3]

Running a 1 time task to enter values into database in Ruby

I have now added a new column to my table in database. I want to add some values to some rows in this new column. I know the logic and all. But actually I dont know the way to add this, or write a 1 time task to do this in ruby on rails. Can any one help me. I just need some idea.
data = Model.where(#your_condition)
if the value is same for all
data.update_all(:new_column => "new value")
if the value is different for all
data.each do |d|
d.update_attributes(:new_column => "some value")
end
You can create a rake task for this, and run it once.
Create a file lib/tasks/my_namespace.rake
namespace :my_namespace do
desc "My Task description"
task :my_task => :environment do
# Code to make db change goes here
end
end
You can invoke the task from the command line in project root folder like
rake my_namespace:my_task RAILS_ENV=production

convert string to model in Rails rake task

I'm trying to run a quick rake task on all my Rails models but haven't been able to call them because this piece of code tells me that I can't call the method columns on a string.
I tried classify instead of camelize and it hasn't worked either, tried inserting a class_eval in there as well, but that dosen't seem to work here / don't know too much about it.
task :collect_models_and_field_names => :environment do
models = Dir.glob("#{models_path}/*").map do |m|
m.capitalize.camelize.columns.each { |n| puts n.name }
end
I do know that this worked so I would have manual access to the model if I needed, but I don't really want to do that...
Model.columns.each { |c| puts c.name }
Try
Kernel.const_get(m.classify).columns
classify just changes the string to look like a class -- i.e. with a capital letter and in camelcase, singular.
after using classify to make the string look like a class/model, you need to use constantize, which actually takes the string and converts it into a class.
See:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-constantize
You can use something like this:
models = Dir[Rails.root.join("app", "models", "*.rb")].map do |m|
model = File.basename(m, ".rb").classify.constantize
model.columns.each { |n| puts n.name }
end

Ruby on Rails - Import Data from a CSV file

I would like to import data from a CSV file into an existing database table. I do not want to save the CSV file, just take the data from it and put it into the existing table. I am using Ruby 1.9.2 and Rails 3.
This is my table:
create_table "mouldings", :force => true do |t|
t.string "suppliers_code"
t.datetime "created_at"
t.datetime "updated_at"
t.string "name"
t.integer "supplier_id"
t.decimal "length", :precision => 3, :scale => 2
t.decimal "cost", :precision => 4, :scale => 2
t.integer "width"
t.integer "depth"
end
Can you give me some code to show me the best way to do this, thanks.
require 'csv'
csv_text = File.read('...')
csv = CSV.parse(csv_text, :headers => true)
csv.each do |row|
Moulding.create!(row.to_hash)
end
Simpler version of yfeldblum's answer, that is simpler and works well also with large files:
require 'csv'
CSV.foreach(filename, headers: true) do |row|
Moulding.create!(row.to_hash)
end
No need for with_indifferent_access or symbolize_keys, and no need to read in the file to a string first.
It doesnt't keep the whole file in memory at once, but reads in line by line and creates a Moulding per line.
The smarter_csv gem was specifically created for this use-case: to read data from CSV file and quickly create database entries.
require 'smarter_csv'
options = {}
SmarterCSV.process('input_file.csv', options) do |chunk|
chunk.each do |data_hash|
Moulding.create!( data_hash )
end
end
You can use the option chunk_size to read N csv-rows at a time, and then use Resque in the inner loop to generate jobs which will create the new records, rather than creating them right away - this way you can spread the load of generating entries to multiple workers.
See also:
https://github.com/tilo/smarter_csv
You might try Upsert:
require 'upsert' # add this to your Gemfile
require 'csv'
u = Upsert.new Moulding.connection, Moulding.table_name
CSV.foreach(file, headers: true) do |row|
selector = { name: row['name'] } # this treats "name" as the primary key and prevents the creation of duplicates by name
setter = row.to_hash
u.row selector, setter
end
If this is what you want, you might also consider getting rid of the auto-increment primary key from the table and setting the primary key to name. Alternatively, if there is some combination of attributes that form a primary key, use that as the selector. No index is necessary, it will just make it faster.
This can help. It has code examples too:
http://csv-mapper.rubyforge.org/
Or for a rake task for doing the same:
http://erikonrails.snowedin.net/?p=212
It is better to wrap the database related process inside a transaction block. Code snippet blow is a full process of seeding a set of languages to Language model,
require 'csv'
namespace :lan do
desc 'Seed initial languages data with language & code'
task init_data: :environment do
puts '>>> Initializing Languages Data Table'
ActiveRecord::Base.transaction do
csv_path = File.expand_path('languages.csv', File.dirname(__FILE__))
csv_str = File.read(csv_path)
csv = CSV.new(csv_str).to_a
csv.each do |lan_set|
lan_code = lan_set[0]
lan_str = lan_set[1]
Language.create!(language: lan_str, code: lan_code)
print '.'
end
end
puts ''
puts '>>> Languages Database Table Initialization Completed'
end
end
Snippet below is a partial of languages.csv file,
aa,Afar
ab,Abkhazian
af,Afrikaans
ak,Akan
am,Amharic
ar,Arabic
as,Assamese
ay,Aymara
az,Azerbaijani
ba,Bashkir
...
The better way is to include it in a rake task. Create import.rake file inside /lib/tasks/ and put this code to that file.
desc "Imports a CSV file into an ActiveRecord table"
task :csv_model_import, [:filename, :model] => [:environment] do |task,args|
lines = File.new(args[:filename], "r:ISO-8859-1").readlines
header = lines.shift.strip
keys = header.split(',')
lines.each do |line|
values = line.strip.split(',')
attributes = Hash[keys.zip values]
Module.const_get(args[:model]).create(attributes)
end
end
After that run this command in your terminal rake csv_model_import[file.csv,Name_of_the_Model]
I know it's old question but it still in first 10 links in google.
It is not very efficient to save rows one-by-one because it cause database call in the loop and you better avoid that, especially when you need to insert huge portions of data.
It's better (and significantly faster) to use batch insert.
INSERT INTO `mouldings` (suppliers_code, name, cost)
VALUES
('s1', 'supplier1', 1.111),
('s2', 'supplier2', '2.222')
You can build such a query manually and than do Model.connection.execute(RAW SQL STRING) (not recomended)
or use gem activerecord-import (it was first released on 11 Aug 2010) in this case just put data in array rows and call Model.import rows
refer to gem docs for details
Use this gem:
https://rubygems.org/gems/active_record_importer
class Moulding < ActiveRecord::Base
acts_as_importable
end
Then you may now use:
Moulding.import!(file: File.open(PATH_TO_FILE))
Just be sure to that your headers match the column names of your table
The following module can be extended on any model and it will import the data according to the column headers defined in the CSV.
Note:
This is a great internal tool, for customer use I would recommend adding safeguards and sanitization
The column names in the CSV must be exactly like the DB schema or it won't work
It can be further improved by using the table name to get the headers vs defining them in the file
Create a file named "csv_importer.rb" in your models/concerns folder
module CsvImporter
extend ActiveSupport::Concern
require 'csv'
def convert_csv_to_book_attributes(csv_path)
csv_rows = CSV.open(csv_path).each.to_a.compact
columns = csv_rows[0].map(&:strip).map(&:to_sym)
csv_rows.shift
return columns, csv_rows
end
def import_by_csv(csv_path)
columns, attributes_array = convert_csv_to_book_attributes(csv_path)
message = ""
begin
self.import columns, attributes_array, validate: false
message = "Import Successful."
rescue => e
message = e.message
end
return message
end
end
Add extend CsvImporter to whichever model you would like to extend this functionality to.
In your controller you can have an action like the following to utilize this functionality:
def import_file
model_name = params[:table_name].singularize.camelize.constantize
csv = params[:file].path
#message = model_name.import_by_csv(csv)
end
It's better to use CSV::Table and use String.encode(universal_newline: true). It converting CRLF and CR to LF
If you want to Use SmartCSV
all_data = SmarterCSV.process(
params[:file].tempfile,
{
:col_sep => "\t",
:row_sep => "\n"
}
)
This represents tab delimited data in each row "\t" with rows separated by new lines "\n"

Resources