Rails 4.2: Rake Task to Import CSV Issue - ruby-on-rails

I have a rake task that should import a .csv file and save the data into the database. So far it runs but when I check the database - nothing was saved and I see no errors - leaving me with no direction.
I'm using Rails 4.2 with Postgresql for the db.
Here is my task..
namespace :import_users do
desc "imports user data from a csv file"
task :data => :environment do
require 'csv'
CSV.foreach('/Users/RAILS/Extra Files/2015 User Report.csv') do |row|
plan = row[0]
contact_ident = row[1]
prefer_fax = row[2]
pin = row[3]
name = row[4] #account
first_name = row[5]
last_name = row[6]
email = row[7]
phone = row[8]
prefer_fax == "Yes" ? p_fax = true : p_fax = false
p = Plan.where("name ~ ?","#{plan}( )")
user = User.create( contact_ident: contact_ident,
prefer_fax: p_fax,
first_name: first_name,
last_name: last_name,
email: email
)
account = Account.where("pin ~ ?", "#{pin}").first_or_create do |account|
account.name = name
account.phone = phone
end
user.plans << p
user.accounts << account
end
end
end

Can you try to replace
User.create
with
User.create!
so if there is any problem with creation it raise.

Related

Scraping data in rails using thread

I am doing scraping to fetch the data from the website to my database in rails.I am fetching the 32000 record with this script there isn't any issue but i want to fetch the data faster so i apply the thread in my rake task but then there is a issue while running the rake task some of the data is fetching then the rake task getting aborted.
I am not aware of what to do task if any help can be done i am really grateful . Here is my rake task code for the scraping.
task scratch_to_database: :environment do
time2 = Time.now
puts "Current Time : " + time2.inspect
client = Mechanize.new
giftcard_types=Giftcard.card_types
find_all_merchant=Merchant.all.pluck(:id, :name).to_h
#first index page of the merchant
index_page = client.get('https://www.twitter.com//')
document_page_index = Nokogiri::HTML::Document.parse(index_page.body)
#set all merchant is deteled true
# set_merchant_as_deleted = Merchant.update_all(is_deleted: true) if Merchant.exists?
# set_giftcard_as_deleted = Giftcard.update_all(is_deleted: true) if Giftcard.exists?
update_all_merchant_record = []
update_all_giftcard_record = []
threads = []
#Merchant inner page pagination loop
page_no_merchant = document_page_index.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
1.upto(page_no_merchant) do |page_number|
threads << Thread.new do
client.get("https://www.twitter.com/buy-gift-cards?page=#{page_number}") do |page|
document = Nokogiri::HTML::Document.parse(page.body)
#Generate the name of the merchant and image of the merchant loop
document.css('.product-source').each do |item|
merchant_name= item.children.css('.name').text.gsub("Gift Cards", "")
href = item.css('a').first.attr('href')
image_url=item.children.css('.img img').attr('data-src').text.strip
#image url to parse the url of the image
image_url=URI.parse(image_url)
#saving the record of the merchant
# #merchant=Merchant.create(name: merchant_name , image_url:image_url)
if find_all_merchant.has_value?(merchant_name)
puts "this if"
merchant_id=find_all_merchant.key(merchant_name)
puts merchant_id
else
#merchant= Merchant.create(name: merchant_name , image_url:image_url)
update_all_merchant_record << #merchant.id
merchant_id=#merchant.id
end
# #merchant.update_attribute(:is_deleted, false)
#set all giftcard is deteled true
# set_giftcard_as_deleted = Giftcard.where(merchant_id: #merchant.id).update_all(is_deleted: true) if Giftcard.where(merchant_id: #merchant.id).exists?
#first page of the giftcard details page
first_page = client.get("https://www.twitter.com#{href}")
document_page = Nokogiri::HTML::Document.parse(first_page.body)
page_no = document_page.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
hrefextra =document_page.css('.dropdown-menu li a').last.attr('href')
#generate the giftcard details loop with the pagination
# update_all_record = []
find_all_giftcard=Giftcard.where(merchant_id:merchant_id).pluck(:row_id)
puts merchant_name
# puts find_all_giftcard.inspect
card_page = client.get("https://www.twitter.com#{hrefextra}")
document_page = Nokogiri::HTML::Document.parse(card_page.body)
#table details to generate the details of the giftcard with price ,per_off and final value of the giftcard
document_page.xpath('//table/tbody/tr[#class="toggle-details"]').collect do |row|
type1=[]
row_id = row.attr("id").to_i
row.at("td[2] ul").children.each do |typeli|
type = typeli.text.strip if typeli.text.strip.length != 0
type1 << type if typeli.text.strip.length != 0
end
value = row.at('td[3]').text.strip
value = value.to_s.tr('$', '').to_f
per_discount = row.at('td[4]').text.strip
per_discount = per_discount.to_s.tr('%', '').to_f
final_price = row.at('td[5] strong').text.strip
final_price = final_price.to_s.tr('$', '').to_f
type1.each do |type|
if find_all_giftcard.include?(row_id)
update_all_giftcard_record<<row_id
puts "exists"
else
puts "new"
#giftcard= Giftcard.create(card_type: giftcard_types.values_at(type.to_sym)[0], card_value:value, per_off:per_discount, card_price: final_price, merchant_id: merchant_id , row_id: row_id )
update_all_giftcard_record << #giftcard.row_id
end
end
#saving the record of the giftcard
# #giftcard=Giftcard.create(card_type:1, card_value:value, per_off:per_discount, card_price: final_price, merchant_id: #merchant.id , gift_card_type: type1)
end
# Giftcard.where(:id =>update_all_record).update_all(:is_deleted => false)
#delete all giftcard which is not present
# giftcard_deleted = Giftcard.where(:is_deleted => true,:merchant_id => #merchant.id).destroy_all if Giftcard.where(merchant_id: #merchant.id).exists?
time2 = Time.now
puts "Current Time : " + time2.inspect
end
end
end
end
threads.each(&:join)
puts "-------"
puts threads
# merchant_deleted = Merchant.where(:is_deleted => true).destroy_all if Merchant.exists?
merchant_deleted = Merchant.where('id NOT IN (?)',update_all_merchant_record).destroy_all if Merchant.exists?
giftcard_deleted = Giftcard.where('row_id NOT IN (?)',update_all_giftcard_record).destroy_all if Giftcard.exists?
end
end
Error i am receiving:
ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use
Each thread requires a separate connection to your database. You need to increase the connection pool size that your application can use in your database.yml file.
But your database should also be capable of handling the incoming connections. If you are using mysql you can check this by running select ##MAX_CONNECTIONS on your console.

Request returns nothing

I have managed to connect my web service to the database, but now whenever I make a request it returns nothing. The database has a couple of rows, but the web service returns zero.
get '/all_users/' do
conn = TinyTds::Client.new(username: 'nicole', password: 'pass', dataserver: 'Nikki-PC\Mydatabase', database: 'Thedatabase')
recordsArray = "{\"clientList\":["
clientArray = Array.new
sql = 'select * from dbo.ServerUsers'
records = conn.execute(sql) do |record|
client = AndroidtableClientsSearch.new(record[0], record[1], record[2], record[3], record[4])
clientArray << client.to_s
end
recordsArray << clientArray.join(',')
recordsArray << "]}"
recordsArray
end
I'm pretty sure I am doing the execute, but this is the first time I am using tiny_tds and I am very confused.
Thank you for your help.
[EDIT]
This is AndroidClientsSearch:
class AndroidtableClientsSearch
def initialize(username, password, phone_number, profile_state, clasa)
#username = username
#password = password
#phone_number = phone_number
#profile_state = profile_state
#clasa = clasa
end
def to_s
{ :username => "#{#username}", :password => "#{#password}", :phone_number => "#{#phone_number}", :profile_state => "#{#profile_state}", :clasa =>"#{#clasa}"}.to_json
end
end
[UPDATE]
I have modified the code as suggested and it returns a result, but it does not return the data from the database.
This is a result:
{"recordsArray":["{\"username\":\"\",\"password\":\"\",\"phone_number\":\"\",\"profile_state\":\"\",\"clasa\":\"\"}"]}
conn.execute(sql) does not accept a block, it simply returns a result. The proc afterwards is treated by a ruby interpreter as “orphan proc definition” and never gets executed. You might try to put puts 'I am here' inside it and see it is never called.
The solution would be to iterate the result:
get '/all_users/' do
conn = TinyTds::Client.new(...)
sql = 'select * from dbo.ServerUsers'
# ⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓⇓ iterate!!!
records = conn.execute(sql).each_with_object([]) do |record, memo|
client = AndroidtableClientsSearch.new(*5.times.map { |i| record[i] })
memo << client.to_s
end
require 'json'
JSON.dump(clientList: records)
end

Rake task to update specific products in db

Hi all I've got a database full of products and I need to schedule a rake task to run every week.
I'm not exactly sure how to code the rake task. An example would be I have a product with id=1 name="testname" description="description" sku="198" price=12.99
I need to upload a csv with name="testname" and update the price to 13.99 while obviously keeping the id intact.
This is what I've tried so far but its not complete if anyone could help that would be great.
require 'csv'
desc "Updates Products inside an ActiveRecord table"
task :update_prods, [:filename] => :environment do
products = Spree::Product.all
CSV.foreach('update_prods.csv', :headers => true) do |row|
Spree::Product.update!(row.to_hash)
end
end
Here is a gist of how we imported products from Shopify to Spree which could give you some ideas on how to go about this https://gist.github.com/dgross881/b4f1ac96bafa2e29be7f.
def update_products
puts 'Updating Products...'
require 'csv'
products_csv = File.read(Rails.root.join('lib/assets/products_list.csv'))
products = CSV.parse(products_csv, headers: true)
products.each_with_index do |row, index|
Rails.logger.info { [#{index + 1}..#{products.length}] Updating product: #{row['title']} }
product = Spree::Product.find!(row['id'])
update_product = product.update_attributes(name: row['title'], description:row['description'],
meta_title: row['seo_title'], meta_description: row['seo_description'],
meta_keywords: "#{row['handle']}, #{row['title']}, the Squirrelz",
available_on: Time.zone.now, price: row['price'],
shipping_category: Spree::ShippingCategory.find_by!(name: 'Shipping'))
update_product.tag_list = row['tags']
update_product.slug = row['handle']
update_product.save!
end
Rails.logger.info { "Finished Updating Products" }
end
def update_variants
puts 'updating Variants...'
require 'csv'
products_variants_csv =File.read(Rails.root.join('lib/assets/variants_list.csv'))
products_variants = CSV.parse(products_variants_csv, headers: true)
products_variants.each_with_index do |row, index|
puts "[#{index + 1}..#{products_variants.length}] Adding Variant (#{row['sku']} to Product: #{Spree::Product.find_by!(slug: row['handle']).name})"
variant = Spree::Variant.find_by!(sku: row['sku']
update_variant = variant.update_attributes!(sku: row['sku'], stock_items_count: row['qty'], cost_price: row['price'], weight: row['weight']
unless row['option1'].blank?
variant.option_values << Spree::OptionValue.find_by!(name: row['option1'])
end
unless row['option2'].blank?
variant.option_values << Spree::OptionValue.find_by!(name: row['option2'])
end
variant.save!
end
puts 'Updated Variants'
end

Importing from CSV in Rails. How to do per batches, not all at once?

def self.import file, organization
counter = 0
CSV.foreach(file.path, encoding: 'windows-1251:utf-8', headers: true) do |row|
name = (row["First Name"].to_s + " " + row["Last Name"].to_s).titleize
customer = Customer.create(
name: name,
phone: row["Main Phone"],
email: row["Main Email"],
address: row["Address"],
repair_shop: repair_shop
)
puts "#{name} - #{customer.errors.full_messages.join(',')}" if customer.errors.any?
counter += 1 if customer.persisted?
end
message = "Imported #{counter} users."
end
This is the code I have so far. I'm importing files with 10,000 rows, so it overwhelms my production server in processing.
How could I do this in batches?
Taken from https://satishonrails.wordpress.com/2007/07/18/how-to-import-csv-file-in-rails/
Simply add a periodic explicit garbage collection:
def self.import file, organization
counter = 0
CSV.foreach(file.path, encoding: 'windows-1251:utf-8', headers: true).with_index do |row, i|
name = (row["First Name"].to_s + " " + row["Last Name"].to_s).titleize
customer = Customer.create(
name: name,
phone: row["Main Phone"],
email: row["Main Email"],
address: row["Address"],
repair_shop: repair_shop
)
puts "#{name} - #{customer.errors.full_messages.join(',')}" if customer.errors.any?
counter += 1 if customer.persisted?
GC.start if i % 100 == 0 # forcing garbage collection
end
message = "Imported #{counter} users."
end
This way you will guarantee that your server will not run out of memory. I have checked it in practice, it really worked.

How to drop a particular column in csv using ruby 1.9.2

I trying to export the csv to a database. CSV contains some unwanted data which I dont want to store in the database. I need to skip particular columns from the csv which I get and store rest of the data in my database. How to remove particular column from the csv programatically before I push data to the DB. I am using ruby 1.9.2.
Kindly help me out
def csv_import
##parsed_file = csv.open(params[:dump][:file])
puts "before CSV Reader"
file = params["dump"]["file"]
directory = "#{Rails.root.to_s}/public/dump"
# create the file path
path = File.join(directory, "#{file.original_filename}")
# write the file
File.open(path, "wb") { |f| f.write(file.read) }
#parsed_file=CSV.open(path, "r")
#parsed_file.drop(1).each do |row|
n=0
c=ModelName.new
c.invoiceno=row[2]
c.invoice_date=row[3]
c.orderrefno = row[4]
c.skucode = row[7]
c.quantiy = row[8]
c.amount = row[9]
c.trackno=row[11]
c.dispatched = "No"
c.mailsenttoc = "No"
c.mailsenttobluedart = "No"
if c.save
n=n+1
GC.start if n%50==0
end
end
end
As you can see I have skipped couple of columns like 1,5,6,10
Not sure if this helps, but you could also use remote_table:
require 'remote_table'
def csv_import
# [...]
RemoteTable.new("file://#{path}", :format => :csv, :headers => :false).each do |row|
c = ModelName.new
c.invoiceno = row[2]
c.invoice_date = row[3]
c.orderrefno = row[4]
c.skucode = row[7]
c.quantiy = row[8]
c.amount = row[9]
c.trackno = row[11]
c.dispatched = "No"
c.mailsenttoc = "No"
c.mailsenttobluedart = "No"
if c.save
# [...]
end
end
# [...]
end
You can use activewarehouse-etl to do this.
https://github.com/activewarehouse/activewarehouse-etl
It will allow you to specify the columns that you want to pull in from the csv file and then will bulk upload it to your database.
You can also use it to clean up and verify the data that you are putting in as well as set default values.

Resources