Please help me in solving this error.
I am getting this error while loading records from text files in to database using ruby scripts.
It just works fine if I use small number of records to load in to the database.But fails if there are large number of records.
CSV.foreach(fileName) do |line|
completePath = line[0]
num_of_bps = line[1]
completePath = cluster_path+ '/' + completePath
inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")
metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
end
def self.add_entry(metaid, num_of_bps, num_of_bp_tests)
entry = Bp.new
entry.metric_instance_id = metaid
entry.num_of_bps = num_of_bps
entry.num_of_bp_tests = num_of_bp_tests
entry.save
return entry
end
Try something like this:
File.open(fileName) do |csv|
csv.each_line do |line|
CSV.parse(line) do |values|
# Here you can do your manipulation
end
end
end
This way is slower, but it should ensure you don't get out of memory.
Related
Would it be possible to access the ActiveStorageBlob or ActiveStorageAttachment like it would be a native model ?
E.g.
I want to do ActiveStorageBlob.first to access the first record of this model/table.
or. ActiveStorageAttachment.all.as_json to generate json formated print.
The background idea is to find a way how to dump the content of these ActiveStorage related tables as json formated files. Then change simething on these files, and load it back.
----Extending this text after got correct answer-----
Thank you very much Sarah Marie.
And I hope you know how to load the JSON data back into these tables ?
I have tried this :
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
ActiveStorage::Blob.create(j)
end
But thats not working.
ActiveModel::UnknownAttributeError (unknown attribute
'attachable_sgid' for ActiveStorage::Blob.)
ActiveStorage::Blob.first
ActiveStorage::Attachment.all.as_json
---- For second extended question ----
ActiveStorage::Blob.create_before_direct_upload!(
filename: j[:filename],
content_type: j[:content_type],
byte_size: j[:byte_size],
checksum: j[:checksum]
)
# or
ActiveStorage::Blob.create_before_direct_upload!(**j.symbolize_keys)
Reference: https://github.com/rails/rails/blob/5f3ff60084ab5d5921ca3499814e4697f8350ee7/activestorage/app/controllers/active_storage/direct_uploads_controller.rb#L8-L9
https://github.com/rails/rails/blob/098fd7f9b3d5c6f540911bc0c17207d6b48d5bb3/activestorage/app/models/active_storage/blob.rb#L113-L120
Now I have a complete solution, how to dump and load the ActiveStorage tables as JSON files.
...dump it
active_storage_blobs_file = "active_storage_blob.json"
active_storage_attachments_file = "active_storage_attachment.json"
puts("...dump active_storage_blob")
dump_file_path = File.join(Rails.root, "backup",active_storage_blobs_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Blob.all.as_json))
dump_file.close()
puts("...dump active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup",
active_storage_attachments_file)
dump_file = File.open(dump_file_path, "w")
dump_file.write(JSON.pretty_generate(ActiveStorage::Attachment.all.as_json))
dump_file.close()
...load it back
puts("...load active_storage_blob")
dump_file_path = File.join(Rails.root, "backup", active_storage_blobs_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
j = j.except("attachable_sgid")
result = ActiveStorage::Blob.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end
puts("...load active_storage_attachment")
dump_file_path = File.join(Rails.root, "backup", active_storage_attachments_file)
abort("File does not exist (" + dump_file_path + ") > abort <") unless File.exist?(dump_file_path)
load_json = JSON.parse(File.read(dump_file_path))
load_json.each do |j|
result = ActiveStorage::Attachment.create(j)
if (not result.errors.empty?)
puts(result.errors.full_messages.to_s)
puts(j.inspect)
exit(1)
end
end
I am doing scraping to fetch the data from the website to my database in rails.I am fetching the 32000 record with this script there isn't any issue but i want to fetch the data faster so i apply the thread in my rake task but then there is a issue while running the rake task some of the data is fetching then the rake task getting aborted.
I am not aware of what to do task if any help can be done i am really grateful . Here is my rake task code for the scraping.
task scratch_to_database: :environment do
time2 = Time.now
puts "Current Time : " + time2.inspect
client = Mechanize.new
giftcard_types=Giftcard.card_types
find_all_merchant=Merchant.all.pluck(:id, :name).to_h
#first index page of the merchant
index_page = client.get('https://www.twitter.com//')
document_page_index = Nokogiri::HTML::Document.parse(index_page.body)
#set all merchant is deteled true
# set_merchant_as_deleted = Merchant.update_all(is_deleted: true) if Merchant.exists?
# set_giftcard_as_deleted = Giftcard.update_all(is_deleted: true) if Giftcard.exists?
update_all_merchant_record = []
update_all_giftcard_record = []
threads = []
#Merchant inner page pagination loop
page_no_merchant = document_page_index.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
1.upto(page_no_merchant) do |page_number|
threads << Thread.new do
client.get("https://www.twitter.com/buy-gift-cards?page=#{page_number}") do |page|
document = Nokogiri::HTML::Document.parse(page.body)
#Generate the name of the merchant and image of the merchant loop
document.css('.product-source').each do |item|
merchant_name= item.children.css('.name').text.gsub("Gift Cards", "")
href = item.css('a').first.attr('href')
image_url=item.children.css('.img img').attr('data-src').text.strip
#image url to parse the url of the image
image_url=URI.parse(image_url)
#saving the record of the merchant
# #merchant=Merchant.create(name: merchant_name , image_url:image_url)
if find_all_merchant.has_value?(merchant_name)
puts "this if"
merchant_id=find_all_merchant.key(merchant_name)
puts merchant_id
else
#merchant= Merchant.create(name: merchant_name , image_url:image_url)
update_all_merchant_record << #merchant.id
merchant_id=#merchant.id
end
# #merchant.update_attribute(:is_deleted, false)
#set all giftcard is deteled true
# set_giftcard_as_deleted = Giftcard.where(merchant_id: #merchant.id).update_all(is_deleted: true) if Giftcard.where(merchant_id: #merchant.id).exists?
#first page of the giftcard details page
first_page = client.get("https://www.twitter.com#{href}")
document_page = Nokogiri::HTML::Document.parse(first_page.body)
page_no = document_page.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
hrefextra =document_page.css('.dropdown-menu li a').last.attr('href')
#generate the giftcard details loop with the pagination
# update_all_record = []
find_all_giftcard=Giftcard.where(merchant_id:merchant_id).pluck(:row_id)
puts merchant_name
# puts find_all_giftcard.inspect
card_page = client.get("https://www.twitter.com#{hrefextra}")
document_page = Nokogiri::HTML::Document.parse(card_page.body)
#table details to generate the details of the giftcard with price ,per_off and final value of the giftcard
document_page.xpath('//table/tbody/tr[#class="toggle-details"]').collect do |row|
type1=[]
row_id = row.attr("id").to_i
row.at("td[2] ul").children.each do |typeli|
type = typeli.text.strip if typeli.text.strip.length != 0
type1 << type if typeli.text.strip.length != 0
end
value = row.at('td[3]').text.strip
value = value.to_s.tr('$', '').to_f
per_discount = row.at('td[4]').text.strip
per_discount = per_discount.to_s.tr('%', '').to_f
final_price = row.at('td[5] strong').text.strip
final_price = final_price.to_s.tr('$', '').to_f
type1.each do |type|
if find_all_giftcard.include?(row_id)
update_all_giftcard_record<<row_id
puts "exists"
else
puts "new"
#giftcard= Giftcard.create(card_type: giftcard_types.values_at(type.to_sym)[0], card_value:value, per_off:per_discount, card_price: final_price, merchant_id: merchant_id , row_id: row_id )
update_all_giftcard_record << #giftcard.row_id
end
end
#saving the record of the giftcard
# #giftcard=Giftcard.create(card_type:1, card_value:value, per_off:per_discount, card_price: final_price, merchant_id: #merchant.id , gift_card_type: type1)
end
# Giftcard.where(:id =>update_all_record).update_all(:is_deleted => false)
#delete all giftcard which is not present
# giftcard_deleted = Giftcard.where(:is_deleted => true,:merchant_id => #merchant.id).destroy_all if Giftcard.where(merchant_id: #merchant.id).exists?
time2 = Time.now
puts "Current Time : " + time2.inspect
end
end
end
end
threads.each(&:join)
puts "-------"
puts threads
# merchant_deleted = Merchant.where(:is_deleted => true).destroy_all if Merchant.exists?
merchant_deleted = Merchant.where('id NOT IN (?)',update_all_merchant_record).destroy_all if Merchant.exists?
giftcard_deleted = Giftcard.where('row_id NOT IN (?)',update_all_giftcard_record).destroy_all if Giftcard.exists?
end
end
Error i am receiving:
ActiveRecord::ConnectionTimeoutError: could not obtain a connection from the pool within 5.000 seconds (waited 5.001 seconds); all pooled connections were in use
Each thread requires a separate connection to your database. You need to increase the connection pool size that your application can use in your database.yml file.
But your database should also be capable of handling the incoming connections. If you are using mysql you can check this by running select ##MAX_CONNECTIONS on your console.
I am a full stack ruby developer.I am trying to scrape to the data from the website and i am successfully able to get the data.But the problem is that next time when i fetched the data i just want to fetch only new data the i don't want to overwrite all the the data in the database.
I just want to add new record which added recently.But i am not able to find any solution for that how to do it with minimum queries and minimum code.
Here is my code which i am using for scrapping:
client = Mechanize.new
index_page = client.get('https://www.google.com/')
document_page_index = Nokogiri::HTML::Document.parse(index_page.body)
page_no_merchant = document_page_index.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
1.upto(page_no_merchant) do |page_number|
client.get("https://www.google.com/buy-gift-cards?page=#{page_number}") do |page|
document = Nokogiri::HTML::Document.parse(page.body)
document.css('.product-source').each do |item|
merchant_name= item.children.css('.name').text.gsub("Gift Cards", "")
puts merchant_name
href = item.css('a').first.attr('href')
puts href
image_url=item.children.css('.img img').attr('data-src').text.strip
puts image_url
image_url=URI.parse(image_url)
#merchant=Merchant.create!(name: merchant_name , image_url:image_url)
first_page = client.get("https://www.google.com#{href}")
document_page = Nokogiri::HTML::Document.parse(first_page.body)
page_no = document_page.css('.pagination.pagination-centered ul li:nth-last-child(2) a').text.to_i
1.upto(page_no) do |page_number_giftcard|
type1=[]
card_page = client.get("https://www.google.com#{href}?page=#{page_number_giftcard}")
document_page = Nokogiri::HTML::Document.parse(card_page.body)
document_page.xpath('//table/tbody/tr[#class="toggle-details"]').collect do |row|
row.at("td[2] ul").children.each do |typeli|
type = typeli.text.strip if typeli.text.strip.length != 0
type1 << type if typeli.text.strip.length != 0
end
value = row.at('td[3]').text.strip
value = value.to_s.tr('$', '').to_f
puts value
per_discount = row.at('td[4]').text.strip
per_discount = per_discount.to_s.tr('%', '').to_f
puts per_discount
final_price = row.at('td[5] strong').text.strip
final_price = final_price.to_s.tr('$', '').to_f
puts final_price
puts '******************************'
#giftcard=Giftcard.create(card_type:1, card_value:value, per_off:per_discount, card_price: final_price, merchant_id: #merchant.id)
end
#giftcard.update_attribute()
end
end
end
end
Thank you in advance.
Basically you are saving all data, by doing this.
#merchant=Merchant.create!(name: merchant_name , image_url:image_url)
You can try something like find_or_create_by.
#merchant=Merchant.find_or_create_by(name: merchant_name , image_url:image_url)
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/first_or_create
http://apidock.com/rails/v4.0.2/ActiveRecord/Relation/find_or_create_by
I have a ruby controller
def new
counter = 1
fileW = File.new("query_output.txt", "w")
file = File.new("query_data.txt", "r")
while (line = file.gets)
puts "#{counter}: #{line}"
query = "select name,highway from planet_osm_line where name ilike '" +line+"'"
#output = PlanetOsmLine.connection.execute(query)
#output.each do |output|
fileW.write(output['highway'] + "\n")
end
counter = counter + 1
end
file.close
query = ""
#output = PlanetOsmLine.connection.execute(query)
end
So in this I am reading from a file like
%12th%main%
%100 feet%
%12th%main%
%12th%main%
%12th%main%
%100 feet%
In the ruby console I can see all the queries getting executed but in query_output.txt I only get the output of last query. What am I doing wrong here?
You use filemode w which will re-create the output file every time (so you will write into an empty file). Instead open your file as follows:
fileW = File.new("query_output.txt", "a")
a stands for append. It will open or create the file, and append at the back.
For more info concerning file-modes: http://pubs.opengroup.org/onlinepubs/009695399/functions/fopen.html
CSV parsing of the file was very slow so I was trying to load the file directly in to some temp table in database directly and then doing the computation as below :
Earlier it was like this, took 13 mins to add the entries using below method :
CSV.foreach(fileName) do |line|
completePath = line[0]
num_of_bps = line[1]
completePath = cluster_path+ '/' + completePath
inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")
metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
end
def self.add_entry(metaid, num_of_bps, num_of_bp_tests)
entry = Bp.new
entry.metric_instance_id = metaid
entry.num_of_bps = num_of_bps
entry.num_of_bp_tests = num_of_bp_tests
entry.save
return entry
end
now I changed the method to this, now takes 52 mins :(
#bps = TempTable.all
#bps.each do |bp|
completePath = bp.first_column
num_of_bps = bp.second_column
num_of_bps3 = bp.third_column
completePath = cluster_path+ '/' + completePath
inode = FileOrFolder.find_by_fullpath(completePath, :select=>"id")
num_of_bp_tests = 0
if(inode.nil?)
else
if(num_of_bps !='0')
num_of_bp_tests = 1
end
metric_instance = MetricInstance.find(:first, :conditions=>["file_or_folder_id = ? AND dataset_id = ?", inode.id, dataset_id])
add_entry(metric_instance.id, num_of_bps, num_of_bp_tests)
end
end
Please help me optimize this code or let me know if you think CSV.each is faster than database read !
When you load csv into database you do:
load N csv lines
insert N records int DB
select and instantiate N active record models
iterate over its
When you work with raw csv you only
load N csv lines
iterate over its
Of course it's faster.