My ruby application uses about 97 %CPU which eventually gets killed. The program is reading files from the folder and if a file name exists in the database, it skips it and checks another file. While executing this procedure, application usually gets killed.
COMMAND %CPU
ruby 96.5
Even if I insert almost all files and try to lunch an application again (because it was killed), system tends to kill it even sooner. How can I decrease the %CPU?
task :process_data, [:data_directory] => :environment do |_task, args|
# add data to a database
saver = CsvToSqlSaver.new
saver.fill_files_names
Dir.foreach(args.data_directory) do |filename|
# if not present in records already we read it
Base.logger.info "> Found #{filename}."
next if saver.files_names.to_s.include?(filename) ||
!filename.include?('csv')
Base.logger.info "> Reading #{filename}."
begin
saver.generate_db_rows_from_csv_file(args.data_directory, filename)
# handle Malformed .csv exception
rescue CSV::ArgumentError, CSV::MalformedCSVError => e
Base.logger.info e.message
next
end # we continue csv file loop?
unless integrator.insert_data_to_database
Base.logger.info '> No new data saved.'
end
end
end
This is fill_files_names:
def fill_files_names
#files_names = []
files_names = MyFilesTable.select(:filename).distinct
files_names.each do |row|
#files_names.push(row[:filename])
end
end
This is Base:
class Base
class << self
attr_accessor :logger
end
#logger ||= Logger.new(STDERR)
end
This is generate_db_rows_from_csv_file
def generate_db_rows_from_csv_file(directory, filename)
#incoming_data = []
CSV.foreach("#{directory}/#{filename}",
headers: true, quote_char: "\x00") do |csv_record|
# if invalid record, go further
next if record_invalid?(csv_record)
generate_row_in_the_database(csv_record, filename)
end
end
Related
For a variety of reasons I am migrating my uploads from ActiveStorage (AS) to CarrierWave (CW).
I am making rake task and have the logic sorted out - I am stumped at how to feed the AS blob into the CW file.
I am trying something like ths:
#files.each.with_index(1) do | a, index |
if a.attachment.attached?
a.attachment.download do |file|
a.file = file
end
a.save!
end
end
This is based on these two links:
https://edgeguides.rubyonrails.org/active_storage_overview.html#downloading-files
message.video.open do |file|
system '/path/to/virus/scanner', file.path
# ...
end
and
https://github.com/carrierwaveuploader/carrierwave#activerecord
# like this
File.open('somewhere') do |f|
u.avatar = f
end
I tested this locally and the files are not mounted via the uploader. My question(s) would be:
am I missing something obvious here?
is my approach wrong and needs a new one?
Bonus Karma Question:
I can't seem to see a clear path to set the CW filename when I do this?
Here is my final rack task (based on the accepted answer) - open to tweaks. Does the job for me:
namespace :carrierwave do
desc "Import the old AS files into CW"
task import: :environment do
#files = Attachment.all
puts "#{#files.count} files to be processed"
puts "+" * 50
#files.each.with_index(1) do | a, index |
if a.attachment.attached?
puts "Attachment #{index}: Key: #{a.attachment.blob.key} ID: #{a.id} Filename: #{a.attachment.blob.filename}"
class FileIO < StringIO
def initialize(stream, filename)
super(stream)
#original_filename = filename
end
attr_reader :original_filename
end
a.attachment.download do |file|
a.file = FileIO.new(file, a.attachment.blob.filename.to_s)
end
a.save!
puts "-" * 50
end
end
end
desc "Purge the old AS files"
task purge: :environment do
#files = Attachment.all
puts "#{#files.count} files to be processed"
puts "+" * 50
#files.each.with_index(1) do | a, index |
if a.attachment.attached?
puts "Attachment #{index}: Key: #{a.attachment.blob.key} ID: #{a.id} Filename: #{a.attachment.blob.filename}"
a.attachment.purge
puts "-" * 50
#count = index
end
end
puts "#{#count} files purged"
end
end
Now in my case I am doing this in steps - I have branched my master with this rake task and the associated MCV updates. If my site was in true production would probably run the import rake task first then confirm all went well THEN purge the old AS files.
The file object you get from the attachment.download block is a string. More precisely, the response from .download is the file, "streamed and yielded in chunks" (see documentation). I validated this by calling file.class to make sure the class is what I expected.
So, to solve your issue, you need to provide an object on which .read can be called. Commonly that is done using the Ruby StringIO class.
However, considering Carrierwave also expects a filename, you can solve it using a helper model that inherits StringIO (from blogpost linked above):
class FileIO < StringIO
def initialize(stream, filename)
super(stream)
#original_filename = filename
end
attr_reader :original_filename
end
And then you can replace a.file = file with a.file = FileIO.new(file, 'new_filename')
I'm trying to create a Rails locale file from a CSV. The file is created and the CSV is correctly parsed, but the file is not filled. I don't have errors so I don't know what is wrong...
This is my code:
# frozen_string_literal: true
class FillLanguages
require 'csv'
def self.get
result = []
file = File.new('config/locales/languages.yml', 'w')
CSV.foreach('lib/csv/BCP-47_french.csv', headers: false, col_sep: ';') do |row|
result.push(row[0])
hash = {}
key = row[0]
hash[key] = row[1]
file.puts(hash.to_yaml)
end
result
end
end
Rails.logger.debug(hash) returns
{"af-ZA"=>"Africain (Afrique du Sud)"}
{"ar-AE"=>"Arabe (U.A.E.)"}
{"ar-BH"=>"Arabe (Bahreïn)"}
{"ar-DZ"=>"Arabe (Algérie)"}
{"ar-EG"=>"Arabe (Egypte)"}
{"ar-IQ"=>"Arabe (Irak)"}
...
as expected.
Rails.logger.debug(hash.to_yaml) returns
---
af-ZA: Africain (Afrique du Sud)
---
ar-AE: Arabe (U.A.E.)
---
ar-BH: Arabe (Bahreïn)
---
ar-DZ: Arabe (Algérie)
---
ar-EG: Arabe (Egypte)
---
ar-IQ: Arabe (Irak)
...
But the file still empty.
My CSV looks like:
https://i.gyazo.com/f3fa5ba8b1bfdd014018da5b46fa7ec0.png
Even if I try to puts a string like 'hello world' just after the line where I'm creating the file, it doesn't work...
You forgot to close the file.
You can either do it explicitly (best practice to do it in ensure block) or using File.open with block.
UPDATE:
IO#close → nil
Closes ios and flushes any pending writes to the operating system. The stream is unavailable for any further data operations; an IOError is raised if such an attempt is made. I/O streams are automatically closed when they are claimed by the garbage collector.
https://ruby-doc.org/core-2.5.0/IO.html#method-i-close
So your changes are not flushed to disc from IO buffers. You can also use explicit IO#flush to do that, but it's better to close files you opened.
# explicit close
class FillLanguages
require 'csv'
def self.get
result = []
file = File.new('config/locales/languages.yml', 'w')
CSV.foreach('lib/csv/BCP-47_french.csv', headers: false, col_sep: ';') do |row|
result.push(row[0])
hash = {}
key = row[0]
hash[key] = row[1]
file.puts(hash.to_yaml)
end
result
ensure
file.close
end
end
--
# block version
class FillLanguages
require 'csv'
def self.get
result = []
File.open('config/locales/languages.yml', 'w') do |file|
CSV.foreach('lib/csv/BCP-47_french.csv', headers: false, col_sep: ';') do |row|
result.push(row[0])
hash = {}
key = row[0]
hash[key] = row[1]
file.puts(hash.to_yaml)
end
end
result
end
end
I'm having a strange issue where when I check the File.size of a particular file in Rails console, it returns the correct size. However when I run the same code in a rake task, it returns 0. Here is the code in question (I've tidied it up a bit to help with readability):
def sum_close
daily_closed_tickets = Fst.sum_retrieve_closed_tickets
daily_closed_tickets.each do |ticket|
CSV.open("FILE_NAME_HERE", "w+", {force_quotes: false}) do |csv|
if (FileCopyReceipt.exists?(path: "#{ticket.attributes['TroubleTicketNumber']}_sum.txt"))
csv << ["GENERATE CSV WITH ATTRIBUTES HERE"]
files = Dir.glob("/var/www/html/harmonize/public/close/CLOSED_#{ticket.attributes['TroubleTicketNumber']}_sum.txt")
files.each do |f|
Rails.logger.info "File size (should return non-0): #{File.size(f)}" #returns 0, but not in Rails Console
Rails.logger.info "File size true or false, should be true: #{File.size(f) != 0}" #returns false, should return true
Rails.logger.info "Rails Environment: #{Rails.env}" #returns production
if(!FileCopyReceipt.exists?(path: f) && (File.size(f) != 0))
Rails.logger.info("SUM CLOSE, GOOD => FileUtils.cp_r occurred and FileCopyReceipt object created")
else
Rails.logger.info("SUM CLOSE, WARNING: => no data transfer occurred")
end
end
else
Rails.logger.info("SUM CLOSE => DID NOT make it into initial if ClosedDate.present? if block")
end
end
end
close_tickets.rake
task :close_tickets => :environment do
tickets = FstController.new
tickets.sum_close
tickets.dais_close
end
It is beyond me why this File.size comes back as 0 when this is run as a rake task. I thought it may be a environment issue, but that does not seem to be the case.
Any insight on the matter is appreciated.
The CSV.open block and everything being wrapped in there was causing issues. So I just made CSV generation it's own snippet instead of wrapping everything in there.
daily_closed_tickets.each do |ticket|
CSV.open("generate csv here.txt") do |csv|
#enter ticket.attributes here for the csv
end
#continue on with the rest of the code and File.size() works properly
end
I am trying to process some very large tab-separated files. The process is:
begin
Dir["#{#data_path}*.tsv"].each do |file|
begin
CSV.foreach(file, :col_sep => "\t") do |row|
# assign columns to model and save
end
#log.info("Loaded all files into MySQL database illu.datafeeds")
rescue Exception => e
#log.warn("Unable to process the data feed: #{file} because #{e.message}")
next
end
end
However, when I execute this I get the following error:
Unable to process the file: /Users/XXXXX_2013-06-12.tsv because Illegal quoting in line 153.
The files are too big for me to go in and fix the error rows. I would like the process to continue the loop and process the file even if there are error rows.
Any suggestions?
Thanks.
just ... rescue nil the row causing the error
you can even log it with logger
before the loop:
error_log ||= Logger.new("#{Rails.root}/log/my.log")
inside the loop instead of just rescue nil use
rescue error_log.info(row.to_s)
in case you get the error before file begins to parse (before .foreach procedure) you can open it as raw file and read it as CSV later - inside the loop (like mentioned here)
..or just rescue full file parsing procedure
CSV.foreach(file, :col_sep => "\t") do |row|
...
end rescue error_log.info(row.to_s)
Wow, what a vague quesetion, I know. I have a file called enc_file in my Rails repo.
In my environments/production.rb, I have:
authentication_file = "#{Rails.root}/enc_file"
unless File.exist?(authentication_file)
puts "ERROR: File not found! (#{authentication_file})"
raise SystemExit, 1
end
my_config = YAML.load(PaymentGatewayCipher.decrypt(authentication_file)).symbolize_keys!
config.app_config.pay_pal.merge!(pay_pal_config.slice(:login, :password, :business, :business_id, :cert_id, :private_key, :signature).merge(
:return_to_merchant => false,
:server => 'whatever.paypal.com'
))
Then in my payment_gateway_cipher.rb file, I have:
require 'openssl'
# Encapsulates payment gateway encryption / decryption utility functions
class PaymentGatewayCipher
class << self
def encrypt(file, options = {})
cipher = create_cipher
cipher.encrypt(cipher_key)
data = cipher.update(File.read(file))
data << cipher.final
if to_file = options[:to]
# Write it out to a different file
File.open(to_file, 'wb') do |f|
f << data
end
end
data
end
# Decrypts the given file
def decrypt(file)
cipher = create_cipher
cipher.decrypt(cipher_key)
encrypted_data = File.open(file, 'rb') {|io| io.read}
data = cipher.update(encrypted_data)
data << cipher.final
end
# Generates the cipher to be used for encryption/decryption
def create_cipher
OpenSSL::Cipher::Cipher.new('aes-256-cbc')
end
# Loads the cipher key used for the symmetric algorithm
def cipher_key
File.open(File.join(Rails.root, 'config/mystuff/live/cipher.key'), 'rb') {|io| io.read}
end
end
end
How would I decrypt the enc_file to see it's content outside of Rails? I want to view the contents, modify them, and resave the file if possible.
Thoughts?
You have the decrypt function right there, so presumably by outputting the result of that function?
puts decrypt("path/to/enc_file")
Or writing the same to a file which you can then view outside of Ruby:
File.open("decrypted_file", "w") do |f|
f.write decrypt("path/to/enc_file")
end