How to speed up loading of Marshal objects in Ruby/Rails - ruby-on-rails

I have a mongoid model/class in my Rails application. It looks like this:
class Operation
include Mongoid::Document
include Mongoid::Timestamps
extend Mongoid::MarshallableField
marshallable_field :message
def load_message
message
end
end
message contains an array of several thousands of elements, so it has been converted into a byte stream with Marshal.
I need to be able to load messagefast, but currently it takes approx. 1,4 seconds to load, e.g. with the load_message method as displayed above.
How could I speed things up?
For your reference, here is my configuration:
## app/lib/mongoid/marshallable_field.rb
module Mongoid
module MarshallableField
def marshallable_field(field_name, params = {})
set_method_name = "#{field_name}=".to_sym
get_method_name = "#{field_name}".to_sym
attr_name = "__#{field_name}_marshallable_path".to_sym
send :define_method, set_method_name do |obj|
if Rails.env == "development" || Rails.env == "test"
path = File.expand_path(Rails.public_path + "/../file_storage/#{Time.now.to_i}-#{id}.class_dump")
elsif Rails.env == "production"
path = "/home/ri/prod/current/file_storage/#{Time.now.to_i}-#{id}.class_dump"
end
f = File.new(path, "w")
Marshal.dump(obj, f)
f.close
update_attribute(attr_name, path)
path
end
send :define_method, get_method_name do
if self[attr_name] != nil
file = File.open(self[attr_name], "r")
begin
Marshal.load(file)
rescue ArgumentError => e
Rails.logger.error "Error unmarshalling a field #{attr_name}: #{e}"
nil
end
else
Rails.logger.error "self[attr_name] is nil"
nil
end
end
end
end
end

Related

Do a diff between csv column and ActiveRecord object

I have a simple csv (a list of emails) that I want to upload to my rails backend API which looks like this:
abd#gmail.com,cool#hotmail.com
What I want is to upload that file, check in the user table if there are matching rows (in terms of the email address) and then return a newly downloadable csv with 2 columns: the email and whether or not the email was matched to an existing user(boolean true/false).
I'd like to stream the output since the file can be very large. This is what I have so far:
controller
def import_csv
send_data FileIngestion.process_csv(
params[:file]
), filename: 'processed_emails.csv', type: 'text/csv'
end
file_ingestion.rb
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = []
CSV.foreach(file.path, headers: true) do |row|
emails << row[0]
end
users = User.where("email IN (?)", emails)
end
end
Thanks!
Why not just pluck all the emails from the Users and do something like this. This example keeps it simple but you get the idea. If we can assume your input file is just a string of emails with comma separated values then this should work:
emails = File.read('emails.csv').split(',')
def process_csv(emails)
user_emails = User.where.not(email: [nil, '']).pluck(:email)
CSV.open('emails_processed.csv', 'w') do |row|
row << ['email', 'present']
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
end
process_csv(emails)
UPDATED to match your code design:
def import_csv
send_data FileIngestion.process_csv(params[:file]),
filename: 'processed_emails.csv', type: 'text/csv'
end
require 'csv'
class FileIngestion
def self.process_csv(file)
emails = File.read('emails.csv').split(',')
CSV.open('emails_processed.csv', 'w') do |row|
emails.each do |email|
row << [email, user_emails.include?(email) ? 'true' : 'false']
end
end
File.read('emails_processed.csv')
end
end
Basically what you want to do is collect the incoming CSV data into batches - use each batch to query the database and write a diff to a tempfile.
You would then stream the tempfile to the client.
require 'csv'
require 'tempfile'
class FileIngestion
BATCH_SIZE = 1000
def self.process_csv(file)
csv_tempfile = CSV.new(Tempfile.new('foo'))
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) do |batch|
emails = batch.flatten
users = User.where(email: emails).pluck(:email)
emails.each do |e|
csv_tempfile << [e, users.include?(e)]
end
end
csv_tempfile
end
end
CSV.read(file, headers: false).lazy.drop(1).each_slice(BATCH_SIZE) uses a lazy enumerator to access the CSV file in batches. .drop(1) gets rid of the header row.
Ok so this is what I came up with. A solution that basically prevents users from uploading a file that has more than 10,000 data points. Might not be the best solution (I prefer #Max's one) but in any case wanted to share what I did:
def emails_exist
raise 'Missing file parameter' if !params[:file]
csv_path = params[:file].tempfile.path
send_data csv_of_emails_matching_users(csv_path), filename: 'emails.csv', type: 'text/csv'
end
private
def csv_of_emails_matching_users(input_csv_path)
total = 0
CSV.generate(headers: true) do |result|
result << %w{email exists}
emails = []
CSV.foreach(input_csv_path) do |row|
total += 1
if total > 10001
raise 'User Validation limited to 10000 emails'
end
emails.push(row[0])
if emails.count > 99
append_to_csv_info_for_emails(result, emails)
end
end
if emails.count > 0
append_to_csv_info_for_emails(result, emails)
end
end
end
def append_to_csv_info_for_emails(csv, emails)
user_emails = User.where(email: emails).pluck(:email).to_set
emails.each do |email|
csv << [email, user_emails.include?(email)]
end
emails.clear
end

Globalize rails: Save all translations by checking I18n.available_locales

Is it possibile to save all translations looking at I18n.available_locales (or maybe some other Globalize config file) when the main record is created?
I'm using Globalize in combination with Active Admin and I created a custom page only for the translations but I would like the person who needs to translate to know which are the fields yet to be translated.
This is what I'm doing now (base model) even though I'm not proud of it. It seems to be twisted for no reason I did try way simpler solution which appeared at first to be valid but they turned out not to work.
after_save :add_empty_translations
def add_empty_translations
# if the class is translatable
if (self.class.translates?)
# get available locales
locales = I18n.available_locales.map do |l| l.to_s end
# get foreign key for translated table
foreign_key = "#{self.class.to_s.underscore}_id"
# get translated columns
translated_columns = self.class::Translation.column_names.select do |col|
!['id', 'created_at', 'updated_at', 'locale', "#{self.class.to_s.underscore}_id"].include? col
end
# save current locale
current_locale = I18n.locale
# foreach available locale check if column was difined by user
locales.each do |l|
I18n.locale = l
add_translation = true
translated_columns.each do |col|
add_translation = add_translation && self[col].nil?
end
if (add_translation)
payload = {}
payload[foreign_key] = self.id
payload['locale'] = l
self.class::Translation.create(payload)
end
end
#restore locale
I18n.locale = current_locale
end
end
Is there a way to do it with globalize?
Since the above solution wasn't working all the times I ended up patching the gem itself like it follows:
Globalize::ActiveRecord::Adapter.module_eval do
def save_translations!
# START PATCH
translated_columns = self.record.class::Translation.column_names.select do |col|
!['id', 'created_at', 'updated_at', 'locale', "#{self.record.class.to_s.underscore}_id"].include? col
end
payload = {}
translated_columns.each do |column|
payload[column] = ""
end
I18n.available_locales.each do |l|
add_translation = true
translated_columns.each { |column| add_translation &&= stash[l][column].nil? }
if (record.translations_by_locale[l].nil? && add_translation)
stash[l] = payload
end
end
# END PATCH
stash.each do |locale, attrs|
next if attrs.empty?
translation = record.translations_by_locale[locale] ||
record.translations.build(locale: locale.to_s)
attrs.each do |name, value|
value = value.val if value.is_a?(Arel::Nodes::Casted)
translation[name] = value
end
end
reset
end
end

Rails console - reload! third party services in modules

My app is connected to some third-party APIs.
I have several APIconnector module-singletons that are initialized only once at application start (initialized means the client is instanciated once with the credentials retrieved from secrets)
When I reload! the application in my console, I am losing those services and I have to exit and restart the console from scratch.
Basically all my connectors include a ServiceConnector module like this one
module ServiceConnector
extend ActiveSupport::Concern
included do
#activated = false
#activation_attempt = false
#client = nil
attr_reader :client, :activated
def self.client
#client ||= service_client
end
def self.service_name
name.gsub('Connector', '')
end
def self.activate
#activation_attempt = true
if credentials_present?
#client = service_client
#activated = true
end
end
Here is an example of a service implementation
module My Connector
include ServiceConnector
#app_id = nil
#api_key = nil
def self.set_credentials(id, key)
#app_id = id
#api_key = key
end
def self.credentials_present?
#app_id.present? and #api_key.present?
end
def self.service_client
::SomeAPI::Client.new(
app_id: #app_id,
api_key: #api_key
)
end
end
I use this pattern that lets me reuse those services outside Rails (eg Capistrano, worker without Rails, etc.). In Rails I would load the services that way
# config/initializers/my_service.rb
if my_service_should_be_activated?
my_service.set_credentials(
Rails.application.secrets.my_service_app_id,
Rails.application.secrets.my_service_app_key
)
my_service.activate
end
I guess that executing reload! seems to clear all my instance variables including #client, #app_id, #api_key.
Is it possible to add code to be executed after a reload! ? In my case I would need to re-run the initializer. Or is there a way to make sure the instance variables of my services are not cleared with a reload! ?
So I have come up with a solution involving two initializers
First, a 000_initializer that will report which secrets were loaded successfully
module SecretChecker
module_function
# Return true if all secrets are present
def secrets?(secret_list, under:)
secret_root = Rails.application.secrets
if under
if under.is_a?(Array)
secret_root = secret_root.public_send(under.shift)&.dig(*under.map(&:to_s))
else
secret_root = secret_root.public_send(under)
end
secret_list.map do |secret|
secret_root&.dig(secret.to_s).present?
end
else
secret_list.map do |secret|
secret_root&.public_send(secret.to_s).present?
end
end.reduce(:&)
end
def check_secrets(theme, secret_list, under: nil)
return if secrets?(secret_list, under: under)
message = "WARNING - Missing secrets for #{theme} - #{yield}"
puts message and Rails.logger.warn(message)
end
end
SecretChecker.check_secrets('Slack', %i[martine], under: [:slack, :webhooks]) do
'Slack Notifications will not work'
end
SecretChecker.check_secrets('MongoDB', %i[user password], under: :mongodb) do
'No Database Connection if auth is activated'
end
Then, a module to reload the services with ActiveSupport::Reloader (an example featuring Slack)
# config/initializers/0_service_activation.rb
module ServiceActivation
def self.with_reload
ActiveSupport::Reloader.to_prepare do
yield
end
end
module Slack
def self.service
::SlackConnector
end
def self.should_be_activated?
Rails.env.production? ||
Rails.env.staging? ||
(Rails.env.development? && ENV['ENABLE_SLACK'] == 'true')
end
def self.activate
slack = service
slack.webhook = Rails.application.secrets.slack&.dig('webhooks', 'my_webhook')
ENV['SLACK_INTERCEPT_CHANNEL'].try do |channel|
slack.intercept_channel = channel if channel.present?
end
slack.activate
slack
end
end
end
[
...,
ServiceActivation::Slack
] .each do |activator|
ServiceActivation.with_reload do
activator.activate if activator.should_be_activated?
activator.service.status_report
end
end

Fluentd record with source filename parts

I'm using fluentd on a server to export logs.
My configuration uses something like this to capture several log files:
<source>
type tail
path /my/path/to/file/*/*.log
</source>
The different files are tracked properly, however, I have one more feature needed:
The two wildcards parts of the path should be added to the record as well (let's call them directory and filename).
If the in_tail plugin would add the filename to the record, I could write a formatter to split and edit.
Anything that I'm missing or rewriting in_tail to my heart wishes is the best way to go?
So, yes. Extending in_tail is the way to go.
I've written a new plugin that inherits from NewTailInput and uses a slightly different parse_singleline and parse_multilines to add the path to the record.
Much better than expected.
Update 6/3/2020:
I've dug up the code, this was the least Ruby I could muster to solve the problem.
Customize convert_line_to_event_with_path_names for your needs to add custom data to the records.
module Fluent
class DirParsingTailInput < NewTailInput
Plugin.register_input('dir_parsing_tail', self)
def initialize
super
end
def receive_lines(lines, tail_watcher)
es = #receive_handler.call(lines, tail_watcher)
unless es.empty?
tag = if #tag_prefix || #tag_suffix
#tag_prefix + tail_watcher.tag + #tag_suffix
else
#tag
end
begin
router.emit_stream(tag, es)
rescue
# ignore errors. Engine shows logs and backtraces.
end
end
end
def convert_line_to_event_with_path_names(line, es, path)
begin
directory = File.basename(File.dirname(path))
filename = File.basename(path, ".*")
line.chomp! # remove \n
#parser.parse(line) { |time, record|
if time && record
if directory != "logs"
record["parent"] = directory
record["child"] = filename
else
record["parent"] = filename
end
es.add(time, record)
else
log.warn "pattern not match: #{line.inspect}"
end
}
rescue => e
log.warn line.dump, :error => e.to_s
log.debug_backtrace(e.backtrace)
end
end
def parse_singleline(lines, tail_watcher)
es = MultiEventStream.new
lines.each { |line|
convert_line_to_event_with_path_names(line, es, tail_watcher.path)
}
es
end
def parse_multilines(lines, tail_watcher)
lb = tail_watcher.line_buffer
es = MultiEventStream.new
if #parser.has_firstline?
lines.each { |line|
if #parser.firstline?(line)
if lb
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
end
lb = line
else
if lb.nil?
log.warn "got incomplete line before first line from #{tail_watcher.path}: #{line.inspect}"
else
lb << line
end
end
}
else
lb ||= ''
lines.each do |line|
lb << line
#parser.parse(lb) { |time, record|
if time && record
convert_line_to_event_with_path_names(lb, es, tail_watcher.path)
lb = ''
end
}
end
end
tail_watcher.line_buffer = lb
es
end
end
end

Monkey patching ActiveResource::Errors

I've come across an issue with ActiveResource that has been resolved and was trying to monkey patch it into my application without much luck.
I've added a file in config/initializers/ containing the following:
class ActiveResource::Errors < ActiveModel::Errors
# https://github.com/rails/rails/commit/b09b2a8401c18d1efff21b3919ac280470a6eb8b
def from_hash(messages, save_cache = false)
clear unless save_cache
messages.each do |(key,errors)|
errors.each do |error|
if #base.attributes.keys.include?(key)
add key, error
elsif key == 'base'
self[:base] << error
else
# reporting an error on an attribute not in attributes
# format and add themActive to base
self[:base] << "#{key.humanize} #{error}"
end
end
end
end
# Grabs errors from a json response.
def from_json(json, save_cache = false)
decoded = ActiveSupport::JSON.decode(json) || {} rescue {}
if decoded.kind_of?(Hash) && (decoded.has_key?('errors') || decoded.empty?)
errors = decoded['errors'] || {}
if errors.kind_of?(Array)
# 3.2.1-style with array of strings
ActiveSupport::Deprecation.warn('Returning errors as an array of strings is deprecated.')
from_array errors, save_cache
else
# 3.2.2+ style
from_hash errors, save_cache
end
else
# <3.2-style respond_with - lacks 'errors' key
ActiveSupport::Deprecation.warn('Returning errors as a hash without a root "errors" key is deprecated.')
from_hash decoded, save_cache
end
end
end
But it seems still to be calling activeresource-3.2.2/lib/active_resource/validations.rb:31:in 'from_json'. Any help on how properly to monkey patch this would be very much appreciated.
Thanks!
It turns out that the problem was Rails lazy loading ActiveResource after my file was loaded in the config, overriding it with the original definitions. The fix is simply requiring the needed files before defining the patched code.
My revised code:
require 'active_resource/base'
require 'active_resource/validations'
module ActiveResource
class Errors
# https://github.com/rails/rails/commit/b09b2a8401c18d1efff21b3919ac280470a6eb8b
def from_hash(messages, save_cache = false)
clear unless save_cache
messages.each do |(key,errors)|
errors.each do |error|
if #base.attributes.keys.include?(key)
add key, error
elsif key == 'base'
self[:base] << error
else
# reporting an error on an attribute not in attributes
# format and add themActive to base
self[:base] << "#{key.humanize} #{error}"
end
end
end
end
# Grabs errors from a json response.
def from_json(json, save_cache = false)
decoded = ActiveSupport::JSON.decode(json) || {} rescue {}
if decoded.kind_of?(Hash) && (decoded.has_key?('errors') || decoded.empty?)
errors = decoded['errors'] || {}
if errors.kind_of?(Array)
# 3.2.1-style with array of strings
ActiveSupport::Deprecation.warn('Returning errors as an array of strings is deprecated.')
from_array errors, save_cache
else
# 3.2.2+ style
from_hash errors, save_cache
end
else
# <3.2-style respond_with - lacks 'errors' key
ActiveSupport::Deprecation.warn('Returning errors as a hash without a root "errors" key is deprecated.')
from_hash decoded, save_cache
end
end
end
end

Resources