Saving an ActiveRecord non-transactionally - ruby-on-rails

My application accepts file uploads, with some metadata being stored in the DB, and the file itself on the file system. I am trying to make the metadata visible in the application before the file upload and post-processing are finished, but because saves are transactional, I have had no success. I have tried the callbacks and calling create_or_update() instead of save(), all to no avail. Is there a way to do this without re-writing the guts of ActiveRecord::Base? I've even attempted naming the method make() instead of save(), but perplexingly that had no effect.
The code below "works" fine, but the database is not modified until everything else is finished.
def save(upload)
uploadFile = upload['datafile']
originalName = uploadFile.original_filename
self.fileType = File.extname(originalName)
create_or_update()
# write the file
File.open(self.filePath, "wb") { |f| f.write(uploadFile.read) }
begin
musicFile = TagLib::File.new(self.filePath())
self.id3Title = musicFile.title
self.id3Artist = musicFile.artist
self.id3Length = musicFile.length
rescue TagLib::BadFile => exc
logger.error("Failed to id track: \n #{exc}")
end
if(self.fileType == '.mp3')
convertToOGG();
end
create_or_update()
end
Any ideas would be quite welcome, thanks.

Have you considered processing the file upload as a background task? Save the metadata as normal and then perform the upload and post-processing using Delayed Job or similar. This Railscast has the details.

You're getting the meta-data from the file, right? So is the problem that the conversion to OGG is taking too long, and you want the data to appear before the conversion?
If so, John above has the right idea -- you're going to need to accept the file upload, and schedule a conversion to occur sometime in the future.
The main reason why is that your rails thread will process the OGG conversion and can't respond to any other web-requests until it's complete. Blast!
Some servers compensate for this by having multiple rails threads, but I recommend a background queue (use BJ if you host yourself, or Heroku's background jobs if you host there).

Related

Rails - running threads after method has exited

When the client changes his profile picture it hits the update method, which responds with update.js.erb. This is a fast and straightforward process. However, behind the scenes on the server, a bunch of files (10 of them) is generated from the profile picture and these need to be uploaded to an Amazon bucket from the server. This a lengthy process and I don't want to make the client wait until it is finished. Moreover, the file uploads often fail with a RequestTimeoutException because they take longer than 15 seconds.
All this raises many questions:
How do you do the 10 file generation/upload after the update method has exited? Threads are killed after the main method has finished.
How do you catch an exception inside a thread? The following code does not catch the timeout exceptions.
threads = []
threads << Thread.new {
begin
# upload file 1 ....
rescue Rack::Timeout::RequestTimeoutException => e
# try to upload again ....
else
ensure
end
}
threads << Thread.new {
begin
# upload file 2 ....
rescue Rack::Timeout::RequestTimeoutException => e
# try to upload again ....
else
ensure
end
}
threads.each { |thr|
thr.join
}
What's the best way to try to upload a file again if it timed out?
What is the best solution to this problem?
You need to use delayed_job or whenever gem for background task, but I would like suggest sidekiq
I also faced the same problem in a project. I came accross a solution using AWS lambda. You can use carrierwave gem/ rails 5 active storage module if you are using rails to upload image on S3. If you are not using rails then use AWS-SDK for ruby to upload files on S3. You can bind events whenever a file created/modified on S3. Whenever a file created it will invoke lambda function and your work is done. can bind them to lambda function. In lambda function you can write logic to create files and upload it back to s3. You can write lambda code in ruby, node and python.
This strategy may help you.

Net::SFTP Errors

I have been trying to download a file using Net::SFTP and it keeps getting an error.
The file is partially downloaded, and is only 2.1 MB, so it's not a huge file. I removed the loop over the files and even tried just downloading the one file and got the same error:
yml = YAML.load_file Rails.root.join('config', 'ftp.yml')
Net::SFTP.start(yml["url"], yml["username"], password: yml["password"]) do |sftp|
sftp.dir.glob(File.join('users', 'import'), '*.csv').each do |f|
sftp.download!(File.join('users', 'import', f.name), Rails.root.join('processing_files', 'download_files', f.name), read_size: 1024)
end
end
NoMethodError: undefined method `close' for #<Pathname:0x007fc8fdb50ea0>
from /[my_working_ap_dir]/gems/net-sftp-2.1.2/lib/net/sftp/operations/download.rb:331:in `on_read'
I have prayed to Google all I can and am not getting anywhere with it.
Rails.root returns a Pathname object, but it looks like the sftp code doesn't check to see whether it got a Pathname or a File handle, it just runs with it. When it runs into entry.sink.close it crashes because Pathnames don't implement close.
Pathnames are great for manipulating paths to files and directories, but they're not substitutes for file handles. You could probably tack on to_s which would return a string.
Here's a summary of the download call from the documentation that hints that the expected parameters should be a String:
To download a single file from the remote server, simply specify both the
remote and local paths:
downloader = sftp.download("/path/to/remote.txt", "/path/to/local.txt")
I suspect that if I dig into the code it will check to see whether the parameters are strings, and, if not, assumes that they are IO handles.
See ri Net::SFTP::Operations::Download for more info.
Here's an excerpt from the current download! code, and you can see how the problem occurred:
def download!(remote, local=nil, options={}, &block)
require 'stringio' unless defined?(StringIO)
destination = local || StringIO.new
result = download(remote, destination, options, &block).wait
local ? result : destination.string
end
local was passed in as a Pathname. The code checks to see if there's something passed in, but not what that is. If nothing is passed in it assumes it's something with IO-like features, which is what StringIO provides for the in-memory caching.
Apparently you can't use Rails.root.join, which was causing the problem. It is really stupid though because it would download part of the file.
Changed:
sftp.download!(File.join('users', 'import', f.name), Rails.root.join('processing_files', 'download_files', f.name))
To:
sftp.download!(File.join('users', 'import', f.name), File.join('processing_files', 'download_files', f.name))
argument remote can be a Pathname object while argument local when set should be a String or else an object that responds to #write method.
Below is the working code
local_stringified_path = Rails.root.join('processing_files', f.name).to_s
sftp.download!(Pathname.new('/users/import'), local_stringified_path)
For all those curious minds please read below to understand this behaviour..
The issue NoMethodError: undefined method close' for #<Pathname:0x007fc8fdb50ea0> happens exactly here
in the #on_read method and below is the code snippet of the concerned statements.
if response.eof?
update_progress(:close, entry)
entry.sink.close # ERRORED OUT LINE.. ideally when eof, file IO handler is supposed to be closed
WHAT IS entry.sink ?
We know already that #download! methods takes two args as below
sftp.download!(remote, local)
The given args remote and local is converted to an Entry object here
[Entry.new(remote, local, recursive?)]
and Entry is a nothing but a Struct here
Entry = Struct.new(:remote, :local, :directory, :size, :handle, :offset, :sink)
okay then what is sink attribute? we will jump to that right away..
Once the concerned remote file is open to be read, the #on_open method updates this sink attribute with a File handler here.
Find the snippet below,
entry.sink = entry.local.respond_to?(:write) ? entry.local : ::File.open(entry.local, "wb")
This actually happens only when given local path object doesn't implement it's own #write method In our scenario, Pathname objects does respond to write
Below are some snippets of the console outputs, I inspected in between multiple download chunk calls while debugging this.. which shows the entry and entry.sink displaying the above discussed objects.
Here I chose my remote to be a Pathname object and local to be String path which returns proper value for the entry.sink and there by downloading successfully..
0> entry
=> #<struct Net::SFTP::Operations::Download::Entry remote=#<Pathname:214010463.xml>, local="214010463.xml", directory=nil, size=nil, handle="1", offset=32000, sink=#<File:214010463.xml>>
0> entry.sink
=> #<File:214010463.xml>

cherrypy serve multiple requests / per connection

i have this code
(on the fly compression and stream)
#cherrypy.expose
def backup(self):
path = '/var/www/httpdocs'
zip_filename = "backup" + t.strftime("%d_%m_%Y_") + ".zip"
cherrypy.response.headers['Content-Type'] = 'application/zip'
cherrypy.response.headers['Content-Disposition'] = 'attachment; filename="%s"' % (zip_filename,)
#https://github.com/gourneau/SpiderOak-zipstream/blob/3463c5ccb5d4a53fc5b2bdff849f25bae9ead761/zipstream.py
return ZipStream(path)
backup._cp_config = {'response.stream': True}
the problem i faced is when i'm downloading the file i cant browse any other page or send any other request until the download done...
i think that the problem is that cherrypy can't serve more than one request at a time/ per user
any suggestion?
When you say "per user", do you mean that another request could come in for a different "session" and it would be allowed to continue?
In that case, your issue is almost certainly due to session locking in cherrypy. You can read more about it is the session code. Since the sessions are unlocked late by default, the session is not available for use by other threads (connections) while the backup is still being processed.
Try setting tools.sessions.locking = 'explicit' in the _cp_config for that handler. Since you’re not writing anything to the session, it’s probably safe not to lock at all.
Good luck. Hope that helps.
Also, from the FAQ:
"CherryPy certainly can handle multiple connections. It’s usually your browser that is the culprit. Firefox, for example, will only open two connections at a time to the same host (and if one of those is for the favicon.ico, then you’re down to one). Try increasing the number of concurrent connections your browser makes, or test your site with a tool that isn’t a browser, like siege, Apache’s ab, or even curl."

echoprint-codegen runs indefinitely with delayed_job

I'm attempting to run echoprint-codegen in a background process for analysing audio files as they're uploaded to a web service.
The desired functionality exists with a simple system call to the tmp file that gets uploaded via paperclip:
result = `echoprint-codegen #{path} 0 20` # works!
Unfortunately, this is not the case when the delayed workers fire off a new job; the echoprint-codegen process appears to hang indefinitely.
Per the echoprint README, I've double checked that ffmpeg is also within the path (Paperclip.options[:command_path] is pointing to the correct path).
I've also attempted to encapsulate the echoprint-codegen command line in a Paperclip.run() call, but that also results in a hanging process.
Any pointers?
I have obtained desired functionality by placing the echoprint-codegen system call in a Ruby Thread:
thread = Thread.new { Thread.current[:result] = `echoprint-codegen #{path} 0 20` }
thread.join
result = thread[:result]

Does Ruby's 'open_uri' reliably close sockets after read or on fail?

I have been using open_uri to pull down an ftp path as a data source for some time, but suddenly found that I'm getting nearly continual "530 Sorry, the maximum number of allowed clients (95) are already connected."
I am not sure if my code is faulty or if it is someone else who's accessing the server and unfortunately there's no way for me to really seemingly know for sure who's at fault.
Essentially I am reading FTP URI's with:
def self.read_uri(uri)
begin
uri = open(uri).read
uri == "Error" ? nil : uri
rescue OpenURI::HTTPError
nil
end
end
I'm guessing that I need to add some additional error handling code in here...
I want to be sure that I take every precaution to close down all connections so that my connections are not the problem in question, however I thought that open_uri + read would take this precaution vs using the Net::FTP methods.
The bottom line is I've got to be 100% sure that these connections are being closed and I don't somehow have a bunch open connections laying around.
Can someone please advise as to correctly using read_uri to pull in ftp with a guarantee that it's closing the connection? Or should I shift the logic over to Net::FTP which could yield more control over the situation if open_uri is not robust enough?
If I do need to use the Net::FTP methods instead, is there a read method that I should be familiar with vs pulling it down to a tmp location and then reading it (as I'd much prefer to keep it in a buffer vs the fs if possible)?
I suspect you are not closing the handles. OpenURI's docs start with this comment:
It is possible to open http/https/ftp URL as usual like opening a file:
open("http://www.ruby-lang.org/") {|f|
f.each_line {|line| p line}
}
I looked at the source and the open_uri method does close the stream if you pass a block, so, tweaking the above example to fit your code:
uri = ''
open("http://www.ruby-lang.org/") {|f|
uri = f.read
}
Should get you close to what you want.
Here's one way to handle exceptions:
# The list of URLs to pass in to check if one times out or is refused.
urls = %w[
http://www.ruby-lang.org/
http://www2.ruby-lang.org/
]
# the method
def self.read_uri(urls)
content = ''
open(urls.shift) { |f| content = f.read }
content == "Error" ? nil : content
rescue OpenURI::HTTPError
retry if (urls.any?)
nil
end
Try using a block:
data = open(uri){|f| f.read}

Resources