I just started using resque to do some processing on some very large files in the background, and I'm having trouble figuring out how to pass a file to a resque worker. I use rails to handle the file upload, and rails creates an ActionDispatch::Http::UploadedFile object for each file uploaded from the form.
How do send this file to a resque worker? I tried sending a custom hash of just the pathname of the temporary file and original filename, but I can't reopen the temporary file in the resque worker anymore (just a normal Errno::ENOENT - No such file or directory) because rails seems to delete that temporary file after the request ends.
Http::UploadedFileisn't accessible once the request finishes. You need to write the file somewhere (or use s3 as temp storage). Pass resque the path to the file that you wrote.
I just spent two days trying to do this and finally figured it out. You need to Base64 encode the file so that it can be serialized into json. Then you need to decode it in the worker and create a new
ActionDispatch::Http::UploadedFile
Here's how to encode and pass to resque:
// You only need to encode the actual file, everything else in the
// ActionDispatch::Http::UploadedFile object is just string or a hash of strings
file = params[:file] // Your ActionDispatch::Http::UploadedFile object
file.tempfile.binmode
file.tempfile = Base64.encode64(file.tempfile.read)
Resque.enqueue(QueueWorker, params)
And Here's how to decode and convert back to an object within your worker
class QueueWorker
#queue = :main_queue
def self.perform(params)
file = params['file']
tempfile = Tempfile.new('file')
tempfile.binmode
tempfile.write(Base64.decode64(file['tempfile']))
// Now that the file is decoded you need to build a new
// ActionDispatch::Http::UploadedFile with the decoded tempfile and the other
// attritubes you passed in.
file = ActionDispatch::Http::UploadedFile.new(tempfile: tempfile, filename: file['original_filename'], type: file['content_type'], head: file['headers'])
// This object is now the same as the one in your controller in params[:file]
end
end
Related
I am using fine-uploader.js and fine-uploader.css for uploading my files using web2py framework.
The callback in the controller is
def upload_callback():
if 'qqfile' in request.vars:
filename = request.vars.qqfile
newfilename = db.doc.filename.store(request.body, filename)
db.doc.insert(filename=newfilename,uploaded_by=auth.user.id)
return response.json({'success': 'true'})
The Model
uploadfolder = os.path.join(request.folder, 'uploads')
db.define_table('doc',
Field('name', requires = IS_NOT_EMPTY()),
Field('filename', 'upload',autodelete=True,uploadfolder=uploadfolder),
Field('uploaded_by', db.auth_user)
)
When I upload file 'test01.xls', web2py stores it in file "doc.filename.bfbf907529358f82.7830302729.txt"
I do not understand why the extension xls is being changed to txt. I have also tried uploading a jpg file. Web2py changes the extension of the uploaded file to txt. Can somebody help me.
As request.vars.qqfile is not the filename but a cgi.FieldStorage object, you cannot use it as the filename but must instead extract the filename from it:
filename = request.vars.qqfile.filename
Alternatively, you can simply pass the FieldStorage object directly to the .insert method, and web2py will automatically handle extracting the filename and saving the file:
def upload_callback():
if 'qqfile' in request.vars:
db.doc.insert(filename=request.vars.qqfile, uploaded_by=auth.user.id)
return response.json({'success': 'true'})
I'd like to create a file object from an image located at a specific url. I'm downloading the file with Net Http:
img = Net::HTTP.get_response(URI.parse('https://prium-solutions.com/wp-content/uploads/2016/11/rails-1.png'))
file = File.read(img.body)
However, I get ArgumentError: string contains null byte when trying to read the file and store in into the file variable.
How can I do this without having to store it locally ?
Since File deals with reading from storage, it's really not applicable here. The read method is expecting you to hand it a location to read from, and you're passing in binary data.
If you have a situation where you need to interface with a library that expects an object that is streaming, you can wrap the string body in a StringIO object:
file = StringIO.new(img)
# you can now call file.read, file.seek, file.rewind, etc.
I get image files sent from an Android app to my Rails API. I decode the images using this:
StringIO.new(Base64.decode64(image[1]))
The issue is that it takes too much time; on heroku it takes even longer.
Is there another way to do this that's faster and more efficient?
You can also use this for decode base64:
# this method for decode base64 code to file
def parse_image_data(image[1])
base64_file = image[1]
ext, string = base64_file.split(',')
ext = MIME::Types[base64_file].first.preferred_extension if ext.include?("base64")
tempfile = Tempfile.new(["#{DateTime.now.to_i}", ".#{ext}"])
tempfile.binmode
tempfile.write Base64.decode64(string)
tempfile.rewind
tempfile
end
I have a image in s3 bucket and url to access it.
I want to read the image from the s3 and to create a thumbnail icon and push the thumbnail_icon to s3.
If the image is in local, I can read the image and convert it to StringIO. After that I can push the StringIO to create thumbnail image in s3:
item = File.read(url)
data_io = StringIO.new(item)
s3_connection.interface.put(data_io, ...)
how can I open remote file and process it?
File.open(remote_url) returns No such file or directory
with OpenURI I can read the file. But I couldn't convert it to StringIO
response = open(remote_url) #Tempfile
data_io = StringIO.new(response)
#can't convert Tempfile into String`
What am I missing ?
The StringIO initialize method expects a string as the only parameter. The object you are giving it is a Tempfile. Try this:
data_io = StringIO.new(response.read)
With the docsplit gem I can extract the text from a PDF or any other file type. For example, with the line:
Docsplit.extract_pages('doc.pdf')
I can have the text content of a PDF file.
I'm currently using Rails, and the PDF is sent through a request and lives in memory. Looking in the API and in the source code I couldn't find a way to extract the text from memory, only from a file.
Is there a way to get the text of this PDF avoiding the creation of a temporary file?
I'm using attachment_fu if it matters.
Use a temporary directory:
require 'docsplit'
def pdf_to_text(pdf_filename)
Docsplit.extract_text([pdf_filename], ocr: false, output: Dir.tmpdir)
txt_file = File.basename(pdf_filename, File.extname(pdf_filename)) + '.txt'
txt_filename = Dir.tmpdir + '/' + txt_file
extracted_text = File.read(txt_filename)
File.delete(txt_filename)
extracted_text
end
pdf_to_text('doc.pdf')
If you have the content in a string, use StringIO to create a File-like object that IO can read. In StringIO, it doesn't matter if the content is true text, or binary, it's all the same.
Look at either of:
new(string=""[, mode])
Creates new StringIO instance from with string and mode.
open(string=""[, mode]) {|strio| ...}
Equivalent to ::new except that when it is called with a block, it yields with the new instance and closes it, and returns the result which returned from the block.