File object from URL - ruby-on-rails

I'd like to create a file object from an image located at a specific url. I'm downloading the file with Net Http:
img = Net::HTTP.get_response(URI.parse('https://prium-solutions.com/wp-content/uploads/2016/11/rails-1.png'))
file = File.read(img.body)
However, I get ArgumentError: string contains null byte when trying to read the file and store in into the file variable.
How can I do this without having to store it locally ?

Since File deals with reading from storage, it's really not applicable here. The read method is expecting you to hand it a location to read from, and you're passing in binary data.
If you have a situation where you need to interface with a library that expects an object that is streaming, you can wrap the string body in a StringIO object:
file = StringIO.new(img)
# you can now call file.read, file.seek, file.rewind, etc.

Related

How to change new File method in Groovy?

How do I replace the new File method with a secure one? Is it possible to create a python script and connect it?
Part of the code where I have a problem:
def template Name = new File(file: "${template}").normalize.name.replace(".html", "").replace(".yaml", "")
But when I run my pipeline, I get the error
java.lang.SecurityException: Unable to find constructor: new java.io .File java.util.LinkedHashMap
This method is prohibited and is blacklisted. How do I replace it and with what?
If you're reading the contents of the file, you can replace that "new File" with "readFile".
See https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#readfile-read-file-from-workspace
readFile: Read file from workspace
Reads a file from a relative path (with root in current directory, usually > workspace) and returns its content as a plain string.
file : String
Relative (/-separated) path to file within a workspace to read.
encoding : String (optional)
The encoding to use when reading the file. If left blank, the platform default encoding will be used. Binary files can be read into a Base64-encoded string by specifying "Base64" as the encoding.

How to get a bitmap image in ruby?

The google vision API requires a bitmap sent as an argument. I am trying to convert a png from a URL to a bitmap to pass to the google api:
require "google/cloud/vision"
PROJECT_ID = Rails.application.secrets["project_id"]
KEY_FILE = "#{Rails.root}/#{Rails.application.secrets["key_file"]}"
google_vision = Google::Cloud::Vision.new project: PROJECT_ID, keyfile: KEY_FILE
img = open("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png").read
image = google_vision.image img
ArgumentError: string contains null byte
This is the source code processing of the gem:
def self.from_source source, vision = nil
if source.respond_to?(:read) && source.respond_to?(:rewind)
return from_io(source, vision)
end
# Convert Storage::File objects to the URL
source = source.to_gs_url if source.respond_to? :to_gs_url
# Everything should be a string from now on
source = String source
# Create an Image from a HTTP/HTTPS URL or Google Storage URL.
return from_url(source, vision) if url? source
# Create an image from a file on the filesystem
if File.file? source
unless File.readable? source
fail ArgumentError, "Cannot read #{source}"
end
return from_io(File.open(source, "rb"), vision)
end
fail ArgumentError, "Unable to convert #{source} to an Image"
end
https://github.com/GoogleCloudPlatform/google-cloud-ruby
Why is it telling me string contains null byte? How can I get a bitmap in ruby?
According to the documentation (which, to be fair, is not exactly easy to find without digging into the source code), Google::Cloud::Vision#image doesn't want the raw image bytes, it wants a path or URL of some sort:
Use Vision::Project#image to create images for the Cloud Vision service.
You can provide a file path:
[...]
Or any publicly-accessible image HTTP/HTTPS URL:
[...]
Or, you can initialize the image with a Google Cloud Storage URI:
So you'd want to say something like:
image = google_vision.image "https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png"
instead of reading the image data yourself.
Instead of using write you want to use IO.copy_stream as it streams the download straight to the file system instead of reading the whole file into memory and then writing it:
require 'open-uri'
require 'tempfile'
uri = URI("https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png")
tmp_img = Tempfile.new(uri.path.split('/').last)
IO.copy_stream(open(uri), tmp_img)
Note that you don't need to set the 'r:BINARY' flag as the bytes are just streamed without actually reading the file.
You can then use the file by:
require "google/cloud/vision"
# Use fetch as it raises an error if the key is not present
PROJECT_ID = Rails.application.secrets.fetch("project_id")
# Rails.root is a Pathname object so use `.join` to construct paths
KEY_FILE = Rails.root.join(Rails.application.secrets.fetch("key_file"))
google_vision = Google::Cloud::Vision.new(
project: PROJECT_ID,
keyfile: KEY_FILE
)
image = google_vision.image(File.absolute_path(tmp_img))
When you are done you clean up by calling tmp_img.unlink.
Remember to read things in binary format:
open("https://www.google.com/..._272x92dp.png",'r:BINARY').read
If you forget this it might try and open it as UTF-8 textual data which would cause lots of problems.

Extract text from document in memory using docsplit

With the docsplit gem I can extract the text from a PDF or any other file type. For example, with the line:
Docsplit.extract_pages('doc.pdf')
I can have the text content of a PDF file.
I'm currently using Rails, and the PDF is sent through a request and lives in memory. Looking in the API and in the source code I couldn't find a way to extract the text from memory, only from a file.
Is there a way to get the text of this PDF avoiding the creation of a temporary file?
I'm using attachment_fu if it matters.
Use a temporary directory:
require 'docsplit'
def pdf_to_text(pdf_filename)
Docsplit.extract_text([pdf_filename], ocr: false, output: Dir.tmpdir)
txt_file = File.basename(pdf_filename, File.extname(pdf_filename)) + '.txt'
txt_filename = Dir.tmpdir + '/' + txt_file
extracted_text = File.read(txt_filename)
File.delete(txt_filename)
extracted_text
end
pdf_to_text('doc.pdf')
If you have the content in a string, use StringIO to create a File-like object that IO can read. In StringIO, it doesn't matter if the content is true text, or binary, it's all the same.
Look at either of:
new(string=""[, mode])
Creates new StringIO instance from with string and mode.
open(string=""[, mode]) {|strio| ...}
Equivalent to ::new except that when it is called with a block, it yields with the new instance and closes it, and returns the result which returned from the block.

downloading and storing files from given url to given path in lua

I'm new with lua but working on an application that works on specific files with given path. Now, I want to work on files that I download. Is there any lua libraries or line of codes that I can use for downloading and storing it on my computer ?
You can use the LuaSocket library and its http.request function to download using HTTP from an URL.
The function has two flavors:
Simple call: http.request('http://stackoverflow.com')
Advanced call: http.request { url = 'http://stackoverflow.com', ... }
The simple call returns 4 values - the entire content of the URL in a string, HTTP response code, headers and response line. You can then save the content to a file using the io library.
The advanced call allows you to set several parameters like HTTP method and headers. An important parameter is sink. It represents a LTN12-style sink. For storing to file, you can use sink.file:
local file = ltn12.sink.file(io.open('stackoverflow', 'w'))
http.request {
url = 'http://stackoverflow.com',
sink = file,
}

Sending uploaded file to resque worker to be processed

I just started using resque to do some processing on some very large files in the background, and I'm having trouble figuring out how to pass a file to a resque worker. I use rails to handle the file upload, and rails creates an ActionDispatch::Http::UploadedFile object for each file uploaded from the form.
How do send this file to a resque worker? I tried sending a custom hash of just the pathname of the temporary file and original filename, but I can't reopen the temporary file in the resque worker anymore (just a normal Errno::ENOENT - No such file or directory) because rails seems to delete that temporary file after the request ends.
Http::UploadedFileisn't accessible once the request finishes. You need to write the file somewhere (or use s3 as temp storage). Pass resque the path to the file that you wrote.
I just spent two days trying to do this and finally figured it out. You need to Base64 encode the file so that it can be serialized into json. Then you need to decode it in the worker and create a new
ActionDispatch::Http::UploadedFile
Here's how to encode and pass to resque:
// You only need to encode the actual file, everything else in the
// ActionDispatch::Http::UploadedFile object is just string or a hash of strings
file = params[:file] // Your ActionDispatch::Http::UploadedFile object
file.tempfile.binmode
file.tempfile = Base64.encode64(file.tempfile.read)
Resque.enqueue(QueueWorker, params)
And Here's how to decode and convert back to an object within your worker
class QueueWorker
#queue = :main_queue
def self.perform(params)
file = params['file']
tempfile = Tempfile.new('file')
tempfile.binmode
tempfile.write(Base64.decode64(file['tempfile']))
// Now that the file is decoded you need to build a new
// ActionDispatch::Http::UploadedFile with the decoded tempfile and the other
// attritubes you passed in.
file = ActionDispatch::Http::UploadedFile.new(tempfile: tempfile, filename: file['original_filename'], type: file['content_type'], head: file['headers'])
// This object is now the same as the one in your controller in params[:file]
end
end

Resources