How can I download an image from a website using Rails? - ruby-on-rails

I'm using Selenium-Webdriver, OpenUri and Nokogiri to scrape a website. I want to download a particular image from said website to my Ubuntu computer. I tried a few different methods but each of them gives a different error message.
Here's my base code, which opens the website and gets the image url (everything after this I ran in my pry console):
require 'open-url'
require 'selenium-webdriver'
require 'nokogiri'
require 'uri'
url = "https://www.google.com/"
browser = Selenium::WebDriver.for :chrome
document = open(url).read
parsed_content = Nokogiri::HTML(content)
image = "https://www.google.com" + parsed_content.css('#hplogo').attr('src').value
binding.pry
1) Here's the first thing I tried to download the image:
download = open(image)
IO.copy_stream(download, '~/image.png')
For this, I got the following error:
Errno::ENOENT: No such file or directory # rb_sysopen - ~/image.png from (pry):44:in 'initialize'
As per this question, I tried adding a directory in the code:
FileUtils.mkdir_p(image) unless File.exist?(image)
But I got the same error.
2) Next I tried this:
open('image.png', 'wb') do |file|
file << open(image).read
end
and this returns
#<File:image.png (closed)
but the file isn't anywhere on my computer and I can't figure out what that message means.
3) Next I tried
IO.copy_stream(open(image), 'image.png')
which simply returned this:
5482
but again, I have no idea what that means and the file isn't anywhere.
4) Finally I tried
read_image = open(image).read
File.open(image, 'image.png') do |file|
file.puts read_image
end
which outputs
ArgumentError: invalid access mode image.png
from (pry):53:in 'initialize
What am I doing wrong? Was I close with any of my approaches?

File open second argument is mode for file openning.
read_image = open(image).read
File.open('image.png', 'w+') do |file|
file.write read_image
end
Your third variant works good.
5482 - length of file. File 'image.png' in same directory as your .rb file.

Related

Rails FTP OPEN CSV

I have the following code to connect my rails app to my FTP. This works great. However, I want to use open-uri to open the csv file so I can parse it. Any ideas how to do this? I think it's an easy thing to do but I'm missing something.
require 'net/ftp'
ftp = Net::FTP.new
ftp.connect("xxx.xxx.xx.xxx",21)
ftp.login("xxxxx","xxxx")
ftp.chdir("/")
ftp.passive = true
puts ftp.list("TEST.csv")
You'll need to use #gettextfile.
A) Get the file to a local temporary file and read its content
# Creating a tmp file can be done differently as well.
# It may also be omitted, in which case `gettextfile`
# will create a file in the current directory.
Dir::Tmpname.create(['TEST', ['.csv']) do |file_name|
ftp.gettextfile('TEST.csv', file_name)
content = File.read(file_name)
end
B) Pass a block to gettextfile and get the content one line at a time
content = ''
ftp.gettextfile('TEST.csv') do |line|
content << line
end

Heroku: Unpacking a Gzip file through a rake task fails

I'm using Rails 5.2 with ruby 2.5.1 and am deploying my app to Heroku.
I ran into problems when I tried running my local rake task. The task calls an API which responds with a *.gz file, saves it, upzips and then uses the retrieved JSON to populate the database and finally deletes the *.gz file. The task runs smooth in development but when called in production. The last line printed into the console is 'Unzipping the file...', so my guess is that the issues origin from the zlib library.
companies_list.rake
require 'json'
require 'open-uri'
require 'zlib'
require 'openssl'
require 'action_view'
include ActionView::Helpers::DateHelper
desc 'Updates Company table'
task update_db: :environment do
start = Time.now
zip_file_url = 'https://example.com/api/download'
TEMP_FILE_NAME = 'companies.gz'
puts 'Creating folders...'
tempdir = Dir.mktmpdir
file_path = "#{tempdir}/#{TEMP_FILE_NAME}"
puts 'Downloading the file...'
open(file_path, 'wb') do |file|
open(zip_file_url, { ssl_verify_mode: OpenSSL::SSL::VERIFY_NONE }) do |uri|
file.write(uri.read)
end
end
puts 'Download complete.'
puts 'Unzipping the file...'
gz = Zlib::GzipReader.new(open(file_path))
output = gz.read
#companies_array = JSON.parse(output)
puts 'Unzipping complete.'
(...)
end
Has anyone else run into similar issues and knows how to get it to work?
Your code snippet does not indicate that you ever close your GzipReader. It is often best to wrap IO's in blocks to ensure they are closed appropriately. Also, the open method may not be the one you want, so just let GzipReader handle opening the file for you and just send in the file_path.
Zlib::GzipReader.new(file_path) do |gz|
output = gz.read
#companies_array = JSON.parse(output)
end
The issue was linked to memory limit rather than Gzip unpacking (that's why the problem only occurred in production).
The solution was using a Json::Streamer so that the whole file is not loading into memory at once.
This is the crucial part: (goes after the code posted in the question)
puts 'Updating the Company table...'
streamer = Json::Streamer.parser(file_io: file, chunk_size: 1024) # customize your chunk_size
streamer.get(nesting_level: 1) do |company|
(do your stuff with the API data here...)
end
end

Opening a remote image on S3 in controller (ApplicationMailer) - No such file or directory # rb_sysopen

I am trying to open a profile picture uploaded to S3 through Paperclip, in order to submit it as an online attachment to an email.
Though I get error :
No such file or directory # rb_sysopen
Here is my bit of code in question :
attachments.inline['profilepic'] = File.read(profilepic)
profilepic being an absolute URL (starting with //mybucket.S3-eu-west..... ) to the image at S3 (when pasted onto the navbar, it just shows the image perfectly)
I have tried the following using open-uri, but same error
require 'open-uri'
attachments.inline['profilepic'] = open(profilepic)
Like you did, you need to first:
require 'open-uri'
and then do:
uri = URI("http:"+profilepic.to_s)
attachments["profilepic"] = open(uri).read

Read a file from github

I want to read a file from github repository in my ruby script. Say I want to read Gemfile from my repo on github, URL for which would be like: "http://www.github.com/myrepo/blob/master/Gemfile".
I tried using File.readLink("http://www.github.com/myrepo/blob/master/Gemfile") but this gives me error saying "'readlink': No such file or directory # rb_readlink".
How do I read a file using the github URL?
You should try to fetch raw content from github files like:
require 'net/http'
uri = "https://raw.githubusercontent.com/username/myrepo/master/Gemfile"
uri = URI(uri)
file = Net::HTTP.get(uri)
With the below code, I was able to read the content of the file.
require 'open-uri'
raw_url = "https://raw.githubusercontent.com/username/myrepo/master/Gemfile"
open(raw_url) {|f|
f.each_line {|line| p line}
}

Why does using OpenURI to download a file result in a partial file?

I'm trying to use OpenURI to download a file from S3, and then save it locally so I can send the file as an attachment with ActionMailer.
Something strange is going on. The images being downloaded and attached are corrupt, the bottom parts of the images are missing.
Here's the code:
require 'open-uri'
open("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}", "wb") do |file|
source_url = a.authenticated_url()
io = open(URI.parse(source_url).to_s)
file << io.read
attachments[a.attachment_file_name] = File.read("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}")
end
a is the attachment from ActionMailer.
What can I try next?
It looks like you're trying to read the file before it's been closed, which could leave part of the file buffer unwritten.
I'd do it like this:
require 'open-uri'
source_url = a.authenticated_url()
attachment_file = "#{Rails.root.to_s}/tmp/#{a.attachment_file_name}"
open(attachment_file, "wb") do |file|
file.print open(source_url, &:read)
end
attachments[a.attachment_file_name] = File.read(attachment_file)
It looks like source_url = a.authenticated_url() will be a string, so parsing the string into a URI then doing to_s on it will be redundant unless URI is doing some normalizing, which I don't think it does.
Based on my sysadmin experience: A side task is cleaning up the downloaded/spooled files. They could be deleted immediately after being attached, or you could have a cron job that runs daily, deleting all spooled files over one day old.
An additional concern for this is there is no error handling in case the URL can't be read, causing the attachment to fail. Using a temp spool file you could check for the existence of the file. Even better, you should probably be prepared to handle an exception if the server returns a 400 or 500 error.
To avoid using a temporary spool file try this untested code:
require 'open-uri'
source_url = a.authenticated_url()
attachments[a.attachment_file_name] = open(source_url, &:read)

Resources