Read a file from github - ruby-on-rails

I want to read a file from github repository in my ruby script. Say I want to read Gemfile from my repo on github, URL for which would be like: "http://www.github.com/myrepo/blob/master/Gemfile".
I tried using File.readLink("http://www.github.com/myrepo/blob/master/Gemfile") but this gives me error saying "'readlink': No such file or directory # rb_readlink".
How do I read a file using the github URL?

You should try to fetch raw content from github files like:
require 'net/http'
uri = "https://raw.githubusercontent.com/username/myrepo/master/Gemfile"
uri = URI(uri)
file = Net::HTTP.get(uri)

With the below code, I was able to read the content of the file.
require 'open-uri'
raw_url = "https://raw.githubusercontent.com/username/myrepo/master/Gemfile"
open(raw_url) {|f|
f.each_line {|line| p line}
}

Related

How to fix Errno::ENOENT: No such file or directory # rb_sysopen - https://jobs.lever.co/stackadapt

I am trying to scrape a website using this tutorial:
https://towardsdatascience.com/job-board-scraping-with-rails-872c432ed2c8
Error: https://i.stack.imgur.com/XZ3T9.jpg
Did you have the line:
require 'open-uri'
before the doc = Nokogiri::HTML(open(URL))?
open-uri enhances the Kernel.open method, which normally only reads from a local file, with a http option. Your error looks like, open-uri was not loaded.
doc = Nokogiri::HTML(URI.open(link))
Added URI.
This post helped me

Downloading all files from URL with extension filter

I want to download all files from FTP or HTTP using extension filter
For example, I have one URL that contains many MKV files and I want to set the extension filtering to download all MKV files from URL or download all jpg
I can use open-Uri for this but this method only download one file and save it
require 'open-uri'
download = open('https://img.webmd.com/dtmcms/live/webmd/consumer_assets/site_images/article_thumbnails/video/caring_for_your_kitten_video/650x350_caring_for_your_kitten_video.jpg')
IO.copy_stream(download, '650x350_caring_for_your_kitten_video.png')
That's a limitation on the server site. If your server doesn't allow directory listing (which most of the time HTTP servers dont) there is not a lot you can do.
So to answer your question: open-uri will not allow you to list files.
As for FTP. you are able to list all files in a directory like this. Not sure if you're able to pass in wildcards. If not you will have to use the select method to filter on the filenames you want.
I found the solution to get all links and put it to a text file
Then using system command to get a text file to download by the download manager, for example, I need to get mkv links from URL pages
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get("URL")
page.search('a').each do |link|
uri = mechanize.resolve(' ' + link).to_s
if uri.include?(".mkv")
File.open("url.txt", "a") do |file|
file.puts uri
end
end
end
puts "The File has been created!"
#exec 'idman /d URL'
#puts "Done!"

Rails FTP OPEN CSV

I have the following code to connect my rails app to my FTP. This works great. However, I want to use open-uri to open the csv file so I can parse it. Any ideas how to do this? I think it's an easy thing to do but I'm missing something.
require 'net/ftp'
ftp = Net::FTP.new
ftp.connect("xxx.xxx.xx.xxx",21)
ftp.login("xxxxx","xxxx")
ftp.chdir("/")
ftp.passive = true
puts ftp.list("TEST.csv")
You'll need to use #gettextfile.
A) Get the file to a local temporary file and read its content
# Creating a tmp file can be done differently as well.
# It may also be omitted, in which case `gettextfile`
# will create a file in the current directory.
Dir::Tmpname.create(['TEST', ['.csv']) do |file_name|
ftp.gettextfile('TEST.csv', file_name)
content = File.read(file_name)
end
B) Pass a block to gettextfile and get the content one line at a time
content = ''
ftp.gettextfile('TEST.csv') do |line|
content << line
end

How can I download an image from a website using Rails?

I'm using Selenium-Webdriver, OpenUri and Nokogiri to scrape a website. I want to download a particular image from said website to my Ubuntu computer. I tried a few different methods but each of them gives a different error message.
Here's my base code, which opens the website and gets the image url (everything after this I ran in my pry console):
require 'open-url'
require 'selenium-webdriver'
require 'nokogiri'
require 'uri'
url = "https://www.google.com/"
browser = Selenium::WebDriver.for :chrome
document = open(url).read
parsed_content = Nokogiri::HTML(content)
image = "https://www.google.com" + parsed_content.css('#hplogo').attr('src').value
binding.pry
1) Here's the first thing I tried to download the image:
download = open(image)
IO.copy_stream(download, '~/image.png')
For this, I got the following error:
Errno::ENOENT: No such file or directory # rb_sysopen - ~/image.png from (pry):44:in 'initialize'
As per this question, I tried adding a directory in the code:
FileUtils.mkdir_p(image) unless File.exist?(image)
But I got the same error.
2) Next I tried this:
open('image.png', 'wb') do |file|
file << open(image).read
end
and this returns
#<File:image.png (closed)
but the file isn't anywhere on my computer and I can't figure out what that message means.
3) Next I tried
IO.copy_stream(open(image), 'image.png')
which simply returned this:
5482
but again, I have no idea what that means and the file isn't anywhere.
4) Finally I tried
read_image = open(image).read
File.open(image, 'image.png') do |file|
file.puts read_image
end
which outputs
ArgumentError: invalid access mode image.png
from (pry):53:in 'initialize
What am I doing wrong? Was I close with any of my approaches?
File open second argument is mode for file openning.
read_image = open(image).read
File.open('image.png', 'w+') do |file|
file.write read_image
end
Your third variant works good.
5482 - length of file. File 'image.png' in same directory as your .rb file.

Why does using OpenURI to download a file result in a partial file?

I'm trying to use OpenURI to download a file from S3, and then save it locally so I can send the file as an attachment with ActionMailer.
Something strange is going on. The images being downloaded and attached are corrupt, the bottom parts of the images are missing.
Here's the code:
require 'open-uri'
open("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}", "wb") do |file|
source_url = a.authenticated_url()
io = open(URI.parse(source_url).to_s)
file << io.read
attachments[a.attachment_file_name] = File.read("#{Rails.root.to_s}/tmp/#{a.attachment_file_name}")
end
a is the attachment from ActionMailer.
What can I try next?
It looks like you're trying to read the file before it's been closed, which could leave part of the file buffer unwritten.
I'd do it like this:
require 'open-uri'
source_url = a.authenticated_url()
attachment_file = "#{Rails.root.to_s}/tmp/#{a.attachment_file_name}"
open(attachment_file, "wb") do |file|
file.print open(source_url, &:read)
end
attachments[a.attachment_file_name] = File.read(attachment_file)
It looks like source_url = a.authenticated_url() will be a string, so parsing the string into a URI then doing to_s on it will be redundant unless URI is doing some normalizing, which I don't think it does.
Based on my sysadmin experience: A side task is cleaning up the downloaded/spooled files. They could be deleted immediately after being attached, or you could have a cron job that runs daily, deleting all spooled files over one day old.
An additional concern for this is there is no error handling in case the URL can't be read, causing the attachment to fail. Using a temp spool file you could check for the existence of the file. Even better, you should probably be prepared to handle an exception if the server returns a 400 or 500 error.
To avoid using a temporary spool file try this untested code:
require 'open-uri'
source_url = a.authenticated_url()
attachments[a.attachment_file_name] = open(source_url, &:read)

Resources