Read GB2312 encoding page using Ruby - ruby-on-rails

I am trying to parse GB2312 encoded page (http://news.qq.com/a/20140824/015032.htm), and this is my code.
I am not yet into the parsing part, just in the open and read, and I got error.
This is my code:
require 'open-uri'
open("http://news.qq.com/a/20140824/015032.htm").read
And this is the error:
Encoding::InvalidByteSequenceError: "\x8B" on GB2312
I am using Ruby 2.0.0p247
Any solution?

I don't know exactly why this happens when calling .read, but you can work around it if you are using Nokogiri. Just pass the file object directly to Nokogiri without calling .read:
require 'open-uri'
file = open("http://news.qq.com/a/20140824/015032.htm")
document = Nokogiri(file)

I cannot duplicate the error using 2.0.0p247,
require 'open-uri'
open("http://news.qq.com/a/20140824/015032.htm").read
Works fine.
However
require 'open-uri'
open("http://news.qq.com/a/20140824/015032.htm").read.encode('utf-8')
will raise the error
Encoding::InvalidByteSequenceError: "\x8B" on GB2312
Are you trying to do some encoding conversion?

you can try this
document = Nokogiri::HTML(open("http://news.qq.com/a/20140824/015032.htm"), nil, "GB18030")

Related

How to fix Errno::ENOENT: No such file or directory # rb_sysopen - https://jobs.lever.co/stackadapt

I am trying to scrape a website using this tutorial:
https://towardsdatascience.com/job-board-scraping-with-rails-872c432ed2c8
Error: https://i.stack.imgur.com/XZ3T9.jpg
Did you have the line:
require 'open-uri'
before the doc = Nokogiri::HTML(open(URL))?
open-uri enhances the Kernel.open method, which normally only reads from a local file, with a http option. Your error looks like, open-uri was not loaded.
doc = Nokogiri::HTML(URI.open(link))
Added URI.
This post helped me

uninitialized constant Spreadsheet::Link - roo gem - xlsx file

When I open a file in xlsx format, using empact/roo gem, this line of code:
data = Roo::Spreadsheet.open("/Users/asd/Desktop/in_xlsx.xlsx", extensions: :xlsx)
or this line
data = Roo::Excelx.new("/Users/asd/Desktop/in_xlsx.xlsx")
works perfect! (at least this is what I think)
data is now a Roo::Excelx object with the columns and rows filled correctly.
But whenever I try to use a method like data.first_row or data.cell(1,1), I get this
NameError: uninitialized constant Spreadsheet::Link
from /Users/asd/.rvm/gems/ruby-2.0.0-p353#ch/gems/roo-1.13.2/lib/roo/excelx.rb:379:in `set_cell_values'
Additional info:
MacOS 10.9.1
Rails 4.0.2
Ruby 2.0.0-p353
Roo (1.13.2)
Any help is really appreciated!
Try this :
require 'rubygems'
require 'roo'
For more information http://roo.rubyforge.org/

Name Error in Excel Uninitalized Contant in 'roo'

I am trying to read an Excel file in Ruby On Rails.
I have done coding like this for reading the cell content from the Excel sheet.
def test
require 'rubygems'
require 'iconv'
require 'roo'
s = Excel.new("C:/Sites/hmmsapp/Book1.xls")
s.default_sheet = s.sheets.first
1.upto(4) do |line|
roll = s.cell(line,'A')
puts "#{roll} -------------"
end
end
But on running this it always gives me this error.
NameError in HostelController#test
uninitialized constant HostelController::Excel
I have also included iconv as per suggestions for this problem. But there is no change in error.
Please give some light to removing this error & to read the excel file properly.
Try Roo::Excel.new
Or Roo::Spreadsheet.new

Need to load the URL from excel and need to execute using Ruby script

I am very new to ruby, i wanted to learn the ruby code. I have basic knowledge.
I have list of pages(URL), I saved in an excel. I wanted to get those pages and needs to load in browser. Please find below basic script which I wrote for page loading.
require 'rubygems'
require 'watir'
ie = Watir::IE.new
ie.goto("http://google.com")
ie.goto("http://www.softwaretestinghelp.com/")
ie.goto("http://www.onestoptesting.com/manual-testing/")
ie.goto("https://facebook.com")
Please help to get it done.
Thanks and Regards,
Vinayaka M N
You can do it with Nokogiri gem and xml/ json file.
For exmaple:
require 'watir-webdriver'
require 'nokogiri'
browser = Watir::Browser.new :firefox
doc = Nokogiri::XML(File.open('your_file.xml'))
url = doc.at('//url').text
browser.goto(url)
And you must have xml like this:
<url>http://google.com</url>

Ruby file_get_contents equivalent

I need to use this in my rails program so I can get the image contents and then base64 it. I know how to base64 it but I just don't know how I would get the image. Anyone know how?
Edited to retrieve from external URL:
PHP:
$image = file_get_contents("http://www.example.com/file.png");
Ruby:
require 'net/http'
image = Net::HTTP.get_response(URI.parse("http://www.example.com/file.png")).body
For http/https/ftp you can use OpenURI module:
require "open-uri"
image = open("http://www.example.com/file.png").read

Resources