ROR/Hpricot: parsing a site and searching/comparing strings with regex - ruby-on-rails

I just started with Ruby On Rails, and want to create a simple web site crawler which:
Goes through all the Sherdog fighters' profiles.
Gets the Referees' names.
Compares names with the old ones (both during the site parsing and from the file).
Prints and saves all the unique names to the file.
An example URL is: http://www.sherdog.com/fighter/Fedor-Emelianenko-1500
I am searching for the tag entries like <span class="sub_line">Dan Miragliotta</span>, unfortunately, additionally to the proper Referee names I need, the same kind of class is used with:
The date.
"N/A" when the referee name is not known.
I need to discard all the results with a "N/A" string as well as any string which contains numbers. I managed to do the first part but couldn't figure out how to do the second. I tried searching, thinking and experimenting, but, after experimenting and rewriting, managed to break the whole program and don't know how to (properly) fix it:
require 'rubygems'
require 'hpricot'
require 'simplecrawler'
# Set up a new crawler
sc = SimpleCrawler::Crawler.new("http://www.sherdog.com/fighter/Fedor-Emelianenko-1500")
sc.maxcount = 1
sc.include_patterns = [".*/fighter/.*$", ".*/events/.*$", ".*/organizations/.*$", ".*/stats/fightfinder\?association/.*$"]
# The crawler yields a Document object for each visited page.
sc.crawl { |document|
# Parse page title with Hpricot and print it
hdoc = Hpricot(document.data)
(hdoc/"td/span[#class='sub_line']").each do |span|
if span.inner_html == 'N/A' || Regexp.new(".*/\d\.*$").match(span.inner_html)
# puts "Test"
else
puts span.inner_html
#File.open("File_name.txt", 'a') {|f| f.puts(hdoc.span.inner_html) }
end
end
}
I would also appreciate help with ideas on the rest of the program: How do I properly read the current names from the file, if the program is run multiple times, and how do I make the comparisons for the unique names?
Edit:
After some proposed improvements, here is what I got:
require 'rubygems'
require 'simplecrawler'
require 'nokogiri'
#require 'open-uri'
sc = SimpleCrawler::Crawler.new("http://www.sherdog.com/fighter/Fedor-Emelianenko-1500")
sc.maxcount = 1
sc.crawl { |document|
doc = Nokogiri::HTML(document.data)
names = doc.css('td:nth-child(4) .sub-line').map(&:content).uniq.reject { |c| c == 'N/A' }
puts names
}
Unfortunately, the code still doesn't work - it returns a blank.
If instead of doc = Nokogiri::HTML(document.data), I write doc = Nokogiri::HTML(open(document.data)), then it gives me the whole page, but, parsing still doesn't work.

hpricot isn't maintained anymore. How about using nokogiri instead?
names = document.css('td:nth-child(4) .sub-line').map(&:content).uniq.reject { |c| c == 'N/A' }
=> ["Yuji Shimada", "Herb Dean", "Dan Miragliotta", "John McCarthy"]
A breakdown of the different parts:
document.css('td:nth-child(4) .sub-line')
This returns an array of html elements with the class name sub-line that are in the forth table column.
.map(&:content)
For each element in the previous array, return element.content (the inner html). This is equivalent to map({ |element| element.content }).
.uniq
Remove duplicate values from the array.
.reject { |c| c == 'N/A' }
Remove elements whose value is "N/A"

You would use array math (-) to compare them:
get referees from the current page
current_referees = doc.search('td[4] .sub_line').map(&:inner_text).uniq - ['N/A']
read old referees from the file
old_referees = File.read('old_referees.txt').split("\n")
use Array#- to compare them
new_referees = current_referees - old_referees
write the new file
File.open('new_referees.txt','w'){|f| f << new_referees * "\n"}

This will return all the names, ignoring dates and "N/A":
puts doc.css('td span.sub_line').map(&:content).reject{ |s| s['/'] }.uniq
It results in:
Yuji Shimada
Herb Dean
Dan Miragliotta
John McCarthy
Adding these to a file and removing duplicates is left as an exercise for you, but I'd use some magical combination of File.readlines, sort and uniq followed by a bit of File.open to write the results.

Here is the final answer
require 'rubygems'
require 'simplecrawler'
require 'nokogiri'
require 'open-uri'
# Mute log messages
module SimpleCrawler
class Crawler
def log(message)
end
end
end
n = 0 # Counters how many pages/profiles processed
sc = SimpleCrawler::Crawler.new("http://www.sherdog.com/fighter/Fedor-Emelianenko-1500")
sc.maxcount = 150000
sc.include_patterns = [".*/fighter/.*$", ".*/events/.*$", ".*/organizations/.*$", ".*/stats/fightfinder\?association/.*$"]
old_referees = File.read('referees.txt').split("\n")
sc.crawl { |document|
doc = Nokogiri::HTML(document.data)
current_referees = doc.search('td[4] .sub_line').map(&:text).uniq - ['N/A']
new_referees = current_referees - old_referees
n +=1
# If new referees found, print statistics
if !new_referees.empty? then
puts n.to_s + ". " + new_referees.length.to_s + " new : " + new_referees.to_s + "\n"
end
new_referees = new_referees + old_referees
old_referees = new_referees.uniq
old_referees.reject!(&:empty?)
# Performance optimization. Saves only every 10th profile.
if n%10 == 0 then
File.open('referees.txt','w'){|f| f << old_referees * "\n" }
end
}
File.open('referees.txt','w'){|f| f << old_referees * "\n" }

Related

Move data into 3 Separate Hashes inside loop in ruby

It's only my second post and I'm still learning ruby.
I'm trying to figure this out based on my Java knowledge but I can't seem to get it right.
What I need to do is:
I have a function that reads a file line by line and extract different car features from each line, for example:
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
f.each_line do |line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = ....
end end
Now for each line I need to store the extracted features into separate
hashes that I can access later in my program.
The issues I'm facing:
1) I'm overwriting the features in the same hash
2) Can't access the hash outside my function
That what's in my file:
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC,
Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated
Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors,
Keyless Entry},19BF723A,2018,LE
Now my function extracts the features of each car model, but I need to store these features in 3 different hashes with the same keys but different values.
listing = Hash.new(0)
listing = { kilometers: km, type: t, transmission: trans, drivetrain: dt, status: status, car_maker: car_maker }
I tried moving the data from one hash to another, I even tried storing the data in an array first and then moving it to the hash but I still can't figure out how to create separate hashes inside a loop.
Thanks
I don't fully understand the question but I thought it was important to suggest how you might deal with a more fundamental issue: extracting the desired information from each line of the file in an effective and Ruby-like manner. Once you have that information, in the form of an array of hashes, one hash per line, you can do with it what you want. Alternatively, you could loop through the lines in the file, constructing a hash for each line and performing any desired operations before going on to the next line.
Being new to Ruby you will undoubtedly find some of the code below difficult to understand. If you persevere, however, I think you will be able to understand all of it, and in the process learn a lot about Ruby. I've made some suggestions in the last section of my answer to help you decipher the code.
Code
words_by_key = {
type: %w| sedan coupe hatchback station suv |,
transmission: %w| auto manual steptronic |,
drivetrain: %w| fwd rwd awd |,
status: %w| used new |,
car_maker: %w| honda toyota mercedes bmw lexus |,
model: %w| camry clk crv |
}
#=> {:type=>["sedan", "coupe", "hatchback", "station", "suv"],
# :transmission=>["auto", "manual", "steptronic"],
# :drivetrain=>["fwd", "rwd", "awd"],
# :status=>["used", "new"],
# :car_maker=>["honda", "toyota", "mercedes", "bmw", "lexus"],
# :model=>["camry", "clk", "crv"]}
WORDS_TO_KEYS = words_by_key.each_with_object({}) { |(k,v),h| v.each { |s| h[s] = k } }
#=> {"sedan"=>:type, "coupe"=>:type, "hatchback"=>:type, "station"=>:type, "suv"=>:type,
# "auto"=>:transmission, "manual"=>:transmission, "steptronic"=>:transmission,
# "fwd"=>:drivetrain, "rwd"=>:drivetrain, "awd"=>:drivetrain,
# "used"=>:status, "new"=>:status,
# "honda"=>:car_maker, "toyota"=>:car_maker, "mercedes"=>:car_maker,
# "bmw"=>:car_maker, "lexus"=>:car_maker,
# "camry"=>:model, "clk"=>:model, "crv"=>:model}
module ExtractionMethods
def km(str)
str[/\A\d+(?=km\z)/]
end
def year(str)
str[/\A\d+{4}\z/]
end
def stock(str)
return nil if str.end_with?('km')
str[/\A\d+\p{Alpha}\p{Alnum}*\z/]
end
def trim(str)
str[/\A\p{Alpha}{2}\z/]
end
def fuel_consumption(str)
str.to_f if str[/\A\d+(?:\.\d+)?(?=l\/100km\z)/]
end
end
class K
include ExtractionMethods
def extract_hashes(fname)
File.foreach(fname).with_object([]) do |line, arr|
line = line.downcase
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
arr << line.split(',').each_with_object(g) do |word, h|
word.strip!
if WORDS_TO_KEYS.key?(word)
h[WORDS_TO_KEYS[word]] = word
else
ExtractionMethods.instance_methods.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
end
end
end
end
end
Example
data =<<BITTER_END
65101km,Sedan,Manual,2010,18131A,FWD,Used,5.5L/100km,Toyota,camry,SE,{AC, Heated Seats, Heated Mirrors, Keyless Entry}
coupe,1100km,auto,RWD, Mercedec,CLK,LX ,18FO724A,2017,{AC, Heated Seats, Heated Mirrors, Keyless Entry, Power seats},6L/100km,Used
AWD,SUV,0km,auto,new,Honda,CRV,8L/100km,{Heated Seats, Heated Mirrors, Keyless Entry},19BF723A,2018,LE
BITTER_END
FILE_NAME = 'temp'
File.write(FILE_NAME, data)
#=> 353 (characters written to file)
k = K.new
#=> #<K:0x00000001c257d348>
k.extract_hashes(FILE_NAME)
#=> [{:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry}",
# :km=>"65101", :type=>"sedan", :transmission=>"manual", :year=>"2010",
# :stock=>"18131a", :drivetrain=>"fwd", :status=>"used", :fuel_consumption=>5.5,
# :car_maker=>"toyota", :model=>"camry", :trim=>"se"},
# {:set_of_features=>"{ac, heated seats, heated mirrors, keyless entry, power seats}",
# :type=>"coupe", :km=>"1100", :transmission=>"auto", :drivetrain=>"rwd",
# :model=>"clk", :trim=>"lx", :stock=>"18fo724a", :year=>"2017",
# :fuel_consumption=>6.0, :status=>"used"},
# {:set_of_features=>"{heated seats, heated mirrors, keyless entry}",
# :drivetrain=>"awd", :type=>"suv", :km=>"0", :transmission=>"auto",
# :status=>"new", :car_maker=>"honda", :model=>"crv", :fuel_consumption=>8.0,
# :stock=>"19bf723a", :year=>"2018", :trim=>"le"}]
Explanation
Firstly, note that the HEREDOC needs to be un-indented before being executed.
You will see that the instance method K#extract_hashes uses IO#foreach to read the file line-by-line.1
The first step in processing each line of the file is to downcase it. You will then want to split the string on commas to form an array of words. There is a problem, however, in that you don't want to split on commas that are between a left and right brace ({ and }), which corresponds to the key :set_of_features. I decided to deal with that by determining the indices of the two braces, creating a hash with the single key :set_of_features, delete that substring from the line and lastly replace a resulting pair of adjacent commas with a single comma:
idx_left = line.index('{')
idx_right = line.index('}')
if idx_left && idx_right
g = { set_of_features: line[idx_left..idx_right] }
line[idx_left..idx_right] = ''
line.squeeze!(',')
else
g = {}
end
See String for the documentation of the String methods used here and elsewhere.
We can now convert the resulting line to an array of words by splitting on the commas. If any capitalization is desired in the output this should be done after the hashes have been constructed.
We will build on the hash { set_of_features: line[idx_left..idx_right] } just created. When complete, it will be appended to the array being returned.
Each element (word) in the array, is then processed. If it is a key of the hash WORDS_TO_KEYS we set
h[WORDS_TO_KEYS[word]] = word
and are finished with that word. If not, we execute each of the instance methods m in the module ExtractionMethods until one is found for which m[word] is not nil. When that is found another key-value pair is added to the hash h:
h[m] = word
Notice that the name of each instance method in ExtractionMethods, which is a symbol (e.g., :km), is a key in the hash h. Having separate methods facilitates debugging and testing.
I could have written:
if (s = km(word))
s
elsif (s = year(word))
s
elsif (s = stock(str))
s
elsif (s = trim(str))
s
elsif (s = fuel_consumption(str))
s
end
but since all these methods take the same argument, word, we can instead use Object#public_send:
a = [:km, :year, :stock, :trim, :fuel_consumption]
a.find do |m|
v = public_send(m, word)
(h[m] = v) unless v.nil?
v
end
A final tweak is to put all the methods in the array a in a module ExtractionMethods and include that module in the class K. We can then replace a in the find expression above with ExtractionMethods.instance_methods. (See Module#instance_methods.)
Suppose now that the data are changed so that additional fields are added (e.g., for "colour" or "price"). Then the only modifications to the code required are changes to words_by_key and/or the addition of methods to ExtractionMethods.
Understanding the code
It may be helpful to run the code with puts statements inserted. For example,
idx_left = line.index('{')
idx_right = line.index('}')
puts "idx_left=#{idx_left}, idx_left=#{idx_left}"
Where code is chained it may be helpful to break it up with temporary variables and insert puts statements. For example, change
arr << line.split(',').each_with_object(g) do |word, h|
...
to
a = line.split(',')
puts "line.split(',')=#{a}"
enum = a.each_with_object(g)
puts "enum.to_a=#{enum.to_a}"
arr << enum do |word, h|
...
The second puts here is merely to see what elements the enumerator enum will generate and pass to the block.
Another way of doing that is to use the handy method Object#tap, which is inserted between two methods:
arr << line.split(',').tap { |a| puts "line.split(',')=#{a}"}.
each_with_object(g) do |word, h|
...
tap (great name, eh?), as used here, simply returns its receiver after displaying its value.
Lastly, I've used the method Enumerable#each_with_object in a couple of places. It may seem complex but it's actually quite simple. For example,
arr << line.split(',').each_with_object(g) do |word, h|
...
end
is effectively equivalent to:
h = g
arr << line.split(',').each do |word|
...
end
h
1 Many IO methods are typically invoked on File. This is acceptable because File.superclass #=> IO.
You could leverage the fact that your file instance is an enumerable. This allows you to leverage the inject method, and you can seed that with an empty hash. collector in this case is the hash that gets passed along as the iteration continues. Be sure to (implicitly, by having collector be the last line of the block) return the value of collector as the inject method will use this to feed into the next iteration. It's some pretty powerful stuff!
I think this is roughly what you're going for. I used model as the key in the hash, and set_of_features as your data.
def convertListings2Catalogue (fileName)
f = File.open(fileName, "r")
my_hash = f.inject({}) do |collector, line|
km=line[/[0-9]+km/]
t = line[(Regexp.union(/sedan/i, /coupe/i, /hatchback/i, /station/i, /suv/i))]
trans = line[(Regexp.union(/auto/i, /manual/i, /steptronic/i))]
dt = line[(Regexp.union(/fwd/i, /rwd/i, /awd/i))]
status = line[(Regexp.union(/used/i, /new/i))]
car_maker = line[(Regexp.union(/honda/i, /toyota/i, /mercedes/i, /bmw/i, /lexus/i))]
stock = line.scan(/(\d+[a-z0-9]+[a-z](?<!km\b))(?:,|$)/i).first
year = line.scan(/(\d{4}(?<!km\b))(?:,|$)/).first
trim = line.scan(/\b[a-zA-Z]{2}\b/).first
fuel = line.scan(/[\d.]+L\/\d*km/).first
set_of_features = line.scan(/\{(.*?)\}/).first
model = line[(Regexp.union(/camry/i, /clk/i, /crv/i))]
collector[model] = set_of_features
collector
end
end

How to fix slow Nokogiri parsing

I have a Rake task in my Rails app which looks into a folder for an XML file, parses it, and saves it to a database. The code works OK, but I have about 2100 files totaling 1.5GB, and processing is very slow, about 400 files in 7 hours. There are approximately 600-650 contracts in each XML file, and each contract can have 0 to n attachments. I did not paste all values, but each contract has 25 values.
To speed-up the process I use Activerecord's Import gem, so I am building an array per file and when the whole file is parsed. I do a mass import to Postgres. Only if a record is found is it directly updated and/or a new attachment inserted, but this is like 1 out of 100000 records. This helps a little, instead of doing new record per contract, but now I see that the slow part is XML parsing. Can you please look if I am doing something wrong in my parsing?
When I tried to print the arrays I am building, the slow part was until it loaded/parsed whole file and starts printing array by array. Thats why I assume the probem with speed is in parsing as Nokogiri loads the whole XML before it starts.
require 'nokogiri'
require 'pp'
require "activerecord-import/base"
ActiveRecord::Import.require_adapter('postgresql')
namespace :loadcrz2 do
desc "this task load contracts from crz xml files to DB"
task contracts: :environment do
actual_dir = File.dirname(__FILE__).to_s
Dir.foreach(actual_dir+'/../../crzfiles') do |xmlfile|
next if xmlfile == '.' or xmlfile == '..' or xmlfile == 'archive'
page = Nokogiri::XML(open(actual_dir+"/../../crzfiles/"+xmlfile))
puts xmlfile
cons = page.xpath('//contracts/*')
contractsarr = []
#c =[]
cons.each do |contract|
name = contract.xpath("name").text
crzid = contract.xpath("ID").text
procname = contract.xpath("procname").text
conname = contract.xpath("contractorname").text
subject = contract.xpath("subject").text
dateeff = contract.xpath("dateefficient").text
valuecontract = contract.xpath("value").text
attachments = contract.xpath('attachments/*')
attacharray = []
attachments.each do |attachment|
attachid = attachment.xpath("ID").text
attachname = attachment.xpath("name").text
doc = attachment.xpath("document").text
size = attachment.xpath("size").text
arr = [attachid,attachname,doc,size]
attacharray.push arr
end
#con = Crzcontract.find_by_crzid(crzid)
if #con.nil?
#c=Crzcontract.new(:crzname => name,:crzid => crzid,:crzprocname=>procname,:crzconname=>conname,:crzsubject=>subject,:dateeff=>dateeff,:valuecontract=>valuecontract)
else
#con.crzname = name
#con.crzid = crzid
#con.crzprocname=procname
#con.crzconname=conname
#con.crzsubject=subject
#con.dateeff=dateeff
#con.valuecontract=valuecontract
#con.save!
end
attacharray.each do |attar|
attachid=attar[0]
attachname=attar[1]
doc=attar[2]
size=attar[3]
#at = Crzattachment.find_by_attachid(attachid)
if #at.nil?
if #con.nil?
#c.crzattachments.build(:attachid=>attachid,:attachname=>attachname,:doc=>doc,:size=>size)
else
#a=Crzattachment.new
#a.attachid = attachid
#a.attachname = attachname
#a.doc = doc
#a.size = size
#a.crzcontract_id=#con.id
#a.save!
end
end
end
if #c.present?
contractsarr.push #c
end
#p #c
end
#p contractsarr
puts "done"
if contractsarr.present?
Crzcontract.import contractsarr, recursive: true
end
FileUtils.mv(actual_dir+"/../../crzfiles/"+xmlfile, actual_dir+"/../../crzfiles/archive/"+xmlfile)
end
end
end
There are a number of problems with the code. Here are some ways to improve it:
actual_dir = File.dirname(__FILE__).to_s
Don't use to_s. dirname is already returning a string.
actual_dir+'/../../crzfiles', with and without a trailing path delimiter is used repeatedly. Don't make Ruby rebuild the concatenated string over and over. Instead define it once, but take advantage of Ruby's ability to build the full path:
File.absolute_path('../../bar', '/path/to/foo') # => "/path/bar"
So use:
actual_dir = File.absolute_path('../../crzfiles', __FILE__)
and then refer to actual_dir only:
Dir.foreach(actual_dir)
This is unwieldy:
next if xmlfile == '.' or xmlfile == '..' or xmlfile == 'archive'
I'd do:
next if (xmlfile[0] == '.' || xmlfile == 'archive')
or even:
next if xmlfile[/^(?:\.|archive)/]
Compare these:
'.hidden'[/^(?:\.|archive)/] # => "."
'.'[/^(?:\.|archive)/] # => "."
'..'[/^(?:\.|archive)/] # => "."
'archive'[/^(?:\.|archive)/] # => "archive"
'notarchive'[/^(?:\.|archive)/] # => nil
'foo.xml'[/^(?:\.|archive)/] # => nil
The pattern will return a truthy value if it starts with '.' or is equal to 'archive'. It's not as readable but it's compact. I'd recommend the compound conditional test though.
In some places, you're concatenating xmlfile, so again let Ruby do it once:
xml_filepath = File.join(actual_dir, xmlfile)
which will honor the file path delimiter for whatever OS you're running on. Then use xml_filepath instead of concatenating the name:
xml_filepath = File.join(actual_dir, xmlfile)))
page = Nokogiri::XML(open(xml_filepath))
[...]
FileUtils.mv(xml_filepath, File.join(actual_dir, "archive", xmlfile)
join is a good tool so take advantage of it. It's not just another name for concatenating strings, because it's also aware of the correct delimiter to use for the OS the code is running on.
You use a lot of instances of:
xpath("some_selector").text
Don't do that. xpath, along with css and search return a NodeSet, and text when used on a NodeSet can be evil in a way that'll hurtle you down a very steep and slippery slope. Consider this:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<root>
<node>
<data>foo</data>
</node>
<node>
<data>bar</data>
</node>
</root>
EOT
doc.search('//node/data').class # => Nokogiri::XML::NodeSet
doc.search('//node/data').text # => "foobar"
The concatenation of the text into 'foobar' can't be split easily and it's a problem we see here in questions too often.
Do this if you expect getting a NodeSet back because of using search, xpath or css:
doc.search('//node/data').map(&:text) # => ["foo", "bar"]
It's better to use at, at_xpath or at_css if you're after a specific node because then text will work as you'd expect.
See "How to avoid joining all text from Nodes when scraping" also.
There's a lot of replication that could be DRY'd. Instead of this:
name = contract.xpath("name").text
crzid = contract.xpath("ID").text
procname = contract.xpath("procname").text
You could do something like:
name, crzid, procname = [
'name', 'ID', 'procname'
].map { |s| contract.at(s).text }

Parse Json Data with Ruby on Rails

Objective: Parse data to display all the id's in the erb file
Problem: NoMethodError in DemoController#index due to this piece of code
#x = obj[i]["id"]
When I replace the "i" in the above piece of code with a number, one id number displays which leads me to believe that the while loop is correct. It just doesn't understand what "i" is.
What am I doing wrong?
Here is my code for my Controller and View
demo_controller.rb
require 'rubygems'
require 'json'
require 'net/http'
require 'httparty'
class DemoController < ApplicationController
respond_to :json
$angelURI = "https://api.angel.co/1/jobs"
def index
response = HTTParty.get('https://api.angel.co/1/jobs/')
obj = JSON.parse(response.body)["jobs"]
arraylength = obj.length
i = 0
while i <= arraylength do
#x = obj[i]["id"]
i += 1
end
end
end
index.html.erb
<%=#x%>
You are assigning a value to the same #x variable at each level of your loop - this will end with #x having the value of the last id - is that the intended behavior ?
I don't see something weird with your array right now, but Ruby tend to favor using each over for:
obj.each do |elem|
#x = elem["id"]
end
Upate: Following zishe good catch about the loop, using each also avoid that kind of question ("do I need to go to the ith element or stop at the ith-1").
By combining best of answers we get:
#x = []
obj.each do |job|
#x << job["id"]
end
i is a counter in while loop, it's basics. I think you looping to more, change <= on < in this:
i = 0
while i < arraylength do
#x = obj[i]["id"]
i += 1
end
Or better do like Martin suggests.
So, you have a off-by-one error: your while loop runs too far (because of the <=). Simple solution: use each (so you do not have to maintain a counter yourself --why make it hard). But on top, I would propose to add a file in lib that will do the parsing of the page.
So, e.g. add a file called lib/jobs_parser.rb that contains something like
require 'httparty'
module JobsParser
ANGEL_JOBS_URI = "https://api.angel.co/1/jobs"
def all_job_ids
all_jobs.map{|j| j["id"]}
end
def all_jobs
response = HTTParty.get(ANGEL_JOBS_URI)
jobs = JSON.parse(response.body)["jobs"]
end
end
What do I do here: the map generates an array containing just the "id" field.
I think it makes more sense, on this level to keep the complete array of jobs or ids.
Note: I drastically shortened the list require statements, most should be auto-required via your Gemfile.
And then in your controller you can write:
class DemoController < ApplicationController
def index
all_job_ids = JobsParser.all_job_ids
#x = all_job_ids.last
end
end
and your view remains the same :)
This has the advantage that you can simply test the JobsParser, through tests, or manually in the rails console, and that your code is a bit more readable.
You have a off-by-one error in your code. Basically, you are looping over the array and are then trying to access one more element than is in the array, which is then returned as nil and naturally doesn't act as a Hash.
Say your obj is an array with 3 elements, thus arraylength is three. You are now fetching 4 elements from the array, the elements with the indexes of 0, 1, 2, and 3. As you only have the 3 elements 0..2, the last one obj[3] doesn't exist.
To keep your existing code, you could change your loop to read as follows:
while i < arraylength do
#...
end
However, to just get the id of the last element in your array, it is much clearer (and much faster) to just use idiomatic ruby and write your whole algorithm as
def index
response = HTTParty.get('https://api.angel.co/1/jobs/')
jobs = JSON.parse(response.body)["jobs"]
#x = jobs.last["id"]
end

creating dynamic hash with key and value in ruby on rails

I am trying to create Hash with dynamic key and respective values. For example like this
hash = {1 => 23.67, 1 => 78.44, 3 => 66.33, 12 => 44.2}
Something like this in which 1,2,12 are array index. I hope it is understandable. I am trying with the syntax from ROR tutorials.
Like this
test = Hash.new
for i in 0..23
if (s.duration.start.hour == array[i].hour)
s.sgs.each do |s1|
case s1.type.to_s
when 'M'
test ={i => s1.power} # here I am trying to create hash like give example in which i is for loop value
when 'L'
puts "to be done done"
else
puts "Not Found"
end
end
end
end
end
Updated code
test = Hash.new
for i in 0..23
if (s.duration.start.hour == array[i].hour)
s.sgs.each do |s1|
case s.type.to_s
when 'M'
puts s1.power;
test[i] = s1._power
when 'L'
puts "to be done"
else
puts "Not Found"
end
end
end
end
Results
on traversing
for t in 0..array.size
puts test[t]
end
Results :
t = 68.6 # which is last value
and expected
t = 33.4
t = 45.6 etc
Sample logs
after assign {23=>#<BigDecimal:7f3a1e9a6870,'0.3E2',9(18)>}
before assign {23=>#<BigDecimal:7f3a1e9a6870,'0.2E2',9(18)>}
after assign {23=>#<BigDecimal:7f3a1e9ce550,'-0.57E2',9(18)>}
before assign {23=>#<BigDecimal:7f3a1e9ce550,'-0.57E2',9(18)>}
if any other optimised solution is there would be good thanks
You are re-assigning test with a new hash on each iteration. You should add to it, so instead of
test ={i => s1.power}
you should do:
test[i] = s1.power
This sets the value of key i to s1.power
If you want to keep an array of all the values for a given key, I would suggest the following (more ruby-ish) solution:
hour_idx = array.find_index { |item| s.duration.start.hour == item.hour }
values = case s.type.to_s
when 'M'
s.sgs.map(&:_power)
when 'L'
puts "to be done"
else
puts "Not Found"
end
test = { hour_idx => values }
What I'm doing here is:
Find the hour_idx which is relevant to the current s (I assume there is only one such item)
Create an array of all the relevant values according to s.type (if it is 'M' an array of all the _power of s.sgs, for 'L' whatever map you need, and nil otherwise)
Create the target hash using the values set in #1 and #2...

Whats the best way to put a small ruby app online?

I have a small ruby application I wrote that's an anagram searcher. It's for learning ruby, but I would like to put it up online for personal use. I have some experience with Rails, and many here have recommended Sinatra. I'm fine with either, but I cannot find any information on how to use a text file instead of a database.
The application is quite simple, validates against a text file of a word list, then finds all anagrams. I have been assuming that this should be quite simple, but I'm stuck on importing that textfile into Rails (or Sinatra if i choose that way). In the Rails project, I have placed the textfile in the lib directory.
Unfortunately, even though the path appears to be correct in Rails, I get an error:
no such file to load -- /Users/court/Sites/cvtest/lib/english.txt
(cvtest is the name of the rails project)
Here is the code. It works great by itself:
file_path = '/Users/court/Sites/anagram/dictionary/english.txt'
input_string = gets.chomp
# validate input to list
if File.foreach(file_path) {|x| break x if x.chomp == input_string}
#break down the word
word = input_string.split(//).sort
# match word
anagrams = IO.readlines(file_path).partition{
|line| line.strip!
(line.size == word.size && line.split(//).sort == word)
}[0]
#list all words except the original
anagrams.each{ |matched_word| puts matched_word unless matched_word == input_string }
#display error if
else
puts "This word cannot be found in the dictionary"
end
Factor the actual functionality (finding the anagrams) into a method. Call that method from your Web app.
In Rails, you'd create a controller action that calls that method instead of ActiveRecord. In Sinatra, you'd just create a route that calls the method. Here's a Sinatra example:
get '/word/:input'
anagrams = find_anagrams(params[:input])
anagrams.join(", ")
end
Then, when you access the http://yourapp.com/word/pool, it will print "loop, polo".
I know the question is marked as answered, but I prefer the following, as it uses query parameters rather than path based parameters, which means you can pass the parameters in using a regular GET form submission:
require 'rubygems'
require 'sinatra'
def find_anagrams word
# your anagram method here
end
get '/anagram' do
#word = params['word']
#anagrams = find_anagrams #word if #word
haml :anagram
end
And the following haml (you could use whatever template language you prefer). This will give you an input form, and show the list of anagrams if a word has been provided and an anagram list has been generated:
%h1
Enter a word
%form{:action => "anagram"}
%input{:type => "text", :name => "word"}
%input{:type => "submit"}
- if #word
%h1
Anagrams of
&= #word
- if #anagrams
%ul
- #anagrams.each do |word|
%li&= word
- else
%p No anagrams found
With sinatra, you can do anything. These examples doesn't even require sinatra, you could roll your own rack interface thing.
require 'rubygems'
require 'sinatra'
require 'yaml'
documents = YAML::load_file("your_data.yml")
Or:
require 'rubygems'
require 'sinatra'
content = Dir[File.join(__DIR__, "content/*.textile)].map {|path|
content = RedCloth(File.read(path)).to_html
}
Etcetera.

Resources