counting regexp matches within the method - ruby-on-rails

I have some code like below. comment method is called whenever some comment occurs in the html. Then, I am doing a regexp match, I want to count the number of matches within the parsed comments. Its printing like below
1
2
3
4
5
what I want is to just print 5 because thats the total number of matches. can someone help pls.
class PlainTextExtractor < Nokogiri::XML::SAX::Document
def comment(string)
# I am defining some regexp here
m = Regexp.new(re, Regexp::IGNORECASE);
if m.match(string)
$count += 1
puts $count
end
end
end
parser = Nokogiri::HTML::SAX::Parser.new(PlainTextExtractor.new)
parser.parse_memory(html)

Just move your puts $count out of the loop. You can put it at the end, after you call the parser.

If you are only interested in the number of matches you can do
m = Regexp.new(re, Regexp::IGNORECASE);
puts string.scan(m).length

One way is to make your class count the number of matches internally in an instance variable, eg #count. Then use attr_reader to create a method allowing you to read its value at the end. Also you don't need a global variable. Example (not tested):
class PlainTextExtractor < Nokogiri::XML::SAX::Document
attr_reader :count
def comment(string)
# I am defining some regexp here
m = Regexp.new(re, Regexp::IGNORECASE);
if m.match(string)
#count += 1
end
end
end
pt_extractor = PlainTextExtractor.new
parser = Nokogiri::HTML::SAX::Parser.new(pt_extractor)
parser.parse_memory(html)
puts pt_extractor.count

Related

Trying to get a total - Ruby

I'm trying to get the total cost in one of my field called "upgrade_cost" and store that in a variable called $tuc
def totalUpgradeCost
$e = Experience.all
$tuc = 0
(e.emf_assets).each do |i|
i.upgrade_cost += $tuc
end
return $tuc
end
I'm getting some error undefined local variable or method `e', new to ruby. Anyone help?
I am assuming that emf_assets are associated (via has_many) with an experience. That said I think the following could work for you:
def total_upgrade_cost
total = 0 # use a more descriptive variable names
all_experiences = Experience.all
all_experiences.each do |experience| # iterate over each `experiment`
experience.emf_assets.each do |asset| # load `emf_assets` for each `experiment`
# add the `upgrade_cost` (which might be `nil`) to `total`
total += asset.upgrade_cost.to_i
end
end
total # no need for an explicit `return`
end
Please note that this might work for smaller numbers of experiences and emf_assets, but in a next step performance will benefit from some optimization. But I think that optimization is out of the scope of this question at the moment. You will need to avoid the N+1 query problem and it might makes sense to do the whole calculation in your database.
What is the e in e.emf_assets? If you mean $e, you aren't allowed to drop the $. In Ruby, a $ at the start of a variable name indicates a global variable. If you aren't using $e outside of this function anyway, it would be better to call it simply e, so that it wouldn't be visible outside of the function. Regardless, you're getting an error because $e refers to a global, and e refers to a separate (undefined) local variable.
This is not PHP. $ sign isn't required everywhere. You've used $ with one e and left another empty, that's why the error.
This code should work:
def totalUpgradeCost
e = Experience.all
tuc = 0
e.emf_assets.each do |i|
tuc += i.upgrade_cost
end
return tuc
end
This is doable in shorter way:
def totalUpgradeCost
e = Experience.all
e.emf_assets.inject(0) {|sum, i| sum += i.upgrade_cost}
end

Parse Json Data with Ruby on Rails

Objective: Parse data to display all the id's in the erb file
Problem: NoMethodError in DemoController#index due to this piece of code
#x = obj[i]["id"]
When I replace the "i" in the above piece of code with a number, one id number displays which leads me to believe that the while loop is correct. It just doesn't understand what "i" is.
What am I doing wrong?
Here is my code for my Controller and View
demo_controller.rb
require 'rubygems'
require 'json'
require 'net/http'
require 'httparty'
class DemoController < ApplicationController
respond_to :json
$angelURI = "https://api.angel.co/1/jobs"
def index
response = HTTParty.get('https://api.angel.co/1/jobs/')
obj = JSON.parse(response.body)["jobs"]
arraylength = obj.length
i = 0
while i <= arraylength do
#x = obj[i]["id"]
i += 1
end
end
end
index.html.erb
<%=#x%>
You are assigning a value to the same #x variable at each level of your loop - this will end with #x having the value of the last id - is that the intended behavior ?
I don't see something weird with your array right now, but Ruby tend to favor using each over for:
obj.each do |elem|
#x = elem["id"]
end
Upate: Following zishe good catch about the loop, using each also avoid that kind of question ("do I need to go to the ith element or stop at the ith-1").
By combining best of answers we get:
#x = []
obj.each do |job|
#x << job["id"]
end
i is a counter in while loop, it's basics. I think you looping to more, change <= on < in this:
i = 0
while i < arraylength do
#x = obj[i]["id"]
i += 1
end
Or better do like Martin suggests.
So, you have a off-by-one error: your while loop runs too far (because of the <=). Simple solution: use each (so you do not have to maintain a counter yourself --why make it hard). But on top, I would propose to add a file in lib that will do the parsing of the page.
So, e.g. add a file called lib/jobs_parser.rb that contains something like
require 'httparty'
module JobsParser
ANGEL_JOBS_URI = "https://api.angel.co/1/jobs"
def all_job_ids
all_jobs.map{|j| j["id"]}
end
def all_jobs
response = HTTParty.get(ANGEL_JOBS_URI)
jobs = JSON.parse(response.body)["jobs"]
end
end
What do I do here: the map generates an array containing just the "id" field.
I think it makes more sense, on this level to keep the complete array of jobs or ids.
Note: I drastically shortened the list require statements, most should be auto-required via your Gemfile.
And then in your controller you can write:
class DemoController < ApplicationController
def index
all_job_ids = JobsParser.all_job_ids
#x = all_job_ids.last
end
end
and your view remains the same :)
This has the advantage that you can simply test the JobsParser, through tests, or manually in the rails console, and that your code is a bit more readable.
You have a off-by-one error in your code. Basically, you are looping over the array and are then trying to access one more element than is in the array, which is then returned as nil and naturally doesn't act as a Hash.
Say your obj is an array with 3 elements, thus arraylength is three. You are now fetching 4 elements from the array, the elements with the indexes of 0, 1, 2, and 3. As you only have the 3 elements 0..2, the last one obj[3] doesn't exist.
To keep your existing code, you could change your loop to read as follows:
while i < arraylength do
#...
end
However, to just get the id of the last element in your array, it is much clearer (and much faster) to just use idiomatic ruby and write your whole algorithm as
def index
response = HTTParty.get('https://api.angel.co/1/jobs/')
jobs = JSON.parse(response.body)["jobs"]
#x = jobs.last["id"]
end

Mongoid random document

Lets say I have a Collection of users. Is there a way of using mongoid to find n random users in the collection where it does not return the same user twice? For now lets say the user collection looks like this:
class User
include Mongoid::Document
field :name
end
Simple huh?
Thanks
If you just want one document, and don't want to define a new criteria method, you could just do this:
random_model = Model.skip(rand(Model.count)).first
If you want to find a random model based on some criteria:
criteria = Model.scoped_whatever.where(conditions) # query example
random_model = criteria.skip(rand(criteria.count)).first
The best solution is going to depend on the expected size of the collection.
For tiny collections, just get all of them and .shuffle.slice!
For small sizes of n, you can get away with something like this:
result = (0..User.count-1).sort_by{rand}.slice(0, n).collect! do |i| User.skip(i).first end
For large sizes of n, I would recommend creating a "random" column to sort by. See here for details: http://cookbook.mongodb.org/patterns/random-attribute/ https://github.com/mongodb/cookbook/blob/master/content/patterns/random-attribute.txt
MongoDB 3.2 comes to the rescue with $sample (link to doc)
EDIT : The most recent of Mongoid has implemented $sample, so you can call YourCollection.all.sample(5)
Previous versions of mongoid
Mongoid doesn't support sample until Mongoid 6, so you have to run this aggregate query with the Mongo driver :
samples = User.collection.aggregate([ { '$sample': { size: 3 } } ])
# call samples.to_a if you want to get the objects in memory
What you can do with that
I believe the functionnality should make its way soon to Mongoid, but in the meantime
module Utility
module_function
def sample(model, count)
ids = model.collection.aggregate([
{ '$sample': { size: count } }, # Sample from the collection
{ '$project': { _id: 1} } # Keep only ID fields
]).to_a.map(&:values).flatten # Some Ruby magic
model.find(ids)
end
end
Utility.sample(User, 50)
If you really want simplicity you could use this instead:
class Mongoid::Criteria
def random(n = 1)
indexes = (0..self.count-1).sort_by{rand}.slice(0,n).collect!
if n == 1
return self.skip(indexes.first).first
else
return indexes.map{ |index| self.skip(index).first }
end
end
end
module Mongoid
module Finders
def random(n = 1)
criteria.random(n)
end
end
end
You just have to call User.random(5) and you'll get 5 random users.
It'll also work with filtering, so if you want only registered users you can do User.where(:registered => true).random(5).
This will take a while for large collections so I recommend using an alternate method where you would take a random division of the count (e.g.: 25 000 to 30 000) and randomize that range.
You can do this by
generate random offset which will further satisfy to pick the next n
elements (without exceeding the limit)
Assume count is 10, and the n is 5
to do this check the given n is less than the total count
if no set the offset to 0, and go to step 8
if yes, subtract the n from the total count, and you will get a number 5
Use this to find a random number, the number definitely will be from 0 to 5 (Assume 2)
Use the random number 2 as offset
now you can take the random 5 users by simply passing this offset and the n (5) as a limit.
now you get users from 3 to 7
code
>> cnt = User.count
=> 10
>> n = 5
=> 5
>> offset = 0
=> 0
>> if n<cnt
>> offset = rand(cnt-n)
>> end
>> 2
>> User.skip(offset).limit(n)
and you can put this in a method
def get_random_users(n)
offset = 0
cnt = User.count
if n < cnt
offset = rand(cnt-n)
end
User.skip(offset).limit(n)
end
and call it like
rand_users = get_random_users(5)
hope this helps
Since I want to keep a criteria, I do:
scope :random, ->{
random_field_for_ordering = fields.keys.sample
random_direction_to_order = %w(asc desc).sample
order_by([[random_field_for_ordering, random_direction_to_order]])
}
Just encountered such a problem. Tried
Model.all.sample
and it works for me
The approach from #moox is really interesting but I doubt that monkeypatching the whole Mongoid is a good idea here. So my approach is just to write a concern Randomizable that can included in each model you use this feature. This goes to app/models/concerns/randomizeable.rb:
module Randomizable
extend ActiveSupport::Concern
module ClassMethods
def random(n = 1)
indexes = (0..count - 1).sort_by { rand }.slice(0, n).collect!
return skip(indexes.first).first if n == 1
indexes.map { |index| skip(index).first }
end
end
end
Then your User model would look like this:
class User
include Mongoid::Document
include Randomizable
field :name
end
And the tests....
require 'spec_helper'
class RandomizableCollection
include Mongoid::Document
include Randomizable
field :name
end
describe RandomizableCollection do
before do
RandomizableCollection.create name: 'Hans Bratwurst'
RandomizableCollection.create name: 'Werner Salami'
RandomizableCollection.create name: 'Susi Wienerli'
end
it 'returns a random document' do
srand(2)
expect(RandomizableCollection.random(1).name).to eq 'Werner Salami'
end
it 'returns an array of random documents' do
srand(1)
expect(RandomizableCollection.random(2).map &:name).to eq ['Susi Wienerli', 'Hans Bratwurst']
end
end
I think it is better to focus on randomizing the returned result set so I tried:
Model.all.to_a.shuffle
Hope this helps.

In Ruby, how to write a method to display any object's instance variable names and its values

Given any object in Ruby (on Rails), how can I write a method so that it will display that object's instance variable names and its values, like this:
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298 #y=20, #x=38>
(Update: inspect will do except for large object it is difficult to break down the variables from the 200 lines of output, like in Rails, when you request.inspect or self.inspect in the ActionView object)
I also want to be able to print <br> to the end of each instance variable's value so as to print them out nicely on a webpage.
the difficulty now seems to be that not every instance variable has an accessor, so it can't be called with obj.send(var_name)
(the var_name has the "#" removed, so "#x" becomes "x")
Update: I suppose using recursion, it can print out a more advanced version:
#<Point:0x10031b462>
#x: 1
#y: 2
#link_to_point: #<Point:0x10031b298>
#x=38
#y=20
I would probably write it like this:
class Object
def all_variables(root=true)
vars = {}
self.instance_variables.each do |var|
ivar = self.instance_variable_get(var)
vars[var] = [ivar, ivar.all_variables(false)]
end
root ? [self, vars] : vars
end
end
def string_variables(vars, lb="\n", indent="\t", current_indent="")
out = "#{vars[0].inspect}#{lb}"
current_indent += indent
out += vars[1].map do |var, ivar|
ivstr = string_variables(ivar, lb, indent, current_indent)
"#{current_indent}#{var}: #{ivstr}"
end.join
return out
end
def inspect_variables(obj, lb="\n", indent="\t", current_indent="")
string_variables(obj.all_variables, lb, indent, current_indent)
end
The Object#all_variables method produces an array containing (0) the given object and (1) a hash mapping instance variable names to arrays containing (0) the instance variable and (1) a hash mapping…. Thus, it gives you a nice recursive structure. The string_variables function prints out that hash nicely; inspect_variables is just a convenience wrapper. Thus, print inspect_variables(foo) gives you a newline-separated option, and print inspect_variables(foo, "<br />\n") gives you the version with HTML line breaks. If you want to specify the indent, you can do that too: print inspect_variables(foo, "\n", "|---") produces a (useless) faux-tree format instead of tab-based indenting.
There ought to be a sensible way to write an each_variable function to which you provide a callback (which wouldn't have to allocate the intermediate storage); I'll edit this answer to include it if I think of something. Edit 1: I thought of something.
Here's another way to write it, which I think is slightly nicer:
class Object
def each_variable(name=nil, depth=0, parent=nil, &block)
yield name, self, depth, parent
self.instance_variables.each do |var|
self.instance_variable_get(var).each_variable(var, depth+1, self, &block)
end
end
end
def inspect_variables(obj, nl="\n", indent="\t", sep=': ')
out = ''
obj.each_variable do |name, var, depth, _parent|
out += [indent*depth, name, name ? sep : '', var.inspect, nl].join
end
return out
end
The Object#each_variable method takes a number of optional arguments, which are not designed to be specified by the user; instead, they are used by the recursion to maintain state. The given block is passed (a) the name of the instance variable, or nil if the variable is the root of the recursion; (b) the variable; (c) the depth to which the recursion has descended; and (d), the parent of the current variable, or nil if said variable is the root of the recursion. The recursion is depth-first. The inspect_variables function uses this to build up a string. The obj argument is the object to iterate through; nl is the line separator; indent is the indentation to be applied at each level; and sep separates the name and the value.
Edit 2: This doesn't really add anything to the answer to your question, but: just to prove that we haven't lost anything in the reimplementation, here's a reimplementation of all_variables in terms of each_variables.
def all_variables(obj)
cur_depth = 0
root = [obj, {}]
tree = root
parents = []
prev = root
obj.each_variable do |name, var, depth, _parent|
next unless name
case depth <=> cur_depth
when -1 # We've gone back up
tree = parents.pop(cur_depth - depth)[0]
when +1 # We've gone down
parents << tree
tree = prev
else # We're at the same level
# Do nothing
end
cur_depth = depth
prev = tree[1][name] = [var, {}]
end
return root
end
I feel like it ought to be shorter, but that may not be possible; because we don't have the recursion now, we have to maintain the stack explicitly (in parents). But it is possible, so the each_variable method works just as well (and I think it's a little nicer).
I see... Antal must be giving the advanced version here...
the short version then probably is:
def p_each(obj)
obj.instance_variables.each do |v|
puts "#{v}: #{obj.instance_variable_get(v)}\n"
end
nil
end
or to return it as a string:
def sp_each(obj)
s = ""
obj.instance_variables.each do |v|
s += "#{v}: #{obj.instance_variable_get(v)}\n"
end
s
end
or shorter:
def sp_each(obj)
obj.instance_variables.map {|v| "#{v}: #{obj.instance_variable_get(v)}\n"}.join
end
This is a quick adaptation of a simple JSON emitter I wrote for another question:
class Object
def inspect!(indent=0)
return inspect if instance_variables.empty?
"#<#{self.class}:0x#{object_id.to_s(16)}\n#{' ' * indent+=1}#{
instance_variables.map {|var|
"#{var}: #{instance_variable_get(var).inspect!(indent)}"
}.join("\n#{' ' * indent}")
}\n#{' ' * indent-=1}>"
end
end
class Array
def inspect!(indent=0)
return '[]' if empty?
"[\n#{' ' * indent+=1}#{
map {|el| el.inspect!(indent) }.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}]"
end
end
class Hash
def inspect!(indent=0)
return '{}' if empty?
"{\n#{' ' * indent+=1}#{
map {|k, v|
"#{k.inspect!(indent)} => #{v.inspect!(indent)}"
}.join(",\n#{' ' * indent}")
}\n#{' ' * indent-=1}}"
end
end
That's all the magic, really. Now we only need some simple defaults for some types where a full-on inspect doesn't really make sense (nil, false, true, numbers, etc.):
module InspectBang
def inspect!(indent=0)
inspect
end
end
[Numeric, Symbol, NilClass, TrueClass, FalseClass, String].each do |klass|
klass.send :include, InspectBang
end
Like this?
# Get the instance variables of an object
d = Date.new
d.instance_variables.each{|i| puts i + "<br />"}
Ruby Documentation on instance_variables.
The concept is commonly called "introspection", (to look into oneself).

How to refactor this Ruby (controller) code?

This is the code in my reports controller, it just looks so bad, can anyone give me some suggestions on how to tidy it up?
# app\controller\reports_controller.rb
#report_lines = []
#sum_wp, #sum_projcted_wp, #sum_il, #sum_projcted_il, #sum_li,#sum_gross_profit ,#sum_opportunities = [0,0,0,0,0,0,0]
date = #start_date
num_of_months.times do
wp,projected_wp, invoice_line,projected_il,line_item, opp = Report.data_of_invoicing_and_delivery_report(#part_or_service,date)
#sum_wp += wp
#sum_projcted_wp +=projected_wp
#sum_il=invoice_line
#sum_projcted_il +=projected_il
#sum_li += line_item
gross_profit = invoice_line - line_item
#sum_gross_profit += gross_profit
#sum_opportunities += opp
#report_lines << [date.strftime("%m/%Y"),wp,projected_wp ,invoice_line,projected_il,line_item,gross_profit,opp]
date = date.next_month
end
I'm looking to use some method like
#sum_a,#sum_b,#sum_c += [1,2,3]
My instant thought is: move the code to a model.
The objective should be "Thin Controllers", so they should not contain business logic.
Second, I like to present my report lines to my Views as OpenStruct() objects, which seems cleaner to me.
So I'd consider moving this accumulation logic into (most likely) a class method on Report and returning an array of "report line" OpenStructs and a single totals OpenStruct to pass to my View.
My controller code would become something like this:
#report_lines, #report_totals = Report.summarised_data_of_inv_and_dlvry_rpt(#part_or_service, #start_date, num_of_months)
EDIT: (A day later)
Looking at that adding accumulating-into-an-array thing, I came up with this:
require 'test/unit'
class Array
def add_corresponding(other)
each_index { |i| self[i] += other[i] }
end
end
class TestProblem < Test::Unit::TestCase
def test_add_corresponding
a = [1,2,3,4,5]
assert_equal [3,5,8,11,16], a.add_corresponding([2,3,5,7,11])
assert_equal [2,3,6,8,10], a.add_corresponding([-1,-2,-2,-3,-6])
end
end
Look: a test! It seems to work OK. There are no checks for differences in size between the two arrays, so there's lots of ways it could go wrong, but the concept seems sound enough. I'm considering trying something similar that would let me take an ActiveRecord resultset and accumulate it into an OpenStruct, which is what I tend to use in my reports...
Our new Array method might reduce the original code to something like this:
totals = [0,0,0,0,0,0,0]
date = #start_date
num_of_months.times do
wp, projected_wp, invoice_line, projected_il, line_item, opp = Report.data_of_invoicing_and_delivery_report(#part_or_service,date)
totals.add_corresponding [wp, projected_wp, invoice_line, projected_il, line_item, opp, invoice_line - line_item]
#report_lines << [date.strftime("%m/%Y"),wp,projected_wp ,invoice_line,projected_il,line_item,gross_profit,opp]
date = date.next_month
end
#sum_wp, #sum_projcted_wp, #sum_il, #sum_projcted_il, #sum_li, #sum_opportunities, #sum_gross_profit = totals
...which if Report#data_of_invoicing_and_delivery_report could also calculate gross_profit would reduce even further to:
num_of_months.times do
totals.add_corresponding(Report.data_of_invoicing_and_delivery_report(#part_or_service,date))
end
Completely un-tested, but that's a hell of a reduction for the addition of a one-line method to Array and performing a single extra subtraction in a model.
Create a summation object that contains all those fields, pass the entire array to #sum.increment_sums(Report.data_of...)

Resources