i've written a function within a model to scrape a site and store certain attributes within a separate model (story):
def get_content
request = HTTParty.get("#{url}")
doc = Nokogiri::HTML(request.body)
doc.css("#{anchor}")["#{range}"].each do |entry|
story = self.stories.new
story.title = entry.text
story.url = entry[:href]
story.save
end
This uses the url, anchor, and range attributes of a Sections variable. The range attribute is stored as an array range - i.e. 0..2 or 11..13 - however, I'm being told that it can't convert a string into a variable. I've tried storing range as an integer and as a string, but both fail.
I realise I could input the beginning and end of the range as two separate integers in my db, and put ["#{beginrange}".."#{endrange}"] but this seems a messy way of doing it.
Any other ideas? Many thanks in advance
Hmm if you are sure that the range is always a string like '1..2' ('<Integer >..<Integer>'), you can use the eval method:
In my IRB console:
1.9.3p0 :032 > (eval "1..2").each { |l| puts l }
1
2
=> 1..2
1.9.3p0 :033 > (eval "1..2").inspect
=> "1..2"
1.9.3p0 :034 > (eval "1..2").class
=> Range
In your case:
doc.css("#{anchor}")[eval(range)].each do |entry|
#...
end
But eval is kind of dangerous. If you are sure that the range attribute is a Range as a String (validations and Regex are here to help), you can use eval without risk.
There's a couple things I see wrong.
["#{beginrange}".."#{endrange}"] creates a range of characters, not a range of integers, which Array[] needs:
beginrange = 1
endrange = 2
["#{beginrange}".."#{endrange}"]
=> ["1".."2"]
[beginrange..endrange]
=> [1..2]
But, you're storing the representation of the array range you need as a string. If I had a string representation of a range, I'd use this:
range_value = '1..2'
[Range.new(*range_value.scan(/\d+/).map(&:to_i))]
=> [1..2]
Or, if there was a chance I'd encounter an exclusive-range:
[Range.new(*range_value.scan(/\d+/).map(&:to_i), range_value['...'])]
=> [1..2]
range_value = '1...2'
[Range.new(*range_value.scan(/\d+/).map(&:to_i), range_value['...'])]
=> [1...2]
Those are all good when you can't trust your Range string representation's source, i.e., the value is coming from a form or a file someone else created. If you own the incoming value, or, for convenience, stored it as a string in a database, you can easily recreate the range using eval:
eval('1..2').class
=> Range
eval('1..2')
=> 1..2
eval('1...2')
=> 1...2
People are afraid of eval, because, used unwisely, it is dangerous. That doesn't mean we should avoid using it, instead, we should use it when it's safe.
You could use a regex to check the format of the string, raise an exception if it's not acceptable, then continue:
raise "Invalid range value received" if (!range_value[/\A\d+\s*\.{2,3}\s*\d+\z/])
[eval(range_value)]
Related
I'm new to Ruby and I am building a web scraper. I have a variable that is assigned a value if a conditional is true.
The problem is that the value of the variable is really long and I'd like to avoid repeating myself with these long values.
I am using conditionals because the number of data that exists is not a static figure.
#Grab the top 3 comps if they exist
#comp1
if b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[1]/td[13]/span').exists?
comp1 = b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[1]/td[13]/span')
end
#comp2
if b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[2]/td[13]/span').exists?
comp2 = b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[2]/td[13]/span')
end
#comp3
if b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[3]/td[13]/span').exists?
comp3 = b.element(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[3]/td[13]/span')
end
Is there a way to decrease it the length of that such as
if "telement with really long xpath location on the webpage that we are checking to see if it is true ".exists?
x = "That conditional referenced above"
end
Since you're just replacing a single number in that long xpath selector you can use a template string:
elements = (1..3).map do |x|
b.element(
xpath: '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr[%d]/td[13]/span' % x
)
end.select(&:exists?)
See Kernel#sprintf for the options which are pretty much identical to the venerable C sprintf function.
Break up the string, either literally, or logically:
# literally
table_xpath = '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table'
if b.element(:xpath => "#{table_xpath}/tbody/tr[1]/td[13]/span").exists?
#...
end
# logically
table = b.element(xpath: '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table')
if table.element(xpath: "tbody/tr[1]/td[13]/span").exists?
end
break it up as many or as few times as you feel like to make the code read well.
You can directly write WATIR CODE as shown below, you have to use elements instead of element
b.elements(:xpath => '/html/body/form/div[3]/div[6]/table/tbody/tr/td/div[2]/div[3]/div[3]/div/div/div[1]/table/tbody/tr')
.take(3)
.map{|tr|tr.element(xpath: "./td[13]/span")}
But still, the above code is not optimized, you can write the below code Once you located the table, For the below code, I assume the table number is 2.
b.table(index: 2)
.rows
.to_enum
.take(3)
.map{|row| row.cell(index: 13).span}
I have a string params, whose value is "1" or "['1','2','3','4']". By using eval method, I can get the result 1 or [1,2,3,4], but I need the result [1] or [1,2,3,4].
params[:city_id] = eval(params[:city_id])
scope :city, -> (params) { params[:city_id].present? ? where(city_id: (params[:city_id].is_a?(String) ? eval(params[:city_id]) : params[:city_id])) : all }
Here i don't want eval.
scope :city, -> (params) { params[:city_id].present? ? where(city_id: params[:city_id]) : all }
params[:city_id] #should be array values e.g [1] or [1,2,3,4] instead of string
Your strings look very close to JSON, so probably the safest thing you can do is parse the string as JSON. In fact:
JSON.parse("1") => 1
JSON.parse('["1","2","3","4"]') => ["1","2","3","4"]
Now your array uses single quotes. So I would suggest you to do:
Array(JSON.parse(string.gsub("'", '"'))).map(&:to_i)
So, replace the single quotes with doubles, parse as JSON, make sure it's wrapped in an array and convert possible strings in the array to integers.
I've left a comment for what would be my preferred approach: it's unusual to get your params through as you are, and the ideal approach would be to address this. Using eval is definitely a no go - there are some big security concerns to doing so (e.g. imagine someone submitting "City.delete_all" as the param).
As a solution to your immediate problem, you can do this using a regex, scanning for digits:
str = "['1','2','3','4']"
str.scan(/\d+/)
# => ["1", "2", "3"]
str = '1'
str.scan(/\d+/)
# => ["1"]
# In your case:
params[:city_id].scan(/\d+/)
In very simple terms, this looks through the given string for any digits that are in there. Here's a simple Regex101 with results / an explanation: https://regex101.com/r/41yw9C/1.
Rails should take care of converting the fields in your subsequent query (where(city_id: params[:city_id])), though if you explictly want an array of integers, you can append the following (thanks #SergioTulentsev):
params[:city_id].scan(/\d+/).map(&:to_i)
# or in a single loop, though slightly less readable:
[].tap { |result| str.scan(/\d+/) { |match| result << match.to_i } }
# => [1, 2, 3, 4]
Hope that's useful, let me know how you get on or if you have any questions.
I have sample parameter below:
Parameters: {
"utf8"=>"✓",
"authenticity_token"=>"xxxxxxxxxx",
"post" => {
"product_attributes" => {
"name"=>"Ruby",
"product_dtls_attributes" => {
"0"=>{"price"=>"12,333.00"},
"1"=>{"price"=>"111,111.00"}
},
},
"content"=>"Some contents here."
}
Now, the scenario is, I cannot get the price exact value in model.
Instead of:
price = 12,333.00
price = 111,111.00
I get:
price = 12.00
price = 11.00
And now here is what I did in my code:
before_validation(on: :create) do
puts "price = #{self.price}" # I also tried self.price.to_s, but didn't work.
end
UPDATE:
(I am trying do to here is to get the full value and strip the comma).
before_validation(on: :create) do
puts "price = #{self.price.delete(',').to_f}" # I also tried self.price.to_s, but didn't work.
end
Note:
column price is float
The question is, how can I get the exact value of params price.
Thanks!
Looking at the 'price' parameter you provided:
"price"=>"12,333.00"
The problem is with the comma.
For example:
irb(main):003:0> "12,333.00".to_i
=> 12
But you can fix that:
Example:
irb(main):011:0> "12,333.00".tr(",", "_").to_i
=> 12333
The key point is replacing the comma with an underscore. The reason is that 12_333 is the same integer as 12333 (the underscores are ignored). You could just remove the comma with tr(",", "") as well. In this case, you could replace tr with gsub and have the same effect.
By the way, are you aware that your validation method is not doing anything besides printing? Anyway, a before_validation method is not the right approach here because the number will already have been incorrectly converted when the code reaches this point. Instead, you can override the setter on the model:
class MyModel
def price=(new_price)
if new_price.is_a?(String)
new_price = new_price.tr(",", "")
end
super(new_price)
end
end
You can do it like this too:
2.1.1 :002 > "12,333.00".gsub(',', '').to_f
=> 12333.0
This will replace the comma and if you have any decimal value then too it will interpret it:
2.1.1 :003 > "12,333.56".gsub(',', '').to_f
=> 12333.56
The solution I made is to handle it on controller. Iterate the hash then save it. Then it get the proper value which I want to get and save the proper value.
Iterate the following hash and save.
"post" => {
"product_attributes" => {
"name"=>"Ruby",
"product_dtls_attributes" => {
"0"=>{"price"=>"12,333.00"},
"1"=>{"price"=>"111,111.00"}
},
},
"content"=>"Some contents here."
I can't get the full value of price in model because of comma separator. This comma separator and decimal points + decimal places is made by gem.
Price is float, but your data contains a non-numeric character (comma, ","). When the field is converted to a float, parsing likely stops at this character and returns just 12.
I had expected an error to be thrown, though.
I suggest you remove the comma before putting it into the database.
Have been hacking together a couple of libraries, and had an issue where a string was getting 'double escaped'.
for example:
Fixed example
> x = ['a']
=> ["a"]
> x.to_s
=> "[\"a\"]"
>
Then again to
\"\[\\\"s\\\"\]\"
This was happening while dealing with http headers. I have a header which will be an array, but the http library is doing it's own character escaping on the array.to_s value.
The workaround I found, was to convert the array to a string myself, and then 'undo' the to_s. Like so:
formatted_value = value.to_s
if value.instance_of?(Array)
formatted_value = formatted_value.gsub(/\\/,"") #remove backslash
formatted_value = formatted_value.gsub(/"/,"") #remove single quote
formatted_value = formatted_value.gsub(/\[/,"") #remove [
formatted_value = formatted_value.gsub(/\]/,"") #remove ]
end
value = formatted_value
... There's gotta be a better way ... (without needing to monkey-patch the gems I'm using). (yeah, this break's if my string actually contains those strings.)
Suggestions?
** UPDATE 2 **
Okay. Still having troubles in this neighborhood, but now I think I've figured out the core issue. It's serializing my array to json after a to_s call. At least, that seems to be reproducing what I'm seeing.
['a'].to_s.to_json
I'm calling a method in a gem that is returning the results of a to_s, and then I'm calling to_json on it.
I've edited my answer due to your edited question:
I still can't duplicate your results!
>> x = ['a']
=> ["a"]
>> x.to_s
=> "a"
But when I change the last call to this:
>> x.inspect
=> "[\"a\"]"
So I'll assume that's what you're doing?
it's not necessarily escaping the values - per se. It's storing the string like this:
%{["a"]}
or rather:
'["a"]'
In any case. This should work to un-stringify it:
>> x = ['a']
=> ["a"]
>> y = x.inspect
=> "[\"a\"]"
>> z = Array.class_eval(y)
=> ["a"]
>> x == z
=> true
I'm skeptical about the safe-ness of using class_eval though, be wary of user inputs because it may produce un-intended side effects (and by that I mean code injection attacks) unless you're very sure you know where the original data came from, or what was allowed through to it.
My objective is to convert form input, like "100 megabytes" or "1 gigabyte", and converts it to a filesize in kilobytes I can store in the database. Currently, I have this:
def quota_convert
#regex = /([0-9]+) (.*)s/
#sizes = %w{kilobyte megabyte gigabyte}
m = self.quota.match(#regex)
if #sizes.include? m[2]
eval("self.quota = #{m[1]}.#{m[2]}")
end
end
This works, but only if the input is a multiple ("gigabytes", but not "gigabyte") and seems insanely unsafe due to the use of eval. So, functional, but I won't sleep well tonight.
Any guidance?
EDIT: ------
All right. For some reason, the regex with (.*?) isn't working correctly on my setup, but I've worked around it with Rails stuff. Also, I've realized that bytes would work better for me.
def quota_convert
#regex = /^([0-9]+\.?[0-9]*?) (.*)/
#sizes = { 'kilobyte' => 1024, 'megabyte' => 1048576, 'gigabyte' => 1073741824}
m = self.quota.match(#regex)
if #sizes.include? m[2].singularize
self.quota = m[1].to_f*#sizes[m[2].singularize]
end
end
This catches "1 megabyte", "1.5 megabytes", and most other things (I hope). It then makes it the singular version regardless. Then it does the multiplication and spits out magic answers.
Is this legit?
EDIT AGAIN: See answer below. Much cleaner than my nonsense.
You can use Rails ActiveHelper number_to_human_size.
def quota_convert
#regex = /([0-9]+) (.*)s?/
#sizes = "kilobytes megabytes gigabytes"
m = self.quota.match(#regex)
if #sizes.include? m[2]
m[1].to_f.send(m[2])
end
end
Added ? for optional plural in the regex.
Changed #sizes to a string of plurals.
Convert m[1] (the number to a float).
Send the message m[2] directly
why don't you simply create a hash that contains various spellings of the multiplier as the key and the numerical value as the value? No eval necessary and no regexs either!
First of all, changing your regex to #regex = /([0-9]+) (.*?)s?/ will fix the plural issue. The ? says match either 0 or 1 characters for the 's' and it causes .* to match in a non-greedy manner (as few characters as possible).
As for the size, you could have a hash like this:
#hash = { 'kilobyte' => 1, 'megabyte' => 1024, 'gigabyte' => 1024*1024}
and then your calculation is just self.quota = m[1].to_i*#hash[m2]
EDIT: Changed values to base 2