Rails: Split text including dollar end euro - ruby-on-rails

I'm using Rails and Nokogiri and I'm trying to parse some website.
This is where I'm stuck:
doc.css('#example > li:nth-child(1)').each do |node|
money = node.xpath('//*ul/li/div/span').text
end
It returns something like:
$100,000£230,000$40,000$9,000€600$800,000
I want to split those items, save them to the database and finally hand them to the view.
So, in the view, I want it to appear like:
(1)$100,000
(2)£230,000
(3)$40,000
(4)$9,000
(5)€600
(6)$800,000
I tried to split those items by this code below.
money = node.xpath('//*ul/li/div/span').text.split(/[$€£]/)
but the result looks like this:
["", "100,000", "230,000", "40,000", "9,000", "600", "800,000"]
And I don't know which item is in Dollar, Euro, or Pond.
Is there any good way to solve this problem?

you're almost there,
just use the positive lookahead :)
irb(main):005:0> "$100,000£230,000$40,000$9,000€600$800,000".split(/(?=[$£€])/)
=> ["$100,000", "£230,000", "$40,000", "$9,000", "€600", "$800,000"]

It needs a regular expression. This works:
"$100,000£230,000$40,000$9,000$600$800,000".scan(/([^\d][0-9,]+)/)
=> [["$100,000"],
["£230,000"],
["$40,000"],
["$9,000"],
["$600"],
["$800,000"]]
The regex contains these parts:
[^\d]: A character class matching a single non-digit. This will match the currency symbol.
`[0-9,]+': Another character class, this time repeating (the '+'). It matches the numeric part (0-9) plus the thousand's separator.

Related

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

How can I simplify statements like these in an =OR() statement?

isnumber(search("-tr",right(j2,3
))),isnumber(search("-trus",right(j2,5))),isnumber(search(" ll",right(j2,3))),isnumber(search(" homes",right(j2,6))),isnumber(search("the ",left(j2,4))),isnumber(search(" hoa",right(j2,4))),isnumber(search("b ch",right(j2,4))),isnumber(search(" ch",right(j2,3))),isnumber(search("-trs",right(j2,4))),isnumber(search(" prop",right(j2,5))),isnumber(search(" st",right(j2,3))),isnumber(search(" av",right(j2,3))),isnumber(search(" ave",right(j2,4))),isnumber(search(" servi",right(j2,6))),isnumber(search(" maint",right(j2,6))),isnumber(search(" home",right(j2,5))),isnumber(search(" tr",right(j2,3))),isnumber(search(" assn",right(j2,5))),isnumber(search(" co",right(j2,3))),isnumber(search(" trus",right(j2,5))),isnumber(search(" trs",right(j2,4))),isnumber(search("-trs",right(j2,4))),isnumber(search(" tru",right(j2,4))),isnumber(search("jtrs",right(j2,4))),isnumber(search(" est of",right(j2,7))),isnumber(search(" trs",right(j2,4))),isnumber(value(LEFT(j2,1))),isnumber(search(" apts",right(j2,5))),isnumber(value(right(j2,3))),isnumber(search(" grp",right(j2,4))),isnumber(value(left(right(j2,4),1))),isnumber(search(" mgmt",right(j2,5))),isnumber(search(" props",right(j2,6))),isnumber(search(" tr",right(j2,3))),isnumber(search(" dev",right(j2,4))),isnumber(search(" tr",right(j2,3))),isnumber(search(" fdn",right(j2,4))),isnumber(search(" ent",right(j2,4))),isnumber(search(" PRPTS",right(j2,6))),isnumber(search(" ARPTS",right(j2,6))),isnumber(search(" univ",right(j2,5)))
So I have this giant =OR() statement containing a bunch of isnumner(search() statements checking to see if the string in a cell ends in these phrases. It is for the purpose of identifying company names in lists that contain both peoples names and company names. I feel like there must be a more efficient way. Adding them all together in one isnumber(search() in this format {item1|item2|item3} does not work.
I feel like there must be a more efficient way.
Building on the answer provided here, matching the end of the string can be done by using the $-sign (which means 'end of the string in regular expressions). Matching the beginning of the string on the other hand is done by providing a pattern after a caret (^), indicating the beginning of a string.
So, if you'd want to add both the the formula provided in the other thread
(LP|JT/RS)$ : match LP OR JT/RS at the end of the string
^(ABC|DEF) : match ABC OR DEF at the beginning of the string
That would make the formula look something like:
=REGEXMATCH(A2, "(?i)LLC|CORPORATION|COMPANY|HOLDINGS|PARTNERS|EQUITY|(LP|JT/RS)$|^(ABC|DEF)")
REFERENCE:
REGEXMATCH()
RE2 SYNTAX

Rails strip all except numbers commas and decimal points

Hi I've been struggling with this for the last hour and am no closer. How exactly do I strip everything except numbers, commas and decimal points from a rails string? The closest I have so far is:-
rate = rate.gsub!(/[^0-9]/i, '')
This strips everything but the numbers. When I try add commas to the expression, everything is getting stripped. I got the aboves from somewhere else and as far as I can gather:
^ = not
Everything to the left of the comma gets replaced by what's in the '' on the right
No idea what the /i does
I'm very new to gsub. Does anyone know of a good tutorial on building expressions?
Thanks
Try:
rate = rate.gsub(/[^0-9,\.]/, '')
Basically, you know the ^ means not when inside the character class brackets [] which you are using, and then you can just add the comma to the list. The decimal needs to be escaped with a backslash because in regular expressions they are a special character that means "match anything".
Also, be aware of whether you are using gsub or gsub!
gsub! has the bang, so it edits the instance of the string you're passing in, rather than returning another one.
So if using gsub! it would be:
rate.gsub!(/[^0-9,\.]/, '')
And rate would be altered.
If you do not want to alter the original variable, then you can use the version without the bang (and assign it to a different var):
cleaned_rate = rate.gsub!(/[^0-9,\.]/, '')
I'd just google for tutorials. I haven't used one. Regexes are a LOT of time and trial and error (and table-flipping).
This is a cool tool to use with a mini cheat-sheet on it for ruby that allows you to quickly edit and test your expression:
http://rubular.com/
You can just add the comma and period in the square-bracketed expression:
rate.gsub(/[^0-9,.]/, '')
You don't need the i for case-insensitivity for numbers and symbols.
There's lots of info on regular expressions, regex, etc. Maybe search for those instead of gsub.
You can use this:
rate = rate.gsub!(/[^0-9\.\,]/g,'')
Also check this out to learn more about regular expressions:
http://www.regexr.com/

Isolating/removing Characters from string using rails

I am using ruby on rails
I have
article.id = 509969989168Q000475601
I would like the output to be
article.id = 68Q000475601
basically want to get rid of all before it gets to 68Q
the numbers in front of the 68Q can be various length
is there a way to remove up to "68Q"
it will always be 68Q and Q is always the only Letter
is there a way to say remove all characters from 2 digits before "Q"
I'd use:
article.id[/68Q.*/]
Which will return everything from 68Q to the end of the string.
article.id.match(/68Q.+\z/)[0]
You can do this easily with the split method:
'68Q' + article.id.split('68Q')[1]
This splits the string into an array based on the delimiter you give it, then takes the second element of that array. For what it's worth though, #theTinMan's solution is far more elegant.

Conditional Regular Expression testing of a CSV

I am doing some client side validation in ASP.NET MVC and I found myself trying to do conditional validation on a set of items (ie, if the checkbox is checked then validate and visa versa). This was problematic, to say the least.
To get around this, I figured that I could "cheat" by having a hidden element that would contain all of the information for each set, thus the idea of a CSV string containing this information.
I already use a custom [HiddenRequired] attribute to validate if the hidden input contains a value, with success, but I thought as I will need to validate each piece of data in the csv, that a regular expression would solve this.
My regular expression work is extremely weak and after a good 2 hours I've almost given up.
This is an example of the csv string:
true,3,24,over,0.5
to explain:
true denotes if I should validate the rest. I need to conditionally switch in the regex using this
3 and 24 are integers and will only ever fall in the range 0-24.
over is a string and will either be over or under
0.5 is a decimal value, of unknown precision.
In the validation, all values should be present and at least of the correct type
Is there someone who can either provide such a regex or at least provide some hints, i'm really stuck!
Try this regex:
#"^(true,([01]?\d|2[0-4]),([01]?\d|2[0-4]),(over|under),\d+\.?\d+|false.*)$"
I'll try to explain it using comments. Feel free to ask if anything is unclear. =)
#"
^ # start of line
(
true, # literal true
([01]?\d # Either 0, 1, or nothing followed by a digit
| # or
2[0-4]), # 20 - 24
([01]?\d|2[0-4]), # again
(over|under), # over or under
\d+\.?\d+ # any number of digits, optional dot, any number of digits
| #... OR ...
false.* # false followed by anything
)
$ # end of line
");
I would probably use a Split(',') and validate elements of the resulting array instead of using a regex. Also you should watch out for the \, case (the comma is part of the value).

Resources