Remove certain regex from a string in Rails - ruby-on-rails

I am building a tweet-like system that includes #mentions and #hashtags. Right now, I need to take a tweet that will come to the server like this:
hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)
and save it in the database as:
hi #Bob P whats the deal with #red
I have the flow of what the code looks like in my mind but can't get it to work. Basically, I need to do the following:
Scan the string for any [#...] (an array like structure that begins with an #)
Delete the paranthesis after the array like structure(so for [#Bob D](member:Bob D), remove everything in paranthesis)
Remove the brackets surrounding a substring that begins with #(meaning, delete the [] from [#...])
I will also need to do the same for #. I'm almost certain this can be done by using regular expressions the slice! method, but i'm really having trouble coming up with the regular expressions needed and the control flow.
I think it would be something like this:
a = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
substring = a.scan <regular expression here>
substring.each do |matching_substring| #the loop should get rid of the paranthesis but not the brackets
a.slice! matching_substring
end
#Something here should get rid of brackets
The problem with the code above is that I can't figure out the regex and it doesn't get rid of the brackets.

This regex should work for this
/(\[(#.*?)\]\((.*?)\))/
you can use this rubular to test it
the ? after the * makes it non-greedy so it should capture each match
the code would look something like
a = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
substring = a.scan (\[(#.*?)\]\((.*?)\))
substring.each do |matching_substring|
a.gsub(matching_substring[0], matching_substring[1]) # replaces [#Bob D](member:Bob D) with #Bob D
matching_substring[1] #the part in the brackets sans brackets
matching_substring[2] #the part in the parentheses sans parentheses
end

Consider this:
str = "hi [#Bob D](member:Bob D) whats the deal with [#red](tag:red)"
BRACKET_RE_STR = '\[
(
[##]
[^\]]+
)
\]'
PARAGRAPH_RE_STR = '\(
[^)]+
\)'
BRACKET_RE = /#{BRACKET_RE_STR}/x
PARAGRAPH_RE = /#{PARAGRAPH_RE_STR}/x
BRACKET_AND_PARAGRAPH_RE = /#{BRACKET_RE_STR}#{PARAGRAPH_RE_STR}/x
str.gsub(BRACKET_AND_PARAGRAPH_RE) { |s| s.sub(PARAGRAPH_RE, '').sub(BRACKET_RE, '\1') }
# => "hi #Bob D whats the deal with #red"
The longer, or more complex the pattern, the harder it is to maintain or update, so keep them as small as possible. Build complex patterns from simple ones so it's easier to debug and extend.

Related

What Lua pattern behaves like a regex negative lookahead?

my problem is I need to write a Lua code to interpret a text file and match lines with a pattern like
if line_str:match(myPattern) then do myAction(arg) end
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world". I found that in regex, what I want is called negative lookahead, and you would write it like
.*hello (?!world).*
but I'm struggling to find the Lua version of this.
Let's say I want a pattern to match lines containing "hello" in any context except one containing "hello world".
As Wiktor has correctly pointed out, the simplest way to write this would be line:find"hello" and not line:find"hello world" (you can use both find and match here, but find is probably more performant; you can also turn off pattern matching for find).
I found that in regex, what I want is called negative lookahead, and
you would write it like .*hello (?!world).*
That's incorrect. If you checked against the existence of such a match, all it would tell you would be that there exists a "hello" which is not followed by a "world". The string hello hello world would match this, despite containing "hello world".
Negative lookahead is a questionable feature anyways as it isn't trivially provided by actually regular expressions and thus may not be implemented in linear time.
If you really need it, look into LPeg; negative lookahead is implemented as pattern1 - pattern2 there.
Finally, the RegEx may be translated to "just Lua" simply by searching for (1) the pattern without the negative part (2) the pattern with the negative part and checking whether there is a match in (1) that is not in (2) simply by counting:
local hello_count = 0; for _ in line:gmatch"hello" do hello_count = hello_count + 1 end
local helloworld_count = 0; for _ in line:gmatch"helloworld" do helloworld_count = helloworld_count + 1 end
if hello_count > helloworld_count then
-- there is a "hello" not followed by a "world"
end

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

Remove quotes from string built from an array

I have user controller input like so (the length and # of items may change):
str = "['honda', 'toyota', 'lexus']"
I would like to convert this into an array, but I'm struggling to find the best way to do so. eval() does exactly what I need, but it is not very elegant and is dangerous in this case, since it's user controller input.
Another way is:
str[1..-2].split(',').collect { |car| car.strip.tr("'", '') }
=> ["honda", "toyota", "lexus"]
But this is also not very elegant. Any suggestions that are more 'Rubyish'?
You could use a regular expression:
# match (in a non-greedy way) characters up to a comma or `]`
# capture each word as a group, and don't capture `,` or `]`
str.scan(/'(.+?)'(?:,|\])/).flatten
Or JSON.parse (but accounting for the fact that single quotes are in fact technically not allowed in JSON):
JSON.parse( str.tr("'", '"') )
JSON.parse probably has a small edge over the regexp in terms of performance, but if you're expecting your users to do single quote escaping, then that tr is going to mess things up. In this case, I'd stick with the regexp.
The JSON.parse looks more correct, but here is another alternative:
str.split(/[[:punct:] ]+/).drop(1)

Rails: Split text including dollar end euro

I'm using Rails and Nokogiri and I'm trying to parse some website.
This is where I'm stuck:
doc.css('#example > li:nth-child(1)').each do |node|
money = node.xpath('//*ul/li/div/span').text
end
It returns something like:
$100,000£230,000$40,000$9,000€600$800,000
I want to split those items, save them to the database and finally hand them to the view.
So, in the view, I want it to appear like:
(1)$100,000
(2)£230,000
(3)$40,000
(4)$9,000
(5)€600
(6)$800,000
I tried to split those items by this code below.
money = node.xpath('//*ul/li/div/span').text.split(/[$€£]/)
but the result looks like this:
["", "100,000", "230,000", "40,000", "9,000", "600", "800,000"]
And I don't know which item is in Dollar, Euro, or Pond.
Is there any good way to solve this problem?
you're almost there,
just use the positive lookahead :)
irb(main):005:0> "$100,000£230,000$40,000$9,000€600$800,000".split(/(?=[$£€])/)
=> ["$100,000", "£230,000", "$40,000", "$9,000", "€600", "$800,000"]
It needs a regular expression. This works:
"$100,000£230,000$40,000$9,000$600$800,000".scan(/([^\d][0-9,]+)/)
=> [["$100,000"],
["£230,000"],
["$40,000"],
["$9,000"],
["$600"],
["$800,000"]]
The regex contains these parts:
[^\d]: A character class matching a single non-digit. This will match the currency symbol.
`[0-9,]+': Another character class, this time repeating (the '+'). It matches the numeric part (0-9) plus the thousand's separator.

Break strings into substrings based on delimiters, with empty substrings

I am using LUA to create a table within a table, and am running into an issue. I need to also populate the NIL values that appear, but can not seem to get it right.
String being manipulated:
PatID = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
for word in PatID:gmatch("[^\~w]+") do table.insert(PatIDTable,word) end
local _, PatIDCount = string.gsub(PatID,"~","")
PatIDTableB = {}
for i=1, PatIDCount+1 do
PatIDTableB[i] = {}
end
for j=1, #PatIDTable do
for word in PatIDTable[j]:gmatch("[^\^]+") do
table.insert(PatIDTableB[j], word)
end
end
This currently produces this output:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='SCI'
[3]='SP'
[3]=table
[1]='N7N558300000Acc'
But I need it to produce:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]=''
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
EDIT:
I think I may have done a bad job explaining what it is I am looking for. It is not necessarily that I want the karats to be considered "NIL" or "empty", but rather, that they signify that a new string is to be started.
They are, I guess for lack of a better explanation, position identifiers.
So, for example:
L73F11341687Per^^^SCI^SP
actually translates to:
1. L73F11341687Per
2.
3.
4. SCI
5. SP
If I were to have
L73F11341687Per^12ABC^^SCI^SP
Then the positions are:
1. L73F11341687Per
2. 12ABC
3.
4. SCI
5. SP
And in turn, the table would be:
table
[1]=table
[1]='07-26-27'
[2]=table
[1]='L73F11341687Per'
[2]='12ABC'
[3]=''
[4]='SCI'
[5]='SP'
[3]=table
[1]='N7N558300000Acc'
[2]=''
Hopefully this sheds a little more light on what I'm trying to do.
Now that we've cleared up what the question is about, here's the issue.
Your gmatch pattern will return all of the matching substrings in the given string. However, your gmatch pattern uses "+". That means "one or more", which therefore cannot match an empty string. If it encounters a ^ character, it just skips it.
But, if you just tried :gmatch("[^\^]*"), which allows empty matches, the problem is that it would effectively turn every ^ character into an empty match. Which is not what you want.
What you want is to eat the ^ at the end of a substring. But, if you try :gmatch("([^\^])\^"), you'll find that it won't return the last string. That's because the last string doesn't end with ^, so it isn't a valid match.
The closest you can get with gmatch is this pattern: "([^\^]*)\^?". This has the downside of putting an empty string at the end. However, you can just remove that easily enough, since one will always be placed there.
local s0 = '07-26-27~L73F11341687Per^^^SCI^SP~N7N558300000Acc^'
local tt = {}
for s1 in (s0..'~'):gmatch'(.-)~' do
local t = {}
for s2 in (s1..'^'):gmatch'(.-)^' do
table.insert(t, s2)
end
table.insert(tt, t)
end

Resources