Postgres sort alphanumeric only - ruby-on-rails

Trying to sort descending A->Z on some podcast titles, I only want A-Z and 0-9, everything else should come last:
.order('title ASC')
is giving me odd results at the start and end. The majority of the results in the middle are fine:
> ["\"Success Living\" - Dr. Leigh-Davis",
"\"The Real Deal\" with Dr. Leigh-Davis",
"#WeThePeople_Live",
"Alley Oop podcast",
"Always Listening: Podcast Reviews",
... ### everything here is fine ### ...
"Your Mom's House",
"Zen Dude Fitness",
"podCast411",
"talk2Cleo"]
(first three, last two are odd.)

Replace .order('title ASC') with this longer argument:
.order("
CASE WHEN lower(title) BETWEEN 'a' AND 'zzzzz'
OR title BETWEEN '0' AND '99999'
THEN lower(title)
ELSE concat('zzzzz', lower(title))
END")
This will sort case insensitive (lower); when values start with a digit or letter they are sorted normally, and all the other values will be sorted as if they were prefixed by 'zzzzz', forcing them to the end of the sort order.
Demo in SQL Fiddle
With Regular Expression
This solution combines the above idea with the idea of PJSCopeland (to use a regular expression). Again the strings starting with non-alphanumerical characters are sorted after those that start with alphanumerical characters:
.order("regexp_replace(lower(title), '([^[:alnum:] ])', 'zzz\1', 'gi')")
The \1 back-references the non-alphanumerical character that was matched, so all of them get prefixed with zzz.
Demo in SQL Fiddle

Disclaimer: I haven't tested this. It comes from the documentation for Postgres 9.1.
I have an inexact solution - the difference being that punctuation will simply be ignored, and your entries will end up like this:
.order("regexp_replace(title, '\W', '', 'gi')") # ASC is optional
=> ["Alley Oop podcast",
"Always Listening: Podcast Reviews",
"podCast411",
"\"Success Living\" - Dr. Leigh-Davis",
"talk2Cleo"
"\"The Real Deal\" with Dr. Leigh-Davis",
"Your Mom's House",
"#WeThePeople_Live",
"Zen Dude Fitness"]
regexp_replace is the Postgres equivalent of Ruby [g]sub.
\W means 'any non-word character' and will match anything other than A-Z, a-z, 0-9, and _.
If you want to ignore underscores as well, change \W to [\W_].
'' is what you replace the matches with: an empty string.
the g flag means 'all matches', and the i flag means 'case-insensitive' (dispensing with the need for lower()).

Related

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

Rails array INCLUDE with only distinct words

I'm building a profanity search function which needs to find instances of an array of profane words in a long string of text.
One could do a simple include like:
if profane_words.any? {|word| self.name.downcase.include? word}
...
end
This results in a positive match if ANY of the array of profane words are present anywhere in the text.
However, if a word like 'hell' is considered profane, this would produce a positive match against "Hell's Angels" or "Hell's Kitchen", which is undesirable.
How can the above search be modified to only produce positive results against distinct words or phrases? For example, "Hell Angels" returns positive but "Hell's Angels" returns negative.
To be clear, this means we're searching for any instance of a profane word that is immediately preceded or followed by another character or apostrophe.
What about using a regex ?
profane_words.any? { |word| self.name.downcase.match? /#{word}(?!')/ }
Examples:
"hell's angels".match?(/hell(?!')/) # => false
"hell angel".match?(/hell(?!')/) # => true
(?!') is a negative lookup meaning it won't match if the word has a ' right after it. If you'd like to exclude other characters you can add it to the list with pipes e.g. (?!'|") won't match ' and ".
See https://www.regular-expressions.info/lookaround.html for reference.
And you could make it more performant like this:
self.name.downcase.match? /#{profane_words.join('|')}(?!')/
if profane_words.any? {|word| self.name.downcase.split(' ').include? word} ... end
You should definitely use a Regex containing all your profane words followed by a space or period. Bellow yo
> "Hell's angels".match(/(hell|shit)[ .]/i)
=> nil
> "Hell angels".match(/(hell|shit)[ .]/i)
=> #<MatchData "Hell " 1:"Hell">
> "Hell's angels shit".match(/(hell|shit)[ .]/i)
=> nil

Match a word or whitespaces in Lua

(Sorry for my broken English)
What I'm trying to do is matching a word (with or without numbers and special characters) or whitespace characters (whitespaces, tabs, optional new lines) in a string in Lua.
For example:
local my_string = "foo bar"
my_string:match(regex) --> should return 'foo', ' ', 'bar'
my_string = " 123!#." -- note: three whitespaces before '123!#.'
my_string:match(regex) --> should return ' ', ' ', ' ', '123!#.'
Where regex is the Lua regular expression pattern I'm asking for.
Of course I've done some research on Google, but I couldn't find anything useful. What I've got so far is [%s%S]+ and [%s+%S+] but it doesn't seem to work.
Any solution using the standart library, e.g. string.find, string.gmatch etc. is OK.
Match returns either captures or the whole match, your patterns do not define those. [%s%S]+ matches "(space or not space) multiple times more than once", basically - everything. [%s+%S+] is plain wrong, the character class [ ] is a set of single character members, it does not treat sequences of characters in any other way ("[cat]" matches "c" or "a"), nor it cares about +. The [%s+%S+] is probably "(a space or plus or not space or plus) single character"
The first example 'foo', ' ', 'bar' could be solved by:
regex="(%S+)(%s)(%S+)"
If you want a variable number of captures you are going to need the gmatch iterator:
local capt={}
for q,w,e in my_string:gmatch("(%s*)(%S+)(%s*)") do
if q and #q>0 then
table.insert(capt,q)
end
table.insert(capt,w)
if e and #e>0 then
table.insert(capt,e)
end
end
This will not however detect the leading spaces or discern between a single space and several, you'll need to add those checks to the match result processing.
Lua standard patterns are simplistic, if you are going to need more intricate matching, you might want to have a look at lua lpeg library.

Ruby .scan method returns empty using regex

So given a string like this "\"turkey AND ham\" NOT \"roast beef\"" I need to get an array with the inner strings like so: ["turkey AND ham", "roast beef"] and eliminate OR's, AND's and NOT's that may or may not be there.
With the help of Rubular I came up with this regex /\\["']([^"']*)\\["']/
which returns the following 2 groups:
Match 1
1. turkey AND ham
Match 2
1. roast beef
however when I use it with .scan keep getting and empty array.
I looked at this and this other SO posts, and a few others, but can not figure out where I am going wrong
Here is the result from my rails console:
=> q = "\"turkey and ham\" OR \"roast beef\""
=> q.scan(/\\["']([^"']*)\\["']/)
=> []
Expectation:
["turkey AND ham", "roast beef"]
I shall also mention I suck at regex.
When the regex used with scan contains a capture group (#davidhu2000's approach), one generally can use lookarounds1 instead. It's just a matter of personal preference. To allow for double-quoted strings that contain either single- or (escaped) double-quoted strings, you could use the following regex.
r = /
(?<=") # match a double quote in a positive lookbehind
[^"]+ # match one or more characters that are not double-quotes
(?=") # match a double quote in a positive lookahead
| # or
(?<=') # match a single quote in a positive lookbehind
[^']+ # match one or more characters that are not single-quotes
(?=') # match a single quote in a positive lookahead
/x # free-spacing regex definition mode
"\"turkey AND ham\" NOT 'roast beef'".scan(r)
#=> ["turkey AND ham", "roast beef"]
As '"turkey AND ham" NOT "roast beef"' #=> "\"turkey AND ham\" NOT \"roast beef\"" (i.e., how the single-quoted string is saved), we need not be concerned about that being an additional case to deal with.
1 For any in the audience who still consider regular expressions to be black magic, there are four kinds of lookarounds (positive and negative lookbehinds and lookaheads) as elaborated in the doc for Regexp. Sometimes they are regarded as "zero-width" matches as they are not part of the matched text.
You regex is trying to match \, which won't match anything in the string, since the \ existed to escape the double quote, and won't be part of the string.
So if you remove \\ in your regex
res = q.scan(/["']([^"']*)["']/)
This will return a 2d array
res = [["turkey and ham"], ["roast beef"]]
Each inner array is all the matching groups from the regex, so if you have two capture groups in your regex, you will see two items in the inner array.
If you want a simple array, you can run flatten method on the array.

Lua: How to start match after a character

I'm trying to make a search feature that allows you to split a search into two when you insert a | character and search after what you typed.
So far I have understood how to keep the main command by capturing before the space.
An example being that if I type :ban user, a box below would still say :ban, but right when I type in a |, it starts the search over again.
:ba
:ba
:ban user|:at
:at
:ban user|:attention members|:kic
:kic
This code:
text=":ban user|:at"
text=text:match("(%S+)%s+(.+)")
print(text)
would still return ban.
I'm trying to get a match of after the final | character.
Then you can use
text=":ban user|:at"
new_text=text:match("^.*%|(.*)")
if new_text == nil then new_text = text end
print(new_text)
See the Lua demo
Explanation:
.* - matches any 0+ characters as many as possibl (in a "greedy" way, since the whole string is grabbed and then backtracking occurs to find...)
%| - the last literal |
(.*) - match and capture any 0+ characters (up to the end of the string).
To avoid special cases, make sure that the string always has |:
function test(s)
s="|"..s
print(s:match("^.*|(.*)$"))
end
test":ba"
test":ban user|:at"
test":ban user|:attention members|:kic"

Resources