I have a list of names in my DB and I need to sort them alphabetically. However, I need to show the greek letters first, and then the latin ones. For example, I have:
[Jale, Βήτα, Άλφα, Ben]
and I need to order it like this:
[Άλφα, Βήτα, Ben, Jale]
Any suggestions would be much appreciated :)
I like to solve these problems by playing around in irb. Here's one way you could go about finding this solution. First, we'll define our test array:
>> names = %w{Jale Βήτα Άλφα Ben}
=> ["Jale", "Βήτα", "Άλφα", "Ben"]
To solve this, let's first transform the array into 2-tuples which contain a flag indicating whether the name is greek or not, and then the name itself. We want the flag to be sortable, so we'll first find a regex match for latin-only characters, and coerce it to be a string.
>> names.map{|name| [(name =~ /^\w+$/).to_s, name]}
=> [["0", "Jale"], ["", "Βήτα"], ["", "Άλφα"], ["0", "Ben"]]
Then we'll sort the 2-tuples:
>> names.map{|name| [(name =~ /^\w+$/).to_s, name]}.sort
=> [["", "Άλφα"], ["", "Βήτα"], ["0", "Ben"], ["0", "Jale"]]
We now have a sort order where we have first the greek names, then the latin names. We can shorten this into our solution:
>> names.sort_by{|name| [(name =~ /^\w+$/).to_s, name]}
=> ["Άλφα", "Βήτα", "Ben", "Jale"]
I gave one solution above. Another approach is to partition the names into greek and latin, then sort within those groups, then flatten the two arrays into one:
>> names.partition{|name| name !~ /^\w+$/}.map(&:sort).flatten
=> ["Άλφα", "Βήτα", "Ben", "Jale"]
This might be a little more elegant and understandable than my other solution, but it is less flexible. Note that name !~ /\w+$ will return something like true if the name has non-latin characters, ie, is greek.
Related
Given names like below:
richard should be Richard
RICHARD should be Richard
richAnne should be RichAnne
I'd rather this be done in excel, but my other option is using Rails.
Try this:
new_string = string.slice(0,1).capitalize + string.slice(1..-1)
This works for your examples.
words = ['richard', 'RICHARD', 'richAnne']
=> ["richard", "RICHARD", "richAnne"]
words.map{|w| w.titleize.gsub(' ','')}
=> ["Richard", "Richard", "RichAnne"]
In Rails you can do:
>> 'richard'.titleize.split.join
=> "Richard"
>> 'RICHARD'.titleize.split.join
=> "Richard"
>> 'richAnne'.titleize.split.join
=> "RichAnne"
The .split.join is not necessary for the first two cases, but it is for the last.
Just for comparison, in Excel you could try this
=IF(SUM((CODE(MID(A1&REPT(" ",10),COLUMN(A:J),1))>=65)*(CODE(MID(A1&REPT(" ",10),COLUMN(A:J),1))<=90))=LEN(A1),PROPER(A1),UPPER(LEFT(A1))&RIGHT(A1,LEN(A1)-1))
which counts the number of capital letters in the name, but it looks very messy compared to the Ruby solutions. Change to suit the longest name you expect.
It's an array formula and must be entered with CtrlShiftEnter
You could also do it by checking to see if there were any lower case letters in the name and if so, just capitalise the first letter.
=IF(SUM(--ISNUMBER(FIND(MID("abcdefghijklmnopqrstuvwxyz",COLUMN(A:Z),1),A1))),UPPER(LEFT(A1))&RIGHT(A1,LEN(A1)-1),PROPER(A1))
A pattern-matching solution would be neater but would need VBA.
[PostgreSQL(9.4), Rails(4.1)]
The problem:
I have a table with the names of tools. The column_name is hstore type and looks like this: name -> ('en': value, 'de': value). Worth noting that 'de' is unnecessary in this problem, cause all names are stored only in 'en' key.
Next I have to construct a search query that will find the right record, but the format of the text in query are unknown, e.g.:
In DB:
WQXZ 123GT, should match query: WQXZ_123-GT
In DB:
Three Words Name 123-D45, should match query: Three_WORDS_NAME 123D45
and so on...
Solution:
To get this happen I want to normalize the value that I'm looking for and the query in such way that both of them will be identical. To do this I need to make both values in downcase, remove all whitspaces, remove all non-alphanumeric characters, so the values above will be:
wqxz123gt == wqxz123gt
and
threewordsname123d45 == threewordsname123d45
I have no problem to format a search value in ruby:
"sTR-in.g24 3".downcase.gsub(/\s/, "").gsub(/\W/, "") # => "string243"
But I can't understand how to do this in SQL-search query to look like:
Tool.where("CODE_I_AM_LOOKING_FOR(name -> 'en') = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, ""))
Thank you for your time.
UPD: I can make a downcase in query:
Tool.where("lower(name -> 'en') = (?)", value.downcase)
But it solves only a part of the problem (downcase). The whitespaces and non-word characters (dots, dashes, underscores, etc.) are still an issue.
You can use Postgres replace function to remove spaces. Then use lower function to match on that value. Like this.
Tool.where("lower(replace(name -> 'en', ' ', '')) = (?)", value.downcase.gsub(/\s/, "").gsub(/\W/, "") )
I hope this would be helpful.
Nitin Srivastava's answer directed me in right direction. All I needed was to use regexp_replace function.
So the proper query is:
Tool.where(
"lower(regexp_replace((name -> 'en'), '[^a-zA-Z0-9]+', '', 'g')) = ?",
value.downcase.gsub(/\s/, "").gsub(/\W/,"")
)
I ran into something interesting that I think would be related to operator precedence, but not sure if I'm just leaving something out. I would like to use a ternary statement in my .group_by sort on a DB query in Rails. So I have something like this that works:
#tools = Tool.all.group_by {|tool| tool.name}
#=> #tools {'anvil' => [<#tool....
which returns a hash tool objects, grouped into keys where the name is the same. It was then brought up that to just sort them into alphabetical groups by first letter of the name would be the desired output so:
#tools = Tool.all.group_by {|tool| tool.name.downcase[0] }
#=> #tools {'a' => [<#tool.....
So great, now I have a hash of the tools grouped by the first letter of their name. But what if a name starts with a number of something else? Not a problem, it really just pulls the first character and uses that for the group, so tool names starting with "1" get sorted into the hash member whose key is "1". Same for any non-number characters that aren't letters.
Here's the question: I can use a conditional statement to choose to sort all of my alphabetical names into letter groups, but put everything else into a single group with some key like "#". But I can't do it with a ternary statement:
#tools = Tool.all.group_by {|tool| if ('a'..'z').include? tool.name.downcase[0] then tool.name.downcase[0] else '#' end }
works great! I get all of my non-letter names sorted into the #tools['#'] part of the hash.
But this does not work:
#tools = Tool.all.group_by {|tool| ('a'..'z').include tool.name.downcase[0] ? tool.name.downcase[0] : '#' }
It returns a hash with only two members: #tools[true] and #tools[false]. I can kind of see why, as a ternary operator is returning true or false, but shouldn't it act like the if-then-else statement? It has to be something with the group_by that is jumping the gun?
Is there some way to tweak the syntax of the group_by statement to make the ternary operator work like I want it to? I have tried enclosing the two return statements in parens () but that didn't seem to work. I tried the entire ternary statement in parens hoping it would eval the whole thing before returning to the group_by function... any ideas?
This is being parsed by ruby as
('a'..'z').include?(tool.name.downcase[0] ? tool.name.downcase[0] : tool.name = '#')
which is the same as ('a'..'z').include?(tool.name.downcase[0]), assuming none of the names are empty?. For it to be equivalent to your previous version you'd need
('a'..'z').include?(tool.name.downcase[0]) ? tool.name.downcase[0] : tool.name = '#'
As an aside, actually changing the name with tool.name='#' sounds like a really bad idea to me. It might not matter here but could easily bite you later on.
I have millions of arrays that each contain about five strings. I am trying to remove all of the "junk words" (for lack of a better description) from the arrays, such as all articles of speech, words like "to", "and", "or", "the", "a" and so on.
For example, one of my arrays has these six strings:
"14000"
"Things"
"to"
"Be"
"Happy"
"About"
I want to remove the "to" from the array.
One solution is to do:
excess_words = ["to","and","or","the","a"]
cleaned_array = dirty_array.reject {|term| excess_words.include? term}
But I am hoping to avoid manually typing every excess word. Does anyone know of a Rails function or helper that would help in this process? Or perhaps an array of "junk words" already written?
Dealing with stopwords is easy, but I'd suggest you do it BEFORE you split the string into the component words.
Building a fairly simple regular expression can make short work of the words:
STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i
clean_string = 'to into and sandbar or forest the thesis a algebra'.gsub(STOPWORDS, '')
# => " into sandbar forest thesis algebra"
clean_string.split
# => ["into", "sandbar", "forest", "thesis", "algebra"]
How do you handle them if you get them already split? I'd join(' ') the array to turn it back into a string, then run the above code, which returns the array again.
incoming_array = [
"14000",
"Things",
"to",
"Be",
"Happy",
"About",
]
STOPWORDS = /\b(?:#{ %w[to and or the a].join('|') })\b/i
# => /\b(?:to|and|or|the|a)\b/i
incoming_array = incoming_array.join(' ').gsub(STOPWORDS, '').split
# => ["14000", "Things", "Be", "Happy", "About"]
You could try to use Array's set operations, but you'll run afoul of the case sensitivity of the words, forcing you to iterate over the stopwords and the arrays which will run a LOT slower.
Take a look at these two answers for some added tips on how you can build very powerful patterns making it easy to match thousands of strings:
"How do I ignore file types in a web crawler?"
"Is there an efficient way to perform hundreds of text substitutions in Ruby?"
All you need is a list of English stopwords. You can find it here, or google for 'english stopwords list'
I'm trying to pull the user_id from a foursquare URL, like this one:
https://foursquare.com/user/99999999
The following regex pulls exactly what I need (a series of numbers that terminate with the end of the line):
\d+$
However, I'm not sure how to set a string equal to the matched characters. I'm aware of sub and gsub, but those methods substitute a matched string for something else.
I'm looking for a way to specifically pull the section of a string that matches my regex (if it exists)
I like to use the return of match():
Anything wrapped in a capture () in the regex, gets assigned to the match result array
"https://foursquare.com/user/99999999".match(/(\d+)\z/)[1] #=> "99999999"
>> "https://foursquare.com/user/99999999"[/(\d+)\z/, 1]
=> "99999999"
>> "https://foursquare.com/user/99999999" =~ /(\d+)\z/
=> 28
>> $1
=> "99999999"
>> "https://foursquare.com/user/99999999".split('/').last
=> "99999999"
There are many ways. I personally like String#[] though