Ruby regex to convert uppercased words and keep titleized ones - ruby-on-rails

Given string "Lorem IPSUM dolor Sit amet". Capital letters in "Lorem" and "Sit" should be kept, uppercased ones like "IPSUM" should be converted to "Ipsum"
How to make "Lorem Ipsum dolor Sit amet" from given string using gsub?
NOT working example: s.gsub(/[[:upper:]]/){$&.downcase}

You may use capitalize with /\b[[:upper:]]{2,}\b/ regex:
s.gsub(/\b[[:upper:]]{2,}\b/){$&.capitalize}
# => Lorem Ipsum dolor Sit amet
See the online Ruby demo.
Note that the \b[[:upper:]]{2,}\b pattern will match whole words (as \b are word boundaries) that only consist of 2 or more uppercase letters (there seems no need to match words like I that are already OK).

Related

How to Search for a few words with a character that changes its position in the cell?

I'm trying to figure out how to search to replace text containing a word, e.g: "This Is My Name!" that also may contain an extra character, in my case, the character "/".
So for example, I'd like to be able to use the search and replace functionality to match this sentence:
This Is My Name! - blah blah / abc 123 ipsum
As well as this sentence:
ipsum lorem $999 - 3 / This Is My Name! $55
Or this:
ipsum lorem $999 - 3 / This Is My Name! $55 / Ipsum Lorem - (34)
I'm assuming some form of regex?
Thank you.
Solution
Based on examples you have provided:
=ArrayFormula(REGEXREPLACE(A1:A3,"^This Is My Name!|/ This Is My Name!","SOMETHINGNEW"))
Picture
Some explanation:
Regex is looking for
^This Is My Name!. ^ before your string means that text should start with your string
OR (this is represented by |)
/ This Is My Name! - which is your text and an extra character
ArrayFormula is added to populate formula down (in A1:A3) range
Helpful?

Twitter Button, share tweets

I hope, I can write tweet like this:
some text Lorem ipsum #myhastag Duis efficitur risus et augue tempus tristique.
so the format is : text #hashtag text.
so in html I wrote,
SHARE
But, the result just
Lorem ipsum #myhashtag
I have searched on https://dev.twitter.com/web/tweet-button and youtube video tutorial but still no idea.
I hope someone can help me, write tweet with format
text #hashtag text.
You need to Percent Encode the hash symbol as %23
Try this:
https://twitter.com/intent/tweet?text=Text%20%23hashtag%20more%20text

Ruby gem for text comparison

I am looking for a gem that can compare two strings (in this case paragraphs of text) and be able to gauge the likelihood that they are similar in content (with perhaps only a few words rearranged, changed). I believe that SO uses something similar when users submit questions.
I'd probably use something like Diff::LCS:
>> require "diff/lcs"
>> seq1 = "lorem ipsum dolor sit amet consequtor".split(" ")
>> seq2 = "lorem ipsum dolor amet sit consequtor".split(" ")
1.9.3-p194 :010 > Diff::LCS.diff(seq1, seq2).length
=> 2
It uses the longest common subsequence algorithm (the method for using LCS to get a diff is described on the wiki page).

Looking for ideas on how to match a pattern, Possible or not?

I'm looking for assistance creating a pattern match to ingest emails. The end goal is to recieve an incoming message and extract just the reply message, not all the trailing junk (previous threads, signature, datastamp header, etc...)
Here are the two same formats:
Format 1:
The Message is here, etc etc can span a random # of lines
On Nov 17, 2010, at 4:18 PM, Person Name wrote:
lots of junk down here which we don't want
Format 2:
The Message is here, etc etc can span a random # of lines
On Nov 17, 2010, at 4:18 PM, Site <yadaaaa+adad#sitename.com> wrote:
lots of junk down here which we don't want
Format 3:
The Message is here, etc etc can span a random # of lines
On Fri, Nov 19, 2010 at 1:57 AM, <customerserviceonline#pge.com> wrote:
lots of junk down here which we don't want
For both examples above, I'd like to create a pattern match that finds the first instance of the 2nd line. And then returns only whats above that line. I don't want that delimiter line.
I can't match on the date stamp, but I can match on everything after the comma as that's in my control.
So the idea, Looks for either either of these two static items:
, Site <yadaaaa+adad#sitename.com> wrote:
, Person Name wrote:
And then take everything above that position. What do you think. Is this possible?
i would add a different approach: Why you don't read everything and break when you match the line that you have as stop?
Well this would be a regexp solution :
/(On (?:(?:Sun|Mon|Tues|Wed|Thurs|Fri|Sat), |)(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}(?:|,) at \d{1,2}:\d{1,2} (?:AM|PM), (?:(?:Site |)<[\w.%+-]+#[\w.-]+\.[A-Za-z]{2,4}>|Person \w+) wrote:)/
You just provided one exemple so this might not be perfect but it should do the job quite well.
Then, you have to get the first captured group with $1 or [0] if you are using match :)
regex = /(On (?:(?:Sun|Mon|Tues|Wed|Thurs|Fri|Sat), |)(?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{1,2}, [12]\d{3}(?:|,) at \d{1,2}:\d{1,2} (?:AM|PM), (?:(?:Site |)<[\w.%+-]+#[\w.-]+\.[A-Za-z]{2,4}>|Person \w+) wrote:)/
if str =~ regex
puts "S1 : #{$1}"
end
if res = str.match(regex)
puts "S2 : #{res[0]}"
end
Btw, you can use the option /i on the regex.
This is not a good use for regex if you're trying to do it all in one pattern. It's possible to do, but I suspect the universe will cool before you work all the bugs out.
To understand the scope of what you are trying to do, read Wikipedia's article on "Posting Style". There are a lot of different ways replies are embedded into an email message, partly controlled by the MUA (mail user agent) and partly by the person doing the reply. There isn't a set method of doing the attribution, and no rule saying that the reply is in one block on the page, or that it is at the top of the page. This means that any code you write will have to be very sophisticated in order to have a chance of working consistently.
Have you looked at Mail? It's already written, it's well tested, it's got all sorts of cool bells and whistles, and it's already written. (I said it again because reinventing wheels that work well can be really painful.)
Parsing plain text email is one task. Then there is MIME-encoded email, with different content types. Then there is "HTML" email that doesn't have MIME blocks, but instead some moron just figured everyone liked HTML formatting and blinking text. Then there's various weirdly broken types of message bodies with four reply quoting types and the full content of all the previous messages appended one below the next, and the signatures of the horribly frustrated wanna-be writers who include the whole text of my favorite book "Girl to Grab", AKA Vol. 5 of Encyclopedia Britannica. Mail can help break out all the garbage for you, giving you a good shot at the content you need.
To grab a range of text in a body, look at Ruby's .. (AKA "flip-flop") operator. It's designed to return a Boolean true/false when two different tests occur. See "When would a Ruby flip-flop be useful?"
Typically you'd build it like:
if ((string =~ /pattern1/) .. (string =~ /pattern2/))
...
end
As processing occurs, if the first test matches something then subsequent loops will fall into the if block. When the ending test is found the block will be turned off for subsequent loops. In this case you'd want to use either a string literal, or a small regex to locate your starting and ending lines. If you have a chance of seeing the starting pattern in later text then you'll have to figure out how to trap that.
For instance, here's a way to grab some content that appears to meet your stated requirements if someone does a top-reply:
msg = <<EOT
The Message is here, etc etc can span a random # of lines
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
On Nov 17, 2010, at 4:18 PM, Person Name wrote:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
EOT
body = []
msg.lines.each do |li|
li.chomp!
body << li
break if (li =~ /^On (\S+ )*\w+ \d+, \d+, at [\d:]+ \w+, .+ wrote:/i)
end
puts body[0 .. -2]
puts '=' * 40
msg = <<EOT
The Message is here, etc etc can span a random # of lines
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
On Nov 17, 2010, at 4:18 PM, Site <yadaaaa+adad#sitename.com> wrote:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
EOT
body = []
msg.lines.each do |li|
li.chomp!
body << li
break if (li =~ /^On (\S+ )*\w+ \d+, \d+, at [\d:]+ \w+, .+ wrote:/i)
end
puts body[0 .. -2]
And here is the output:
# >> The Message is here, etc etc can span a random # of lines
# >> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
# >>
# >> ========================================
# >> The Message is here, etc etc can span a random # of lines
# >> Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
# >>
The pattern could be simpler, but if it was it would increase the chance of returning false-positives.

Latex listings package ignores last blank line in listing

I use LaTeX listings package with \lstinputlisting to display text from an external file. The file contains a data format description with a blank line at the end. The package ignores the blank line. How can I show the blank line in a listing?
What it displays:
1 lorem ipsum...
2 more lorem ipsum
3 lorem lorem ipsum
What I want:
1 lorem ipsum
2 more lorem ipsum
3 lorem lorem ipsum
4
See the documentation, section 4.4
`showlines=(true|false) or showlines (default = false)
If true, the package prints empty lines at the end of listings. Otherwise these lines are dropped (but they count for line numbering).
Try adding this before your listing:
\lstset{
showlines=true
}
You can escape to LaTeX from within listings by assigning an escape character like so:
\lstset{numbers=left, stepnumber=1, frame=none,basicstyle = \ttfamily}
\begin{lstlisting}[escapechar=\%]
codeline1
codeline2
%
\end{lstlisting}
Comes out as:
1 codeline1
2 codeline2
3
I know it's not \lstinputlisting but hopefully it'll help you anyway.

Resources