Regex markdown string for key-value pairs - ruby-on-rails

My Rails app is retrieving data from a third-party service that doesn't allow me to attach arbitrary data to records, but does have a description area that supports Markdown. I am attempting to pass data for each record over to my Rails app within the description content, via Markdown comments:
[//]: # (POST:28|USERS:102,78,90)
... additional Markdown content.
I found the [//]: # (...) syntax in this answer for embedding comments in Markdown, and my idea was to then pass pipe-separated key-value pairs in as the comment content.
Using my example above, I would like to be able to parse the description content string to interpret the key-value pairs. In this case, POST=28 and USERS=102,78,90. If it helps, this comment will always appear at the first line of the Markdown content.
I imagine Regex is the way to go here? I would really appreciate any help!

You can use \G:
(?:^\[//\]:[^(]+\(\K # match your token [//]: at the beginning
|
\G(?!\A)\| # or right after the previous match
)
(\w+):([\w,]+) # capture word chars (=key)
# followed by :
# followed by word chars and comma (=val)
See a demo on regex101.com.

You'll need 2 steps to properly parse this:
First find the comment: ^\[\/\/\]: # \(([^)]*)\)
This captures the comment's content.
Then parse the content: (\w+):([^|]+) (with the global flag)
This captures the key and value separately.

As I mentioned in my comment above, you could simplify things a lot by using a standard data serialization format like JSON after the #. For example:
require "json"
MATCH_DATA_COMMENT = %r{(?<=^\[//\]: # ).*}
markdown = <<END
[//]: # {"POST":28,"USERS":[102,78,90]}
... additional Markdown content.
END
p JSON.parse(markdown[MATCH_DATA_COMMENT])
# => { "POST" => 28, "USERS" => [102, 78, 90] }
The regular expression %r{(?<=^\[//\]: # ).*} uses negative lookbehind to match anything that follows "[//]: #".

Related

Prevent Ruby from changing & to &?

I need to display some superscript and subscript characters in my webpage title. I have a helper method that recognizes the pattern for a subscript or superscript, and converts it to &sub2; or ²
However, when it shows up in the rendered page's file, it shows up in the source code as:
&sub2;
Which is not right. I have it set up to be:
<% provide(:title, raw(format_title(#hash[:page_title]))) %>
But the raw is not working. Any help is appreciated.
Method:
def format_title(name)
label = name
if label.match /(_[\d]+_)+|(~[\d]+~)+/
label = label.gsub(/(_([\d]+)_)+/, '&sub\2;')
label = label.gsub(/(~([\d]+)~)+/, '&sup\2;')
label.html_safe
else
name
end
end
I have even tried:
str.gsub(/&/, '&')
but it gives back:
&amp;sub2;
You can also achieve this with Rails I18n.
<%= t(:page_title_html, scope: [:title]) %>
And in your respective locale file. (title.en.yml most probably):
title:
page_title: "Title with ²"
Here is a chart for HTML symbols regarding subscript and superscripts.
For more information check Preventing HTML character entities in locale files from getting munged by Rails3 xss protection
Update:
In case you need to load the page titles dynamically, first, you'll have to install a gem like Page Title Helper.
You can follow the guide in the gem documentation.
There are two of issues with your example, one is of matter and the other is just a coincidence.
The first issue is you are trying to use character entities that do not actually exist. Specifically, there are only ¹, ² and ³ which provide 1, 2 and 3 superscript symbols respectively. There is no such character entity as &sup4; nor any other superscript digits. There are though bare codepoints for other digits which you can use but this would require a more involved code.
More importantly, there are no subscript character entities at all in HTML5 character entities list. All subscript digits are bare codepoints. Therefore it makes no sense to replace anything with &sub2; or any other "subscript" digit.
The reason you didn't see your example working is due to the test string you chose. Supplying anything with underscores, like _2_mystring will be properly replaced with &sub2;. As &sub2; character entity is non-existent, the string will appear as is, creating an impression that raw method somehow doesn't work.
Try to use ~2~mystring and it will be replaced with the superscript character entity ² and will be rendered correctly. This illustrates that your code correct, but the basic assumption about character entities is not.

How to find and save a string section in ruby on rails 4?

I need to find a section of a string that includes the substring "print time:" and save it with the time it displays after the colon on the database.
What I've used until now is the downcase helper and the includes? helper to start the search but I'm not sure how to actually execute a search inside the string.
How can I find the section of the string so that I can save it afterwards?
Use regular expressions, which in Ruby can be written with the /…/ syntax and matched against using String#match.
log = "username: jsmith\nprint time: 08:02:41\npower level: 9001"
print_time = log.match(/print time:\s*([^\n]+)\s*\n/)[1]
p print_time # => "08:02:41"
The regex /print time:\s*([^\n]+?)\s*\n/ matches any text after “print time:”, on the same line, ignoring surrounding whitespace \s*, and saves it as the first capture group using the (). Then [1] selects the contents of that first capture group.
After extracting the print_time string, you can do whatever you need to with it. For example, if you had a Rails model PrintTime, you might save the extracted time to the database with PrintTime.create(extracted_time: print_time).

How to format a text area sent as an array via JSON?

I'm pretty new to Ruby, and I'm using it work with an API. Text area's sent over the API are converted to the format below before being sent to me via a JSON POST request:
"Comment": [
"hdfdhgdfgdfg\r",
"This is just a test\r",
"Thanks!\r",
"- Kyle"
]
And I'm getting the value like this:
comments = params["Comment"]
So each line is broken down into what looks like an array. My issue is, it functions just like one big string instead of an array with 4 values. I tried using comments[0] and just printing comments but both return the same result, it just displays everything as a string, ie
["hdfdhgdfgdfg\r", "This is just a test\r", "Thanks!\r", "- Kyle"]
But I need to display it as it appears in the text area, ie
hdfdhgdfgdfg
This is just a test
Thanks!
- Kyle
I know I could just strip out all the extra characters, but I feel like there has to be a better way. Is there a good way to convert this back to the original format of a text area, or at least to convert it to an array so I can loop through each item and re-format it?
First, get rid of those ugly \rs:
comments.map!(&:chomp)
Then, join the lines together:
comment = comments.join("\n") # using a newline
# OR, for HTML output:
comment = comments.join('<br>')
You should be able to parse the JSON and populate a hash with all of the values:
require 'json'
hash = JSON.parse(params["Comment"])
puts hash
=> {"Comment"=>['all', 'of', 'your', 'values']}
This should work for all valid json. One of the rules of json syntax is that
Data is in name/value pairs
The json you provided doesn't supply names for the values, therefore this method might not work. If this is the case, parsing the raw string and extracting values would do the job as well (although more messy).
How you might go about doing that:
json = params["Comment"]
newArray = []
json.split(" ").each do |element|
if element.length > 1
newArray << element
end
end
This would at least give you an array with all of your values.

Nokogiri- Parsing HTML <a href> and displaying only part of the URL

So basically I am scraping a website, and I want to display only part of the address. For instance, if it is www.yadaya.com/nyc/sales/manhattan and I want to only put "sales" in a hash or an array.
{
:listing_class => listings.css('a').text
}
That will give me the whole URL. Would I want to gsub to get the partial output?
Thanks!
When you are dealing with URLs, you should start with URI, then, to mess with the path, switch to using File.dirname and/or File.basename:
require 'uri'
uri = URI.parse('http://www.yadaya.com/nyc/sales/manhattan')
dir = File.dirname(uri.path).split('/').last
which sets dir to "sales".
No regex is needed, except what parse and split do internally.
Using that in your code's context:
File.dirname(URI.parse(listings.css('a').text).path).split('/').last
but, personally, I'd break that into two lines for clarity and readability, which translate into easier maintenance.
A warning though:
listings.css('a')
returns a NodeSet, which is akin to an Array. If the DOM you are searching has multiple <a> tags, you will get more than one Node being passed to text, which will then be concatenated into the text you are treating as a URL. That's a bug in waiting:
require 'nokogiri'
html = '<div>foobar</div>'
doc = Nokogiri::HTML(html)
doc.at('div').css('a').text
Which results in:
"foobar"
Instead, your code needs to be:
listings.at('a')
or
listings.at_css('a')
so only one node is returned. In the context of my sample code:
doc.at('div').at('a').text
# => "foo"
Even if the code that sets up listings only results in a single <a> node being visible, use at or at_css for correctness.
Since you have the full URL using listings.css('a').text, you could parse out a section of the path using a combination of the URI class and a regular expression, using something like the following:
require 'uri'
uri = URI.parse(listings.css('a').text)
=> #<URI::HTTP:0x007f91a39255b8 URL:http://www.yadaya.com/nyc/sales/manhattan>
match = %r{^/nyc/([^/]+)/}.match(uri.path)
=> #<MatchData "/nyc/sales/" 1:"sales">
match[1]
=> "sales"
You may need to tweak the regular expression to meet your needs, but that's the gist of it.

I am creating a Twitter clone in Ruby On Rails, how do I code it so that the '#...''s in the 'tweets' turn into links?

I am somewhat of a Rails newbie so bear with me, I have most of the application figured out except for this one part.
def linkup_mentions_and_hashtags(text)
text.gsub!(/#([\w]+)(\W)?/, '#\1\2')
text.gsub!(/#([\w]+)(\W)?/, '#\1\2')
text
end
I found this example here: http://github.com/jnunemaker/twitter-app
The link to the helper method: http://github.com/jnunemaker/twitter-app/blob/master/app/helpers/statuses_helper.rb
Perhaps you could use Regular Expressions to look for "#..." and then replace the matches with the corresponding link?
You could use a regular expression to search for #sometext{whitespace_or_endofstring}
You can use regular expressions, i don't know ruby but the code should be almost exactly as my example:
Regex.Replace("this is an example #AlbertEin",
"(?<type>[##])(?<nick>\\w{1,}[^ ])",
"${type}${nick}");
This example would return
this is an example <a href="http://twitter.com/AlbertEin>#AlbertEin</a>
If you run it on .NET
The regex (?<type>[##])(?<nick>\\w{1,}[^ ]) means, capture and name it TYPE the text that starts with # or #, and then capture and name it NAME the text that follows that contains at least one text character until you fin a white space.
Perhaps you can use a regular expression to parse out the words starting with #, then update the string at that location with the proper link.
This regular expression will give you words starting with # symbols, but you might have to tweak it:
\#[\S]+\
You would use a regular expression to search for #username and then turn that to the corresponding link.
I use the following for the # in PHP:
$ret = preg_replace("#(^|[\n ])#([^ \"\t\n\r<]*)#ise",
"'\\1<a href=\"http://www.twitter.com/\\2\" >#\\2</a>'",
$ret);
I've also been working on this, I'm not sure that it's 100% perfect, but it seems to work:
def auto_link_twitter(txt, options = {:target => "_blank"})
txt.scan(/(^|\W|\s+)(#|#)(\w{1,25})/).each do |match|
if match[1] == "#"
txt.gsub!(/##{match.last}/, link_to("##{match.last}", "http://twitter.com/search/?q=##{match.last}", options))
elsif match[1] == "#"
txt.gsub!(/##{match.last}/, link_to("##{match.last}", "http://twitter.com/#{match.last}", options))
end
end
txt
end
I pieced it together with some google searching and some reading up on String.scan in the api docs.

Resources