How to implement a Pine Script preprocessor - parsing

The required steps for implementing a Pine Script preprocessor are documented here:
https://www.tradingview.com/pine-script-docs/en/v3/appendix/Pine_Script_v2_preprocessor.html
It is an 8 step process, quoting:
Algorithm of #version=2 Pine Script preprocessor in pseudo-code:
Remove comments.
Replace \r\n or \r with just \n.
Add \n to the end of the text if it’s missing.
Lines that contain only whitespace replace with just empty strings.
Add |INDENT| tokens. They indicate that statement is in a block of code, such as function body, if or for body. Every tab or four spaces are replaced with token |INDENT|.
Add |B| and |E| tokens. They indicate line begin and line end. Replace empty lines with |EMPTY| tokens.
Join lines that represent one splitted statement.
Add code block tokens (|BEGIN| β€” beginning of the block, |END| β€” end of the block, |PE| β€” possible end of the block).
Now, I find step 7 rather puzzling. We're building a preprocessor here, so we have not lexed/parsed anything yet, so how can we tell what lines represent "one splitted statement"?
As per their example:
After step 6):
"|EMPTY|
|B|study('Preprocessor example')|E|
|B|fun(x, y) =>|E|
|B||INDENT|if close > open |E|
|B||INDENT||INDENT|x + y |E|
|B||INDENT|else |E|
|B||INDENT||INDENT|x - y|E|
|EMPTY|
|EMPTY|
|B|a = sma(close, 10)|E|
|B|b = fun(a, 123)|E|
|B|c = security(tickerid, period, b)|E|
|B|plot(c, title='Out', color=c > c[1] ? lime : red, |E|
|B||INDENT| style=linebr, trackprice=true) |E|
|B|alertcondition(c > 100)|E|
|EMPTY|"
After step 7). Note that line with plot(c, title= has been joined with the next line:
"|EMPTY|
|B|study('Preprocessor example')|E|
|B|fun(x, y) =>|E|
|B||INDENT|if close > open |E|
|B||INDENT||INDENT|x + y |E|
|B||INDENT|else |E|
|B||INDENT||INDENT|x - y|E|
|EMPTY|
|EMPTY|
|B|a = sma(close, 10)|E|
|B|b = fun(a, 123)|E|
|B|c = security(tickerid, period, b)|E|
|B|plot(c, title='Out', color=c > c[1] ? lime : red, style=linebr, trackprice=true) |E|
|EMPTY|
|B|alertcondition(c > 100)|E|
|EMPTY|"
Note that not only the lines are joined, but also the |B||INDENT| is removed.
Suggestions, anybody?

According to the documentation, a continuation line starts with at least as many INDENTs (four spaces) as the line being continued, plus at least one space but not a multiple of four.
In other words, after converting groups of four leading spaces to INDENTs, if there are still leading spaces, it's a continuation line. So no tokenisation or parsing is necessary.

Related

Lua pattern help (Double parentheses)

I have been coding a program in Lua that automatically formats IRC logs from a roleplay. In the roleplay logs there is a specific guideline for "Out of character" conversation, which we use double parentheses for. For example: ((<Things unrelated to roleplay go here>)). I have been trying to have my program remove text between double brackets (and including both brackets). The code is:
ofile = io.open("Output.txt", "w")
rfile = io.open("Input.txt", "r")
p = rfile:read("*all")
w = string.gsub(p, "%(%(.*?%)%)", "")
ofile:write(w)
The pattern here is > "%(%(.*?%)%)" I've tried multiple variations of the pattern. All resulted in fruitless results:
1. %(%(.*?%)%) --Wouldn't do anything.
2. %(%(.*%)%) --Would remove *everything* after the first OOC message.
Then, my friend told me that prepending the brackets with percentages wouldn't work, and that I had to use backslashes to 'escape' the parentheses.
3. \(\(.*\)\) --resulted in the output file being completely empty.
4. (\(\(.*\)\)) --Same result as above.
5. (\(\(.*?\)\) --would for some reason, remove large parts of the text for no apparent reason.
6. \(\(.*?\)\) --would just remove all the text except for the last line.
The short, absolute question:
What pattern would I need to use to remove all text between double parentheses, and remove the double parentheses themselves too?
You're friend is thinking of regular expressions. Lua patterns are similar, but different. % is the correct escape character.
Your pattern should be %(%(.-%)%). The - is similar to * in that it matches any number of the preceding sequence, but while * tries to match as many characters as it can (it's greedy), - matches the least amount of characters possible (it's non-greedy). It won't go overboard and match extra double-close-parenthesis.

Leading and trailing spaces for each line from textarea

ruby 2.1.3
rails 4.1.7
I want to generate a unordered list from textarea. So I have to preserve all line breaks for each item and remove leading and trailing spaces.
Well, I'm trying to remove all leading and trailing spaces from each line of textarea with no success.
I'm using a regex:
string_from_textarea.gsub(/^[ \t]+|[ \t]+$/, '')
I've tried strip and rstrip rails methods with no luck too (they are working with the same result as regex):
Leading spaces for each line are removed perfectly.
But with trailing spaces only the last space from string is removed. But I wanna for each line.
What am I missing here? What is the deal with textarea and trailing spaces for each line?
UPDATE
Some code example:
I'm using a callback to save formated data.
after_validation: format_ingredients
def format_ingredients
self.ingredients = #ingredients.gsub(/^[ \t]+|[ \t]+$/, "")
end
Form view:
= f.text_area :ingredients, class: 'fieldW-600 striped', rows: '10'
You can use String#strip
' test text with multiple spaces '.strip
#=> "test text with multiple spaces"
To apply this to each line:
str = " test \ntext with multiple \nspaces "
str = str.lines.map(&:strip).join("\n")
"test\ntext with multiple\nspaces"
This isn't a good use for a regexp. Instead use standard String processing methods.
If you have text that contains embedded LF ("\n") line-ends and spaces at the beginning and ends of the lines, then try this:
foo = "
line 1
line 2
line 3
"
foo # => "\n line 1 \n line 2\nline 3\n"
Here's how to clean the lines of leading/trailing white-space and re-add the line-ends:
bar = foo.each_line.map(&:strip).join("\n")
bar # => "\nline 1\nline 2\nline 3"
If you're dealing with CRLF line-ends, as a Windows system would generate text:
foo = "\r\n line 1 \r\n line 2\r\nline 3\r\n"
bar = foo.each_line.map(&:strip).join("\r\n")
bar # => "\r\nline 1\r\nline 2\r\nline 3"
If you're dealing with the potential of having white-space that contains other forms of white-space like non-breaking spaces, then switching to a regexp that uses the POSIX [[:space:]] character set, that contains white-space used in all character sets. I'd do something like:
s.sub(/^[[:space:]]+/, '').sub(/[[:space:]]+$/, '')
I think #sin probably intimated the problem in his/her first comment. Your file was probably produced on a Windows machine that puts a carriage return/life feed pair ("\r\n") at the end of each line other than (presumably) the last, where it just writes \n. (Check line[-2] on any line other than the last.) That would account for the result you are getting:
r = /^[ \t]+|[ \t]+$/
str = " testing 123 \r\n testing again \n"
str.gsub(r, '')
#=> "testing 123 \r\ntesting again\n"
If this theory is correct the fix should be just a slight tweak to your regex:
r = /^[ \t]+|[ \t\r]+$/
str.gsub(r, '')
#=> "testing 123\ntesting again\n"
You might be able to do this with your regex by changing the value of the global variable $/, which is the input record separator, a newline by default. That could be a problem for the end of the last line, however, if that only has a newline.
I think you might be looking for String#lstrip and String#rstrip methods:
str = %Q^this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end ^`
`> new_string = ""
> ""
str.each_line do |line|
new_string += line.rstrip.lstrip + "\n"
end
> "this is a line\n and so is this \n all of the lines have two spaces at the beginning \n and also at the end "
2.1.2 :034 > puts new_string
this is a line
and so is this
all of the lines have two spaces at the beginning
and also at the end
> new_string
`> "this is a line\nand so is this\nall of the lines have two spaces at the beginning\nand also at the end\n"`

Regular expression to remove only beginning and end html tags from string?

I would like to remove for example <div><p> and </p></div> from the string below. The regex should be able to remove an arbitrary number of tags from the beginning and end of the string.
<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>
I have been tinkering with rubular.com without success. Thanks!
def remove_html_end_tags(html_str)
html_str.match(/\<(.+)\>(?!\W*\<)(.+)\<\/\1\>/m)[2]
end
I'm not seeing the problem of \<(.+)> consuming multiple opening tags that Alan Moore pointed out below, which is odd because I agree it's incorrect. It should be changed to \<([^>\<]+)> or something similar to disambiguate.
def remove_html_end_tags(html_str)
html_str.match(/\<([^\>\<]+)\>(?!\W*?\<)(.+)\<\/\1\>/m)[2]
end
The idea is that you want to capture everything between the open/close of the first tag encountered that is not followed immediately by another tag, even with spaces between.
Since I wasn't sure how (with positive lookahead) to say give me the first key whose closing angle bracket is followed by at least one word character before the next opening angle bracket, I said
\>(?!\W*\<)
find the closing angle bracket that does not have all non-word characters before the next open angle bracket.
Once you've identified the key with that attribute, find its closing mate and return the stuff between.
Here's another approach. Find tags scanning forward and remove the first n. Would blow up with nested tags of the same type, but I wouldn't take this approach for any real work.
def remove_first_n_html_tags(html_str, skip_count=0)
matches = []
tags = html_str.scan(/\<([\w\s\_\-\d\"\'\=]+)\>/).flatten
tags.each do |tag|
close_tag = "\/%s" % tag.split(/\s+/).first
match_str = "<#{tag}>(.+)<#{close_tag}>"
match = html_str.match(/#{match_str}/m)
matches << match if match
end
matches[skip_count]
end
Still involves some programming:
str = '<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>'
while (m = /\A<.+?>/.match(str)) && str.end_with?('</' + m[0][1..-1])
str = str[m[0].size..-(m[0].size + 2)]
end
Cthulhu you out there?
I am going to go ahead and answer my own question. Below is the programmatic route:
The input string goes into the first loop as an array in order to remove the front tags. The resulting string is looped through in reverse order in order to remove the end tags. The string is then reversed in order to put it in the correct order.
def remove_html_end_tags(html_str)
str_no_start_tag = ''
str_no_start_and_end_tag = ''
a = html_str.split("")
i= 0
is_text = false
while i <= (a.length - 1)
if (a[i] == '<') && !is_text
while (a[i] != '>')
i+= 1
end
i+=1
else
is_text = true
str_no_start_tag << a[i]
i+=1
end
end
a = str_no_start_tag.split("")
i= a.length - 1
is_text = false
while i >= 0
if (a[i] == '>') && !is_text
while (a[i] != '<')
i-= 1
end
i-=1
else
is_text = true
str_no_start_and_end_tag << a[i]
i-=1
end
end
str_no_start_and_end_tag.reverse!
end
(?:\<div.*?\>\<p.*?\>)|(?:\<\/p\>\<\/div\>) is the expression you need. But this doesn't check for every scenario... if you are trying to parse any possible combination of tags, you may want to look at other ways to parse.
Like for example, this expression doesn't allow for any whitespace between the div and p tag. So if you wanted to allow for that, you would add \s* inbetween the \>\< sections of the tag like so: (?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>).
The div tag and the p tag are expected to be lowercase, as the expression is written. So you may want to figure out a way to check for upper or lower case letters for each, so that Div or dIV would be found too.
Use gskinner's RegEx tool for testing and learning Regular Expressions.
So your end ruby code should look something like this:
# Ruby sample for showing the use of regular expressions
str = "<div><p>text to <span class=\"test\">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>"
puts 'Before Reguar Expression: "', str, '"'
str.gsub!(/(?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>)/, "")
puts 'After Regular Expression', str
system("pause")
EDIT: Replaced div*? to div.*? and replaced p*? to p.*? per suggestions in the comments.
EDIT: This answer doesn't allow for any set of tags, just the two listed in the first line of the question.

Remove hard line breaks from text with Ruby

I have some text with hard line breaks in it like this:
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
I want to remove the single newlines but keep the double newlines so it looks like this:
This should all be on one line since it's one sentence.
This is a new paragraph that should be separate.
Is there a single regular expression to do this? (or some easy way)
So far this is my only solution which works but feels hackish.
txt = txt.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]')
txt = txt.gsub('[[[NEWLINE]]][[[NEWLINE]]]', "\n\n")
txt = txt.gsub('[[[NEWLINE]]]', " ")
Replace all newlines that are not followed by or preceded by a newline:
text = <<END
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
END
p text.gsub /(?<!\n)\n(?!\n)/, ' '
#=> "This should all be on one line since it's one sentence.\n\nThis is a new paragraph that should be separate. "
Or, for Ruby 1.8 without lookarounds:
txt.gsub! /([^\n])\n([^\n])/, '\1 \2'
text.gsub!(/(\S)[^\S\n]*\n[^\S\n]*(\S)/, '\1 \2')
The two (\S) groups serve the same purposes as the lookarounds ((?<!\s)(?<!^) and(?!\s)(?!$)) in #sln's regexes:
they confirm that the linefeed really is in the middle of a sentence, and
they ensure that the [^\S\n]*\n[^\S\n]* part consumes any other whitespace surrounding the linefeed, making it possible for us to normalize it to a single space.
They also make the regex easier to read, and (perhaps most importantly) they work in pre-1.9 versions of Ruby that don't support lookbehinds.
There is more to formatting (turning off word wrap) than you think.
If the output is a result of a formatting operation, then you should go by
those rules to reverse engineer the original.
For instance, the test you have there is
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
If you removed just the single newlines only, it would look like this:
This should all be on one line since it's one sentence.
This is a new paragraph thatshould be separate.
Also, other formatting such as intentional newlines will be lost, so something like:
This is Chapter 1
Section a
Section b
Turns into
This is Chapter 1 Section a Section b
Finding the newline in question is easy /(?<!\n)\n(?!\n)/
but, what do you replace it with.
Edit: Actually, its not that easy even to find standalone newlines, because visually they sit amongst hidden from view (horizontal) whitespaces.
There are 4 ways to go.
Remove newline, keep the surrounding formatting
$text =~ s/(?<!\s)([^\S\n]*)\n([^\S\n]*)(?!\s)/$1$2/g;
Remove newline and formatting, substitute a space
$text =~ s/(?<!\s)[^\S\n]*\n[^\S\n]*(?!\s)/ /g;
Same as above but ignore newline at beginning or end of string
$text =~ s/(?<!\s)(?<!^)[^\S\n]*\n[^\S\n]*(?!$|\s)/ /g;
$text =~ s/(?<!\s)(?<!^)([^\S\n]*)\n([^\S\n]*)(?!$|\s)/$1$2/g;
Example breakdown of regex (this is the minimum required just to isolate a single newline):
(?<!\s) # Not a whitespace behind us (text,number,punct, etc..)
[^\S\n]* # 0 or more whitespaces, but no newlines
\n # a newline we want to remove
[^\S\n]* # 0 or more whitespaces, but no newlines
(?!\s)/ # Not a whitespace in front of us (text,number,punct, etc..)
Well, there is this:
s.gsub /([^\n])\n([^\n])/, '\1 \2'
It won't do anything to leading or trailing newlines. If you don't need leading or trailing white space at all, then you will win with this variation:
s.gsub(/([^\n])\n([^\n])/, '\1 \2').strip
$ ruby -00 -pne 'BEGIN{$\="\n\n"};$_.gsub!(/\n+/,"\0")' file
This should all be on one line since it's one sentence.
This is a new paragraph thatshould be separate.

Interpret newlines as <br>s in markdown (Github Markdown-style) in Ruby

I'm using markdown for comments on my site and I want users to be able to create line breaks by pressing enter instead of space space enter (see this meta question for more details on this idea)
How can I do this in Ruby? You'd think Github Flavored Markdown would be exactly what I need, but (surprisingly), it's quite buggy.
Here's their implementation:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/^[\w\<][^\n]*\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
This logic requires that the line start with a \w for a linebreak at the end to create a <br>. The reason for this requirement is that you don't to mess with lists: (But see the edit below; I'm not even sure this makes sense)
* we don't want a <br>
* between these two list items
However, the logic breaks in these cases:
[some](http://google.com)
[links](http://google.com)
*this line is in italics*
another line
> the start of a blockquote!
another line
I.e., in all of these cases there should be a <br> at the end of the first line, and yet GFM doesn't add one
Oddly, this works correctly in the javascript version of GFM.
Does anyone have a working implementation of "new lines to <br>s" in Ruby?
Edit: It gets even more confusing!
If you check out Github's official Github Flavored Markdown repository, you'll find yet another newline to <br> regex!:
# in very clear cases, let newlines become <br /> tags
text.gsub!(/(\A|^$\n)(^\w[^\n]*\n)(^\w[^\n]*$)+/m) do |x|
x.gsub(/^(.+)$/, "\\1 ")
end
I have no clue what this regex means, but it doesn't do any better on the above test cases.
Also, it doesn't look like the "don't mess with lists" justification for requiring that lines start with word characters is valid to begin with. I.e., standard markdown list semantics don't change regardless of whether you add 2 trailing spaces. Here:
item 1
item 2
item 3
In the source of this question there are 2 trailing spaces after "item 1", and yet if you look at the HTML, there is no superfluous <br>
This leads me to think the best regex for converting newlines to <br>s is just:
text.gsub!(/^[^\n]+\n+/) do |x|
x =~ /\n{2}/ ? x : (x.strip!; x << " \n")
end
Thoughts?
I'm not sure if this will help, but I just use simple_format()
from ActionView::Helpers::TextHelper
ActionView simple_format
my_text = "Here is some basic text...\n...with a line break."
simple_format(my_text)
output => "<p>Here is some basic text...\n<br />...with a line break.</p>"
Even if it doesn't meet your specs, looking at the simple_format() source code .gsub! methods might help you out writing your own version of required markdown.
A little too late, but perhaps useful for other people. I've gotten it to work (but not thoroughly tested) by preprocessing the text using regular expressions, like so. It's hideous as a result of the lack of zero-width lookbehinds, but oh well.
# Append two spaces to a simple line, if it ends in newline, to render the
# markdown properly. Note: do not do this for lists, instead insert two newlines. Also, leave double newlines
# alone.
text.gsub! /^ ([\*\+\-]\s+|\d+\s+)? (.+?) (\ \ )? \r?\n (\r?\n|[\*\+\-]\s+|\d+\s+)? /xi do
full, pre, line, spaces, post = $~.to_a
if post != "\n" && pre.blank? && post.blank? && spaces.blank?
"#{pre}#{line} \n#{post}"
elsif pre.present? || post.present?
"#{pre}#{line}\n\n#{post}"
else
full
end
end

Resources