Porting POSIX regex to Lua pattern - unexpected results

Porting POSIX regex to Lua pattern - unexpected results - lua

I have hard time porting POSIX regex to Lua string patterns.
I'm dealing with html response from which I would like to filter checkboxes
that are checked. Particularly I'm interested in value and name fields of
each checked checkbox:
Here are examples of checkboxes I'm interested in:
<input class="rid-2 form-checkbox" id="edit-2-access-comments" name="2[access comments]" value="access comments" checked="checked" type="checkbox">
<input class="rid-3 form-checkbox real-checkbox" id="edit-3-administer-comments" name="3[administer comments]" value="administer comments" checked="checked" type="checkbox">
as opposed I'm not interested in this (unchecked checkbox):
<input class="rid-2 form-checkbox" id="edit-2-access-printer-friendly-version" name="2[access printer-friendly version]" value="access printer-friendly version" type="checkbox">
Using POSIX regex I've used following pattern in Python: pattern=r'name="(.*)" value="(.*)" checked="checked"' and it just worked.
My first approach in Lua was simply to use this: pattern ='name="(.-)"
value="(.-)" checked="checked"' but it gave strange results (first capture
was as expected but the second one returned lots of unneeded html).
I've also tried following pattern:
pattern = 'name="(%d?%[.-%])" value="(.-)"%s?(c?).-="?c.-"%s?type="checkbox"'
This time, in second capture content of value was returned but all
checkboxes where matched (not only those with checked="checked" field)
For completeness, here's the Lua code (snippet from my Nmap NSE script) that
attempts to do this pattern matching:
pattern = 'name="(.-)" value="(.-)" checked="checked"'
data = {}
for name, value in string.gmatch(res.body, pattern) do
stdnse.debug(1, string.format("%s %s", name, value))
end

I've used following pattern in Python: pattern=r'name="(.*)" value="(.*)" checked="checked"' and it just worked.
Python re is not POSIX compliant and . matches any char but a newline char there (in POSIX and Lua, . matches any char including a newline).
If you want to match a string that has 3 attributes above one after another, you should use something like
local pattern = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'
Why not [^\r\n]-? Because in case there are two tags on one line with the first having the first and/or second attribute and the second having the second and third or just second (and even if there is a third tag with the third attribute while the first one contains the first two attributes), there will be match, as [^\r\n] matches < and > and can "overfire" across the tags.
Note that [^"]*, a negated bracket expression, will only match 0+ chars other than " thus restricting the matches within one tag.
See Lua demo:
local rx = 'name="([^"]*)"%s+value="([^"]*)"%s+checked="checked"'
local s = '<li name="n1"\nvalue="v1"><li name="n2"\nvalue="v1" checked="checked"><li name="n3"\nvalue="v3" checked="checked">'
for name, value in string.gmatch(s, rx) do
print(name, value)
end
Output:
n2 v1
n3 v3

(Updated based on comments) The pattern doesn't work when a line that doesn't have checked="checked" is before a line with checked="checked" in the input as .- expression captures unnecessary parts. There are several ways to avoid this; one suggested by #EgorSkriptunoff is to use ([^"]*) as the pattern; another is to exclude new lines ([^\r\n]-). The following example prints what you expect:
local s = [[
<input class="rid-2 form-checkbox" id="edit-2-access-comments" name="2[access comments]" value="access comments" checked="checked" type="checkbox">
<input class="rid-2 form-checkbox" id="edit-2-access-printer-friendly-version" name="2[access printer-friendly version]" value="access printer-friendly version" type="checkbox">
<input class="rid-3 form-checkbox real-checkbox" id="edit-3-administer-comments" name="3[administer comments]" value="administer comments" checked="checked" type="checkbox">
]]
local pattern = 'name="([^\r\n]-)" value="([^\r\n]-)" checked="checked"'
for name, value in string.gmatch(s, pattern) do
print(name, value)
end
The output:
2[access comments] access comments
3[administer comments] administer comments

Related

Underscores in Thymeleaf Text Literals

Question: How to escape multiple consecutive underscores in text literals?
I am using the standard Thymeleaf dialect for HTML (I am not using Spring or SpEL here).
In Thymeleaf, I can create an underscore as a text literal as follows:
<div th:text="'_'"></div>
This renders as:
<div>_</div>
I can create literals with 2 and 3 underscores in the same way:
<div th:text="'__'"></div>
<div th:text="'___'"></div>
But for 4 underscores, I get an error:
org.thymeleaf.exceptions.TemplateProcessingException: Could not parse as expression: ""
I assume (maybe incorrectly) this is because two pairs of underscores (__ followed by __) are the markers used by Thymeleaf for the expression preprocessor. And when these are removed, I am left with an empty expression - hence the error.
I can escape the underscores using the backslash (\) escape character. The following all give the required results:
<div th:text="'\_\___'"></div>
<div th:text="'\_\_\_\__'"></div>
<div th:text="'\_\_\_\___'"></div>
<div th:text="'_\_\_\_\___'"></div>
<div th:text="'\_\_\_\_\_\___'"></div>
But I can't just escape every underscore.
This displays a stray backslash:
<div th:text="'\_\_\_\_\_'"></div>
The result is:
<div>____\_</div>
So:
What are the rules for escaping underscores in text literals?
Is it really the preprocessor which is causing this behavior (inside text literals) - or is it something else?

Yeah, this is definitely part of the preprocessor.
It looks to me like the preprocessor only replaces an exact match of \_\_ with __. In any case where you have an odd number of \_'s, you will get the output \_ -- because it's not treating \_ as a real escape and instead only looking for \_\_.

I stumbled upon the same issue while providing underscore as placeholder for a code input field and found the following workarounds:
1. Insert zero width space as seperator
In the first example ('____') the zero witdh space is unescaped, but you can copy paste the string into your IDE of choice.
<div th:text="${'______'}"></div>
<div th:text="${'______'}"></div>
<div th:text="${'______'}"></div>
<div th:text="${'_&ZeroWidthSpace;_&ZeroWidthSpace;_&ZeroWidthSpace;_&ZeroWidthSpace;_&ZeroWidthSpace;_'}"></div>
2. Use string replace
Surprisingly, this also seems to work. You can use any character, but no underscores in the original string, I chose "......". You can also use it with a string of unknown length by specifying a variable instead of a fixed string.
<div th:text="${#strings.replace('......', '.', '_')}"></div>
<div th:text="${#strings.replace('......', '.', '_')}"></div>

Decimal Keyboard not showing in iOS after switching fields

To get the numbers with decimal characters in keyboard of iOS, I tried almost every trick in js and jquery (tel as input type, 0.01 as step etc...) but the only solution was the plugins:
https://github.com/mrchandoo/cordova-plugin-decimal-keyboard
https://github.com/gbrits/cordova-plugin-ios-decimal-keyboard
https://www.npmjs.com/package/cordova-plugin-decimal-keyboard-wkwebview
...
They are all working. However...
If there are more than one input fields after each other and if you switch from a "non-decimal" field to our "decimal" input field (either by arrows or by just tapping with fingers), you don't see the decimal character anymore, but just the numbers. If you unclick/click on "Done" and select the decimal input field again, it works then again.
This is the common problem of all the plugins. I'm asking myself whether it is a known issue or it is a very specific problem of my case?
This is my html:
<input v-if="isIOS" type="text" pattern="[0-9]*" decimal="true" decimal-char=","
maxlength="10" min="0" :max="detailsSelectedPunt.tMax" placeholder="Punt..." v-model="detailsSelectedPunt.Number">
<input type="text" maxlength="255" placeholder="Commentaar..." v-model="detailsSelectedPunt.Commentaar">

GPath Expression: Use HTML element's value as argument

I have this gpath expression
<g:findAll in="${paymentList}" expr="it.account == 10">
and this field
<g:hiddenField name="acct_id"/>
acct_id already has a value and I want to use that value for comparison instead of just putting a static number like 10. How do I do it?

Though first question is, from where did the value for hidden field came?
If it's known or even coming from request or in model, then we can do it like below:
Reason for using ${} in g:set tag is to make the value Integer.
Reason for using ${} in g:findAll tag's expr attribute is to make the value available to expr(Note: it's a string expression which gets evaluated later in taglib).
Hope it helps!

parsley.js telephone digits input validating with spaces

I have an input for telephone number.
I would like to write this format: 0175 6565 6262 (with spaces). But if write with " " spaces so get error and I write without spaces so get not error.
Here my HTML Input:
<input type="text" data-parsley-minlength="6" data-parsley-minlength-message="minlength six number" data-parsley-type="digits" data-parsley-type-message="only numbers" class="input_text" value="">
Hope someone can help me?

That's a great answer, but it's a bit too narrow for my needs. Input field should be tolerant of all potential inputs – periods, hyphens, parentheses, spaces in unexpected places, plus signs for international folk – and using this document from Microsoft detailing what numbers IE11 should accept, I've come up with this:
data-parsley-pattern="^[\d\+\-\.\(\)\/\s]*$"
Every number in that list passes the test with flying colours. Enjoy!

If you want your input to accept a string like "nnnn nnnn nnnn" you should use a regular expression.
For example, you can use the following HTML:
<input type="text" name="phone" value="" data-parsley-pattern="^\d{4} \d{4} \d{4}$" />
With this pattern the input will only be valid when you have fourdigits«space»fourdigits«space»fourdigits
You can test or tweak the regular expression and test it here: http://regexpal.com/
If you will use this pattern multiple times in your project I suggest you create a custom validator (see http://parsleyjs.org/doc/index.html#psly-validators-craft)

Rails/Sphinx: search excerpts are also showing search conditions

This gives the results I was expecting:
result = Content.search("minerva", :conditions => {:publication_code => "12345678"})
result.first.element_type #=> "chapter"
result.first.excerpts.text #=> "outdated practice, The Owl of <span class=\"match\">Minerva</span> talks about the “unrealistic ‘Cartesian … major premise The Owl of <span class=\"match\">Minerva</span> details the innumerable combinations possible … concepts?” See The Owl of <span class=\"match\">Minerva</span>, p. 319. “Of course, ideally"
However: if I'm including search conditions that are literally present in the text, for instance the word "section" (which is a content element type) this is what I'm getting:
result = Content.search("minerva", :conditions => {:publication_code => "12345678", :element_type => "section"})
result.first.element_type #=> "section"
result.first.excerpts.text #=> "November 2001. The Owl of <span class=\"match\">Minerva</span>, p. 107. provides as follows: … foreign diplomatic or consular property, <span class=\"match\">section</span> 177 would place the United … source of leverage. In addition, <span class=\"match\">section</span> 177 could seriously affect our"
"Section", literally, is now also considered a match. I'm not getting what's the cause of this response.
Update to illustrate the problem some more:
Here's a query that finds a search term ("certification") near the term I'm using in the search conditions ("section", to limit my search to element_types that are sections).
result = Content.search("certification", :conditions => {:publication_code => "12345678", :element_type => "section"})
The text that gets returned is this (shortened to match following excerpts, and bold text mine):
result.first.text
[…] and operation of section 10 and the section 10 certification process. He noted […]
[…] object of the certification procedure introduced by section 10(1)(b) was not to […]
[…] domestic court. The certification procedure provided for by section 10 is similarly […]
Calling result.first.excerpts.text gives me the following. As you can see, everywhere in the text where either the term 'classification' or 'section' is found, it's set as a match.
" … and operation of <span class=\"match\">section</span> 10 and the <span class=\"match\">section</span> 10 <span class=\"match\">certification</span> process. He noted: … object of the <span class=\"match\">certification</span> procedure introduced by <span class=\"match\">section</span> 10(1)(b) was not to … domestic court. The <span class=\"match\">certification</span> procedure provided for by <span class=\"match\">section</span> 10 is similarly … "

The excerpts pane uses all query terms when generating output - which includes supplied conditions (as they end up being part of the Sphinx query - e.g. your second example, from Sphinx's perspective, is "minerva #publication_code 12345678 #element_type section").
An alternative is to have your own excerpter with just the query you want:
excerpter = ThinkingSphinx::Excerpter.new 'content_core', 'minerva', {}
excerpter.excerpt! results.first.text
The first argument when building the excerpter is the index name, the second is the search query to match against, and the third is options.

I think this is just a coincidence.
Try with a dataset that doesn't have section in the text to see if this also happens.

Categories

HOME

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Porting POSIX regex to Lua pattern - unexpected results - lua

Related

Underscores in Thymeleaf Text Literals

Decimal Keyboard not showing in iOS after switching fields

GPath Expression: Use HTML element's value as argument

parsley.js telephone digits input validating with spaces

Rails/Sphinx: search excerpts are also showing search conditions

Categories

Resources