I need help with regex, I have added some working Playground code on what I am trying to do, to help.
If key="id" it should return: 862
If key="pos" is should return: -301.5, 61.7, 364.6
My RegEx is var expression = key+"=(.*?)[^,]+" which is kinda close, but not exactly what I am wanting.
Any help is appreciated!
import UIKit
var data = "0. id=862, pos=(-301.5, 61.7, 364.6), rot=(-7.0, -2735.2, 0.0), remote=True, health=125"
var key = "id"
var expression = key+"=(.*?)[^,]+"
var match = data.range(of: expression, options: .regularExpression);
var value = data.substring(with: match!)
print(value)
You may match the strings using
var expression = "(?<=" + key+"=)(?:\\([^()]+\\)|[^,\\s]+)"
See the regex demo, it matches
(?<=pos=) - a key followed with = must immediately precede the current position
(?:\([^()]+\)|[^,\s]+) - match either
\([^()]+\) - a (, 1+ chars other than ( and ) and then )
| - or
[^,\s]+ - 1+ chars other than whitespace and comma.
Then, after you get the match, remove trailing ( and ):
var value = data.substring(with: match!).trimmingCharacters(in: ["(",")"])
Related
I have stumbled upon this line of code and I am not sure what the [ ? ] part represents (my guess is it's a sort of a wildcard but I searched it for a while and couldn't find anything):
['?'] = function() return is_canadian and "eh" or "" end
I understand that RHS is a functional ternary operator. I am curious about the LHS and what it actually is.
Edit: reference (2nd example):
http://lua-users.org/wiki/SwitchStatement
Actually, it is quite simple.
local t = {
a = "aah",
b = "bee",
c = "see",
It maps each letter to a sound pronunciation. Here, a need to be pronounced aah and b need to be pronounced bee and so on. Some letters have a different pronunciation if in american english or canadian english. So not every letter can be mapped to a single sound.
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
In the mapping, the letter z and the letter ? have a different prononciation in american english or canadian english. When the program will try to get the prononciation of '?', it will calls a function to check whether the user want to use canadian english or another english and the function will returns either zed or zee.
Finally, the 2 following notations have the same meaning:
local t1 = {
a = "aah",
b = "bee",
["?"] = "bee"
}
local t2 = {
["a"] = "aah",
["b"] = "bee",
["?"] = "bee"
}
If you look closely at the code linked in the question, you'll see that this line is part of a table constructor (the part inside {}). It is not a full statement on its own. As mentioned in the comments, it would be a syntax error outside of a table constructor. ['?'] is simply a string key.
The other posts alreay explained what that code does, so let me explain why it needs to be written that way.
['?'] = function() return is_canadian and "eh" or "" end is embedded in {}
It is part of a table constructor and assigns a function value to the string key '?'
local tbl = {a = 1} is syntactic sugar for local tbl = {['a'] = 1} or
local tbl = {}
tbl['a'] = 1
String keys that allow that convenient syntax must follow Lua's lexical conventions and hence may only contain letters, digits and underscore. They must not start with a digit.
So local a = {? = 1} is not possible. It will cause a syntax error unexpected symbol near '?' Therefor you have to explicitly provide a string value in square brackets as in local a = {['?'] = 1}
they gave each table element its own line
local a = {
1,
2,
3
}
This greatly improves readability for long table elements or very long tables and allows you maintain a maximum line length.
You'll agree that
local tbl = {
z = function() return is_canadian and "zed" or "zee" end,
['?'] = function() return is_canadian and "eh" or "" end
}
looks a lot cleaner than
local tbl = {z = function() return is_canadian and "zed" or "zee" end,['?'] = function() return is_canadian and "eh" or "" end}
I got a deprecation warning on this line:
components.append("-b \"\(string.substring(to: string.index(before: string.endIndex)))\"")
So I changed it to:
components.append("-b \"\(String(string[..<string.endIndex]) )\"")
Is the second line okay?, because my code otherwise seems to be working fine.
Let's see
let string = "12345"
var components = [String]()
var components2 = [String]()
components.append("-b \"\(string.substring(to: string.index(before: string.endIndex)))\"")
components2.append("-b \"\(String(string[..<string.endIndex]) )\"")
print(components)
print(components2)
print(components == components2)
gives us
["-b \"1234\""]
["-b \"12345\""]
false
so the answer is, no they are not...
If your intention is to remove the last character, then you can just use dropLast:
components.append("-b \"\(string.dropLast())\"")
note that you can pass a param for the number of elements you want to drop (dropLast(2) for example)
Finally, the equivalent expression after using the partial range would be:
string[..<string.index(before: string.endIndex)]
and that is because the first expression translates to:
The index up to but not including the index before the endIndex
while the second one translates to:
The index up to but not including the endIndex
where endIndex refers to the "past the end" position
Given this string:
one#two*three#four#five*
What is a fast solution to extract the list of Pairs?
Each pair contains the word with its separator character like this:
[
['one', '#'],
['two', '*'],
['three', '#'],
['four', '#'],
['five', '*']
]
Specifically in my case I want to use both white space and new line characters as separators.
You'd need a regular expression:
(\w+)([#|*])
See example Dart code here that should get you going: https://dartpad.dartlang.org/ae3897b2221a94b5a4c9e6929bebcfce
Full disclosure: dart is a relatively new language to me.
That said, regex might be your best bet. Assuming you are only working with lowercase a-z letters followed by a single character, this should do the trick.
RegExp r = RegExp("([a-z]+)(.)");
var matches = r.allMatches("one#two*three#four#five*");
List<dynamic> l = [];
matches.toList().asMap().forEach((i, m) => l.add([m.group(1), m.group(2)]));
print(l);
Based on other responses here's my solution for white spaces and new lines as separators:
void main() {
RegExp r = RegExp(r"(\S+)([\s]+|$)");
var text = 'one two three \n\n four ';
var matches = r.allMatches(text);
List<dynamic> l = [];
matches.toList().asMap().forEach((i, m) => l.add([m.group(1), m.group(2)]));
print(l);
}
Output
[[one, ], [two, ], [three,
], [four, ]]
Explanation: https://regex101.com/r/cRpMVq/2
Via an enterpreise service consumer I connect to a webservice, which returns me some data, and also url's.
However, I tried all methods of the mentioned class above and NO METHOD seems to convert the unicode-characters inside my url into the proper readable characters.... ( in this case '=' and ';' ) ...
The only method, which runs properly is "is_valid_url", which returns false, when I pass url's like this:
http://not_publish-workflow-dev.hq.not_publish.com/lc/content/forms/af/not_publish/request-datson-internal/v01/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled
What am I missing?
It seems that this format is for json values. Usually = and & don't need to be written with the \u prefix. To decode all \u characters, you may use this code:
DATA(json_value) = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId\u003d105862\u0026wcmmode\u003ddisabled`.
FIND ALL OCCURRENCES OF REGEX '\\u....' IN json_value RESULTS DATA(matches).
SORT matches BY offset DESCENDING.
LOOP AT matches ASSIGNING FIELD-SYMBOL(<match>).
DATA hex2 TYPE x LENGTH 2.
hex2 = to_upper( substring( val = json_value+<match>-offset(<match>-length) off = 2 ) ).
DATA(uchar) = cl_abap_conv_in_ce=>uccp( hex2 ).
REPLACE SECTION OFFSET <match>-offset LENGTH <match>-length OF json_value WITH uchar.
ENDLOOP.
ASSERT json_value = `http://not_publish-workflow-dev.hq.not_publish.com/lc`
&& `/content/forms/af/not_publish/request-datson-internal/v01`
&& `/request-datson-internal.html?taskId=105862&wcmmode=disabled`.
I hate to answer my own questions, but anyway, I found an own solution, via manually replacing those unicodes. It is similar to Sandra's idea, but able to convert ANY unicode.
I share it here, just in case, any person might also need it.
DATA: lt_res_tab TYPE match_result_tab.
DATA(valid_url) = url.
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
WHILE lines( lt_res_tab ) > 0.
DATA(match) = substring( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length ).
DATA(hex_unicode) = to_upper( match+2 ).
DATA(char) = cl_abap_conv_in_ce=>uccp( uccp = hex_unicode ).
valid_url = replace( val = valid_url off = lt_res_tab[ 1 ]-offset len = lt_res_tab[ 1 ]-length with = char ).
FIND ALL OCCURRENCES OF REGEX '\\u.{4}' IN valid_url RESULTS lt_res_tab.
ENDWHILE.
WRITE / url.
WRITE / valid_url.
local querycreate = "create user 'dddqwd123_dwqd'#'localhost'"
local create, usercreate, username, userhost = querycreate:match("^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$")
print(string.format("query: %s", querycreate))
print(string.format(" var create = %s \n var usercreate = %s \n var username = %s \n var userhost = %s", create, usercreate, username, userhost))
query: create user 'dddqwd123_dwqd'#'localhost'
var create = nil
var usercreate = nil
var username = nil
var userhost = nil
My regex works fine on http://regexr.com?37voi.
If I change it to ("^(%w+)%s+(%w+)%s+"), it outputs:
var create = create
var usercreate = user
var username = nil
var userhost = nil
If I remove quotes from querycreate by setting it to "create user dddqwd123_dwqd # localhost" and use ^(%w+)%s+(%w+)%s+(%w+) # (%w+)$, then the output is normal.
Your initial pattern:
"^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$"
The last two captures in your pattern lack + specifiers to indicate that they are to read 1 or more characters.
_ is not a word character matched by %w. You will need a character class including _ everywhere you would need to match it.
The escaping of ' within a "-quoted string is unnecessary.
With some improvement:
"^(%w+)%s+(%w+)%s+'([%w_]+)'#'([%w_]+)'$"
Alternatively: If you wanted to match anything within a set of quotes ', you could match against its inverse class:
"^(%w+)%s+(%w+)%s+'([^']*)'#'([^']*)'$"
([^']*) will capture anything (including nothing) that isn't ', until the end of the string or a ' is found.
You are missing the + on the parts of the pattern that match the user and domain.
You have "^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$"
You need "^(%w+)%s+(%w+)%s+'([%w_]+)'#'([%w_]+)'$" (removed unnecessary single quote escaping and to incorporate #RyanStein's comment about _).