local querycreate = "create user 'dddqwd123_dwqd'#'localhost'"
local create, usercreate, username, userhost = querycreate:match("^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$")
print(string.format("query: %s", querycreate))
print(string.format(" var create = %s \n var usercreate = %s \n var username = %s \n var userhost = %s", create, usercreate, username, userhost))
query: create user 'dddqwd123_dwqd'#'localhost'
var create = nil
var usercreate = nil
var username = nil
var userhost = nil
My regex works fine on http://regexr.com?37voi.
If I change it to ("^(%w+)%s+(%w+)%s+"), it outputs:
var create = create
var usercreate = user
var username = nil
var userhost = nil
If I remove quotes from querycreate by setting it to "create user dddqwd123_dwqd # localhost" and use ^(%w+)%s+(%w+)%s+(%w+) # (%w+)$, then the output is normal.
Your initial pattern:
"^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$"
The last two captures in your pattern lack + specifiers to indicate that they are to read 1 or more characters.
_ is not a word character matched by %w. You will need a character class including _ everywhere you would need to match it.
The escaping of ' within a "-quoted string is unnecessary.
With some improvement:
"^(%w+)%s+(%w+)%s+'([%w_]+)'#'([%w_]+)'$"
Alternatively: If you wanted to match anything within a set of quotes ', you could match against its inverse class:
"^(%w+)%s+(%w+)%s+'([^']*)'#'([^']*)'$"
([^']*) will capture anything (including nothing) that isn't ', until the end of the string or a ' is found.
You are missing the + on the parts of the pattern that match the user and domain.
You have "^(%w+)%s+(%w+)%s+\'(%w)\'#\'(%w)\'$"
You need "^(%w+)%s+(%w+)%s+'([%w_]+)'#'([%w_]+)'$" (removed unnecessary single quote escaping and to incorporate #RyanStein's comment about _).
Related
I would like to check if a string contains any of the following symbols ^ $ * . [ ] { } ( ) ? - " ! # # % & / \ , > < ' : ; | _ ~ ` + =
I tried using the following
string.contains(RegExp(r'[^$*.[]{}()?-"!##%&/\,><:;_~`+=]'))
But that does not seem to do anything. I am also not able to add the ' symbol.
Questions:
How do I check if a string contains any one of a set of symbols?
How do I add the ' symbol in my regex collection?
When writing such a RegExp pattern, you should escape the special symbols (if you want to search specifically by them).
Also, to add the ' to the RegExp, there is no straightforward way, but you could use String concatenation to work around this.
This is what the final result could look like:
void main() {
final regExp = RegExp(
r'[\^$*.\[\]{}()?\-"!##%&/\,><:;_~`+=' // <-- Notice the escaped symbols
"'" // <-- ' is added to the expression
']'
);
final string1 = 'abc';
final string2 = 'abc[';
final string3 = "'";
print(string1.contains(regExp)); // false
print(string2.contains(regExp)); // true
print(string3.contains(regExp)); // true
}
To ad both ' an " to the same string literal, you can use a multiline (triple-quoted) string.
string.contains(RegExp(r'''[^$*.[\]{}()?\-"'!##%&/\\,><:;_~`+=]'''))
You also need to escape characters which have meaning inside a RegExp character class (], - and \ in particular).
Another approach is to create a set of character codes, and check if the string's characters are in that set:
var chars = r'''^$*.[]{}()?-"'!##%&/\,><:;_~`+=''';
var charSet = {...chars.codeUnits};
var containsSpecialChar = string.codeUnits.any(charSet.contains);
i can use string.gsub(message, " ")
but it only cuts the words.
i searched on http://lua-users.org/wiki/StringLibraryTutorial but i cant find any solution for this there
how can i save these words into variables?
for example i have message = "fun 1 true enjoy"
and i want variables to have
var level = 1
var good = true
var message = "enjoy"
Use string.match to extract the fields and then convert them to suitable types:
message = "fun 1 true enjoy"
level,good,message = message:match("%S+%s+(%S+)%s+(%S+)%s+(%S+)")
level = tonumber(level)
good = good=="true"
print(level,good,message)
print(type(level),type(good),type(message))
The pattern in match skips the first field and captures the following three fields; fields are separated by whitespace.
I'm trying to extract the integers after mrp= and talktime=.
var i=0;
var recharge=[];
var recharge_text=[];
var recharge_String="";
var mrp="";
var talktime="";
var validity="";
var mode="";mrp='1100';
talktime='1200.00';
validity='NA';
mode='E-Recharge';
if(typeof String.prototype.trim !== 'function') {
String.prototype.trim = function() {
return this.replace(/^ +| +$/g, '');
}
}
mrp=mrp.trim();
if(isNaN(mrp))
{
recharge_text.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
else
{
mrp=parseInt(mrp);
recharge.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
mrp='2200';
talktime='2400.00';
I've extracted the above text from a webpage, but I do not know how to extract that particular part alone.
You can use regular expressions to parse strings and extract parts of it :
my_text = "blablabla" #just imagine that this is your text
regex_mrp = /mrp='(.+?)';/ #extracts whatever is between single quotes after mrp
regex_talktime = /talktime='(.+?)';/ #extracts whatever is between single quotes after talktime
mrp = my_text.match(regex_mrp)[1].to_i #gets the match, and converts to integer
talktime = my_text.match(regex_talktime)[1].to_f #gets the match, and converts to float
Here's a quick reference to the regular expressions syntax : https://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx
I'd do something like this:
string = <<EOT
var i=0;
var recharge=[];
var recharge_text=[];
var recharge_String="";
var mrp="";
var talktime="";
var validity="";
var mode="";mrp='1100';
talktime='1200.00';
validity='NA';
mode='E-Recharge';
if(typeof String.prototype.trim !== 'function') {
String.prototype.trim = function() {
return this.replace(/^ +| +$/g, '');
}
}
mrp=mrp.trim();
if(isNaN(mrp))
{
recharge_text.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
else
{
mrp=parseInt(mrp);
recharge.push({MRP:mrp, Talktime:talktime, Validity:validity ,Mode:mode});
}
mrp='2200';
talktime='2400.00';
EOT
hits = string.scan(/(?:mrp|talktime)='[\d.]+'/)
# => ["mrp='1100'", "talktime='1200.00'", "mrp='2200'", "talktime='2400.00'"]
This gives us an array of hits using scan, where the pattern /(?:mrp|talktime)='[\d.]+'/ matched in the string. Figuring out how the pattern works is left as an exercise for the user, but Ruby's Regexp documentation explains it all.
Cleaning that up to be a bit more useful:
hash = hits.map{ |s|
str, val = s.split('=')
[str, val.delete("'")]
}.each_with_object(Hash.new { |h, k| h[k] = [] }){ |(str, val), h| h[str] << val }
You also need to read about each_with_object and what's happening with Hash.new as those are important concepts to learn in Ruby.
At this point, hash is a hash of arrays:
hash # => {"mrp"=>["1100", "2200"], "talktime"=>["1200.00", "2400.00"]}
You can easily extract a particular variable's values, and can correlate them if need be.
what if i get a string instead of integer next to "=" sign?
...
string.scan(/(?:tariff)='[\p{Print}]+'/)
It's important to understand what the pattern is doing. The regular expression engine has some gotchas that can drastically affect the speed of a search, so indiscriminately throwing in things without understanding what they do can be very costly.
When using (?:...), you're creating a non-capturing group. When you only have one item you're matching it's not necessary, nor is it particularly desirable since it's making the engine do more work. The only time I'd do that is when I need to refer back to what the capture was, but since you have only one possible thing it'll match that becomes a moot-point. So, your pattern should be reduced to:
/tariff='[\p{Print}]+'/
Which, when used, results in:
%(tariff='abcdef abc a').scan(/tariff='[\p{Print}]+'/)
# => ["tariff='abcdef abc a'"]
If you want to capture all non-empty occurrences of the string being assigned, it's easier than what you're doing. I'd use something like:
%(tariff='abcdef abc a').scan(/tariff='.+'/)
# => ["tariff='abcdef abc a'"]
%(tariff='abcdef abc a').scan(/tariff='[^']+'/)
# => ["tariff='abcdef abc a'"]
The second is more rigorous, and possible safer as it won't be tricked by an line that has multiple single-quotes:
%(tariff='abcdef abc a', 'foo').scan(/tariff='.+'/)
# => ["tariff='abcdef abc a', 'foo'"]
%(tariff='abcdef abc a', 'foo').scan(/tariff='[^']+'/)
# => ["tariff='abcdef abc a'"]
Why that works is for you to figure out.
Hi I've got this function in JavaScript:
function blur(data) {
var trimdata = trim(data);
var dataSplit = trimdata.split(" ");
var lastWord = dataSplit.pop();
var toBlur = dataSplit.join(" ");
}
What this does is it take's a string such as "Hello my name is bob" and will return
toBlur = "Hello my name is" and lastWord = "bob"
Is there a way i can re-write this in Lua?
You could use Lua's pattern matching facilities:
function blur(data) do
return string.match(data, "^(.*)[ ][^ ]*$")
end
How does the pattern work?
^ # start matching at the beginning of the string
( # open a capturing group ... what is matched inside will be returned
.* # as many arbitrary characters as possible
) # end of capturing group
[ ] # a single literal space (you could omit the square brackets, but I think
# they increase readability
[^ ] # match anything BUT literal spaces... as many as possible
$ # marks the end of the input string
So [ ][^ ]*$ has to match the last word and the preceding space. Therefore, (.*) will return everything in front of it.
For a more direct translation of your JavaScript, first note that there is no split function in Lua. There is table.concat though, which works like join. Since you have to do the splitting manually, you'll probably use a pattern again:
function blur(data) do
local words = {}
for m in string.gmatch("[^ ]+") do
words[#words+1] = m
end
words[#words] = nil -- pops the last word
return table.concat(words, " ")
end
gmatch does not give you a table right away, but an iterator over all matches instead. So you add them to your own temporary table, and call concat on that. words[#words+1] = ... is a Lua idiom to append an element to the end of an array.
I have to parse a document containing groups of variable-value-pairs which is serialized to a string e.g. like this:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Here are the different elements:
Group IDs:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of each group:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
One of the groups:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14 ^VAR1^6^VALUE1^^
Variables:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Length of string representation of the values:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
The values themselves:
4^26^VAR1^6^VALUE1^VAR2^4^VAL2^^1^14^VAR1^6^VALUE1^^
Variables consist only of alphanumeric characters.
No assumption is made about the values, i.e. they may contain any character, including ^.
Is there a name for this kind of grammar? Is there a parsing library that can handle this mess?
So far I am using my own parser, but due to the fact that I need to detect and handle corrupt serializations the code looks rather messy, thus my question for a parser library that could lift the burden.
The simplest way to approach it is to note that there are two nested levels that work the same way. The pattern is extremely simple:
id^length^content^
At the outer level, this produces a set of groups. Within each group, the content follows exactly the same pattern, only here the id is the variable name, and the content is the variable value.
So you only need to write that logic once and you can use it to parse both levels. Just write a function that breaks a string up into a list of id/content pairs. Call it once to get the groups, and then loop through them calling it again for each content to get the variables in that group.
Breaking it down into these steps, first we need a way to get "tokens" from the string. This function returns an object with three methods, to find out if we're at "end of file", and to grab the next delimited or counted substring:
var tokens = function(str) {
var pos = 0;
return {
eof: function() {
return pos == str.length;
},
delimited: function(d) {
var end = str.indexOf(d, pos);
if (end == -1) {
throw new Error('Expected delimiter');
}
var result = str.substr(pos, end - pos);
pos = end + d.length;
return result;
},
counted: function(c) {
var result = str.substr(pos, c);
pos += c;
return result;
}
};
};
Now we can conveniently write the reusable parse function:
var parse = function(str) {
var parts = {};
var t = tokens(str);
while (!t.eof()) {
var id = t.delimited('^');
var len = t.delimited('^');
var content = t.counted(parseInt(len, 10));
var end = t.counted(1);
if (end !== '^') {
throw new Error('Expected ^ after counted string, instead found: ' + end);
}
parts[id] = content;
}
return parts;
};
It builds an object where the keys are the IDs (or variable names). I'm asuming as they have names that the order isn't significant.
Then we can use that at both levels to create the function to do the whole job:
var parseGroups = function(str) {
var groups = parse(str);
Object.keys(groups).forEach(function(id) {
groups[id] = parse(groups[id]);
});
return groups;
}
For your example, it produces this object:
{
'1': {
VAR1: 'VALUE1'
},
'4': {
VAR1: 'VALUE1',
VAR2: 'VAL2'
}
}
I don't think it's a trivial task to create a grammar for this. But on the other hand, a simple straight forward approach is not that hard. You know the corresponding string length for every critical string. So you just chop your string according to those lengths apart..
where do you see problems?