REGEX for capturing only words? - dart

here is a text I want to split its words in a list, but the problem is there are commas, periods and a big spacing between the two paragraphs, so can one any suggest a REGEX for extracting only whole words using split(RegExp()) method?
input:
String text = '''
You find so many people are fimble
But you, you are mostly humble
I love the way you wear your hair,
Spreading your style everywhere.
You're like a style fountain.
Enough for a whole mountain.
''';
desired output:
[You, find, so, many, people, are, fimble, But, you, you, are, mostly, humble, I, love, the, way, you, wear, your, hair, Spreading, your, style, everywhere, You're, like, a, style, fountain, Enough, for, a, whole, mountain]

I can see the same question was asked on Reddit where I gave my answer. This is a copy of my solution for the problem.
Also, I would not use split in this case but just use the RegExp class to get matches which is much simpler since it is easier to define what we define a word to be instead of trying to define what we want to delete. So something like this would do the trick:
void main() {
const text = '''
You find so many people are fimble
But you, you are mostly humble
I love the way you wear your hair,
Spreading your style everywhere.
You're like a style fountain.
Enough for a whole mountain.
''';
final words = [...RegExp(r"[\w']+").allMatches(text).map((e) => e.group(0))];
print(words);
// [You, find, so, many, people, are, fimble, But, you, you, are, mostly,
// humble, I, love, the, way, you, wear, your, hair, Spreading, your, style,
// everywhere, You're, like, a, style, fountain, Enough, for, a, whole,
// mountain]
}

Try var newVar = text.split(' ')
Find it here :
https://www.tutorialkart.com/dart/dart-split-string/

I'm pretty sure that you can do with the regex the same thing, I don't like to use regex if I can use the logic to make a logic split inside the string.
I'm proposing a draft code to make your solution customizable without worry about the regex logic. Maybe you need to return on the split login to two years from now!
void main() {
String value = '''
You find so many people are fimble
But you, you are mostly humble
I love the way you wear your hair,
Spreading your style everywhere.
You're like a style fountain.
Enough for a whole mountain.
''';
//Inside the text can be a , ., in this case I use the solution to introcuce a not usual token inside
//inside the list and after split for this token.
var tokenBySpace = value.split(' ').join("#").split("\n").join("#").split("#");
tokenBySpace = tokenBySpace.map((word) => word = cleanString(word.trim())).toList();
tokenBySpace.removeWhere((item) => item.isEmpty); //Easy solution to remove the " . " string
print(tokenBySpace);
}
String cleanString(word){
var grammarRules = [".", ",", "(", ")"];
grammarRules.forEach((rule) => {
if(word.contains(rule)){
word = word.replaceAll(rule, '')
}
});
return word;
}
The result is
[You, find, so, many, people, are, fimble, But, you, you, are, mostly, humble, , I, love, the, way, you, wear, your, hair, Spreading, your, style, everywhere, You're, like, a, style, fountain, Enough, for, a, whole, mountain]
A solution inside one line can be
void main() {
String value = '''
You find so many people are fimble
But you, you are mostly humble
I love the way you wear your hair,
Spreading your style everywhere.
You're like a style fountain.
Enough for a whole mountain.
''';
//Inside the text can be a , ., in this case I use the solution to introcuce a not usual token inside
//inside the list and after split for this token.
var tokenBySpace = value.split(' ').join("#")
.split("\n").join("#")
.split(".").join("#")
.split(",").join("#")
.split(":").join("#")
.split(";").join("#")
.split("#");
tokenBySpace.removeWhere((item) => item.isEmpty); //Easy solution to remove the " . " string
print(tokenBySpace);
}
But I preferer the first!
Can you find a complete example here
the code inside the example is reported below
void main() {
var value = '''I preferer the previos solution because is more customizzable,
and help you to preserve some exception inside the text, such as URL
In addition, the this solution, subdivide the logic inside a function.
This solution is made by https://github.com/vincenzopalazzo
''';
//Inside the text can be a , ., in this case I use the solution to introcuce a not usual token inside
//inside the list and after split for this token.
var tokenBySpace = value.split(' ').join("#").split("\n").join("#").split("#");
tokenBySpace = tokenBySpace.map((word) => word = cleanString(word.trim())).toList();
tokenBySpace.removeWhere((item) => item.isEmpty); //Easy solution to remove the " . " string
print("------------- FIRST SOLUTION -------------------");
print(tokenBySpace);
value = '''
You find so many people are fimble
But you, you are mostly humble
I love the way you wear your hair,
Spreading your style everywhere.
You're like a style fountain.
Enough for a whole mountain.
''';
tokenBySpace = value.split(' ').join("#")
.split("\n").join("#")
.split(".").join("#")
.split(",").join("#")
.split(":").join("#")
.split(";").join("#")
.split("#");
tokenBySpace.removeWhere((item) => item.isEmpty); //Easy solution to remove the " . " string
print("------------- FIRST SOLUTION -------------------");
print(tokenBySpace);
}
String cleanString(word){
var grammarRules = [".", ",", "(", ")"];
grammarRules.forEach((rule) => {
if(word.contains(rule)){
word = word.replaceAll(rule, '')
}
});
return word;
}

Related

Have anyone found beautiful way to replace "smth if smth.present?"?

Often I'm facing lines like
result = 'Some text'
result += some_text_variable if some_text_variable.present?
And every time I want to replace that with something more accurate but I don't know how
Any ideas plz?
result += some_text_variable.to_s
It will work if some_text_variable is nil or empty string for example
But it always will concat empty string to original string
You can also use
result += some_text_variable.presence.to_s
It will work for all presence cases (for example for " " string)
You could "compact" and join an array, e.g.
['Some text', some_text_variable].select(&:present?).join
I realise this is a longhand form, just offering as an alternative to mutating strings.
This can look a bit nicer, if you have a large number of variables to munge together, or you want to join them in some other way e.g.
[
var_1,
var_2,
var_3,
var_4
].select(&:present?).join("\n")
Again, nothing gets mutated - which may or may not suit your coding style.

Splitting a string based on a certain set of words?

I'm trying to figure out how to take a phrase and split it up into a list of separate strings based on the occurrence of certain words.
Examples are probably be the easiest way to explain what I'm hoping to achieve:
List splitters = ['ABOVE', 'AT', 'NEAR', 'IN'];
INPUT: "ALFALFA DITCH IN ECKERT CO";
OUTPUT: ["ALFALFA DITCH", "IN ECKERT CO"];
INPUT: 'ANIMAS RIVER AT DURANGO, CO';
OUTPUT: ['ANIMAS RIVER', 'AT DURANGO, CO'];
INPUT: 'ALAMOSA RIVER ABOVE WILSON CREEK IN JASPER, CO';
OUTPUT ['ALAMOSA RIVER', 'ABOVE WILSON CREEK IN JASPER, CO'];
Notice in the third example, when there are multiple occurrences of splitters in the input phrase, I only want to use the first one.
To my knowledge, the split() method doesn't support multiple strings I can't find a single example of this in dart. I would think there is a simple solution?
I'd use a RegExp then
var splitters = ['ABOVE', 'AT', 'NEAR', 'IN'];
var s = "ALFALFA DITCH IN ECKERT CO";
var splitterRE = RegExp(splitters.join('|'));
var match = splitterRE.firstMatch(s);
if (match ! null) {
var partOne = s.substring(0, match.start).trimRight();
var partTwo = s.substring(match.start);
}
That does what you ask for, but it's slightly unsafe.
It will find "IN" in "BEHIND" if given "BEHIND THE FARM IN ALABAMA".
You likely want to match only complete words. In that case, RegExps are even more helpful, since they can do that too. Change the line to:
var splitterRE = RegExp(r'\b(?:' + splitters.join('|') + r')\b');
then it will only match entire words.

Computing a label from a label and relative offset

I have a macro that is generating two rules to avoid circularity issues. For a call like yaspl_bootstrap_library(name=foo, deps=[":bar"]) I want to generate the following rules:
yaspl_library(name=foo, deps=[":bar"])
yaspl_srcs(name=foo_srcs, deps=[":bar_srcs"])
Thus I need a function to turn ":bar" into ":bar_srcs". And while the obvious string concatenation works in this example it fails in the case where "//lib/foo" needs to be turned into "//lib/foo:foo_srcs".
This seems like a common thing that would happen in macros yet I cannot seem to find anything that does it easily.
First, I'll point out that this kind of string manipulation will not work with the select function (https://docs.bazel.build/versions/master/be/functions.html#select).
If it's not an issue for you, you can go ahead. This function can be written in a .bzl file. I agree this label manipulation functions should become available. In the meantime, you can try this function:
def explicit_label(label):
if ":" in label or "//" not in label:
return label
return label + ":" + label[label.rfind("/")+1:]
explicit_label(dep) + "_srcs"

Google Spreadsheet Translate, ignore variable names

An interesting Google Spreadsheet problem, I have a language file based on key=value that I have copied into a spreadsheet, eg.
titleMessage=Welcome to My Website
youAreLoggedIn=Hello #{user.name} you are now logged in
facebookPublish=Facebook Publishing
I have managed to split the key / value into two columns, and then translate the value column, and re-join it with the keys and Voila! this gives me a translated language file back
But as you may have spotted there are some variable in there (eg. #{user.name}) which are injected by my application, obviously I dont want to translate them.
So here is my question, given the following cell contents...
Hello #{user.name} you are now logged in
Is there a function that will translate the contents using the TRANSLATE function, but ignore anything inside #{ } (this could be at any point in the sentance)
Any Google Spreadsheet guru's have a solution for me?
Many thanks
If there are at most one occurrence of #{} then you could use the SPLIT function to divide the string into three parts that are arranged as below.
A B C D E
Original =SPLIT(An, "#{}") First piece Tag Rest of string
Translate Keep as is Translate
Put the pieces together with CONCATENATE.
=CONCATINATE(Cn,Dn,En)
I come up with same question.
Assume the escape pattern is #{sth.sth}(in regex as #{[\w.]+}). Replace them with string which Google Translate would view as untranslatable term, like VAR.
After translation, replace the term with original pattern.
Here is how I did this in script editor of spreadsheet:
function myTranslate(text, source_language, target_language) {
if(text.toString()) {
var str = text.toString();
var regex = /#{[\w.]+}/g; // g flag for multiple matches
var replace = 'VAR'; // Replace #{variable} to prevent from translation
var vars = str.match(regex).reverse(); // original patterns
str = str.replace(regex, replace);
str = LanguageApp.translate(str, source_language, target_language);
var ret = '';
for (var idx = str.search(replace); idx; idx = str.search(replace)) {
ret += str.slice(0, idx) + vars.pop();
str = str.slice(idx+replace.length);
}
return ret;
}
return null;
}
You can't just split and concatenate, because different languages use different word order of subject/predicate/object etc., and also because several languages modify nouns with different prefixes/suffixes/spelling changes depending on what they are doing in the sentence. It's all very complicated. Google needs to enable some sort of enclosing parentheses around any term we want to be quoted rather than translated.

removing whitespaces in ActionScript 2 variables

let's say that I have an XML file containing this :
<description><![CDATA[
<h2>lorem ipsum</h2>
<p>some text</p>
]]></description>
that I want to get and parse in ActionScript 2 as HTML text, and setting some CSS before displaying it. Problem is, Flash takes those whitespaces (line feed and tab) and display it as it is.
<some whitespace here>
lorem ipsum
some text
where the output I want is
lorem ipsum
some text
I know that I could remove the whitespaces directly from the XML file (the Flash developer at my workplace also suggests this. I guess that he doesn't have any idea on how to do this [sigh]). But by doing this, it would be difficult to read the section in the XML file, especially when lots of tags are involved and that makes editing more difficult.
So now, I'm looking for a way to strip those whitespaces in ActionScript. I've tried to use PHP's str_replace equivalent (got it from here). But what should I use as a needle (string to search) ? (I've tried to put in "\t" and "\r", don't seem to be able to detect those whitespaces).
edit :
now that I've tried to throw in newline as a needle, it works (meaning that newline successfully got stripped).
mystring = str_replace(newline, '', mystring);
But, newlines only got stripped once, meaning that in every consecutive newlines, (eg. a newline followed by another newline) only one newline can be stripped away.
Now, I don't see that this as a problem in the str_replace function, since every consecutive character other than newline get stripped away just fine.
Pretty much confused about how stuff like this is handled in ActionScript. :-s
edit 2:
I've tried str_replace -ing everything I know of, \n, \r, \t, newline, and tab (by pressing tab key). Replacing \n, \r, and \t seem to have no effect whatsoever.
I know that by successfully doing this, my content can never have real line breaks. That's exactly my intention. I could format the XML the way I want without Flash displaying any of the formatting stuff. :)
Several ways to approach this. Perhaps the simplest answer is, in one sense your Flash developer is probably right, and you should move your whitespace outside of the CDATA container. The reason being, many people (me at least) tend to assume that everything inside a CDATA is "real data", as opposed to markup. On the other hand, whitespace outside a CDATA is normally assumed to be irrelevant, so data like this:
<description>
<![CDATA[<h2>lorem ipsum</h2>
<p>some text</p>]]>
</description>
would be easier to understand and to work with. (The flash developer can use the XML.ignoreWhite property to ignore the whitespace outside the CDATA.)
With that said, if you're editing the XML by hand, then I can see why it would be easier to use the formatting you describe. However, if the extra whitespace is inside the CDATA, then it will inevitable be included in the String data you extract, so your only option is to grab the content of the CDATA and remove the whitespace afterwards.
Then your question reduces to "how do I strip leading/trailing whitespace from a String in AS2?". And unfortunately, since AS2 doesn't support RegEx there's no simple way to do this. I think your best option would be to parse through from the beginning and end to find the first/last non-white character. Something along these lines (untested pseudocode):
myString = stuffFromXML;
whitespace = " " + "\t" + "\n" + "\r" + newline;
start = 0;
end = myString.length;
while ( testString( myString.substr(start,1), whitespace ) ) { start++; }
while ( testString( myString.substr(end-1,1), whitespace ) ) { end--; }
trimmedString = myString.substring( start, end );
function testString( needle, haystack ) {
return ( haystack.indexOf( needle ) > -1 );
}
Hope that helps!
Edit: I notice that in your example you'd also need to remove tabs and whitespace within your text data. This would be tricky, unless you can guarantee that your data will never include "real" tabs in addition to the ones for formatting. No matter what you do with the CDATA tags, it would probably be wiser not to insert extraneous formatting inside your real content and then remove it programmatically afterward. That's just making your own life difficult.
Second edit: As for what character to remove to get rid of newlines, it depends partially on what characters are actually in the XML to begin with (which probably depends on what OS is running where the file is generated), and partially on what character the client machine (that's showing the flash) considers a newline. Lots of gory details here. In practice though, if you remove \r, \n, and \r\n, that usually does the trick. That's why I added both \r and \n to the "whitespace" string in my example code.
its been a while since I've tinkered with AS2.
someXML = new XML();
someXML.ignoreWhite = true;
if you wanted to str_replace try '\n'
Is there a reason that you are using cdata? Admittedly I have no idea what the best practice for this sort of this is, but I tend to leave them out and just have the HTML sit there inside the node.
var foo = node.childnodes.join("") parses it out just fine and I never seem to come across these whitespace problems.
I'm reading this over and over again, and if I'm interpreting you right, all you want to know how to do is strip certain characters (tabs and newlines) from a string in AS2, right? I cannot believe no one has given you the simple one line answer yet:
myString = myString.split("\n").join("");
That's it. Repeat that for \r, \n, and \t and all newlines and tabs will be gone. If you want it as an easy function, then do this:
function stripWhiteSpace(str: String) : String
{
return str.split("\r").join("").split("\n").join("").split("\t").join("");
}
That function won't modify your old string, it will return a new one without \r, \n, or \t. To actually modify the old string use that function like this:
myString = stripWhiteSpace(myString);

Resources