How can I make testing-library's getByText() match a string including a non-breaking space ( )? - testing-library

I'm trying to match a phone number string that includes a non-breaking space:
assert
.dom(
screen.getByText(
[my text with a non-breaking space]
) as HTMLElement
)
.exists();
However, it is returning this error:
Unable to find an element with the text: [my text with a non-breaking space]. This could be because the text is broken up by multiple elements. In this case, you can provide a function for your text matcher to make your matcher more flexible.
How can I test this?

The testing library automatically normalizes whitespace, so the non-breaking space gets converted to a regular space by default. See more about the default behavior at:
https://testing-library.com/docs/queries/about/#normalization
To override this behavior and leave it as a non-breaking space (so that it will match your assert), set the collapseWhitespace parameter to false.
This will look something like this:
assert
.dom(
screen.getByText(
[my text with a non-breaking space],
{ collapseWhitespace: false }
) as HTMLElement
)
.exists();

Related

how to tokenize/parse/search&replace document by font AND font style in LibreOffice Writer?

I need to update a bilingual dictionary written in Writer by first parsing all entries into their parts e.g.
main word (font 1, bold)
foreign equivalent transliterated (font 1, italic)
foreign equivalent (font 2, bold)
part of speech (font 1, italic)
Each line of the document is the main word followed by the parts listed above, each separated by a space or punctuation.
I need to automate the process of walking through the whole file, line by line, and place a delimiter between each part, ignoring spaces and punctuation, so I can mass import it into a Calc file. In other words, "each part" is a sequence of character (ignoring spaces and punctuation) that have the same font AND font-style.
I have tried the standard Search&Replace feature, and AltSearch extension, but neither are able to complete the task. The main problem is I am not able to write a search query that says:
Find: consecutive characters with the same font AND font_style, ignore spaces and punctuation
Replace: term found above + "delimiter"
Any suggestions how I can write a script for this, or if an existing tool can solve the problem?
Thanks!
Pseudo code for desired effect:
var delimiter = "|"
Go to beginning of document
While not end of document do:
var $currLine = get line from doc
var $currChar = get next character which is not space or punctuation;
var $font = currChar.font
var $font_style - currChar.font_style (e.g. bold, italic, normal)
While not end of line do:
$currChar = next character which is not space or punctuation;
if (currChar.font != $font || currChar.font_style != $font_style) { // font or style has changed
print $delimiter
$font = currChar.font
$font_style - currChar.font_style (e.g. bold, italic, normal)
}
end While
end While
Here are tips for each of the things your pseudocode does.
First, the easiest way to move line by line is with the TextViewCursor, although it is slow. Notice the XLineCursor section. For the while loop, oVC.goDown() will return false when the end of the document is reached. (oVC is our variable for the TextViewCursor).
Get each character by calling oVC.goRight(0, False) to deselect followed by oVC.goRight(1, True) to select. Then the selected value is obtained by oVC.getString(). To ignore space and punctuation, perhaps use python's isalnum() or the re module.
To determine the font of the character, call oVC.getPropertyValue(attr). Values for attr could simply be CharAutoStyleName and CharStyleName to check for any changes in formatting.
Or grab a list of specific properties such as 'CharFontFamily', 'CharFontFamilyAsian', 'CharFontFamilyComplex', 'CharFontPitch', 'CharFontPitchAsian' etc. Character properties are described at https://wiki.openoffice.org/wiki/Documentation/DevGuide/Text/Formatting.
To insert the delimiter into the text: oVC.getText().insertString(oVC, "|", 0).
This python code from github shows how to do most of these things, although you'll need to read through it to find the relevant parts.
Alternatively, instead of using the LibreOffice API, unzip the .odt file and parse content.xml with a script.

Rails 5 - regex - for string not found [duplicate]

I have following regex handy to match all the lines containing console.log() or alert() function in any javascript file opened in the editor supporting PCRE.
^.*\b(console\.log|alert)\b.*$
But I encounter many files containing window.alert() lines for alerting important messages, I don't want to remove/replace them.
So the question how to regex-match (single line regex without need to run frequently) all the lines containing console.log() and alert() but not containing word window. Also how to escape round brackets(parenthesis) which are unescapable by \, to make them part of string literal ?
I tried following regex but in vain:
^.*\b(console\.log|alert)((?!window).)*\b.*$
You should use a negative lookhead, like this:
^(?!.*window\.).*\b(console\.log|alert)\b.*$
The negative lookhead will assert that it is impossible to match if the string window. is present.
Regex Demo
As for the parenthesis, you can escape them with backslashes, but because you have a word boundary character, it will not match if you put the escaped parenthesis, because they are not word characters.
The metacharacter \b is an anchor like the caret and the dollar sign.
It matches at a position that is called a "word boundary". This match
is zero-length.
There are three different positions that qualify as word boundaries:
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last
character is a word character.
Between two characters in the string,
where one is a word character and the other is not a word character.

Pattern match dropping new lines characters

How to extract the values from a csv like string dropping the new lines characters (\r\n or \n) with a pattern.
A line looks like:
1.1;2.2;Example, 3
Notice there are only 3 values and the separator is ;. The problem I'm having is to come up with a pattern that reads the values while dropping the new line characters (the file comes from a windows machine so it has \r\n, reading it from a linux and would like to be independent from the new line character used).
My simple example right now is:
s = "1.1;2.2;Example, 3\r\n";
p = "(.-);(.-);(.-)";
a, b, c = string.match(s, p);
print(c:byte(1, -1));
The two last characters printed by the code above are the \r\n.
The problem is that both, \r and \n are detected by the %c and %s classes (control characters and space characters), as show by this code:
s = "a\r";
print(s:match("%c"));
print(s:match("%s"));
print(s:match("%d"));
So, is it possible to left out from the match the new lines characters? (It should not be assumed that the last two characters will be new lines characters)
The 3ยบ value may contain spaces, punctuation and alphanumeric characters and since \r\n are detected as space characters a pattern like `"(.-);(.-);([%w%s%c]-).*" does not work.
Your pattern
p = "(.-);(.-);(.-)";
does not work: the third field is always empty because .- matches a little as possible. You need to anchor it at the end of the string, but then the third field will contain trailing newline chars:
p = "(.-);(.-);(.-)$";
So, just stop at the first trailing newline char. This also anchors the last match. Try this pattern instead:
p = "(.-);(.-);(.-)[\r\n]";
If trailing newline chars are optional, try this pattern:
p = "(.-);(.-);(.-)[\r\n]*$";
Without any lua experience I found a naive solution:
clean_CR = s:gsub("\r","");
clean_NL = clean_CR:gsub("\n","");
With POSIX regex syntax I'd use
^([^;]*);([^;]*);([^\n\r]*).*$
.. with "\n" and "\r" possibly included as "^M", "^#" (control/unicode characters) .. depending on your editor.

String is .blank? but neither empty nor whitespace

I'm trying to use the squish method to reduce multiple white spaces in a string to single white spaces. However, I have a string with mutliple spaces which are not reduced. When I check for string[space_position].blank? it returns true, but its neither empty, nor does is it == ' '.
What could cause this behavior?
Not sure if this is relevant, but the string comes from a mongoDB and was saved there by Locomotive CMS.
the three spaces: [32,160,32]
ASCII 160 is a non breaking space usually found in HTML, and apparently not recognized as squish as a space. Try to replace it before:
string.gsub(160.chr, ' ').squish
string.squish!
This might modify the string itself. Also, any empty string like " ".blank? will return true.

How to write a matches excluding a series of space character (' ') from the input?

I am having a problem with my grails project right now, I wanted to write a matches, that would best fit the allowable characters for my input fields. I have written a matches, that throws an error message if the input characters contain a single space character., but no longer works if the input contains a series of spaces. This is my code:
newPassword nullable: false, minSize: 8, matches: /[0-9a-zA-Z_\[\]\\\^\$\.\|\?\*\+\(\)~!##%&-=]*/, blank: false, notEqualToAnyProperty:['username', 'emailAddress'],validator: { value, obj ->
(obj.currentPassword != value && value != '')
}
These are the sample inputs:
1) 'rain drops' - my matches works, it returns an error message that the input contains an invalid character.
2) ' ' - series of spaces; my program returns an error message that should be displayed for the blank constraint instead of displaying the error message for my matches constraint which is the, "input contains an invalid character", since the input doesn't match the allowable input characters.
Any help from you guys? Thanks!
You shouldn't need to add any begin (^) or end ($) tags to your regular expression, as the matches constraint attempts to match the entire String input against the Pattern, thus your first test correctly fails against the constraint.
For your second test where the input is only a series of spaces ' ', your matches constraint will never run. Both blank and nullable are constraints which can block the running of other constraints if they fail. The matches constraint will not run in your case because the blank constraint returns a failure on an all-whitespace input.
Try matches: /[0-9a-zA-Z_[]\\^\$.\|\?*+()~!##%&-=]+$/
I just added a dollar at the end and changed the star to a plus. Dollar means end of line. Maybe the match is returning true because the first part of the string does indeed match.
The reason I changed the star to a plus is because * matches zero or more. That's the case in your empty string. The + requires one or more.
You can require a min sequence of 8 such chars in the regexp but that might make you lose your minsize validation error message.

Resources