I've got an application that bridges our help desk system with TFS (one way from Help Desk to TFS). When I create the work item in TFS, in some situations I'm getting an "InvalidCharacters" validation error.
The field I'm using is the standard "Description" field, which is defined as "Plain Text" in the Work Item definition.
This is only happening on one record, so I'm sure it's the data, but I can't figure out what character is being considered to be invalid. Is there any guidance on what will trigger the InvalidCharacters validation on "Plain Text" fields?
It looks like this field is unable to display the extended ASCII characters. There was an a with an accent grave (à) in the string I was trying to save.
-- EDIT --
This actually became even more frustrating. The character representation when I did a ToCharArray() was "à", however, when I finally found the spot in the string where it was bombing, the actual character was a single-character ellipses (...). Which was probably caused by someone copying and pasting from Word into our help-desk system for comments.
My ultimate resolution was a brute force spin through the char array, replacing any character that had an int value of greater than 127 with something else (in my case, a question mark).
A ‘string’ field is invalid if it contains control characters other than newline, carriage return, and tab or if it contains mismatched surrogate characters. Longtext fields (like plaintext) accept everything except mismatched surrogate pairs. Make sure your copy/paste is resulting in Unicode being pasted in.
You can use a Regex function to compress all white space down to a " " character, such as this:
Regex.Replace( text, #"\s+", " " );
Although that actually strips more than you technically need to, since it takes out newline, carriage return and tab too.
Hope this helps!
Related
i am trying to check if a string is empty by doing the following
if trim(somestring) = '' then
begin
//that an empty string
end
i am doing this empty check in my client application , but i notice that some clients inserts empty strings even with this check applied .
on my Linux server side those empty chars showing as squares and when i copy those chars i can be able to bypass the empty string check like the follwing example
if you copy this empty chars the check will not be effected how can i avoid that ?
Your code is working correctly and the strings are not empty. An empty string is one whose length is zero, it has no characters. You refer to empty characters, but there is no such thing. When. your system displays a small empty square that indicates that the chosen font has no glyph for that character.
Let us assume that these strings are invalid. In that case the real problem you have is that you don't yet fully understand what properties are required for a string to be valid. Your code is written assuming that a string is valid if it contains at least one character with ordinal value greater than 32. Clearly that test is not correct. You need to step back and work out the precise rules for validity. Only when these are clear in your mind can you correct you program and check for validity correctly.
On the other hand perhaps these strings are valid and the mistake is simply that you are erroneously determining otherwise when you inspect the data. Only you can know this, we don't have the information.
A useful technique in all of this is to inspect the ordinal values of the strings. Loop through the characters printing the ordinal value of each one. That allows you to see what is really there and not be at the mercy of non-printing characters, characters with no glyph, invalid encodings, etc.
Since Trim is pretty simple function, it omits only characters with less than or equal dec 32 in ASCII table
( Sample from System.SysUtils.pas )
while S.Chars[L] <= ' ' do Dec(L);
Therefore it's possible that You just can't see some exotic chars ( > ASCII 128) due to bad encoding used with Your input string.
try to use :
StrToInt(Ord(SomeChar))
On every char that is not "trimmed" and remove them by hand or check Your encoding.
Kind Regards
I have a string that, by using string.format("%02X", char), I've received the following:
74657874000000EDD37001000300
In the end, I'd like that string to look like the following:
t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).
I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.
I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.
ETA:
Well I thought that I had fixed it with the following code:
function hexToAscii(input)
local convString = ""
for char in input:gmatch("(..)") do
convString = convString..(string.char("0x"..char))
end
return convString
end
It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.
I did the following to check since I couldn't truly "see" them in the file.
function asciiSub(input)
input = input:gsub(string.char(0x00), "<NUL>") -- suggested by a coworker
print(input)
end
I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.
Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.
The simple way to transform a base16-encoded string is just to
function unhex( input )
return (input:gsub( "..", function(c)
return string.char( tonumber( c, 16 ) )
end))
end
This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)
The resulting string should be exactly what went into the hex-dumper, no matter the encoding.
If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.
Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.
You actually have a couple of problems:
First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.
The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.
Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.
I am facing one issue in one of my Rails project.
My users database contain names with special character and i want them to be shown in search result while searching it with simple characters.
Example: Lets suppose i have a user whose name is "Noël Nocciolo" (please notice soft sign on e) and i want that to be searched if i pass "Noel Nocciolo" as a parameter.
Can anyone tell me how to handle with these cases because no one knows how to provide input of "e with two dots".
And i am using postgres as my databse.
Regards,
Karan
You can create separate field "indexed_name" for search and fill it only with ASCII characters.
Then you have to preprocess query string with .gsub('ë', 'e') (or any other non ASCII characters to its ASCII analog) and search with this processed query
and i believe there is more elegant way to convert any string to ascii analog i just gave you direction )
.parameterize or ActiveSupport::Inflector.transliterate will probably be acceptable for your use case.
"àáâãäå".parameterize
=> "aaaaaa"
However, it won't handle ligatures such as ffi, so for that you'll need:
"àáâãÀffi".mb_chars.normalize(:kd).gsub(/[^\x00-\x7F]/,'').to_s
=> "aaaaAffi"
I'm trying to use CFStringTokenizer with kCFStringTokenizerUnitSentence to split a string into sentences. The first problem I'm having is that sentences need to be capitalized in order for them to be recognized as sentences. If not, it just thinks it's part of the previous sentence.
I'm splitting user-entered text so I'm expecting the text to be very unclean.
Is there something else I can do with CFStringTokenizer to have it detect uncapitalized sentences? Or will I have to use another method of splitting altogether?
I followed the answer on this SO question for my implementation:
How to get an array of sentences using CFStringTokenizer?
NOTE: After testing a bit more it seems that with kCFStringTokenizerUnitSentence, if a '!' or a '?' is followed by an uncapitalized sentence, it will recognize the sentence. Also, if one of those punctuation marks is followed by a sentence without a space between the '!' and the first word, it will still separate.
So the one case I need to work around is a '.' followed by an uncapitalized sentence.
ANOTHER OPTION I found, if you're getting the text from a textField, is to use this:
textField.autocapitalizationType = UITextAutocapitalizationTypeSentences;
It will automatically capitalize sentences so you don't have to worry about converting for CFStringTokenizer. It still doesn't account for edge cases like abbreviations, but at least in my case the user will have an option to delete the auto-capitalization if it's wrong.
You can convert the input string to all uppercase first and then run it through CFStringTokenizer and use the ranges to get the substrings of the original input string. But you must be careful here because some characters might become more than 1 character after conversion to uppercase.
I'm using FitNesse to test web service responses using check to compare the expected to the actual response.
In some cases the check is failing and I can't see what the differences are between the expected and the actual that is causing it to fail.
Here's a screenshot from what it's telling me in a specific instance (of many similar instances):
Feel free to point out the obvious; it's probably staring at me in the face and I'm looking so hard I can't see it!
I would check that the expected and actual strings are both written with the same text encoding. I've seen this error plenty of times when the text comparing failed due to a comma or apostrophe being written in different encodes.
It is possible that your string contains extra spaces in the actual value. FitNesse, being html based, will not respect leading or trailing spaces. It might not handle any extra spaces inside the actual either. So this can cause the result to be different, but not visibly so.
See if you can add some debug messages that would help you see the extra spaces, or at least count the number of characters in both strings.
This question doesn't specify whether Slim or Fit are being used, or which Slim server/plugin if using Slim, but I found the following to be true for me using FitNesse release 20130530 and fitSharp release 2.2:
Non-ASCII characters and { apostrophes / single quote characters } in input arguments/parameters that are strings are HTML encoded. The values in my FitNesse test tables are HTML encoded, but only the required syntax characters and (double) quotes; not the non-ASCII characters (and FitNesse doesn't seem to have any problems storing those values).
EOL characters in the input arguments that are strings consist of a linefeed character only
I imagine that because I'm using .NET, EOLs in my return values consist of carriage return and linefeed characters.
Because of [1], I'm HTML-encoding non-ASCII characters (but not the HTML syntax characters or quotes). Because of [2] and [3], I'm now removing carriage return characters from my fixture return values. Both changes seem to have resolved this issue for me and expected and actual values are now reported as being the same.
Whitespace has troubled me often. The resulting HTML just collapses whitespace, but the compare in code does not.
I now use a fixture to make differences more explicit to me. Example usage: http://fhoeben.github.io/hsac-fitnesse-fixtures/examples-results/HsacExamples.SlimTests.UtilityFixtures.CompareFixtureTest.html
Newer versions of FitNesse (since 20151230) do a diff on the expected and actual result values. Has that helped you at all?