What is the international charcode value of this string? - character-encoding

I have a string <...> (less-than;3x full-stop;greater-than), which I need to convert into its charcode value in order to check if the user pressed these three keys in the specified order (look at the string) on his keyboard (imagine something like konami-code). How to get the charcode value of this string?
I tried some combinations, but it worked just with the english keyboard layout. It hadn't worked with another keyboard layouts (like czech, italian, german). Is there any good solution which doesn't depend on the keyboard layout and keyboard shortcuts, but depends just on the letters typed?

Maybe you are looking for something like
void main() {
'<...>'.codeUnits.forEach(print);
}
60
46
46
46
62
or
void main() {
String s = '<...>';
print(s.codeUnitAt(0));
print(s.codeUnitAt(1));
...
}
which produces the same output.
If this is not what you are looking for, improving your question might help.

The character codes of the five characters is easy to get: "<...>".codeUnits.toList().
It's [60, 46, 46, 46, 62] if you want it literally.
It seems your problem is that you want to detect the keyboard codes corresponding to pressing these keys on a keyboard.
I'll assume we are in a browser setting. In that case, the keyboard events will contain a charCode which should be what you need. Even on foreign (from wherever you are) keyboards, the charCode should be correct - unless the browser misunderstands the keyboard completely, and then the user will have bigger issues.
https://api.dartlang.org/apidocs/channels/stable/dartdoc-viewer/dart-dom-html.KeyboardEvent#id_charCode

Related

Alter regex markup to not separate float numbers (like 2.0)

I was looking for a solution to a regex problem in Rails I had and an answer on a separate question lead me 90% of the path to the answer. Basically, what I would like to do is to have a ruby/rails script that will format a messy text in terms of capitalizing every letter after a "./,/!/?". This code by "Mark S"
ng = Nokogiri::HTML.fragment("<p>hello, how are you? oh, that's nice! i am glad you are fine. i am too.<br />i am glad to have met you.</p>")
ng.traverse{|n| (n.content = n.content.gsub(/(.*?)([\.|\!|\?])/) { " #{$1.strip.capitalize}#{$2}" }.strip) if n.text?}
ng.to_s
The only issue I have with this code, and it is a big issue, is that the code adds a space in between float numbers like "2.0", making a text like:
there is a cat in the hat.it has a 2.0 inch tail!
isn't that awesome?!I think so.
Become
There is a cat i the hat. It has a 2. 0 inch tail!
Isn't that awesome?! I think so.
where I obviously want it to be:
There is a cat i the hat. It has a 2.0 inch tail!
Isn't that awesome?! I think so.
Any suggestions on how to alter this text, for example so that any "." will be ignored by this code?
It seems you want to capitalize any lowercase letter at the beginning of the string or after ., !, or ?.
Use
s.gsub(/(\A|[.?!])(\p{Ll})/) { Regexp.last_match(1).length > 0 ? "#{$1} #{$2.capitalize}" : "#{$2.capitalize}" }
See the Ruby demo
Pattern details:
(\A|[.?!]) - Group 1 capturing the start of string location (empty string) or a ., ?, or !
(\p{Ll}) - Group 2 capturing any Unicode lowercase letter
Inside the replacement, we check if Group 1 value is not empty, and if it is, we just return the capitalized letter. Else, return the punctuation, a space, and the capitalized letter.
NOTE: However, there is a problem with abbreviations (as usual in these cases), like i.e., e.g., etc. Then there are words like iPhone, iCloud, eSklep, and so on.

Checking whether a string contains a phone number

Trying to work out how to parse out phone numbers that are left in a string.
e.g.
"Hi Han, this is Chewie, Could you give me a call on 02031234567"
"Hi Han, this is Chewie, Could you give me a call on +442031234567"
"Hi Han, this is Chewie, Could you give me a call on +44 (0) 203 123 4567"
"Hi Han, this is Chewie, Could you give me a call on 0207-123-4567"
"Hi Han, this is Chewie, Could you give me a call on 02031234567 OR +44207-1234567"
And be able to consistently replace any one of them with some other item (e.g. some text, or a link).
Am assuming it's a regex type approach (I'm already doing something similar with email which works well).
I've got to
text.scan(/([^A-Z|^"]{6,})/i)
Which leaves me a leading space I can't work out how to drop (would appreciate the help there).
Is there a standard way of doing this that people use?
It also drops things into arrays, which isn't particularly helpful
i.e. if there were multiple numbers.
[["02031234567"]["+44207-1234567"]]
as opposed to
["02031234567","+44207-1234567"]
Adding in the third use-case with spaces is difficult. I think the only way to successfully meet that acceptance criteria would be to chain a #gsub call on to your #scan.
Thus:
text.gsub(/\s+/, "").scan(/([^A-Z|^"|^\s]{6,})/i)
The following code will extract all the numbers for you:
text.scan(/(?<=[ ])[\d \-+()]+$|(?<=[ ])[\d \-+()]+(?=[ ]\w)/)
For the examples you supplied this results in:
["02031234567"]
["+442031234567"]
["+44 (0) 203 123 4567"]
["0207-123-4567"]
["02031234567", "+44207-1234567"]
To understand this regex, what we are matching is:
[\d \-+()]+ which is a sequence of one or more digits, spaces, minus, plus, opening or closing brackets (in any order - NB regex is greedy by default, so it will match as many of these characters next to each other as possible)
that must be preceded by a space (?<=[ ]) - NB the space in the positive look-behind is not captured, and therefore this makes sure that there are no leading spaces in the results
and is either at the end of the string $, or | is followed by a space then a word character (?=[ ]\w) (NB this lookahead is not captured)
This pattern will get rid of the space but not match your third case with spaces:
/([^A-Z|^"|^\s]{6,})/i
This is what I came to in the end in case it helps somebody
numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }
That gives me an array of
["+442031234567", "02031234567"]
I'm sure there is a more elegant way of doing this and possibly you'd want to check the numbers for likelihood of being phonelike - e.g. using the brilliant Phony gem.
numbers = text.scan(/([^A-Z|^"]{6,})/i).collect{|x| x[0].strip }
real_numbers = numbers.keep_if{|n| Phony.plausible? PhonyRails.normalize_number(n, default_country_code: "GB")}
Which should help exclude serial numbers or the like from being identified as numbers. You'll obviously want to change the country code to something relevant for you.

Converting iOS ABPeoplePicker numbers into valid, canonical number

I'm using the iOS ABPeoplePickerNavigationController to allow a user to select a phone number, but the number I get back is formatted like this:
+44 (0) 20 3162 0001
I can strip out the spaces and the parenthesis, but the number that remains isn't really a valid phone number.
Does iOS offer any way to force ABPeoplePicker to return a valid, canonical phone number i.e.
+442031620001
or will I be fored to apply a regex or something to it?
you will have to apply a regex. but it should just be strip all but optionally + at the beginning
STILL there is no guarantee that'll get you a valid phone number!
e.g. In Addressbook I could write +44 353 1232 (-0 / -1)
to name to alernates

ascii character not showing in browser

I have an MVC Razor view
#{
ViewBag.Title = "Index";
var c = (char)146;
var c2 = (short)'’';
}
<h2>#c --- #c2 --’-- ‘Why Oh Why’ & ’</h2>
#String.Format("hi {0} there", (char)146)
characters stored in my database in varchar fields are not rendering to the browser.
This example demonstrates how character 146 doesn't show up
How do I make them render?
[EDIT]
When I do this the character 146 get converted to UNICODE 8217 but if 146 is attempted to be rendered directly on the browser it fails
public ActionResult Index()
{
using (var context = new DataContext())
{
var uuuuuggghhh = (from r in context.Projects
where r.bizId == "D11C6FD5-D084-43F0-A1EB-76FEED24A28F"
select r).FirstOrDefault();
if (uuuuuggghhh != null)
{
var ca = uuuuuggghhh.projectSummaryTxt.ToCharArray();
ViewData.Model = ca[72]; // this is the character in question
return View();
}
}
return View();
}
#Html.Raw(((char)146).ToString())
or
#Html.Raw(String.Format("hi {0} there", (char)146))
both appear to work. I was testing this in Chrome and kept getting blank data, after viewing with FF I can confirm the representation was printing (however 146 doesn't appear to be a readable character).
This is confirmed with a readable character '¶' below:
#Html.Raw(((char)182).ToString())
Not sure why you would want this though. But best of luck!
You do not want to use character 146. Character 146 is U+0092 PRIVATE USE TWO, an obscure and useless control character that typically renders as invisible, or a missing-glyph box/question mark.
If you want the character ’: that is U+2019 SINGLE RIGHT QUOTATION MARK, which may be written directly or using ’ or ’.
146 is the byte number of the encoding of U+2019 into the Windows Western code page (cp1252), but it is not the Unicode character number. The bottom 256 Unicode characters are ordered the same as the bytes in the ISO-8859-1 encoding; ISO-8859-1 is similar to cp1252 but not the same.
Bytes 128–159 in cp1252 encode various typographical niceties like smart quotes, whereas bytes 128–159 in ISO-8859-1 (and hence characters 128–159 in Unicode) are seldom-used control characters. For web applications, you usually want to filter out the control characters (0–31 and 128–159 amongst a few others) as they come in, so they never get as far as the database.
If you are getting character 146 out of your database where you expect to have a smart quote, then you have corrupt data and you need to fix it up before continuing, or possibly you are reading the database using the wrong encoding (quite how this works depends what database you're talking to).
Now here's the trap. If you write:
’
as a character reference, the browser actually displays the smart quote U+2019 ’, and, confusingly, not the useless control character that actually owns that code point!
This is an old browser quirk: character references in the range € to Ÿ are converted to the character that maps to that number in cp1252, instead of the real character with that number.
This was arguably a bug, but the earliest browsers did it back before they grokked Unicode properly, and everyone else was forced to follow suit to avoid breaking pages. HTML5 now documents and sanctions this. (Though not in the XHTML serialisation; browsers in XHTML parsing mode won't do this because it's against the basic rules of XML.)
We finally agreed that the data was corrupt we have asked users who can't see this character rendered to fix the source data

How to recognize mobile number in a given text?

I want to extract valid(on the basis of format) mobile numbers from a text.
e.g. I/O some text (987) 456 7890, (987)-456-7890 again some text
O/P 9874567890 9874567890
problem is, there are many valid mobile formats in all over world like.
text = "Denmark 11 11 11 11, 1111 1111 "
// + "Germany 03333 123456, +49 (3333) 123456 "
// + "Netherlands + 31 44 12345678 Russia +7(555)123-123 "
// + "spain 12-123-12-12 switzerland +41 11 222 22 22 "
// + "Uk (01222) 333333 India +91-12345-12345 "
// + "Austrailia (04) 1231 1231 USA (011) 154-123-4567 "
// + "China 1234 5678 France 01-23-45-67-89 "
// + "Poland (12) 345 67 89 Singapore 123 4567 "
// + "Thailand (01) 234-5678, (012) 34-5678 "
// + "United Kingdom 0123 456 7890, 01234 567890 "
// + "United States (987) 456 7890, (987)-456-7890+ etc."
How to cover all mobile formats?
min and max length of the mobile numbers(with or without country code)?
how to recognize that mobile number has country code or not?
You might want to check if this fits your needs: A comprehensive regex for phone number validation
By experience I know how this works in my phone OS. It looks at a long enough sequences of digits, separated by a set of allowed chars.
In principle something like:
[\+]?([0-9]|[\(\).- ]){min,max}
This regex is suboptimal since it also looks for long sequences of separator chars. You will probably need to filter those results out as well.
A very simple method with some false positives, but false positives are IMPO better than misses.
You shouldn't use the list of samples you got as a guide to actual mobile phone numbers.
For example the number sequence shown for the Netherlands is incorrect, in that it doesn't cover just mobile numbers but ALL regular phone numbers (it doesn't cover such things as 0800 and 0900 numbers for which different rules apply) and is missing an element even for that.
I can only assume the list is similarly incorrect for other countries (and of course it's far from complete in that it doesn't cover all countries, but maybe you posted only a fragment).
To parse a phone number you'd have to first remove all white space and other formatting characters from what could be a phone number, then check whether it has the correct length to be one, then try to deduce whether it includes a country code or not.
If it includes a country code but doesn't start with either 00 or + (both are used to indicate an international number) it might not be a phone number after all.
Does it include an area code? If so, is the area code one associated with mobile phones (for example in the Netherlands all mobile phone numbers have area code 06, BUT in the past this wasn't always the case so if you have an old document a 06 area code may not be a mobile number anyway.
After you've deduced that (and AFAIK mobile numbers always include an area code) you have to check if the remaining numbers make up something that could be an actual phone number without area code based on the length of the number (hint: area code + numer together have to be 10 long here, and I think everywhere).
And all that while taking into consideration that the rules may well be different for different countries or even different networks within some countries.
And of course if you find a number that looks like a valid phone number it still may not be.
It could be some other number that just looks like a phone number but isn't.
Simple search of all matching string formats in this case is not right way. The optimal way is using Regular Expressions to find all matches of phone numbers, but Blackberry java don't have built-in capabilities to process Regular Expressions.
But you can use 3-rd party library for J2ME implementing RegEx processing, smth. like this.
// Regex - Check Singapore valid mobile numbers
public static boolean isSingaporeMobileNo(String str) {
Pattern mobNO = Pattern.compile("^(((0|((\\+)?65([- ])?))|((\\((\\+)?65\\)([- ])?)))?[8-9]\\d{7})?$");
Matcher matcher = mobNO.matcher(str);
if (matcher.find()) {
return true;
} else {
return false;
}
}

Resources