Why 255 is the limit - character-encoding

I've seen lots of places say:
The maximum number of characters is 255.
where characters are ASCII. Is there a technical reason for that?
EDIT: I know ASCII is represented by 8 bits and so there're 256 different characters. The question is why do they specify the maximum NUMBER of characters (with duplicates) is 255.

I assume the limit you're referring to is on the length of a string of ASCII characters.
The limit occurs due to an optimization technique where smaller strings are stored with the first byte holding the length of the string. Since a byte can only hold 256 different values, the maximum string length would be 255 since the first byte was reserved for storing the length.
Some older database systems and programming languages therefore had this restriction on their native string types.

Extended ASCII is an 8-bit character set. (Original ASCII is 7-bit, but that's not relevant here.)
8 bit means that 2^8 different characters can be referenced.
2^8 equals 256, and as counting starts with 0, the maximum ASCII char code has the value 255.
Thus, the statement:
The maximum number of characters is 255.
is wrong, it should read:
The maximum number of characters is 256, the highest possible character code is 255.
To understand better how characters are mapped to the numbers from 0 to 255, see the 8-bit ASCII table.

the limit is 255 because 9+36+84+126 = 255. the 256th character (which is really the first character) is zero.
using the combinatoric formula Ck(n) = n/k = n!/(k!(n-k)!) to find the number of non-repeating combinations for 1,2,3,4,5,6,7,8 digits you get this:
of digits: 1 2 3 4 5 6 7 8
of combinations: 9 36 84 126 126 84 36 9
it is unnecessary to include 5-8 digits since it's a symmetric group of M. in other words, a 4 element generator is a group operation for an octet and its group action has 255 permutations.
interestingly, it only requires 3 digits to "count" to 1000 (after 789 the rest of the numbers are repetitions of previous combinations).

The total number of Character in ASCII table is 256 (0 to 255). 0 to 31(total 32 character ) is called as ASCII control characters (character code 0-31). 32 to 127 character is called as ASCII printable characters (character code 32-127). 128 to 255 is called as The extended ASCII codes (character code 128-255).
The ASCII value of a-z = 97-122
The ASCII value of A-Z = 65-90
The ASCII value of 0-9 = 48-57

Is there a technical reason for that?
Yes there is. Early ASCII encoding standard is 7 bit log, which can represent 2^7 = 128 (0 .. 127) different character codes.
What you are talking about here is a variant of ASCII encoding developed later, which is 8 bit log and can hold 2^8 = 256 (0 .. 255) character codes.
See Wikipedia for more information on the same.

Related

How can I convert a 4-byte string into an unicode emoji?

A webservice i use in my Delphi 10.3 returns a string to me consisting of these four bytes: F0 9F 99 82 . I expect a slightly smiling emoji. This site shows this byte sequence as the UTF-8 representation of that emoji. So I guess i have a UTF-8 representation in my string, but its an actual unicode string? How do i convert my string into the actual unicode representation, to show it, for example, in a TMemo?
The character 🙂 has the Unicode code point U+1F642. Displaying text is defined thru an encoding: how a set of bytes has to be interpreted:
in UTF-8 one character can consist of 8, 16, 24 or 32 bits (1 to 4 Bytes); this one is $F0 $9F $99 $82.
in UTF-16 one character can consist of 16 or 32 bits (2 or 4 bytes = 1 or 2 Words); this one is $D83D $DE42 (using surrogates).
in UTF-32 one character always consists of 32 bits (4 bytes = 1 Cardinal or DWord) and always equals to the code point, that is $1F642.
In Delphi, you can use:
TEncoding.UTF8.GetString() for UTF-8
(or TEncoding.Unicode.GetString() if you'd have UTF-16LE
and TEncoding.BigEndianUnicode.GetString() if you'd have UTF-16BE).
Keep in mind that 🙂 is just a character like each letter, symbol and whitespace of this text: it can be marked thru selection (i.e. Ctrl+A) and copied to the clipboard (i.e. Ctrl+C). No special care is needed.

Why is There an Arbitrary Limit of 2..36 for The Base in an Erlang Number Literal?

When using the base#value notation for integer literals in Erlang, the base can range from 2 to 36 only. Is there a rationale behind the upper limit being 36?
26 latin characters: a to z, + 10 digits 0 to 9,
I guess there is no other reason :o)

Concatenate two Base64 encoded strings

I want to decode two Base64 encoded strings and combine them to make one 128 bit string. I am able to decode the Base64 encoded strings. Can some one guide me on how to combine these two decoded strings?
This is the code I used for decoding the two encoded strings.
NSData *decodedData_contentKey = [[NSData alloc] initWithBase64EncodedString:str_content options:0];
NSString *decodedString_contentKey = [[NSString alloc] initWithData:decodedData_contentKey encoding:NSUTF8StringEncoding];
NSLog(#"%#", decodedString_contentKey);
Thanks.
Base 64 is a statically sized encoding of octets/bytes into characters/text: 6 bits of a byte are represented as a printable ASCII character. Hence the name: 2^6 = 64, it uses a alphabet of 64 characters to encode the binary data (+ plus a delimiter character: '=' that does not contain encoded bits).
UTF-8 - used in your sample code - on the other hand is a character-encoding. It is used to encode characters in octets. So character encoding works the other way around. What you are actually doing is to decode the characters back from the bytes. UTF-8 does not use 128 bit values, nor does is it statically sized; multiple bytes may be used to represent one character. It will likely fail when it comes across an octet or octets that do not combine into a valid character encoding.
There is no such thing as base 128 encoding. Please think of what you are trying to accomplish and ask a new question that we can decode, if you get stuck.
GUESSED ANSWER:
Base 64 encoding will output 64 bits (8 bytes) of ASCII text for each 6 bytes. Therefore, if you want 128 bit (16 bytes) of encoding output, you simply have to input 12 bytes. As the base 64 encoding restarts at each 4 character boundary however (4 * 8 = 32 bits of encoding, each 8 bit character represents 6 bits, 4 * 6 = 24 bits of data, 24 bits is 3 bytes -> each 4 character string holds precisely 3 bytes of input), you can simply concatenate the two base 64 strings without decoding.

Finding the correct formula for encoded hex value in decimal

I have a case here where I am trying to figure out how a hex number is converted into a decimal number.
I had a similar case before, but found out that if I reversed the hex string, and swapped each second value (little-endian), then converting it back to a decimal value I got what I wanted, but this one is different.
here is the values we received
Value nr. 1 is
Dec: 1348916578
Hex: 0a66ab46
I just have this one decimal/hex for now but I am trying to get more values to compare results.
I hope any math genius out there will be able to see what formula might been used here :)
thanks
1348916578
= 5 0 6 6 D 5 6 2 hex
= 0101 0000 0110 0110 1101 0101 0110 0010
0a66ab46
= 0 A 6 6 A B 4 6 hex
= 0000 1010 0110 0110 1010 1011 0100 0110
So, if a number is like this, in hex digits:
AB CD EF GH
Then a possible conversion is:
rev(B) rev(A) rev(D) rev(C) rev(F) rev(E) rev(H) rev(G)
where rev reverses the order of bits in the nibble; though I can see that the reversal could be done on a byte-wise basis also.
Interesting.... I expanded the decimal and hex into binary, and this is what you get, respectively:
1010000011001101101010101100010
1010011001101010101101000110
Slide the bottom one over by padding with some 0s, then split into 8-byte blocks.
10100000 1100110 11010101 01100010
10100 1100110 10101011 01000110
It seems to start to line up. Let's make the bottom look like the top.
Pad the first block with 0s and it's equal.
The second block is ok.
Switch the 3rd block around (reverse it) and 10101011 becomes 11010101.
10100000 1100110 11010101 01000110
Likewise with the 4th.
10100000 1100110 11010101 01100010
Now they're the same.
10100000 1100110 11010101 01100010
10100000 1100110 11010101 01100010
Will this work for all cases? Impossible to know.
The decimal value of x0a66ab46 is 174500678 or 1185637898 (depending which endian you use, with any 8, 16 or 32bit access). There seems to be no direct connection between these values. Maybe you just have the pair wrong? It could help if you posted some code about how you generate these value pairs.
BTW, Delphi has a fine little method for this: SysUtils.IntToHex
What we found was that our min USB reader that gave 10 bit decimal format is actually not showing the whole binary code. The hexadecimal reader finds the full binary code. so essentially it is possible to convert from hexadecimal value to 10 bit decimal by taking off 9 characters after binary conversion.
But this does not work the other way around (unless we strip away 2 characters from the hexadecimal value the 10 bit decimal code will only show part of the full binary code).
So case closed.

What is an Illegal octal digit?

I'm trying to make an array of zip codes.
array = [07001, 07920]
This returns :
array = [07001, 07920]
^
from (irb):12
from :0
Never seen this before. Any workarounds?
Ruby is interpreting numbers that have a leading 0 as being in octal (base 8). Thus the digits 8 and 9 are not valid.
It probably makes more sense to store ZIP codes as strings, instead of as numbers (to avoid having to pad with zeroes whenever you display it), as such: array = ["07001", "07920"]
Numbers that start with 0 are assumed to be in octal format, just like numbers that start with 0x are assumed to be in hexadecimal format. Octal digits only go from 0 to 7, so 9 is simply not legal in an octal number.
The easiest workaround would be to simply write the numbers in decimal format: 07001 in octal is the same as 3585 in decimal (I think). Or did you mean to write the numbers in decimal? Then, the easiest workaround is to leave off the leading zeroes: 07001 is the same as 7001 anyway.
However, you mention that you want an array of ZIP codes. In that case, the correct solution would be to use, well, an array of ZIP codes instead of an array of integers, since ZIP codes aren't integers, they are ZIP codes.
Your array is of numbers, so the leading zero causes it to be interpreted as octal (valid digits 0-7). If these are zip codes, and the leading zero is significant, they should probably be strings.

Resources