I have been wondering what this is actually called for a long time because a while ago (like 3 years ago) I thought it was called bytecode but since then I have realized what bytecode actually is. I'll give an example because I don't really know what to call it.
It looks like this:
\234\22\21\65\22\76\54\87. It's basically the byte of all the characters preceded by a backslash.
Does anyone know what this is called?
Thanks.
From the Lua reference manual:
We can specify any byte in a short literal string by its numeric value
(including embedded zeros). This can be done with the escape sequence
\xXX, where XX is a sequence of exactly two hexadecimal digits, or
with the escape sequence \ddd, where ddd is a sequence of up to three
decimal digits. (Note that if a decimal escape sequence is to be
followed by a digit, it must be expressed using exactly three digits.)
Also refer to https://en.wikipedia.org/wiki/String_literal#Escape_sequences
Related
From my Lua knowledge (and according to what I have read in Lua manuals), I've always been under impression that an identifier in Lua is only limited to A-Z & a-z & _ & digits (and can not start using a digit nor be a reserved keyword i.e. local local = 123).
And now I have run into some (obfuscated) Lua program which uses all kind of weird characters for an identifier:
https://i.imgur.com/HPLKMxp.png
-- Most likely, copy+paste won't work. Download the file from https://tknk.io/7HHZ
print(_VERSION .. " " .. (jit and "JIT" or "non-JIT"))
local T = {}
T.math = T.math or {}
T.math.​â®â€‹âŞâ®â€‹ď»żâ€Śâ€âŽ = math.sin
T.math.â¬â€‹ââ¬ââ«â®â€â€¬ = math.cos
for k, v in pairs(T.math) do print(k, v) end
Output:
Lua 5.1 JIT
â¬â€‹ââ¬ââ«â®â€â€¬ function: builtin#45
​â®â€‹âŞâ®â€‹ď»żâ€Śâ€âŽ function: builtin#44
It is unclear to me, why is this set of characters allowed for an identifier?
In other words, why is it a completely valid Lua program?
Unlike some languages, Lua is not really defined by a formal specification, one which covers every contingency and entirely explains all of Lua's behavior. Something as simple as "what character set is a Lua file encoded in" isn't really explain in Lua's documentation.
All the docs say about identifiers is:
Names (also called identifiers) in Lua can be any string of letters, digits, and underscores, not beginning with a digit and not being a reserved word.
But nothing ever really says what a "letter" is. There isn't even a definition for what character set Lua uses. As such, it's essentially implementation-dependent. A "letter" is... whatever the implementation wants it to be.
So, let's say you're writing a Lua implementation. And you want users to be able to provide Unicode-encoded strings (that is, strings within the Lua text). Lua 5.3 requires this. But you also don't want them to have to use UTF-16 encoding for their files (also because lua_load gets sequences of bytes, not shorts). So your Lua implementation assumes the byte sequence it gets in lua_load is encoded in UTF-8, so that users can write strings that use Unicode characters.
When it comes to writing the lexer/parser part of this implementation, how do you handle this? The simplest, easiest way to handle UTF-8 is to... not handle UTF-8. Indeed, that's the whole point of that encoding. Since everything that Lua defines with specific symbols are encoded in ASCII, and ASCII text is also UTF-8 text with the same meaning, you can basically treat a UTF-8 string like an ASCII string. For in-Lua strings, you just copy the sequence of bytes between the start and end characters of the string.
So how do you go about lexing identifiers? Well, you could ask the question above. Or you could ask a much simpler question: is the character a space, control character, digit, or symbol? A "letter" is merely something that isn't one of those.
Lua defines what things it considers to be "symbols". ASCII can tell you what is a control character, space, and a digit. In such an implementation, any UTF-8 code unit with a value outside of ASCII is a letter. Even if technically, those code units decode into something Unicode thinks of as a "symbol", your lexer just threats it as a letter.
This simple form of UTF-8 lexing gives you fast performance and low memory overhead. You don't have to decode UTF-8 into Unicode codepoints, and you don't need a giant Unicode table to tell you whether a codepoint is a "symbol" or "space" or whatever. And of course, it's also something that would naturally fall out of many ASCII-based Lua implementations.
So most Lua implementations will do it this way, if only by accident. Doing something more would require deliberate effort.
It also allows a user to use Unicode character sequences as identifiers. That means that someone can easily write code in their native language (outside of keywords).
But it also means that obfuscators have lots of ways to create "identifiers" that are just strings of nonsensical bytes. Indeed, because there are multiple ways in Unicode to "spell" the same apparent Unicode string (unless you examine the bytes directly), obfuscators can rig up identifiers that appear when rendered in a text editor to all be the same text, while actually being different strings.
To clarify there is only one identifier T
T.math is sugar syntax for T["math"] this also extends to the obfuscate strings. It is perfectly valid to have a key contain any characters or even start with a number.
Now being able to use the . rather then [ ] does not work with a string that don't conform to the identifier's limitations. See Nicol Bolas' answer for a great break down of those limitations.
As you may be able to see in the image, I have a User model and #user.zip is stored as an integer for validation purposes (ie, so only digits are stored, etc.). I was troubleshooting an error when I discovered that my sample zip code (00100) was automatically being converted to binary, and ending up as the number 64.
Any ideas on how to keep this from happening? I am new to Rails, and it took me a few hours to figure out the cause of this error, as you might imagine :)
I can't imagine any other information would be helpful here, but please inform me if otherwise.
This is not binary, this is octal.
In Ruby, any number starting with 0 will be treated as an octal number. You should check the Ruby number literals to learn more about this, here's a quote:
You can use a special prefix to write numbers in decimal, hexadecimal, octal or binary formats. For decimal numbers use a prefix of 0d, for hexadecimal numbers use a prefix of 0x, for octal numbers use a prefix of 0 or 0o, for binary numbers use a prefix of 0b. The alphabetic component of the number is not case-sensitive.
For your case, you should not store zipcodes as numbers. Not only in the database, but even as variables don't treat them as numeric values. Instead, store and treat them as strings.
The zip should probably be stored as a string since you can't have a valid integer with leading zeroes.
As the title of the question states I'm looking to take the following string in hexadecimal base:
b9c84ee012f4faa7a1e2115d5ca15893db816a2c4df45bb8ceda76aa90c1e096456663f2cc5e6748662470648dd663ebc80e151d4d940c98a0aa5401aca64663c13264b8123bcee4db98f53e8c5d0391a7078ae72e7520da1926aa31d18b2c68c8e88a65a5c221219ace37ae25feb54b7bd4a096b53b66edba053f4e42e64b63
And convert it to its decimal equivalent string:
130460875511427281888098224554274438589599458108116621315331564625526207150503189508290869993616666570545720782519885681004493227707439823316135825978491918446215631462717116534949960283082518139523879868865346440610923729433468564872249430429294675444577680464924109881111890440473667357213574597524163283811
I've looked to use this code, found at this link:
unsigned result = 0;
NSScanner *scanner = [NSScanner scannerWithString:hexString];
[scanner setScanLocation:1]; // bypass '#' character
[scanner scanHexInt:&result];
NSLog(#" %u",result);
However, I keep getting the following result: 4294967295. Any ideas on how I can solve this problem?
This sounds like a homework/quiz question, and SO isn't to get code written, so here are some hints in hope they help.
Your number is BIG, far larger than any standard integer size, so you are not going to be able to do this with long long or even NSDecimal.
Now you could go and source an "infinite" precision arithmetic package, but really what you need to do isn't that hard (but if you are going to be doing more than this then such using a package would make sense).
Now think back to your school days, how were you taught to do base conversion? The standard method is long division and reminders.
Example: start with BAD in hex and convert to decimal:
BAD ÷ A = 12A remainder 9
12A ÷ A = 1D remainder 8
1D ÷ A = 2 remainder 9
2 ÷ A = 0 remainder 2
now read the remainder back, last first, to give 2989 decimal.
Long division is a digit at a time process, starting with the most significant digit, and carrying the remainder as you move to the next digit. Sounds like a loop.
Your initial number is a string, the most significant digit is first. Sounds like a loop.
Processing characters one at a time from an NSString is, well, painful. So first convert your NSString to a standard C string. If you copy this into a C-array you can then overwrite it each time you "divide". You'll probably find the standard C functions strlen() and strcpy() helpful.
Of course you have characters in your string, not integer values. Include ctype.h in your code and use the digittoint() function to convert each character in your number to its numeric equivalent.
The standard library doesn't have the inverse of digittoint(), so to convert an integer back to its character equivalent you need to write your own code, think indexing into a suitable constant string...
Write a C function, something like int divide(char *hexstring) which does one long division of hexstring, writing the result into hexstring and returning the remainder. (If you wish to write more general code, useful for testing, write something like int divide(char *buf, int base, int divisor) - so you can convert hex to decimal and then back again to check you get the back to where you started.)
Now you can loop calling your divide and accumulating the remainders (as characters) into another string.
How big should your result string be? Well a number written in decimal typically has more digits than when written in hex (e.g. 2989 v. BAD above). If you're being general then hex uses the fewest digits and binary uses the most. A single hex digit equates to 4 binary digits, so a working buffer 4 times the input size will always be long enough. Don't forget to allow for the terminating NUL in C strings in your buffer.
And as hinted above, for testing make your code general, convert your hex string to a decimal one, then convert that back to a hex one and check the result is the same as the input.
If this sounds complicated don't despair, it only takes around 30 lines of well spaced code.
If you get stuck coding it ask a new question showing your code, explain what goes wrong, and somebody will undoubtedly help you out.
HTH
Your result is the maximum of unsinged int 32 bit, the type you are using. As far as I can see, in the NSScanner documentation long long is the biggest supported type.
I have a string that, by using string.format("%02X", char), I've received the following:
74657874000000EDD37001000300
In the end, I'd like that string to look like the following:
t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).
I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.
I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.
ETA:
Well I thought that I had fixed it with the following code:
function hexToAscii(input)
local convString = ""
for char in input:gmatch("(..)") do
convString = convString..(string.char("0x"..char))
end
return convString
end
It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.
I did the following to check since I couldn't truly "see" them in the file.
function asciiSub(input)
input = input:gsub(string.char(0x00), "<NUL>") -- suggested by a coworker
print(input)
end
I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.
Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.
The simple way to transform a base16-encoded string is just to
function unhex( input )
return (input:gsub( "..", function(c)
return string.char( tonumber( c, 16 ) )
end))
end
This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)
The resulting string should be exactly what went into the hex-dumper, no matter the encoding.
If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.
Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.
You actually have a couple of problems:
First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.
The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.
Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.
i am trying to show \u1F318 in my application. but iphone app just use first 4 digit and and create the image. Can any one guide me what i am doing wrong to show image of unicode \u1F318 in iPhone.
[(OneLabelTableViewCell *)cell textView].text = #"\u1F318";
out in application is
Note: this answer is based on my experience of Java and C#. If it turns out not to be useful, I'll delete it. I figured it was worth the OP's time to try the options presented here...
The \u escape sequence always expects four hex digits - as such, it can only represent characters in the Basic Multilingual Plane.
If this is Objective-C, I believe that supports \U followed by eight hex digits, e.g. \U0001F318. If so, that's the simplest approach:
[(OneLabelTableViewCell *)cell textView].text = #"\U0001F318";
If that doesn't work, it's possible that you need to specify the character as a surrogate pair of UTF-16 code points. In this case, U+1F318 is represented by U+D83C U+DF18, so you'd write:
[(OneLabelTableViewCell *)cell textView].text = #"\uD83c\uDF18";
Of course, this is assuming that it's UTF-16-based...
Even if that's the correct way of representing the character you want, it's entirely feasible that the font you're using doesn't support it. In that case, I'd expect you to see a single character (a question mark, a box, or something similar to represent an error).
(Side-note: I don't know what # is used for in Objective-C. In C# that would stop the \u from being an escape sequence in the first place, but presumably Objective-C is slightly different, given the code in your question and the output.)