I am trying to read a file that has two DWORDs for the FILETIME (this is a prefetch file).
I read at offset 0x81 (0x80 + 1 because of 1-index in lua). How do I go about taking the 8 bytes and converting into a filetime using only lua?
Starting at 0x80 in my hex editor, I have:
FB54B341B70CCf01
Needs to correlate to 01/08/2014
What is FILETIME
The Windows platform defines FILETIME to be a 64-bit integer "count of 100ns intervals since January 1, 1601 UTC".
You will have at least two challenges with dealing with FILETIME in Lua.
First, it a FILETIME is a 64-bit integer and Lua stores numbers internally as IEEE double precision, which only supports 56 bits of precision. To the precision of the envelope I just scribbled on, you need more than 57 significant bits to name any time today as a FILETIME.
(Aside: I estimated that by noticing that there are about 1e7*pi seconds in a year, 1e7 100ns ticks in a second, and today is about 413 years after the FILETIME epoch. So dates in 2014 need about log2(413e14 * pi) bits, or a little more than 57 bits.)
Second, pure Lua doesn't have easy to use functions for converting binary data structures to and from native Lua data types. It isn't difficult to build such functions out of string.byte() and string.sub() and that is even safe to do since strings are 8-bit clean. But it is something you have to build yourself, or find from a third-party source.
But be aware that although there are binary structure libraries out there, many of them only provide limited support for 64-bit integers due to the limitations of Lua numbers. You may be better suited by a hand-crafted module in C that stores a FILETIME in a userdata and provides suitable operators to allow them to be compared, converted to and from a string, and so forth.
Your Example
Starting at 0x80 in my hex editor, I have:
FB54B341B70CCf01
Needs to correlate to 01/08/2014
Windows on a PC is a little-endian platform. That means that values are stored with the least-significant byte at the lowest address. So we can rewrite your sample timestamp to be more readable by reversing the bytes:
01CF0CB741B354FB
As expected, the 57'th bit is the most significant set bit, so this value is plausible for this century.
Related
I'm having trouble converting to 16 bit int values from raw data I'm receiving over Ethernet.
For example, I might receive this:
\x00\x0A\x00\x00\x00\x09\x01\x10\x00\x01\x00\x10\x02\x00\x00
I need to take two of these raw data bytes and convert them to a 16 bit unsigned value.
So far I've tried with tonumber() but I can't find a way to make it combine the 2 bytes, I've seen some examples of using string.gsub() on here to do the conversions but these all deal with an an ASCII representation of the raw data.
TIA
Use string.byte() on a single character to turn it into its numerical value, then just multiply the more significant one by 256 (or if you're on Lua 5.3 or newer, shift it left by 8 bits), then add them.
If you're on Lua 5.3 or newer, try also string.unpack. You can select the byte order with < and >:
s="\x00\x0A\x00\x00\x00\x09\x01\x10\x00\x01\x00\x10\x02\x00\x00\x00"
print("<",">")
for i=1,#s,2 do
print((string.unpack("<i2",s,i)),(string.unpack(">i2",s,i)))
end
Looking at this question on Quora HERE ("Are data stored in registers and memory in hex or binary?"), I think the top answer is saying that data persistence is achieved through physical properties of hardware and is not directly relatable to either binary or hex.
I've always thought of computers as 'binary', but have just realized that that only applies to the usage of components (magnetic up/down or an on/off transistor) and not necessarily the organisation of, for example, memory contents.
i.e. you could, theoretically, create an abstraction in memory that used 'binary components' but that wasn't binary, like this:
100000110001010001100
100001001001010010010
111101111101010100001
100101000001010010010
100100111001010101100
And then recognize that as the (badly-drawn) image of 'hello', rather than the ASCII encoding of 'hello'.
An answer on SO (What's the difference between a word and byte?) mentions that processors can handle 'words', i.e. several bytes at a time, so while information representation has to be binary I don't see why information processing has to be.
Can computers do arithmetic on hex directly? In this case, would the internal representation of information in memory/registers be in binary or hex?
Perhaps "digital computer" would be a good starting term and then from there "binary digit" ("bit"). Electronically, the terms for the values are sometimes "high" and "low". You are right, everything after that depends on the operation. Most of the time, groups of bits are operated on together. Commonly groups are 1, 8, 16, 32 and 64 bits. The meaning of the bits depends on the program but some operations go hand-in-hand with some level of meaning.
When the meaning of a group of bits is not known or important, humans like to be able to decern the value of each bit. Binary could be used but more than 8 bits is hard to read. Although it is rare to operate on groups of 4 bits, hexadecimal is much more readable and is generally used regardless of the number of bits. Sometimes octal is used but that's based on contexts where there is some meaning to a subgrouping of the 3 bits or an avoidance of digits beyond 9.
Integers can be stored in two's complement format and often CPUs have instructions for such integers. Once such operation is negation. For a group of 8 bits, it would map 1 to -1,… 127 to -127, and -1 to 1, … -127 to 127, and 0 to 0 and -128 to -128. Decimal is likely the most valuable to humans here, not base 256, base 2 or base 16. In unsigned hexadecimal, that would be 01 to FF, …, 00 to 00, 80 to 80.
For an intro to how a CPU might do integer addition on a group of bits, see adder circuits.
Other number formats include IEEE-754 floating point and binary-coded decimal.
I think you understand that digital circuits are binary. So, based on the above, yes, operations do operate on a higher conceptual level despite the actual storage.
When we say a specific architecture is either little-endian or big-endian, we are referring to the whether numerical significance is stored from left-to-right or right-to-left in memory. My question is: does this ordering refer to how bits or ordered in a byte, or how bytes are ordered in a memory?
For example, consider the number 6000=1770h=0001011101110000b. If both bits in a byte and byte in memory are little-endian, this would be stored as
00001110 11101000 = 0E E8,
if bits in a byte were big-endian, but bytes in memory were little-endian, this would be stored as (for what it's worth, this happens to be how Visual Studio seems to be telling me that memory is organized in x64 architecture)
01110000 00010111 = 70 17,
if bits were little-endian, but bytes were big-endian, this would be stored as
11101000 00001110 = 0E E8,
and finally, if bits were big-endian, but bytes were little-endian, this would be stored as
00010111 01110000 = 17 70
(Hopefully I did that right.)
So then, what do the terms "little-endian" and "big-endian" actually refer to? Do the terms refer to the ordering of bits in a byte, or the ordering of bytes in memory, or both? Furthermore, if VS tells me that, for example, 7C, is 'in' a given particular byte, do they mean that the bits that make up that byte in computer memory are literally 0111 1100, or do they just mean that the value stored in that byte is 7Ch=124, but or may not be actually represented as 7c=01111100 depending on whether or not the underlying architecture happens to be little-endian?
The ordering of bits in a byte is invisible. Since you can't address individual bits, there would be no difference between the two cases. However, you can address individual bytes, so there it does make a difference.
If we're expressing 6000 in byte-addressible memory, the high byte is 23 decimal (6000 divided by 256) and the low byte is 112 decimal (6000 mod 256). We could store this as 23,112 or 112,23. There are no other options. Only the ordering of bytes is an open choice, and this is what endianness refers to.
In memory, little-endian or big-endian is not so much left-to-right or right-to-left issue as one of addressing. In little endian, the least significant portion of data is stored in the lower addressed locations and the reverse with big endian.
The ordering occurs independently at 2 levels. As most machines address more than 1 bit at a time (recall some graphic CPUs that did have bit addresses), the address will locate a group of bits, typically 8 bits. So if the bits at address 10 are less significant than the bits at address 11, it is a little endian machine. This is generalized as byte endian-ness. The processor's characteristics define this.
The endian-ness of the group of bits, the bit endian-ness, is significant if there is a way to address them in some fashion. Some processors provide operations that use a bit level address within the byte (or word). If your programming language directly allows you to use that or hides that is another question.
In C there are bit fields such as
union u {
unsigned char uc;
struct s {
int a :1;
int b :7;
};
};
This code is non-portable because of bit endian-ness. u.uc = 7 may result in u.s.b also being 7 or something else. Typically the byte endian-ness and bit endian-ness are the same. But the compiler controls the above example.
Bit endian-ness is also significant in serial communication. As 1 bit is sent/received sequentially, its construction to/from memory needs endian-ness definition.
Conclude:
Big-endian and little endian most often refer to the "byte" level addressing. The endian-ness of the bit is typically either the same or of select importance to the programmer.
BTW, your example of "If both bits in a byte and byte in memory are little-endian, this would be stored as
00001110 11101000 = 0E E8
I would suggest is not correct as the left side and right side are using different endian-ness. Had you used the same endian-ness, you may conclude
00001110 11101000 = 07 71
For fun consider:
01000000 (big endian) has value "sixty-four" (A big endian word)
10110000 (little endian) has value "thirteen". (Thirteen is little endian word)
I've been wondering this for a long time since I've never had "formal" education on computer science (I'm in highschool), so please excuse my ignorance on the subject.
On a platform that supports the three types of integers listed in the title, which one's better and why? (I know that every kind of int has a different length in memory, but I'm not sure what that means or how it affects performance or, from a developer's view point, which one has more advantages over the other).
Thank you in advance for your help.
"Better" is a subjective term, but some integers are more performant on certain platforms.
For example, in a 32-bit computer (referenced by terms like 32-bit platform and Win32) the CPU is optimized to handle a 32-bit value at a time, and the 32 refers to the number of bits that the CPU can consume or produce in a single cycle. (This is a really simplistic explanation, but it gets the general idea across).
In a 64-bit computer (most recent AMD and Intel processors fall into this category), the CPU is optimized to handle 64-bit values at a time.
So, on a 32-bit platform, a 16-bit integer loaded into a 32-bit address would need to have 16 bits zeroed out so that the CPU could operate on it; a 32-bit integer would be immediately usable without any alteration, and a 64-bit integer would need to be operated on in two or more CPU cycles (once for the low 32-bits, and then again for the high 32-bits).
Conversely, on a 64-bit platform, 16-bit integers would need to have 48 bits zeroed, 32-bit integers would need to have 32 bits zeroed, and 64-bit integers could be operated on immediately.
Each platform and CPU has a 'native' bit-ness (like 32 or 64), and this usually limits some of the other resources that can be accessed by that CPU (for example, the 3GB/4GB memory limitation of 32-bit processors). The 80386 processor family (and later x86) processors made 32-bit the norm, but now companies like AMD and then Intel are currently making 64-bit the norm.
To answer your first question, the usage of a 16 bit vs a 32 bit vs a 64 bit integer depends on the context that it is used. Therefore, you really can't say one is better over the other, per say. However, depending on a situation, using one over another is preferable. Consider this example. Let's say you have a database with 10 million users and you want to store the year they were born. If you create a field in your database with a 64 bit integer then you have exhausted 80 megabytes of your storage; whereas, if you were to use a 16 bit field, only 20 megabytes of your storage will get used. You can use a 16 bit field here because the year people are born is smaller than the largest 16 bit number. In other words 1980, 1990, 1991 < 65535, assuming your field is unsigned. All in all, it depends on the context. I hope this helps.
A simple answer is to use the smallest one you KNOW will be safe for the range of possible values it will contain.
If you know the possible values are constrained to be smaller than a maximum-length 16-bit integer (e.g. the value corresponding to what day of the year it is - always <= 366) then use that. If you aren't sure (e.g. the record ID of a table in a database that can have any number of rows) then use Int32 or Int64 depending on your judgment.
Other can probably give you a better sense of of the performance advantages depending on what programming language you are using, but the smaller types use less memory and hence are 'better' to use if you don't need larger.
Just for reference, a 16-bit integer means there are 2^16 possible values - generally represented as between 0 and 65,535. 32-bit values range from 0 to 2^32 - 1, or just over 4.29 billion values.
This question On 32-bit CPUs, is an 'integer' type more efficient than a 'short' type? may add some more good information.
It depends on whether speed or storage should be optimized. If you are interested in speed and you are running SQL Server in 64 bit mode then 64 bit keys are what you need. A 64 bit processor running in 64 bit mode, is optimized to use 64 bit numbers and addresses. Likewise, a 64 bit processor running in 32 bit mode is optimized to use 32 bit numbers and addresses. For example, in 64 bit mode, all pushes and pops onto the stack are 8 bytes etc. Also fetch from cache and memory are again optimized for 64 bit numbers and addresses. The processor, running in 64 bit mode, may need more machine cycles to handle a 32 bit number just like a processor, running in 32 bit mode needs more machine cycles to handle a 16 bit number. The increases in processing time come for many reasons, but just think about the example of memory alignment: The 32 bit number may not be aligned on a 64 bit integral boundary which means loading the number requires shifting and masking the number after loading it into a register. At the very least, every 32 bit number must be masked before each operation. We are talking at least halving the processor's effective speed while handling 32 or 16 bit integers in 64 bit mode.
To provide a simple explanation to novice programmers. A bit is either a 0 or a 1.
a 16 bit Int is an integer represented by a string of 16 bits (16 0's and 1's)
a 32 bit Int is an integer represented by a string of 32 bits (32 0's and 1's)
a 64 bit Int is an integer represented by a string of 64 bits (64 0's and 1's)
Examples to drive those concepts home:
an example of a 16-bit integer would be 0000000000000110 which equals the int 6
an example of a 32-bit integer would be 00000000000000000100001000100110 which equals the int 16934.
an example of a 64-bit integer would be 0000100010000000010000100010011000000000000000000100001000100110 which equals the int 612562280298594854.
You can represent a larger number of integers with 64 bits than you can 32 bits than you can 16 bits. So the benefit of using fewer bits is you save space on the machine. The benefit of using more bits is you can represent more integers.
Why is the smallest value that can be stored a Byte(8bit) & not a Bit(1bit) in memory?
Even booleans are stored as Bytes. Will we ever bump the smallest number to 32 or 64bits like register's on the CPU?
EDIT: To clarify as many answers seemed confused about the nature of questing. This question is about why isn't a byte 7-bit, 1-bit, 32-bit, etc (not why lower bit primitives must fit within the hardware's byte at min). Is the 8-bit byte simply historical as some hardware has 10-bit bytes for example. Or is there a mathematical reason 8-bit is ideal vs say 10-bit for general processing?
The hardware is built to read data in blocks (bytes, later words and dwords). This provides greater efficiency, than accessing individual bits, and also offers more addressing range. So most data is aligned to at least byte boundary. There exist encodings that operate with bit sequences, rather than bytes, but they are quite rare.
Nowadays the data is most often aligned to dword (32-bits) boundary anyway. Moreover, some hardware (ARM, for example), can't access misaligned multibyte variables, i.e. 16-bit word can't "cross" dword boundary - exception will be thrown.
Because computers address memory at the byte level, so anything smaller than a byte is not addressable.
The underlying methods of processor access are limited to the size of the smallest usable register. On most architectures, that size is 8 bits. You can use smaller portions of these; for instance, C has the bitfield feature in structs that will allow combining fields that only need to be certain bit lengths. Access will still require that the whole byte be read.
Some older exotic architectures actually did have different a "word size." In these machines, 10 bits might be the common size.
Lastly, processors are almost always backwards compatible. Intel, for instance, has maintained complete instruction compatibility from the 386 on up. If you take a program compiled for the 386, it will still run on an i7 processor. Changing the word size would break compatibility. So while it is possible, no manufacturer will ever do it.
Assume that we have native language that consist of 2 character such as a , b
to distinguish two characters we need at least 1 bit for example 0 to represent char a and 1 to represent char b
so that if we count number of characters and special characters and symbols, there are 128 character and to distinguish one character from another, you need log2(128) = 7 bit and 8th bit for transmission