Three.js unsigned byte attribute buffer - memory

Is it possible to use a Three.js BufferAttribute with unsigned bytes instead of floats?
I am rendering a point cloud using the Potree library, which is based on Three.js, and I am looking for ways to save gpu memory. The library's binary format uses 3 * 4 bytes for each point's position, 4 * 1 byte for RGBA and 2 * 1 byte for the normal (oct16 encoded), for a total of 18 bytes per point. The (perceived) problem is that the data is unpacked on the client side and everything is represented as a 32bit float, resulting in 36 bytes per point. Why not use the data without converting it to floats first? Is this a limitation of Three.js or an issue of Potree? And if the problem is on Three's side, are there good reasons behind it? (e.g. browser compatibility)
Coming from OpenGL and bare-bones WebGL programming, it seems really wasteful to use almost 2x more memory than needed...

Related

JVM 64-bit different memory usages?

I've done some reading but I'm not entirely sure about one thing, for example how much memory would this use in JVM 64 bit(sorry if stupid question, but I'm a bit confused and don't know much about this):
MyObject[] myArray; - I know an array takes up 24 bytes, but how much will each element in this array take? is every element an object reference, meaning 8 byte per element? If not, how do I know how many bytes each element in this array needs?
Normally, that is when using heap sizes of less than 32 GB, the 64-bit JVM uses compressed oops which store object pointers as a 32-bit integer (scaled by three bits when used, since all objects are aligned to 8 bytes; see the link for details), so each element would actually only use 4 bytes.
If you use more than 32 GB of heap or otherwise turn off compressed oops, however, then each element will indeed use 8 bytes.
Also, I suspect that your statement on the array header being 24 bytes is wrong. To begin with, when compressing oops, the class reference in the header is also compressed, and the identity-hash-code and array length fields are 32-bit to begin with, so I suspect it is more likely to use 12 bytes. Even when using full-length oops, it should still only take 16 bytes. I can't find any hard source verifying either, however. In general, however, it should be said that Hotspot does not even use a fixed-size object header but one that varies in size depending on various circumstances of the object. This article describes some of those circumstances.
That is on the Hotspot JVM, at least. Since the JLS doesn't specify any primitive sizes, it could, theoretically, be anything on any given JVM, though 8 bytes are, of course, the most likely implementation choice.
Here is good information on how to calculate the memory usage of a Java array
For Example
let's consider a 10x10 int array. Firstly, the "outer" array has its 12-byte object header followed by space for the 10 elements. Those elements are object references to the 10 arrays making up the rows. That comes to 12+4*10=52 bytes, which must then be rounded up to the next multiple of 8, giving 56. Then, each of the 10 rows has its own 12-byte object header, 4*10=40 bytes for the actual row of ints, and again, 4 bytes of padding to bring the total for that row to a multiple of 8. So in total, that gives 11*56=616 bytes. That's a bit bigger than if you'd just counted on 10*10*4=400 bytes for the hundred "raw" ints themselves.

Separating decimal value to least & most significant byte

I'm working on some 65802 code (don't ask :P) and I need to separate a 16-bit value into two 8-bit bytes to store it in memory. How would I go about this?
EDIT:
Also, how would I take two similar bytes and combine them into one 16-bit value?
EDIT:
To clarify, many of the solutions available on the internet are not possible with the programming language I'm using (a version of MS-BASIC). I can't take modulo, and I can't left or rightshift. I've figured out that I can put the two bytes together by multiplying the high byte by 256 and adding it to the low byte, but how would I reverse the process?

Reading a bit from memory

I'm looking into reading single bits from memory (RAM, harddisk). My understanding was, one can not read less than a byte.
However I read someone telling it can be done with assembly.
I wan't the bandwidth usage to be as low as possible and the to be retrieved data is not sequential, so I can not read a byte and convert it to 8 bits.
I don't think the CPU will read less than the size of a cache line from RAM (64 bytes on recent Intel chips). From disk, the minimum is typically 4 kiB.
Reading a single bit at a time is neither possible nor necessary, since the data bus is much wider than that.
You cannot read less than a byte from any PC or hard disk that I know of. Even if you could, it would be extremely inefficient.
Some machines do memory mapped port io that can read/write less than a byte to the port, but it still shows up when you get it as at least a byte.
Use the bitwise operators to pick off specific bits as in:
char someByte = 0x3D; // In binary, 111101
bool flag = someByte & 1; // Get the first bit, 1
flag = someByte & 2; // Get the second bit, 0
// And so on. The number after the & operator is a power of 2 if you want to isolate one bit.
// You can also pick off several bits like so:
int value = someByte & 3; // Assume the lower 2 bits are interesting for some reason
It used to be, say 386/486 days, where a memory was a bit wide, 1 meg by 1 bit, but you will have 8 or some multiple number of chips, one for each bit lane on the bus, and you could only read in widths of the bus. today the memories are a byte wide and you can only read in units of 32 or 64 or multiples of those. Even when you read a byte, most designs fill in the whole byte. it adds unnecessarily complication/cost, to isolate the bus all the way to the memory, a byte read looks to most of the system as a 32 or 64 bit read, as it approaches the edge of the processor (sometimes physical pins, sometimes the edge of the core inside the chip) is when the individual byte lane is separated out and the other bits are discarded. Having the cache on changes the smallest divisible read size from the memory, you will see a burst or block of reads.
it is possible to design a memory system that is 8 bits wide and read 8 bits at a time, but why would you? unless it is an 8 bit processor which you probably couldnt take advantage of a 8bit by 2 gig memory. dram is pretty slow anyway, something like 133 mhz (even your 1600mhz memory is only short burst as you read from slow parts, memory has not gotten faster in over 10 years).
Hard disks are similar but different, I think sectors are the smallest divisible unit, you have to read or write in those units. so when reading you have a memory cycle on the processor, no different that going to a memory, and depending on the controller either before you do the read or as a result, a sector is read of the disk, into a buffer, not unlike a cache line read, then your memory cycle to the buffer in the disk controller either causes a bus width read and the processor divides it up or if the bus adds complexity to isolate byte lanes then you isolate a byte, but nobody isolates bit lanes. (I say the word nobody and someone will come back with an exception...)
most of this is well documented, not hard to find. For arm platforms look for the amba and/or axi specifications, freely downloaded. the number of bridges, pcie controllers, disk controller documents are all available for PCs and other platforms. it still boils down to an address and data bus or one goesouta and one goesinta data bus and some control signals that indicate the access type. some busses have byte lane enables, which is generally for a write not a read. If I want to write only a byte to a dram in a modern 64 bit system, I DO have to tell everyone almost all the way out to the dram what I want to write. To write a byte on a memory module which must be accessed 64 bits at a time, at a minimum a 64 bit read happens into a temporary place either the cache or the memory controller, then the byte to be written modifies the specific byte within the 64 bit word, then that 64 bit quantity, eventually, is written back to the memory module itself. You can do this using a combination of the address bits and a few control signals or you can just put 8 byte lane enables and the lower address bits can be ignored. Hard disk, same deal, have to read a sector, modify one byte, then eventually write the whole sector at a time. with flash and eeprom, you can only write zeros (from the programmers perspective), you erase to ones (from the programmers perspective, is actually a zero in the logic, there is an inversion) and a write has to be a sector at a time, sectors can be 64 bytes, 128 bytes, 256 bytes typically.

Why is the smallest value that can be stored is a Byte(8bit) & not a Bit(1bit)?

Why is the smallest value that can be stored a Byte(8bit) & not a Bit(1bit) in memory?
Even booleans are stored as Bytes. Will we ever bump the smallest number to 32 or 64bits like register's on the CPU?
EDIT: To clarify as many answers seemed confused about the nature of questing. This question is about why isn't a byte 7-bit, 1-bit, 32-bit, etc (not why lower bit primitives must fit within the hardware's byte at min). Is the 8-bit byte simply historical as some hardware has 10-bit bytes for example. Or is there a mathematical reason 8-bit is ideal vs say 10-bit for general processing?
The hardware is built to read data in blocks (bytes, later words and dwords). This provides greater efficiency, than accessing individual bits, and also offers more addressing range. So most data is aligned to at least byte boundary. There exist encodings that operate with bit sequences, rather than bytes, but they are quite rare.
Nowadays the data is most often aligned to dword (32-bits) boundary anyway. Moreover, some hardware (ARM, for example), can't access misaligned multibyte variables, i.e. 16-bit word can't "cross" dword boundary - exception will be thrown.
Because computers address memory at the byte level, so anything smaller than a byte is not addressable.
The underlying methods of processor access are limited to the size of the smallest usable register. On most architectures, that size is 8 bits. You can use smaller portions of these; for instance, C has the bitfield feature in structs that will allow combining fields that only need to be certain bit lengths. Access will still require that the whole byte be read.
Some older exotic architectures actually did have different a "word size." In these machines, 10 bits might be the common size.
Lastly, processors are almost always backwards compatible. Intel, for instance, has maintained complete instruction compatibility from the 386 on up. If you take a program compiled for the 386, it will still run on an i7 processor. Changing the word size would break compatibility. So while it is possible, no manufacturer will ever do it.
Assume that we have native language that consist of 2 character such as a , b
to distinguish two characters we need at least 1 bit for example 0 to represent char a and 1 to represent char b
so that if we count number of characters and special characters and symbols, there are 128 character and to distinguish one character from another, you need log2(128) = 7 bit and 8th bit for transmission

Reed-Solomon encoder for embedded application (memory-efficient)

I am looking for a very memory-efficient (like max. 500 bytes of memory for lookup tables etc.) implementation of a Reed-Solomon encoder for use in an embedded application?
I am interested in coding blocks of 10 bytes with 5 bytes of parity. Speed is of little importance.
Do you know any freely available implementations that I can use for this purpose?
Thanks in advance.
Starting here:
http://www.eccpage.com/rs.c
You can pre-compute alpha_to, index_of, and gg
For the case in the example program that is 16+16+7 ints (do they need to be ints or will bytes work?) or 156 bytes
That example has 9 ints of data and 6 ints of ecc or 15 total, if these are 4 byte ints that is another 60 bytes, 216 total.
Or 54 bytes if this could be done with bytes only. I seem to remember it works with bytes.
The encoder routine itself has a modulo but you can probably replace that with an and depending on your lengths. If your embedded processor has a divide then that is probably not going to hurt you anyway. Otherwise the encoder routine is quite simple. I am thinking that you may approach 500 bytes with the tables, data, and code.
I dont remember how to get from the 9 data and 6 ecc of the example to the 10 and 5 you are looking for. Hopefully the code in the link above will give you a head start to what you are looking for.

Resources