So I have been wondering, in some machine learning models I saw there was this:
chunk_size = 16
Num_chunks = 16
And I know that chunk size is the largest unit of physical disc space, but why are there multiple instead of one 16*16? I know maximum size is 4TB but this is even at the small scale projects.
Related
How can size of an image effect training the model for this task?
My current training set holds images that are 2880 X 1800, but I am worried this may be too large to train. In total my sample size will be about 200-500 images.
Would this just mean that I need more resources (GPU,RAM, Distribution) when training my model?
If this is too large, how should I go about resizing? -- I want to mimic real-world photo resolutions as best as possible for better accuracy.
Edit:
I would also be using TFRecord format for the image files
Your memory and processing requirements will be proportional to the pixel size of your image. Whether this is too large for you to process efficiently will depend on your hardware constraints and the time you have available.
With regards to resizing the images there is no one answer, you have to consider how to best preserve information that'll be required for your algorithm to learn from your data while removing information that won't be useful. Reducing the size of your input images won't necessarily be a negative for accuracy. Consider two cases:
Handwritten digits
Here the images could be reduced considerably in size and maintain all the structural information necessary to be correctly identified. Have a look at the MNIST data set, these images are distributed at 28 x 28 resolution and identifiable to 99.7%+ accuracy.
Identifying Tree Species
Imagine a set of images of trees where individual leaves could help identify species. Here you might find that reducing the image size reduces small scale detail on leaf shape in a way that's detrimental to the model, but you might find that you get a similar result with a tight crop (which preserves individual leaves) rather than an image resize. If this is the case you may find that creating multiple crops from the same image gives you an augmented data set for training that considerably improves results (which is something to consider, if possible, given your training set is very small)
Deep learning models are achieving results around human level in many image classification tasks: if you struggle to identify your own images then it's less likely you'll train an algorithm to. This is often a useful starting point when considering the level of scaling that might be appropriate.
If you are using GPUs to train, this will def affect your training time. Tensorflow does most of the GPU allocation so you don't have to worry about that. But with big photos you will be experiencing long training time although your dataset is small. You should consider data-augmentation.
You could complement your resizing with the data-augmentation. Resize in equal dimensions and then perform reflection and translation (as in geometric movement)
If your images are too big, your GPU might run out of memory before it can start training because it has to store the convolution outputs on its memory. If that happens, you can do some of the following things to reduce memory consumption:
resize the image
reduce batch size
reduce model complexity
To resize your image, there are many scripts just one Google search away, but I will add that in your case 1440 by 900 is probably a sweet spot.
Higher resolution images will result in a higher training time and an increased memory consumption (mainly GPU memory).
Depending on your concrete task, you might want to reduce the image size in order to therefore fit a reasonable batch size of let's say 32 or 64 on the GPU - for stable learning.
Your accuracy is probably affected more by the size of your training set. So instead of going for image size, you might want to go for 500-1000 sample images. Recent publications like SSD - Single Shot MultiBox Detector achieve high accuracy values like an mAP of 72% on the PascalVOC dataset - with "only" using 300x300 image resolution.
Resizing and augmentation: SSD for instance just scales every input image down to 300x300, independent of the aspect ratio - does not seem to hurt. You could also augment your data by mirroring, translating, ... etc (but I assume there are built-in methods in Tensorflow for that).
As the title states, I am trying to add the same image with different offsets, stored in a list, to the accumulating image.
The current implementation performs this on a CPU, and with some intrinsics it can be quite fast.
However, with larger images (2048x2048) and many offsets in the list (~10000), the performance is not satisfactory.
My question is, can the accumulation of the image with different offsets be efficiently implemented on a GPU?
Yes, you can. The results will be likely much faster than on CPU. The trick is to not send the data for each addition, and to not even launch a new kernel for each addition: the kernel you have should do some decent number of offset additions at once, at least 16 but possibly a few hundred, depending on your typical list size (and you can have more than one kernel of course).
It is my understanding that PVR textures, made with texturetool, are simply compressed images. Therefore the difference lies in the file size.
Frankly, the file size doesn't interest me. What I want to know is, can a PVR texture consume less RAM than a normal .PNG texture? Or does this depend entirely on the texture format (like RGBA8888 etc)?
The essential question would be:
Given X.png and X.pvr, if I display both with texture format RGBA8888, will one consume less RAM than the other?
Yes, the PVR will consume less RAM at all stages — it's unpacked live by the GPU as it's accessed. There's no intermediate decompression.
A PVR-like approach used in digital video is that instead of storing RGB at every pixel, convert to YUV, then store Y at every pixel and U and V only twice per four-pixel block. So you go from 128 bits for the block to 64 bits. To get back to RGB the outputter reads the exactly correct Y and interpolates or accesses the most nearby U and V as necessary.
Schemes like PVR do a similar thing of not storing the full value at every pixel but inferring parts of it from nearby context. What counts as nearby is picked directly corresponding to however the caching is arranged on that GPU. It's usually more in-depth that just scaling down the sampling resolution of some of the channels, e.g. specifying a base offset for samples and then using a tiny precision for each is also common.
So the GPU can always get a value for pixel X by reading only values in a very small, local region of the data.
This contrasts with traditional schemes like PNG where having to know every pixel in the stream prior to X is acceptable if it improves the compression. Processing such things live would flood the GPU's memory bandwidth and hence be completely impractical, so such textures are decompressed from disk and then uploaded.
So schemes like PVR tend to lead to poorer compression and lower per-pixel quality but the win is that they can sit in VRAM compressed. A game will often increase the resolution of its textures if using PVR to try to find a comfortable balance.
Uncompressed textures are:
16 bit per pixel (RGB565, RGBA4444),
24 bit per pixel (RGB888)
PVRTC textures are either 4bpp or even 2bpp. So yes they do use less memory.
Also they perform better because need less memory bandwidth to fetch textures.
I have noticed in some tile based game engines, tiles are saved as grayscale or sometimes even black or white, and the colour is then added through storing a 'palette' along with it to apply to certain pixels however i've never seen how it knows which pixels.
Just to name a few engines i've seen use this, Notch's Minicraft and the old Pokemon games for Gameboy. This is what informed me of how a colour palette is used in old games: deconstructulator
From the little i've seen of people use this technique in tutorials it uses a form of bit-shifting however i'd like to know how that was so efficient that it was next to mandatory in old 8-bit consoles - how it is possible to apply red, green and blue to specific pixels of an image every frame instead of saving the whole coloured image (some pseudo-code would be nice).
The efficient thing about it is that it saves memory. Storing the RGB values usually requires 24 bits (8 bits per channel). While having a palette of 256 colors (requiring 256*24 bits = 768 bytes) each pixel requires just 8 bits (2^8 = 256 colors). So three times as many pixels can be stored in the same amount of memory (if you don't count the needed palette), but with a limitied set of colors obviously. This used to be supported in hardware so the graphics memory could be more efficiently used too (well this is actually still supported in modern PC hardware but almost never used since the graphics memory isn't that limited anymore).
Some hardware (including the Super Gameboy supported by the first Pokemon games) used more than one hardware palette at a time. Different sets of tiles is mapped to different palettes. But the way the tiles are mapped to different palettes is very hardware dependent and often not very straight forward.
The way the bits of the image is stored isn't always so straight forward either. If the pixels are 8 bits it can be an array of bytes where every byte simply is one pixel (as for the classic VGA mode used in many old DOS games). But the Gameboy for example uses 2 bits per pixel and others used 4, that is 2^4 = 16 colors per palette. A common way to arrange the bits in memory is by using bitplanes. That is (in the case of 16 color graphics) storing 4 separate b/w images. But the bitplanes can in some cases also be interleaved in different ways. So there is no simple and generic answer to how to decode/encode graphics this way. So I guess you have to be more specific what you want pseudo code for.
And I'm not sure how any of this applies to Minicraft. Maybe just how the graphics is stored on disk. But it has no significance once the graphics is loaded into graphics memory. (Maybe you have some other feature of Minicraft in mind?)
Suppose a small computer system has 4 MB of main memory. The system manages it in fixed sized frames. A frames table maintains the status of each frame in memory. How large (how many byte) should a frame be? You have a choice of one of the following: 1K, 5K, or 10K bytes. Which of these choices will minimize the total space wasted by processes due to fragmentation and frames table storage?
Assume the following: On the average, 10 processes will reside in memory. The average amount of wasted space will be 1/2 frame for each process.
The frames table must have one entry for each frame. Each entry requires 10 bytes.
Here is my answer:
1K would minimize the fragmentation, as known small size leads to big tables but smaller wasted space.
10 processes ~ 1/2 frame wasted on each.
Am I on the right track?
Yes, you are. I agree with you that on a system such as this, the smallest size makes the most sense. However, for example, if you take the situation of x86-64, where the options are 4kb, 2MB, 1GB. Considering modern memory sizes of approximation 4GB, obviously 1GB makes no sense, but because most programs nowadays contain quite a bit of compiled code, or in the case of interpreted and VM languages, all of the code of the VM, 2 MB pages make the most sense. In other words, to determine these things, you have to think about the average memory usage of a program in this system, the number of programs, and most importantly, the ration of average fragmentation to page table size. Because while a small memory size like that benefits from the low fragmentation, 4kb pages on 4GB of memory is a very large page table. Very large.