Factors that affect dynamic memory allocation - memory

I want to know what are all the factors that affect malloc behavior (or other allocation functions).
Things I know:
The vm.overcommit_memory variable.
Setting limits for malloc size.
32/64 bit platform.
Thing I assume matter:
Different kernel versions?
How the OS handles swapping?
And the OS itself?
Please correct me if I am wrong and if you know something else please comment down below.

The swapfile size, found in / affects allocation.

Related

IOS Operating System - Brightness Increase Past Limit

Thanks in advance,
For those familiar with IOS, I was wondering if it's possible to program it to exceed the brightness limit, and if so is it legal/safe. Thanks again.
Jon
Without altering hardware or possibly going deeper into the root system it is not going to be possible. The limits are set as they are for safety reasons by Apple themselves or hardware limitations (if they could make it brighter they likely would). However, legally you are free to try and increase the brightness all you want, it will just void your warranty.

how can opencl local memory size works?

I use opencl for image processing. For example, I have one 1000*800 image.
I use a 2D global size as 1000*800, and the local work size is 10*8.
In that case, will the GPU give 100*100 computing units automatic?
And do these 10000 units works at the same time so it can be parallel?
If the hardware has no 10000 units, will one units do the same thing for more than one time?
I tested the local size, I found if we use a very small size (1*1) or a big size(100*80), they are both very slow, but if we use a middle value(10*8) it is faster. So last question, Why?
Thanks!
Work group sizes can be a tricky concept to grasp.
If you are just getting started and you don't need to share information between work items, ignore local work size and leave it NULL. The runtime will pick one itself.
Hardcoding a local work size of 10*8 is wasteful and won't utilize the hardware well. Some hardware, for example, prefers work group sizes that are multiples of 32.
OpenCL doesn't specify what order the work will be done it, just that it will be done. It might do one work group at a time, or it may do them in groups, or (for small global sizes) all of them together. You don't know and you can't control it.
To your question "why?": the hardware may run work groups in SIMD (single instruction multiple data) and/or in "Wavefronts" (AMD) or "Warps" (NVIDIA). Too small of a work group size won't leverage the hardware well. Too large and your registers may spill to global memory (slow). "Just right" will run fastest, but it is hard to pick this without benchmarking. So for now, leave it NULL and let the runtime pick for you. Later, when you become an OpenCL expert and understand more about how the hardware works, you can try specifying the work group size. However, be aware that the optimal size may be different for different hardware, and there are other rules (like global size must be a multiple of local size).

Using CGFloat and "memory footprint"

I was reading this and I am curious about what was meant by increasing the memory footprint. I am not an expert in any of this, by any means. I actually know very little, other than what I've come up with thinking about how systems work. If someone could help clarify my thoughts and correct me where I'm wrong, I would really appreciate it.
I know that by using the proper typedefs, I am future-proofing my code in case apple changes the structure of the typedef and using typedefs shouldn't affect the processor, since its the compiler's or preprocessor's job to basically convert them. But will it actually use any more memory than is necessary, if the typedefs are only used for functions that expect them (and their precision), such as CGRect/CGSize/etc and NSDate functions that ask for those typedefs?
Basically, is there any EXTRA memory being used, given that they are only being used in situations where functions ask for them, rather than using their current counterparts (CGFloat -> float)?
This is for iOS vs OSX, since I know that OSX has both 32bit and 64bit processors and the typedefs are expected.
Think of it this way. Memory footprint often means how much memory you are consuming at any time. If you without any reason use 64 bit values instead of perfectly useful 32 bit ones, then there is some marginal inflation. That said, I'll bet most of your usage is in automatics and object ivars.
On iOS now, CGFloat == float.
I personally ALWAY use CGFloat for anything that might interface with iOS - that is, unless I'm doing some math functions. And for exactly as you said. The other day I had to grab some code on iOS and move it to a Mac app, and it took almost not time (as I use CGFlat, NSInteger, and friends). You will get no conversion warnings (ie moving 64 bit values into 32 bit ones).
In the future, given the popularity of iOS, its quite likely that there will be processors using 64 bit floating point and integers. Its the nature of progress. If you use the CGFloat and friends, your code will compile without warnings on a universal app the does both 32 and 64 bit.
If Apple uses CGFloat, why would you be concerned about it? Use the types that match the api calls which you are calling. If CGFloat was a memory problem our phones would all be crashing.

Does replacing int with short help the performance in CUDA

assume that we have enough global memory. Does replacing int with short improve the performance in CUDA? (like short saves the usage of shared memory, registers, etc)
Advices are welcomed. Thanks.
Using short in shared memory will most likely reduce performance due to bank-conflicts, until you use short2.
Also, as far as I know, all registers on GPU are 32-bit, so it's unlikely that using short would reduce register usage.
Depends:
If your program is memory bound then Yes transferring the input as shorts could be beneficial.
If your kernel is computation bound is more likely to be No because the kernel have to do an extra operation to convert from short to int and then back to short each time.
Tesla-class hardware (SM 1.x) has surprisingly rich support for "half registers," so you might get some mileage from using short instead of int on those platforms. You can confirm by using cuobjdump to look at the microcode in the cubin. But Fermi removed that support.
With SM 2.1, NVIDIA added support for "video" instructions that implement 32-bit-wide SIMD operations on 32-bit registers - see section 8.7.9 of the PTX 2.1 spec.
http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/ptx_isa_2.1.pdf

BlackBerry memory usage

I am looking for some advice on memory usage on mobile devices, BlackBerry in particular. Using some profiling tools we have calculated a working set size in RAM of 525kb. Problem is we don't really know whether this is acceptable or too high ?
Can anyone give any insight into their own experience with memory usage on BlackBerry? What sort of number should we be aiming for?
I am also wondering what sort of things we should be looking out for in particular to reduce memory usage.
512KB is perfectly acceptable on the current generation of BlackBerrys devices. You can take a look at JBenchmark to see the exact JVM heap you can expect for each model, but none of the current devices out there go below 20MB of heap. Most are much larger than that.
On JBenchmark you can choose the device you are interested from a drop down on the right side of the page. Then, navigate to the JVM Tab for the device.
When it comes to reducing memory usage I wouldn't worry about the total bytes used for this application if you are truly inline with 525K, just about how often allocation/reallocation is required. Try to pool/reuse objects as much as possible, avoiding any unneeded allocation. For instance, use the StringBuffer class to concatenate strings instead of operators as multiple String objects will be created for each concatenation using the operator, where a StringBuffer will just put the characters in an array and only expand when needed. Google is a good way to find more tips.
Finally, relying on profiling tools, which the BlackBerry JDE has, is a very important part of understanding exactly how you can optimize heap memory usage.
If I'm not mistaken, Blackberry apps are written in Java... which is a managed environment, which means really the only surefire way to use less memory is to create fewer objects. There's not a whole lot you can do about your working set, I think, since it's managed by the runtime (which is actually probably the point of using Java on devices like this).

Resources