I am looking for some advice on memory usage on mobile devices, BlackBerry in particular. Using some profiling tools we have calculated a working set size in RAM of 525kb. Problem is we don't really know whether this is acceptable or too high ?
Can anyone give any insight into their own experience with memory usage on BlackBerry? What sort of number should we be aiming for?
I am also wondering what sort of things we should be looking out for in particular to reduce memory usage.
512KB is perfectly acceptable on the current generation of BlackBerrys devices. You can take a look at JBenchmark to see the exact JVM heap you can expect for each model, but none of the current devices out there go below 20MB of heap. Most are much larger than that.
On JBenchmark you can choose the device you are interested from a drop down on the right side of the page. Then, navigate to the JVM Tab for the device.
When it comes to reducing memory usage I wouldn't worry about the total bytes used for this application if you are truly inline with 525K, just about how often allocation/reallocation is required. Try to pool/reuse objects as much as possible, avoiding any unneeded allocation. For instance, use the StringBuffer class to concatenate strings instead of operators as multiple String objects will be created for each concatenation using the operator, where a StringBuffer will just put the characters in an array and only expand when needed. Google is a good way to find more tips.
Finally, relying on profiling tools, which the BlackBerry JDE has, is a very important part of understanding exactly how you can optimize heap memory usage.
If I'm not mistaken, Blackberry apps are written in Java... which is a managed environment, which means really the only surefire way to use less memory is to create fewer objects. There's not a whole lot you can do about your working set, I think, since it's managed by the runtime (which is actually probably the point of using Java on devices like this).
Related
I am a mathematician and not a programmer, I have a notion on the basics of programming and am a quite advanced power-user both in linux and windows.
I know some C and some python but nothing much.
I would like to make an overlay so that when I start a game it can get info about amd and nvidia GPUs like frame time and FPS because I am quite certain the current system benchmarks use to compare two GPUs is flawed because small instances and scenes that bump up the FPS momentarily (but are totally irrelevant in terms of user experience) result in a higher average FPS number and mislead the market either unintentionally or intentionally (for example, I cant remember the name of the game probably COD there was a highly tessellated entity on the map that wasnt even visible to the player which lead AMD GPUs to seemingly under perform when roaming though that area leading to lower average FPS count)
I have an idea on how to calculate GPU performance in theory but I dont know how to harvest the data from the GPU, Could you refer me to api manuals or references to help me making such an overlay possible?
I would like to study as little as possible (by that I mean I would like to learn what I absolutely have to learn in order to get the job done I dont intent to become a coder).
I thank you in advance.
It is generally what the Vulkan Layer system is for, which allows to intercept API commands and inject your own. But it is nontrivial to code it yourself. Here are some pre-existing open-source options for you:
To get to timing info and draw your custom overlay you can use (and modify) a tool like OCAT. It supports Direct3D 11, Direct3D 12, and Vulkan apps.
To just get the timing (and other interesting info) as CSV you can use a command-line tool like PresentMon. Should work in D3D, and I have been using it with Vulkan apps too and it seems to accept them.
How can I search for specific static memory address in iOS games some thing like amount of damege and other Thing that don't have vaule in Gameing
I use Ida put its not helpful cause it not show you the statice memory address
Assuming you're trying to hack existing games by twiddling bytes similar to what Game Genie did on old video game systems, good luck with that. Besides the fact that iOS apps use dynamic memory allocation for pretty much everything, iOS also incorporates address space layout randomization, which means even ostensibly static storage probably won't be in the same location in consecutive launches.
The only approach that would even stand a chance of working would involve injecting code that performs introspection of the classes using Objective-c runtime calls. And even if you do that, there's no guarantee that such hacks will be possible or practical in iOS.
I use opencl for image processing. For example, I have one 1000*800 image.
I use a 2D global size as 1000*800, and the local work size is 10*8.
In that case, will the GPU give 100*100 computing units automatic?
And do these 10000 units works at the same time so it can be parallel?
If the hardware has no 10000 units, will one units do the same thing for more than one time?
I tested the local size, I found if we use a very small size (1*1) or a big size(100*80), they are both very slow, but if we use a middle value(10*8) it is faster. So last question, Why?
Thanks!
Work group sizes can be a tricky concept to grasp.
If you are just getting started and you don't need to share information between work items, ignore local work size and leave it NULL. The runtime will pick one itself.
Hardcoding a local work size of 10*8 is wasteful and won't utilize the hardware well. Some hardware, for example, prefers work group sizes that are multiples of 32.
OpenCL doesn't specify what order the work will be done it, just that it will be done. It might do one work group at a time, or it may do them in groups, or (for small global sizes) all of them together. You don't know and you can't control it.
To your question "why?": the hardware may run work groups in SIMD (single instruction multiple data) and/or in "Wavefronts" (AMD) or "Warps" (NVIDIA). Too small of a work group size won't leverage the hardware well. Too large and your registers may spill to global memory (slow). "Just right" will run fastest, but it is hard to pick this without benchmarking. So for now, leave it NULL and let the runtime pick for you. Later, when you become an OpenCL expert and understand more about how the hardware works, you can try specifying the work group size. However, be aware that the optimal size may be different for different hardware, and there are other rules (like global size must be a multiple of local size).
iOS provides a lot of frameworks, I am new to iOS but there's one concept I am not sure of:
If I minimise the number of frameworks my app depends on, I assume that is a performance boost, am I correct? or does it not matter?
For example, am I saving on memory footprint, or optimizing performance if I do not iOS frameworks ( like core data or core graphics ) if I really don't need the functionality in them? in my app the code is too simple that I can write it myself without core data ( very basic functionality ) .
I couldn;t find good articles that discuss this concept, do all frameworks released by apple get deployed on the device irrelevant of your project using them or not?
It's generally best not to link to frameworks you're not using. To answer your more specific questions:
do all frameworks released by apple get deployed on the device irrelevant of your project using them or not?
Essentially yes -- they're provided as part of iOS, so they're already loaded onto the device (and always present).
If I minimise the number of frameworks my app depends on, I assume that is a performance boost, am I correct?
Perhaps -- the linker/loader won't have to load the frameworks into memory and link your app to them, but the performance cost for doing so is so minuscule that avoiding linking/loading a framework isn't a good reason to not use it if you otherwise would find it to be useful.
am I saving on memory footprint
Maybe. When you load a shared library, the contents of the library get loaded into your app's memory space, so it will increase the memory footprint. However, if the library is already loaded by another app (or by the OS), it'll be shared amongst all apps using it.
Even so, a possibly increased memory footprint is a bad reason to avoid using a framework you might otherwise find helpful.
I want to port a good OpenCV code on an embedded platform. Earlier such stuffs were very difficult to perform but now TI has come up with nice embedded platforms which are comparatively hassle free as they say.
I want to know following things:
Given that :
The OpenCV code is already running on PC smoothly. (obviously)
Need to determine these before purchasing the device.
Can't put the code here in stackoverflow. :P
To chose from Texas Instruments: C6000.
Questions:
How to make it sure that the porting will be done?
What steps to be taken to make it sure that after porting the code, will run (at least).
to determine whether the code might require some changes to make its run smooth.
The point 3 above is optional.
I need info which will at least give me some start up in this regard.
What I thought I should do?
I am to list the inbuilt functions down.
Then to find available online bench marking for those functions for the particular device like as shown towards the end of this doc.
...
Need to know how to proceed further?
However C6-Integra™ DSP+ARM Processor seems the best.
The best you can do is to try a device simulator (if it is available), but what you'll see there is far from perfect.
Actually, nothing can tell you how fast and how well the app will run on the embedded device before running you specific app on that specific device.
So:
Step 1 Buy it
Step 2 Try it
Things to consider:
embedded CPU architecture: Your app needs a big cache? how big is the embedded cache?
algorithm: do you use a lot of floating point operations? how good is the device at floating point ops?
do you have memory transfers? data bus on a PC is waaay faster than on embedded
hardware support: do you use a lot of double-precision calculations? they are emulated on ARMs. They are gonna kill your app (from millisecons on a PC it can go to seconds on a ARM)
Acceleration. Do your functions use SSE? (many OpenCV funcs are SSEd, even if you don't know). Do you have the NEON counterpart? (OpenCV does not have much support for that). The difference can be orders of magnitude from x86 SSE to embedded without NEON.
and many, many others.
So, again: no one can tell you how it will work. Just the combination between the specific app and the real device tells the truth.
even a run on a similar device is not relevant. It can run smoothly on a given processor, and with another, with similar freq or listed memory, it will slow down too much
This is an interesting question but run is a very generic word in this context, therefore I feel the need to break it down to other 2 questions:
Will it compile in an embedded device?
Will it run as fast/smooth as in a PC?
I've used OpenCV in a lot of different devices, including ARM, SH4, MIPS and I found out that sometimes the manufacturer of the device itself provides a compiled version of OpenCV (for my surprise), which is great. That's something you can look into, maybe the manufacturer of your device provide OpenCV binaries.
There's no way to know for sure how smooth your OpenCV application will be on the target device unless you are able to find some benchmark of OpenCV running in there. PCs have far better processing power than embedded devices, so you can expect less performance from the target device.
There are 3rd party applications like opencv-performance, that you can use to test/benchmark the environment once you get your hands on it. And if performance is such a big deal in this project, you might also be interested in this nice article which explain some timing tests done on couple of OpenCV features comparing implementations using the C and C++ interfaces of OpenCV.