I am currently investigating a performance issue within a large Erlang application. The application exhibits larger-than-expected CPU load. To get a first grasp which parts of the system are responsible for the load, I'd like to perform callstack sampling as described in this answer.
Is there a better way to do this than calling erlang:process_info(Pid, backtrace) repeatedly and grepping the functions from that output?
Note that the system is too large to use fprof, and that etop did not point me into the right direction as well. Using fprof for only parts of the system is not possible right now as well, as I first need to pin-point the general location of the performance issue.
A simple way to get the actual size of the stack is process_info(Pid, stack_size). While this only return the size of the stack in words it is a very simple and efficient way of seeing which processes have large stacks.
Related
I have written a rather complex torch application and it works quite well, that is if it doesn't run out of memory. Now I have tried to see what sort of inputs or situations cause it too seemingly randomly run out of memory but so far I have had little to no success. So now I'm looking for a way to check which variables take how much (v)ram.
I can with a simple statement switch between running my code on caffe:cuda or caffe:cl which changes whatever or not my program runs in RAM or on the GPU, I imagine that such a switch will make validating my memory usage a lot easier.
I have already tried to use print(collectgarbage("count")*1024) to check how much memory is in usage at a given point in time however this does not clearly show me where the memory is being used, perhaps because the program is relatively complex (although there are a few variables which I suspect are hugging a lot of memory, neural networks, large matrices and such).
I already know that once I have identified who is hogging my memory I can assign a nill value to it and call the garbage collector too free it.
So in short is there a program or a tool that allows me to run a torch program and then list each variable and it's memory usage?
I don't know if you tried google :)
But here you are:
Torch7-profiling
Neural Model profiler script
"How to Profile a Lua Script using Pepperfish"
Easy Lua Profiling
tbo, I've never had memory issues with Torch7 so it might be your implementation which is just not optimal. It might be a loop without collectgarbage call somewhere where it should be, e.g. in a training loop or between the epochs.
In my program all the state is held in a giant map in an atom, which is updated by a load of pure functions in each iteration. I have determined that the heap size is increasing, how do I find the code that's responsible ? I tried VisualVM, but it gives generic information and I can't find which part of my state is growing and which function is causing it to grow.
Look for common gotchas like forgetting to use with-open, hanging onto the head of a sequence, etc.
Isolate smaller segments of your code and see if you still see the same kinds of memory growth using JVisualVM. If knocking out or mocking some piece makes no difference then put it back, and if it makes a difference then you can focus on that and figure out what is going on.
I don't know of any silver bullet tool or technique, it's just a process of divide and conquer, and thinking about what you are doing in your code.
recently my friend attended intv, he faced this question(intviewer made this up from my fren's answer to another question)
Say, we have option to use either
1) recursion --> uses system stack, i think OS takes care of everything
2) use our own stack for only data part and get things done.
to fix something. Which one do you prefer? and why?
assume stack size wouldn't grow beyond 100.
I would use the system stack. Why re-invent the wheel?
Function calls, while not really slow per se, do take non-zero time. Therefore an iterative solution can be slightly faster.
More often thatn not, simplicity is better than a slight performance gain.
Dont overkill a solution, and loose maitainability/readability for 1ms if you are not going to use that 1ms.
Just remember that whatever clever little hack you put together has to be maintained (and proven to work first for that matter) where as many standard/system solutions are available, that has been proven. (see Reinventing the wheel).
If it is really system crytical that you reduce memory allocation and enhance performance, you have your work cut out for you, and be prepared to spend some time proving that your solution is better/faster and stable.
Interesting to see the general preference for recursion on here, and a few who assume that the recursive implementation will necessarily be clearer or more maintainable... maybe, maybe not :-).
recursion typically avoids an explicit loop
recursion can sometimes simply use local variables inside the function to avoid a container storing results as they're calculated
recursion can make it trivial to reverse the order in which sub-results are gathered
recursion means there's a limit to the depth of information being processed, where-as often a loop implementation easily avoids this, or at least has memory requirements that more accurately reflect the data-processing needs
the more widely applicable you want your software to be, the more important it is to remove arbitrary limits (e.g. UNIX software like modern vim, less, GNU grep etc. make minimal assumptions about file/line/expression length and dynamically attempt whatever they're asked / many here will remember old editors and vendor-specific utilities e.g. one "celestial" company's grep that would never match results at the end of a too-long line, editors that SIGSEGVed, shutdown, corrupted or slowed down into uselessness on long lines or files)
naive recursion can result in spectacularly inefficiently combined sub-results
some people find recursion easier to understand, some find it harder - definitely it suits how we think about some problems better than others
Depends on the algorithm. Small stack usage, system stack. Lot of stack needed, go on the heap. Stack size is limited by OS beyond which OS throws stackoverflow ;-) If algo uses more stack space then I would go with stack data structure and push the data on the heap
Hm, I think it deppends the problem...
The stack size, if I got your point, is not only what limits you from using one or another.
But wanting to use recursion... well, no bads, really, for the length of the stack, but I'd rather make my own solution.
Avoid recursion when you can. :)
Recursion may be the simplest way to solve a particular problem. An iterative solution can required more code and more opportunities for errors. The testing and maintenance cost may be greater than the performance benefit.
I would go with the first, use the system stack. That being said the language FORTH there are two system stacks. One is the return stack and the other is the parameters stack. This offers some nice flexibility.
I'm looking at refactoring a lot of large (1000+ lines) methods into nice chunks that can then be unit tested as appropriate.
This started me thinking about the call stack, as many of my rafactored blocks have other refactored blocks within them, and my large methods may well have been called by other large methods.
I'd like to open this for discussion to see if refactoring can lead to call stack issues. I doubt it will in most cases, but wondered about refactored recursive methods and whether it would be possible to cause a stack overflow without creating an infinite loop?
Excluding recursion, I wouldn't worry about call stack issues until they appear (which they likely won't).
Regarding recursion: it must be carefully implemented and carefully tested no matter how it's done so this would be no different.
I guess it's technically possible. But not something that I would worry about unless it actually happens when I test my code.
When I was a kid, and computers had 64K of RAM, the call stack size mattered.
Nowadays, it's hardly worth discussing. Memory is huge, stack frames are small, a few extra function calls are hardly measurable.
As an example, Python has an artificially small call stack so it detects infinite recursion promptly. The default size is 1000 frames, but this is adjustable with a simple API call.
The only way to run afoul of the stack in Python is to tackle Project Euler problems without thinking. Even then, you typically run out of time before you run out of stack. (100 trillion loops would take far longer than a human lifespan.)
I think it's highly unlikely for you to get a stackoverflow without recursion when refactoring. The only way that I can see that this would happen is if you are allocating and/or passing a lot of data between methods on the stack itself.
I am trying to write a statistics tool for a game by extracting values from game's process memory (as there is no other way). The biggest challenge is to find out required addresses that store data I am interested. What makes it even more harder is dynamic memory allocation - I need to find not only addresses that store data but also pointers to those memory blocks, because addresses are changing every time game restarts.
For now I am just manually searching game memory using memory editor (ArtMoney), and looking for addresses that change their values as data changes (or don't change). After address is found I am looking for a pointer that points to this memory block in a similar way.
I wonder what techniques/tools exist for such tasks? Maybe there are some articles I can read? Is mastering disassembler the only way to go? For example game trainers are solving similar tasks, but they make them in days and I am struggling already for weeks.
Thanks.
PS. It's all under windows.
Is mastering disassembler the only way to go?
Yes; go download WinDbg from http://www.microsoft.com/whdc/devtools/debugging/default.mspx, or if you've got some money to blow, IDA Pro is probably the best tool for doing this
If you know how to code in C, it is easy to search for memory values. If you don't know C, this page might point you to your solution if you can code in C#. It will not be hard to port the C# they have to Java.
You might take a look at DynInst (Dynamic Instrumentation). In particular, look at the Dynamic Probe Class Library (DPCL). These tools will let you attach to running processes via the debugger interface and insert your own instrumentation (via special probe classes) into them while they're running. You could probably use this to instrument the routines that access your data structures and trace when the values you're interested in are created or modified.
You might have an easier time doing it this way than doing everything manually. There are a bunch of papers on those pages you can look at to see how other people built similar tools, too.
I believe the Windows support is maintained, but I have not used it myself.