I am profiling crucial loop in my app and I see interesting option in time profiler settings:
"View as value"
When I select it, I see numbers with 'x' postfix instead of standard percentage values.
As an example, see attached screenshot (assembly code view):
and instruction has 442x value. This btw seems heavy for such simple assembly instruction and comparing to others in the loop.
What do those numbers mean? Are those somehow referring to the CPU cycles for given line of code?
Related
As the title suggests, I want to see from where came the value of a specific address. I am debugging an ios game with lldb. This game has a muliplier value of 0.4 (how fast combos decrease). I can change this value with cheat engine, but I want to know which instruction in assembly set this value to that address so I can change this instruction with hex editor. I used to use watchpoints breakpoints etc.. for variable values, but in this case, the value is constant and it is set when the app starts immediately.
The equivalent instructions in lldb for Jester's gdb steps are:
(lldb) image lookup -va <ADDRESS>
That will tell you everything lldb knows about that address.
I have some constraints which z3 takes a long time to solve. I am aware of the "-st" command-line flag that prints statistics but at the very end, and the TRACE facility for printing out internal data structure values. Is is there a way to get diagnostic information from within z3 (eg. to monitor memory usage continuously) as it is running (external tools like ps are not always convenient and do not always serve the purpose), when it is being used from the command-line? Thanks.
You can use the option -v:100, it sets the verbosity level to 100. It may not still display the memory usage as often as you want.
Another option is to add the following line of code in appropriate places.
timeit tt(get_verbosity_level() >= 3, "report");
It will display memory usage if the verbosity level is >= 3.
For example, a good place is in the beginning of the method lbool context::bounded_search() at src/smt/smt_context.cpp. This method is executed after each restart.
I'm trying to write a linker script to write one section content into two non-contiguous memory regions.
I have found an old thread in this mail list about this:
"ld linker script and non-contiguous memory region"
http://sourceware.org/ml/binutils/2012-01/msg00188.html
I know a feature from the C28x Compiler for this problem is
spliting the sections across multiple memory segments: (with an or function)
SECTIONS { .text: { *(.text) } >> FLASH1| FLASH3 }
described here:
http://processors.wiki.ti.com/index.php/C28x_Compiler_-_Understanding_Linking
I have try it without success.
At the moment I have to manually fill the fist memory region. but is a difficult to search parts of code witch
I will not change in the future and fit and fill completely the first memory region.
Is such feature in the GNU linker implemented? Or does anyone has a better idea
how can I solve this problem?
I think the easiest way (and maybe the only way) would be to split your section up into two sections, then assign one section to the first memory region, and the second section to the second memory region.
You have probably already seen this, but it is a pretty concise description of link scripts:
http://www.math.utah.edu/docs/info/ld_3.html
I've tried every possible fields but can not find the number of times functions are called.
Besides, I don't get Self and # Self. What do these two numbers mean?
There are several other ways to accomplish this. One is obviously to create a static hit counter and an NSLog that emits and increments a counter. This is intrusive though and I found a way to do this with lldb.
Set a breakpoint
Execute the program until you hit the breakpoint the first time and note the breakpoint number on the right hand side of the line you hit (e.g. "Thread 1: breakpoint 7.1", note the 7.1)
Context click on the breakpoint and choose "Edit Breakpoint"
Leave condition blank and choose "Add Action"
Choose "Debugger Command"
In the command box, enter "breakpoint list 7.1" (using the breakpoint number for your breakpoint from step 2). I believe you can use "info break " if you are using gdb.
Check Options "Automatically Continue after evaluating"
Continue
Now, instead of stopping, llvm will emit info about the breakpoint including the number of times it has been passed.
As for the discussion between Glenn and Mike on the previous answer, I'll describe a performance problem where function execution count was useful: I had a particular action in my app where performance degraded considerably with each execution of the action. The Instruments time profiler showed that each time the action was executed, a particular function was taking twice as long as the time before until quickly the app would hang if the action was performed repeatedly. With the count, I was able to determine that with each execution, the function was called twice as many times as it was during the previous execution. It was then pretty easy to look for the reason, which turned out to be that someone was re-registering for a notification in NotificationCenter on each event execution. This had the effect of doubling the number of response handler calls on each execution and thus doubling the "cost" of the function each time. Knowing that it was doubling because it was called twice as many times and not because the performance was just getting worse caused me to look at the calling sequence rather than for reasons the function itself could be degrading over time.
While it's interesting, knowing the number of times called doesn't have anything to do with how much time is spent in them. Which is what Time Profiler is all about. In fact, since it does sampling, it cannot answer how many times.
It seems you cannot use Time Profiler for counting function calls. This question seems to address potential methods for counting.
W/ respect to self and #self:
Self is "The number of times the symbol calls itself." according to the Apple Docs on the Time Profiler.
From the way the numbers look though, it seems self is the summed duration of samples that had this symbol at the bottom of its stack trace. That would make:
# self: the number of samples where this symbol was at the bottom of the stack trace
% self: the percent of self samples relative to total samples of currently displayed call tree
(eg - #self / total samples).
So this wouldn't tell you how many times a method was called. But it would give you an idea how much time is spent in a method or lower in the call tree.
NOTE: I too am unsure about the various 'self' meanings though. Would love to see someone answer this authoritatively. Arrived here searching for that...
IF your objective is to find out what you need to fix to make the program as fast as possible,
Number of calls and self time may be interesting but are irrelevant.
Look at my answer to this question, in particular points 6 and 8.
EDIT: To clarify the point further, suppose the following is the timeline of execution of the program. Some of that time (in this case about 50%) is spent in an activity that can be removed, if you know what it is, such as needless buried I/O, excessive calls to new, runaway notifications, or "insignificant" data validation. If a random-time sample is taken, it has a 50% chance of occurring in that activity, and an examination of the call stack and/or program variables shows that it is doing something that can be removed. Then, if 10 such samples are taken, the activity will be seen on roughly 5 of them, regardless of whether the activity occurs in a few large chunks of time, or many small ones. The activity may be a few lines of code in a function doing something unnecessary, or it may be something much more generalized. Regardless, you recognize it, fix it, and get roughly a factor of 2 speedup. Call counts and self time contribute nothing to this process.
I am running into the following issue while profiling an application under VC6. When I profile the application, the profiler is indicating that a simple getter method similar to the following is being called many hundreds of thousands of times:
int SomeClass::getId() const
{
return m_iId;
};
The problem is, this method is not called anywhere in the test app. When I change the code to the following:
int SomeClass::getId() const
{
std::cout << "Is this method REALLY being called?" << std::endl;
return m_iId;
};
The profiler never includes getId in the list of invoked functions. Comment out the cout and I'm right back to where I started, 130+ thousand calls! Just to be sure it wasn't some cached profiler data or corrupted function lookup table, I'm doing a clean and rebuild between each test. Still the same results!
Any ideas?
I'd guess that what's happening is that the compiler and/or the linker is 'coalescing' this very simple function to one or more other functions that are identical (the code generated for return m_iId is likely exactly the same as many other getters that happen to return a member that's at the same offset).
essentially, a bunch of different functions that happen to have identical machine code implementations are all resolved to the same address, confusing the profiler.
You may be able to stop this from happening (if this is the problem) by turning off optimizations.
I assume you are profiling because you want to find out if there are ways to make the program take less time, right? You're not just profiling because you like to see numbers.
There's a simple, old-fashioned, tried-and-true way to find performance problems. While the program is running, just hit the "pause" button and look at the call stack. Do this several times, like from 5 to 20 times. The bigger a problem is, the fewer samples you need to find it.
Some people ask if this isn't basically what profilers do, and the answer is only very few. Most profilers fall for one or more common myths, with the result that your speedup is limited because they don't find all problems:
Some programs are spending unnecessary time in "hotspots". When that is the case, you will see that the code at the "end" of the stack (where the program counter is) is doing needless work.
Some programs do more I/O than necessary. If so, you will see that they are in the process of doing that I/O.
Large programs are often slow because their call trees are needlessly bushy, and need pruning. If so, you will see the unnecessary function calls mid-stack.
Any code you see on some percentage of stacks will, if removed, save that percentage of execution time (more or less). You can't go wrong. Here's an example, over several iterations, of saving over 97%.