Techniques for tracking threads in Grand Central? - ios

I suspect my app is creating lots of threads using dispatch_async() calls. I've seen crash reports with north of 50 and 80 threads. It's a large code base I didn't write and haven't fully dissected. What I would like to do is get a profile of our thread usage; how many threads we're creating, when we're creating them, etc.
My goal is to figure out of we are spending all of our time swapping threads and if using an NSOperationQueue would be better so we have more control than we do by just dispatch_async'ing blocks all over willy-nilly.
Any ideas / techniques for investigating this are welcome.

Looks like you need to take a look at Instruments. Learn about it from Apple docs or WWDC sessions or wherever you want. There are many resources
Generally NSOperationQueues are definitely better if you need to implement some dependicies.
As Brad Llarson pointed there are a few WWDC sessions which are helpful in many cases. However besides optimizing your calls you should consider making your code more human readable and simply better. I haven't ever seen source code with as many as 80 queues on iOS. There must be something wrong with the architecture of app.
Let me know anyone if I am wrong.

If you are spinning that many threads, you are most likely I/O bound. Also, Mike's article is great, but it's quite old (though still relevant wrt regular queues).
Instead of using dispatch_async, you should be using dispatch_io and friends for your I/O requirements. They handle all the asynchronous monitoring and callbacks for you... and will not overrun your process with extraneous processing threads.

Related

How DispatchQueues are implemented under the hood?

I’m curious what DispatchQueue really is under the hood, I tried to google this information but all the documentation is rather abstract and doesn’t provide any real information about the implementation. In my understanding DispatchQueue is some kind of an entity that exists somewhere and is able to store blocks of code and is controlled directly by the kernel(by GCD which is baked into the kernel) which is able to inject those blocks in chosen(by GCD/Kernel) thread. This this the correct vision of DispatchQueue, or I misunderstood something?
You've misunderstood, at least in some parts. GCD is not "baked into the kernel", it's a library that runs on top of POSIX threads, which are OS-level primitives with kernel support. GCD is simply a set of APIs that make it easier for developers to do work on multiple threads without having to manage the threads themselves.
For what it's worth, you can see the source code for GCD. It's here: https://opensource.apple.com/tarballs/libdispatch/ That said, it's full of micro-optimizations that take advantage of obscure compiler features (branch prediction directives and things like that) and it can often be hard to read and understand, even for experienced systems programmers.
A full-detail explanation of GCD's inner workings is beyond the scope of a StackOverflow answer, but I'll try to cook up a one or two paragraph explanation.
GCD manages some number of POSIX threads behind the scenes that it will use to execute work in the desired way. It also maintains a number of data structures to organize that work, like "queues" which can be thought of as "lists of blocks of work to be done." There are also groups, which allow you to be notified when a list of work items is completed. There are also various IO mechanisms to allow asynchronous IO to be serviced with these work items. It may (or may not) use various kernel services (like threads, kqueues, etc) to manage parts of its workload, but those aren't specific to GCD.
At the end of the day though, there's little or nothing "special" or "blessed" about GCD. In fact, there are multiple ports of GCD to various other operating systems out there, like this one for Linux: http://nickhutchinson.github.io/libdispatch/ which should drive home the point that it's not something specific to the Darwin kernel. Put differently, you could write your own version of GCD from scratch without needing to recompile the kernel.

Stacking vs. Queuing: efficiency

From an efficiency stand point, which would be better: Stacking or Queuing? And perhaps Heaping? I've been doing a lot of research and tried a few of my own things, it seems Heaping is worse than both Stacking and Queuing. But when I was testing Stacking and Queuing, they were similar in speed. I tried finding the answer, but no answer was reached.
The question is meaningless without an application. If you want to process things in a first-in, first-out manner, you use a queue. If you want to process things first-in, last-out, you use a stack. If you want to process things by priority, you use a heap or some other priority queue implementation.
The question isn't "which is better, stack, queue, or heap?" The question is, "what is the most appropriate data structure for the problem I'm trying to solve?"

Simple Concurrent Core Data

I've done a fair amount of research over the past few days, but I'm not sure what the current best practice is for concurrent Core Data. The most relevant post seems to be this blog post, but in light of this analysis about the performance of different concurrency methods, it seems that the modern way with parent contexts might not be the best. Also, this example from Apple doesn't implement the best practice mentioned in Apple's own concurrency guide that recommends NOT using the default NSConfinementConcurrencyType.
In light of all of this, what is the simplest and best way to implement concurrency with Core Data? All I need is a background thread the do some long writes to Core Data without hanging up the UI. Code examples are appreciated.
As always, it really depend on what you are trying to accomplish.
"long writes" will hang your UI no matter the architecture you implement.
A write operation lock the DB file in the OS level and the sqlite engine level (if you use this kind of store), all pending read operations will have to wait for the write to end before they complete.
One of the most used optimisation methods is segmenting the database "load" process with multiple save operations (you shouldn't mind as this happen in the background).
So, to answer the question:
The simplest way for you, would probably be to use the architecture described in the blog post you mentioned (parent-child hierarchy). if you notice that this cause to much "stutter" for your UI, try to optimise your data load process or try a different architecture.
use instruments to find "bottle necks" in your application execution.
CodeData has "quirks"/bugs in every architecture that I know, and you will find them gradually, depending on your use of it.
My recommendation is to go with the parent/child context pattern. In the light of the sparse details you give (e.g. number of records, total volume of data, latency of delivery, etc). this seems to be the most flexible and proven solution that can also accommodate very large datasets.
Contrary to other claims, you can have a smoothly operating UI regardless of how long the "writes" are to your database. Obviously, that is what background threads are for. The mechanism to keep the UI fluid is through so-called notifications about data change. You can react to these gracefully without disturbing the user experience.
Your remark about NSConfinementConcurrencyType is correct. As stated in your source, it is there for backward compatibility, so you can just forget about it. Obviously, for concurrency you want to use NSPrivateQueueConcurrencyType when creating your context.

What is the optimal trade off between refactoring and increasing the call stack?

I'm looking at refactoring a lot of large (1000+ lines) methods into nice chunks that can then be unit tested as appropriate.
This started me thinking about the call stack, as many of my rafactored blocks have other refactored blocks within them, and my large methods may well have been called by other large methods.
I'd like to open this for discussion to see if refactoring can lead to call stack issues. I doubt it will in most cases, but wondered about refactored recursive methods and whether it would be possible to cause a stack overflow without creating an infinite loop?
Excluding recursion, I wouldn't worry about call stack issues until they appear (which they likely won't).
Regarding recursion: it must be carefully implemented and carefully tested no matter how it's done so this would be no different.
I guess it's technically possible. But not something that I would worry about unless it actually happens when I test my code.
When I was a kid, and computers had 64K of RAM, the call stack size mattered.
Nowadays, it's hardly worth discussing. Memory is huge, stack frames are small, a few extra function calls are hardly measurable.
As an example, Python has an artificially small call stack so it detects infinite recursion promptly. The default size is 1000 frames, but this is adjustable with a simple API call.
The only way to run afoul of the stack in Python is to tackle Project Euler problems without thinking. Even then, you typically run out of time before you run out of stack. (100 trillion loops would take far longer than a human lifespan.)
I think it's highly unlikely for you to get a stackoverflow without recursion when refactoring. The only way that I can see that this would happen is if you are allocating and/or passing a lot of data between methods on the stack itself.

Low level programming: How to find data in a memory of another running process?

I am trying to write a statistics tool for a game by extracting values from game's process memory (as there is no other way). The biggest challenge is to find out required addresses that store data I am interested. What makes it even more harder is dynamic memory allocation - I need to find not only addresses that store data but also pointers to those memory blocks, because addresses are changing every time game restarts.
For now I am just manually searching game memory using memory editor (ArtMoney), and looking for addresses that change their values as data changes (or don't change). After address is found I am looking for a pointer that points to this memory block in a similar way.
I wonder what techniques/tools exist for such tasks? Maybe there are some articles I can read? Is mastering disassembler the only way to go? For example game trainers are solving similar tasks, but they make them in days and I am struggling already for weeks.
Thanks.
PS. It's all under windows.
Is mastering disassembler the only way to go?
Yes; go download WinDbg from http://www.microsoft.com/whdc/devtools/debugging/default.mspx, or if you've got some money to blow, IDA Pro is probably the best tool for doing this
If you know how to code in C, it is easy to search for memory values. If you don't know C, this page might point you to your solution if you can code in C#. It will not be hard to port the C# they have to Java.
You might take a look at DynInst (Dynamic Instrumentation). In particular, look at the Dynamic Probe Class Library (DPCL). These tools will let you attach to running processes via the debugger interface and insert your own instrumentation (via special probe classes) into them while they're running. You could probably use this to instrument the routines that access your data structures and trace when the values you're interested in are created or modified.
You might have an easier time doing it this way than doing everything manually. There are a bunch of papers on those pages you can look at to see how other people built similar tools, too.
I believe the Windows support is maintained, but I have not used it myself.

Resources