Why doesn't glib use private futexes? - glib

When investigating some performance issues, I finally ended up in gthread-posix.c.
There I found code such as:
static void __attribute__((noinline))
g_mutex_lock_slowpath (GMutex *mutex)
{
/* Set to 2 to indicate contention. If it was zero before then we
* just acquired the lock.
*
* Otherwise, sleep for as long as the 2 remains...
*/
while (exchange_acquire (&mutex->i[0], 2) != 0)
syscall (__NR_futex, &mutex->i[0], (gsize) FUTEX_WAIT, (gsize) 2, NULL);
}
I am curious as to why it doesn't use FUTEX_WAIT_PRIVATE here and it other places. At least on the ARM the non-private futexes are significantly slower, and I was under the impression that glib is for multithreading rather than interprocess communication in shared memory.

GLib does now use FUTEX_WAIT_PRIVATE, since March 2015. See this commit and the related bug report.

Related

How to explain this glibc modification on libpthread?

We found an issue while we upgrade Yocto from 1.7 to 2.2.
App below behaves differently on newer glibc, it always got stuck on pthread_cond_destroy on glibc2.25, while on glibc2.20 it ends well.
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
void *thread1(void *);
void *thread2(void *);
int main(void)
{
pthread_t t_a;
pthread_t t_b;
pthread_create(&t_a,NULL,thread1,(void *)NULL);
pthread_create(&t_b,NULL,thread2,(void *)NULL);
sleep(1);
pthread_cond_destroy(&cond);
pthread_mutex_destroy(&mutex);
exit(0);
}
void *thread1(void *args)
{
pthread_cond_wait(&cond,&mutex);
}
void *thread2(void *args)
{
}
After investigation I find a glibc commit here:
https://sourceware.org/git/?p=glibc.git;a=commit;f=sysdeps/arm/nptl/bits/pthreadtypes.h;h=ed19993b5b0d05d62cc883571519a67dae481a14, which seems to be the reason.
We noticed the comment of __pthread_cond_destroy, it mentioned:
A correct program must make sure that no waiters are blocked on the
condvar when it is destroyed.
It must also ensure that no signal or broadcast are still pending to
unblock waiters.
and destruction that is concurrent with still-active waiters is
probably neither common nor performance critical.
From my perspective it brings in a defect. But glibc team seems to have a good reason. Could anybody explain whether this modification is necessary?
From the standard:
Attempting to destroy a condition variable upon which other threads
are currently blocked results in undefined behavior.
From my perspective it brings in a defect.
Your program depended on undefined behavior. Now you pay the price of doing so.

Queues in FreeRTOS

I'm using Freescale FRDM-KL25Z board with Codewarrior 10.6 software. My goal is to make small program in FreeRTOS, which reads voltage from thermistor by analog/digital converter (0-3,3v) and depends on this voltage I'd like turn on/off led diodes. It worked for me till the moment, when I added second task and queues. I'm thinking that problem might be in stack size, but I have no idea how to configure it.
Code is below:
xQueueHandle queue_led;
void TaskLed (void *p)
{
uint16_t temp_val;
xQueueReceive(queue_led, &temp_val, 1);
if (temp_val<60000)
{
LED_1_Neg();
}
}
void TaskTemp (void *p)
{
uint16_t temp_val;
(void)AD1_Measure(TRUE);
(void)AD1_GetValue16(&temp_val);
xQueueSendToBack(queue_led, &temp_val, 1000);
FRTOS1_vTaskDelay(1000);
}
Code in main():
xTaskCreate(TaskLed, (signed char *)"tl", 200, NULL, 1, NULL);
xTaskCreate(TaskTemp, (signed char *)"tt", 200, NULL, 1, NULL);
vTaskStartScheduler();
return(0);
A task is normally a continuous thread of execution - that is - it is implemented as an infinite loop that runs forever. It is very rare for a task to exit its loop - and in FreeRTOS you cannot run off the bottom of a function that implements a task without deleting the task (in more recent versions of FreeRTOS you will trigger an assert if you try). Therefore the functions that implement your tasks are not valid.
FreeRTOS has excellent documentation (and an excellent support forum, for that matter, which would be a more appropriate place to post this question). You can see how a task should be written here: http://www.freertos.org/implementing-a-FreeRTOS-task.html
In the code you post I can't see that you are creating the queue that you are trying to use. That is also documented on the FreeRTOS.org website, and the download has hundreds of examples of how to do it.
If it were a stack issue then Google would show you to go here:
http://www.freertos.org/Stacks-and-stack-overflow-checking.html
You should create the queue and then check that the returned value is not zero (the queue is successfully created)

Is OS X's SecRandomCopyBytes fork safe?

Many userspace CSPRNG's have an issue where after fork(2), it's possible for the two different processes to return the same stream of random bytes.
From looking at dtruss, it's clear that SecRandomCopyBytes is, at a minimum, seeding from /dev/random, but is it doing so in a way that's safe for use after fork()?
With the following source code:
#include <Security/Security.h>
int main() {
uint8_t data[8];
SecRandomCopyBytes(kSecRandomDefault, 8, data);
SecRandomCopyBytes(kSecRandomDefault, 8, data);
printf("%llu\n", *(uint64_t *)data);
}
I get the following from dtruss (with irrelevant stuff removed):
open("/dev/random\0", 0x0, 0x7FFF900D76F5) = 3 0
read(0x3, "\b\2029a6\020+\254\356\256\017\3171\222\376T\300\212\017\213\002\034w\3608\203-\214\373\244\177K\177Y\371\033\243Y\020\030*M\3264\265\027\216r\220\002\361\006\262\326\234\336\357F\035\036o\306\216\227\0", 0x40) = 64 0
read(0x3, "\223??3\263\324\3604\314:+\362c\311\274\326\a_Ga\331\261\022\023\265C\na\211]\356)\0", 0x20) = 32 0
The implementation is actually CCRandomCopyBytes():
http://www.opensource.apple.com/source/Security/Security-55471/libsecurity_keychain/lib/SecRandom.c
int SecRandomCopyBytes(SecRandomRef rnd, size_t count, uint8_t *bytes) {
if (rnd != kSecRandomDefault)
return errSecParam;
return CCRandomCopyBytes(kCCRandomDefault, bytes, count);
}
So the actual code is here:
http://www.opensource.apple.com/source/CommonCrypto/CommonCrypto-60049/lib/CommonRandom.c
The comments in the include for CCRandomCopyBytes state that it is fork() safe:
It is inconvenient to call system random number generators
directly. In the simple case of calling /dev/random, the caller
has to open the device and close it in addition to managing it
while it's open. This module has as its immediate raison d'être
the inconvenience of doing this. It manages a file descriptor to
/dev/random including the exception processing of what happens
in a fork() and exec(). Call CCRandomCopyBytes() and all the
fiddly bits are managed for you. Just get on with whatever you
were really trying to do.
[...]
In my own quick test, the child gets killed when it invokes SecRandomCopyBytes()

Where is my AVPlayer's memory, and how do I get it back?

I'm playing heaps of videos at the same time with AVPlayer. To reduce loading times, I'm storing the corresponding views in a NSCache.
This works fine until reaching a certain number of videos, from which the videos simply stop playing, or even appearing.
There's no error, log or memory warning. In particular, I'm listening to UIApplicationDidReceiveMemoryWarningNotification to clear the cache but this is never received.
If I remove the cache, all the videos play at expense of worse performance.
This makes me suspect that AVPlayer is using memory from a different process (which one?). And when that memory reaches a certain limit, new players cease to work.
Is this correct?
If so, is there a way to be notified when this magic limit is reached to take the appropriate measures (e.g., clear the cache) to ensure playback of other media?
Good news and bad news - good is you can probably fix the problem, bad is it takes work and is somewhat complex.
Root Problem
The reason you don't get notified early happens because iOS does not find out that your app has exceeded its memory budget til its almost too late, then it immediately kills it. The problem has to do with the way iOS (and OS X) manage the file system cache. Normally, when files get opened, as you read the data, the file data gets transferred into a buffer in the Unified Buffer Cache (a term you can google for more info) - I'll call it UBC from now on.
So suppose you have 10 open files, and you have read every file to the end, but have not closed the files. Well, all that data is sitting in the UBC. Now, if you close the files, the buffers are all freed. And technically, the OS can purge this buffers too - only it seems by the time it realizes that memory is tight, it chooses to blow the app away first (and there may be valid reasons for it to do this). So imagine that you app is showing videos, and the way the videos get loaded is through the file system, the number of free buffers starts dropping. At some point iOS notices this, tracks down who most belong to (your app), and wham, kills your app ASAP.
I hit this problem myself in an open source project I support, PhotoScrollerNetwork. Users started complaining that their project was getting terminated by the system, like you, without any notification. I tried in vain to monitor the UBC (there are APIs on OS X to do so, but not on iOS). In the end I found a solution using an heuristic - monitor all your memory usage including the UBC, and don't exceed 50% of the total available iOS memory pool.
So (you might ask) - what is the Apple approved way to solve this problem? Well, there is none. How do I know that? Because I had a half hour long discussion at WWDC 2012 with the Director of Core iOS in one of the labs (after getting ping ponged around by others who had no idea what I was talking about). In the end, after I explained the above heuristic, he told me directly that the solution was probably as good as any he could think of. Without an API to directly monitor the UBC, you can only approximate its usage and adjust accordingly.
But you say, I'm using the NSCache - why doesn't the system account for the AVPlayer memory there? There reason is undoubtedly the UBC - an AVPlayer instance probably only consumes a few thousand K of memory itself - its the open file to the video that is not accounted for by iOS.
Possible Solutions
1) If you can load the videos directly into a NSData object, and keep that in the NSCache, you can most likely totally avoid the UBC issues mentioned above. [I don't know enough about the AV system to know if you can do this.] In this case the system should be capable of purging memory when it needs to.
2) Continue using your original code, but add memory management to it. That is, when you get create an AVPlayer instance, you will need to account for the size of the video in bytes, and keep a running tally of all this memory. When you approach 50% of total device free memory, then start purging old AVPlayers.
Code
For completeness, I've provided the relevant code from PhotoScrollerNetwork below. If you want more details you can peruse the project - however its quite complex so expect to spend some time (its doing JPEG decoding on the fly for massive images and writing tiles to the file system as the decode proceeds).
// Data Structure
typedef struct {
size_t freeMemory;
size_t usedMemory;
size_t totlMemory;
size_t resident_size;
size_t virtual_size;
} freeMemory;
Early on in your app:
// ubc_threshold_ratio defaults to 0.5f
// Take a big chunk of either free memory or all memory
freeMemory fm = [self freeMemory:#"Initialize"];
float freeThresh = (float)fm.freeMemory*ubc_threshold_ratio;
float totalThresh = (float)fm.totlMemory*ubc_threshold_ratio;
size_t ubc_threshold = lrintf(MAX(freeThresh, totalThresh));
size_t ubc_usage = 0;
// Method on some class to monitor the memory pool
- (freeMemory)freeMemory:(NSString *)msg
{
// http://stackoverflow.com/questions/5012886
mach_port_t host_port;
mach_msg_type_number_t host_size;
vm_size_t pagesize;
freeMemory fm = { 0, 0, 0, 0, 0 };
host_port = mach_host_self();
host_size = sizeof(vm_statistics_data_t) / sizeof(integer_t);
host_page_size(host_port, &pagesize);
vm_statistics_data_t vm_stat;
if (host_statistics(host_port, HOST_VM_INFO, (host_info_t)&vm_stat, &host_size) != KERN_SUCCESS) {
LOG(#"Failed to fetch vm statistics");
} else {
/* Stats in bytes */
natural_t mem_used = (vm_stat.active_count +
vm_stat.inactive_count +
vm_stat.wire_count) * pagesize;
natural_t mem_free = vm_stat.free_count * pagesize;
natural_t mem_total = mem_used + mem_free;
fm.freeMemory = (size_t)mem_free;
fm.usedMemory = (size_t)mem_used;
fm.totlMemory = (size_t)mem_total;
struct task_basic_info info;
if(dump_memory_usage(&info)) {
fm.resident_size = (size_t)info.resident_size;
fm.virtual_size = (size_t)info.virtual_size;
}
#if MEMORY_DEBUGGING == 1
LOG(#"%#: "
"total: %u "
"used: %u "
"FREE: %u "
" [resident=%u virtual=%u]",
msg,
(unsigned int)mem_total,
(unsigned int)mem_used,
(unsigned int)mem_free,
(unsigned int)fm.resident_size,
(unsigned int)fm.virtual_size
);
#endif
}
return fm;
}
When you open a video, add the size to ubc_usage, and when you close one decrement it. When you want to open a new video, test ubc_usage against ubc_threadhold, and its it exceeds the value you have to close something first.
PS: you can try calling that freeMemory method at other times, and see, but in my case it hardly changes at all when files get opened - the system seems to consider the whole UBC as "free", since it could purge it if it needed to (I guess).
If you're throwing all of these videos in a NSCache, you have to be prepared for the cache to throw away items when it feels like they are consuming too much memory. From the NSCache documentation:
The NSCache class incorporates various auto-removal policies, which
ensure that it does not use too much of the system’s memory. The
system automatically carries out these policies if memory is needed by
other applications. When invoked, these policies remove some items
from the cache, minimizing its memory footprint.
Check to see if you're getting nils back from the cache, and if you are, you'll have to reconstruct your objects.
Edit:
It is also worth mentioning that objc.io #7 advises against storing large objects in a NSCache:
The eviction method of NSCache is non-deterministic and not
documented. It’s not a good idea to put in super-large objects like
images that might fill up your cache faster than it can evict itself.

iOS Patch program instruction at runtime

How would one go about modifying individual assembly instructions in an application while it is running?
I have a Mobile Substrate tweak that I am writing for an existing application. In the tweak's constructor (MSInitialize), I need to be able to rewrite individual instruction(s) in the app's code. What I mean by this is that there may be multiple places in the application's address space that I wish to modify, but in each instance, only a single instruction needs to be modified. I have already disabled ASLR for the application and know the exact memory address of the instruction to be patched, and I have the hex bytes (as a char[], but this is uninportant and can be changed if necessary) of the new instruction. I just need to figure out how to perform the change.
I know that iOS uses Data Execution Prevention (DEP) to specify that executable memory pages cannot also be writeable and vice versa, but I know that it is possible to bypass this on a jailbroken device. I also know that the ARM processor used by iDevices has an instruction cache that needs to be updated to reflect the change. However, I do not even know where to begin to do this.
So, to answer the question that would surely otherwise be asked, I have not tried anything. This is not because I am lazy; rather, it is because I have absolutely no clue how this could be accomplished. Any help at all would be greatly appreciated.
Edit:
If it helps at all, my ultimate goal is to use this in a Mobile Substrate tweak that hooks an App Store application. Previously, in order to mod this application, one would have to first crack it to decrypt the app so the binary could be patched. I want to make it so people wouldn't have to crack the app, since that can lead to piracy which I am strongly against. I can't use Mobile Substrate normally because all of the work is done in C++, not Objective-C, and the application is stripped, leaving no symbols to use MSHookFunction on.
Completely forgot I asked this question, so I'll show what I ended up with now. The comments should explain how and why it works.
#include <stdio.h>
#include <stdbool.h>
#include <mach/mach.h>
#include <libkern/OSCacheControl.h>
#define kerncall(x) ({ \
kern_return_t _kr = (x); \
if(_kr != KERN_SUCCESS) \
fprintf(stderr, "%s failed with error code: 0x%x\n", #x, _kr); \
_kr; \
})
bool patch32(void* dst, uint32_t data) {
mach_port_t task;
vm_region_basic_info_data_t info;
mach_msg_type_number_t info_count = VM_REGION_BASIC_INFO_COUNT;
vm_region_flavor_t flavor = VM_REGION_BASIC_INFO;
vm_address_t region = (vm_address_t)dst;
vm_size_t region_size = 0;
/* Get region boundaries */
if(kerncall(vm_region(mach_task_self(), &region, &region_size, flavor, (vm_region_info_t)&info, (mach_msg_type_number_t*)&info_count, (mach_port_t*)&task))) return false;
/* Change memory protections to rw- */
if(kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY))) return false;
/* Actually perform the write */
*(uint32_t*)dst = data;
/* Flush CPU data cache to save write to RAM */
sys_dcache_flush(dst, sizeof(data));
/* Invalidate instruction cache to make the CPU read patched instructions from RAM */
sys_icache_invalidate(dst, sizeof(data));
/* Change memory protections back to r-x */
kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_EXECUTE | VM_PROT_READ));
return true;
}
vm_protect to w^x, assuming you're jailbroken with a decent jailbreak (e.g. if mobilesubstrate works)
Writing to instruction memory from processor registers is, as others say above, a bit tricky. Especially with iPhones, since Apple tries to keep the processor details secret.
Permissions on memory access are the first problem. Executable memory is not normally writable. However, if this is overcome, then there is a little dance to go through to get data out of the processor registers and into the instruction pipeline. In general, there are synchronisation instructions, which force a specific order on the memory accesses before and after them, and cache commands, which force dirty write data out to memory and flush out clean and possibly stale read data. Both of these are highly dependent on the detailed implementation of the processor.
Arm Has nice manuals on the web that explain these in detail for specific processors. However, whether the processors inside iPhones do what the public Arm manuals say, I have no idea.
Here's a place to start understanding the Arm memory synchronisation model for one processor:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/ch04s03s04.html
and that goes on to tell how to flush the instruction cache by a write to a control register. It certainly is possible to write self-modifying code for Arm processors because somewhere in that manual I found a statement that said that it is sometimes unavoidable and the has to be supported.
(I'm not claiming this is an answer. But it wouldn't fit in a comment.)

Resources