When i create more than 5 tasks in freeRTOS the scheduler does not start. I am using the board KL46Z Freedom from Freescale. I know that the scheduler does not start because when i debug the program gets stuck in the FOR which is after the line that starts the scheduler (PEX_RTOS_START())
#ifdef PEX_RTOS_START
PEX_RTOS_START(); /* Startup of the selected RTOS. Macro is defined by the RTOS component. */
#endif
/*** End of RTOS startup code. ***/
/*** Processor Expert end of main routine. DON'T MODIFY THIS CODE!!! ***/
for(;;){} // I GET STUCK HERE!
/*** Processor Expert end of main routine. DON'T WRITE CODE BELOW!!! ***/
} /*** End of main routine. DO NOT MODIFY THIS TEXT!!! ***/
The solution to your problem is to increase the HEAP size in FreeRTOSConfig.h. The default HEAP size for the KL46z is 8192:
#define configTOTAL_HEAP_SIZE 8192 /* Size of heap in bytes */
I increased this value to 16384, and it worked!
#define configTOTAL_HEAP_SIZE 16384/* Size of heap in bytes */
:)
If you are using an official FreeRTOS demo (from the FreeRTOS download) then read the comment by the loop you say you get stuck by and it will tell you exactly why you get stuck there, and give you advice on resources to help you fix it. You can also check the documentation for the xTaskCreate() API function, or alternatively to know if you have enough RAM at compile time (rather than run time) you could create a completely statically allocated system by setting configSUPPORT_STATIC_ALLOCATION to 1, configSUPPORT_DYNAMIC_ALLOCATION to 0, and using xTaskCreateStatic() instead of xTaskCreate().
Related
I am using heap_1 memory allocation. There is an initialization task Task_ini, from which 2 tasks Task_1 and Task_2 are launched. Then I delete Task_ini. At some point in time from Task_1 I need to create a new task Task_3. How can I create Task_3 in the FreeRTOS heap in place of Task_ini which has already been deleted by that time, knowing only its TaskHandle_t?
int main(void){
xTaskCreate(Task_ini, "Task_ini", configMINIMAL_STACK_SIZE, NULL, 1, &htask_ini);
vTaskStartScheduler();
for(;;);
}
void Task_ini(void *pParams){
xTaskCreate(Task_function, "Task_1", configMINIMAL_STACK_SIZE, ¶m1, 1, &htask1);
xTaskCreate(Task_function, "Task_2", configMINIMAL_STACK_SIZE, ¶m2, 1, &htask2);
vTaskDelete(NULL);
}
void Task_function(void *pParams){
for(;;){
//task code
//...
//end task code
if(create == true){
create = false;
//Here I need to create a task at the address where the "Task_ini" task was.
//My code creates a task in a new heap section, and if there is no space it will cause a memory allocation error.
xTaskCreate(Task_function, "Task_3", configMINIMAL_STACK_SIZE, ¶m3, 1, &htask3);
}
}
}
The main idea of heap_1 is that you can't free memory. It is simply not capable of doing so. If you want to delete tasks, you need to use other heap_n methods. Even in that case, you should let the kernel to do its job: It's kernels job to manage memory for FreeRTOS objects, not yours.
Actually, deleting tasks isn't considered as a good practice in general. Unless you are really low on heap space, you can simply suspend the task. In this way, you can wake it up again without any cost in case its services are required again.
It's true that an init task will become useless after the system initialization. But there is a well known solution for your init task problem: It can evolve into another task after it completes the initialization sequence. For example, Task_ini can create only Task_2, and instead of creating a Task_1, it can do the Task_1's job itself.
Update:
It's kernels job to manage memory for FreeRTOS objects, not yours.
Actually, FreeRTOS allows you to manage the memory manually, if you prefer to do so. There are static versions of object creation functions, like xTaskCreateStatic(). When using these static versions, you pass two statically allocated buffers to the function for the task stack and the task control block (TCB). Then you will literally be able to place one task onto another (provided that it's deleted). To be able to use these functions, configSUPPORT_STATIC_ALLOCATION must be defined as 1.
But I suggest you to avoid manual memory management unless you have a specific reason to do so.
How would one go about modifying individual assembly instructions in an application while it is running?
I have a Mobile Substrate tweak that I am writing for an existing application. In the tweak's constructor (MSInitialize), I need to be able to rewrite individual instruction(s) in the app's code. What I mean by this is that there may be multiple places in the application's address space that I wish to modify, but in each instance, only a single instruction needs to be modified. I have already disabled ASLR for the application and know the exact memory address of the instruction to be patched, and I have the hex bytes (as a char[], but this is uninportant and can be changed if necessary) of the new instruction. I just need to figure out how to perform the change.
I know that iOS uses Data Execution Prevention (DEP) to specify that executable memory pages cannot also be writeable and vice versa, but I know that it is possible to bypass this on a jailbroken device. I also know that the ARM processor used by iDevices has an instruction cache that needs to be updated to reflect the change. However, I do not even know where to begin to do this.
So, to answer the question that would surely otherwise be asked, I have not tried anything. This is not because I am lazy; rather, it is because I have absolutely no clue how this could be accomplished. Any help at all would be greatly appreciated.
Edit:
If it helps at all, my ultimate goal is to use this in a Mobile Substrate tweak that hooks an App Store application. Previously, in order to mod this application, one would have to first crack it to decrypt the app so the binary could be patched. I want to make it so people wouldn't have to crack the app, since that can lead to piracy which I am strongly against. I can't use Mobile Substrate normally because all of the work is done in C++, not Objective-C, and the application is stripped, leaving no symbols to use MSHookFunction on.
Completely forgot I asked this question, so I'll show what I ended up with now. The comments should explain how and why it works.
#include <stdio.h>
#include <stdbool.h>
#include <mach/mach.h>
#include <libkern/OSCacheControl.h>
#define kerncall(x) ({ \
kern_return_t _kr = (x); \
if(_kr != KERN_SUCCESS) \
fprintf(stderr, "%s failed with error code: 0x%x\n", #x, _kr); \
_kr; \
})
bool patch32(void* dst, uint32_t data) {
mach_port_t task;
vm_region_basic_info_data_t info;
mach_msg_type_number_t info_count = VM_REGION_BASIC_INFO_COUNT;
vm_region_flavor_t flavor = VM_REGION_BASIC_INFO;
vm_address_t region = (vm_address_t)dst;
vm_size_t region_size = 0;
/* Get region boundaries */
if(kerncall(vm_region(mach_task_self(), ®ion, ®ion_size, flavor, (vm_region_info_t)&info, (mach_msg_type_number_t*)&info_count, (mach_port_t*)&task))) return false;
/* Change memory protections to rw- */
if(kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_READ | VM_PROT_WRITE | VM_PROT_COPY))) return false;
/* Actually perform the write */
*(uint32_t*)dst = data;
/* Flush CPU data cache to save write to RAM */
sys_dcache_flush(dst, sizeof(data));
/* Invalidate instruction cache to make the CPU read patched instructions from RAM */
sys_icache_invalidate(dst, sizeof(data));
/* Change memory protections back to r-x */
kerncall(vm_protect(mach_task_self(), region, region_size, false, VM_PROT_EXECUTE | VM_PROT_READ));
return true;
}
vm_protect to w^x, assuming you're jailbroken with a decent jailbreak (e.g. if mobilesubstrate works)
Writing to instruction memory from processor registers is, as others say above, a bit tricky. Especially with iPhones, since Apple tries to keep the processor details secret.
Permissions on memory access are the first problem. Executable memory is not normally writable. However, if this is overcome, then there is a little dance to go through to get data out of the processor registers and into the instruction pipeline. In general, there are synchronisation instructions, which force a specific order on the memory accesses before and after them, and cache commands, which force dirty write data out to memory and flush out clean and possibly stale read data. Both of these are highly dependent on the detailed implementation of the processor.
Arm Has nice manuals on the web that explain these in detail for specific processors. However, whether the processors inside iPhones do what the public Arm manuals say, I have no idea.
Here's a place to start understanding the Arm memory synchronisation model for one processor:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0092b/ch04s03s04.html
and that goes on to tell how to flush the instruction cache by a write to a control register. It certainly is possible to write self-modifying code for Arm processors because somewhere in that manual I found a statement that said that it is sometimes unavoidable and the has to be supported.
(I'm not claiming this is an answer. But it wouldn't fit in a comment.)
I've been trying to find a thread implementation in IOS that suits my projects needs. So far I've failed to find an acceptable solution.
My Problem :
I need to read audio from up to 16 mp3 files on disk simultaneously.
What I have tried:
First off I tried using a NSTimer witch repeats. The timer was not fast enough and the audio would drop out when I played any more than 4 files.
Second I tried Using an NSThread with a priority of 1. The audio just about played correctly but the UI Became wholly unresponsive.
Finally I tried dispatching blocks using GCD in my callback whenever I needed more audio from a file. Again the audio would drop out but the UI was responsive.
In all three of the examples above I also tried dividing up the work load by creating 4 threads and having each thread handle 4 audio files each but this caused really bad synchronization problems with the audio.
Are there other thread options that I can try or do the above sum up what IOS has to offer?
Do you think that reading from 16 files from disk simultaneously is too much of a strain for the IOS system?
Is there a limit of how many threads IOS can handle?
To avoid making my question sound like a discussion I will summarize as follows.
What IOS thread technology is best suited for very frequent calling, quickly completing execution, that can be easily synchronized and will not impact on UI responsiveness.
Any anecdotal advice from solving a similar audio programming problem is also appreciated.
EDIT 1
This is some stripped down code I modelled on a suggestion from a so user. All I'm after solid advice on what setup is going to work best for me. Since my last post I tried NSThread and it does seem to leave me with audio dropouts. Also I tried using NSConditions so that my thread is wasting processing power when its not filling buffer but using these locks seems like a real bad idea for audio callbacks.
OSStatus channelMixerCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
AudioInfo = myaudio[inBusNumber];
if(myaudio.needsbufferfill==YES)
{
[refToSelf performSelector:#selector(GetAudioForItem:) onThread:engineDescribtion.producerthread withObject:myaudio waitUntilDone:false];
}
}
-(void) startthread
{
engineDescribtion.producerthread =[[NSThread alloc]initWithTarget:self selector:#selector(dosinglerunloop) object:nil];
[engineDescribtion.producerthread start];
}
-(void)dosinglerunloop
{
BOOL isstarted=YES;
NSAutoreleasePool *pool=[[NSAutoreleasePool alloc]init];
do {
[[NSRunLoop currentRunLoop]addPort:[NSMachPort port] forMode:NSDefaultRunLoopMode];
[[NSRunLoop currentRunLoop]runMode:NSDefaultRunLoopMode beforeDate:[NSDate distantFuture]];
} while (isstarted);
[pool release];
}
- (void)GetAudioForItem:(AudioInfo *)info
{
// use data in Audio Info to seek to
//corrent place in file
//and extract audio to buffers
}
Problem 0:
Your audio render callbacks should never lock. Example: Creating a single heap allocation will lock.
Your threads will all compete for the hardware. To keep the UI responsive, you should not have many highest priority threads (the audio playback should be the only one). Consider the number of cores, disks, etc you have available in your design.
If you still have issues once you have correctly fixed that: Loading short files into memory can offload some of the disk's demand to memory.
You should profile to determine what is actually the problem: It may be CPU or I/O. You may be simply missing your render deadlines and equating audio dropouts to "can't read fast enough". If you are using a lot of CPU, then Disk I/O may not be the problem. Decoding and performing sample rate conversion on 16 mp3 files can require relatively high CPU (as one example of the things you need to look for).
pthreads will be fastest, but will require some work to implement right. That really doesn't matter at this time because there seem to be a few high level issues yet and there are multiple APIs which should handle the task just fine.
Your program should be smart enough to detect when read buffers cannot be filled fast enough.
You are pre filling the buffers, correct?
Presumably, you are using a run loop?
Well, there's only one diskā¦ So any solution that requires 16 simultaneous reads might be an issue. (Depending on if you're I/O bound or CPU bound.)
NSTimer is not going to get you consistent results.
I don't see any reason why NSThread would kill UI responsiveness, perhaps you had a bug.
I'm going with this system being disk-bound because 16 channels of MP3 is no problem CPU-wise on modern machines - how much rattling is coming from your box? I would probably be tempted to use just one thread to fill the empty buffers with the buffer sized to accommodate, (averageDiskLatency*(bytes/msec)*16*bodgeFactor) bytes of audio stream, (bodgeFactor means rounded up to 8K boundary and add a few 8K's). Whenever threads/callbacks/whatever empty a buffer and so start on the other one, they should queue the empty buffer to the disk read thread, (thread-safe producer-consumer queue), to get it filled up again. Probably, each buffer should include a 'fileControl' instance containing the the fileSpec, file handle, state variable for EOF etc, error string space and anything else needed for the read thread to work as well as the buffer space itself.
This design allows the disk to read nice, large chunks without being annoyingly preempted half-way through reads and being avoidably forced to move lumps of metal too often.
Rgds,
Martin
PS - If you haven't got one already, get an SSD - works wonders for multi-channel audio/video latency.
this has been bugging me all day. When a program sets itself up to call a function when it receives a certain interrupt, I know that the registers are pushed onto the stack when the program is interrupted, but what I can't figure out is: how do the registers get off the stack? I know that the compiler doesn't know if the function is an interrupt handler, and it can't know how many arguments the interrupt gave to the function. So how on earth does it get the registers off?
It depends on the compiler, the OS and the CPU.
For low level embedded stuff, where an ISR may be called directly in response to an interrupt, the compiler will typically have some extension to the language (usually C or C++) that flags a given routine as an ISR, and registers will be saved and restored at the beginning and end of such a routine. [1]
For common desktop/server OSs though there is normally a level of abstraction between interrupts and user code - interrupts are normally handled first by some kernel code before being passed to a user routine, in which case the kernel code takes care of saving and restoring registers, and there is nothing special about the user-supplied ISR.
[1] E.g. Keil 8051 C compiler:
void Some_ISR(void) interrupt 0 // this routine will get called in response to interrupt 0
{
// compiler generates preamble to save registers
// ISR code goes here
// compiler generates code to restore registers and
// do any other special end-of-ISR stuff
}
OpenCL doesn't have a global barrier that will stop all threads, so I'm trying to create a work around with the following code:
void barrier(__global uint* scratch) {
uint nThreads = get_global_size(0);
atom_inc(scratch);
/* this loop never terminates */
while(scratch[0] < nThreads) {
continue;
}
}
The idea is that each thread loops until all of them increment that one piece of memory.
However, the value read from scratch[0] never changes for the threads once it's been read, and it loops forever. I know it's being incremented because it's the correct value when I read it back to the host.
Is the global memory being locally cached? What's going on here?
Found the problem: the order in which work groups are executed is implementation defined. This means that some threads might start only after others have finished.
In the code I gave, the work groups that are started first will loop forever waiting on the the others to hit the 'barrier'. And the work groups that would be started later won't ever start because they're waiting for the first ones to finish.
If the implementation (I'm on a Radeon 5750, using Stream SDK 2.2) executes all work groups concurrently, then it probably wouldn't be an issue. But that's not the case for my setup.