Does anyone know if it is possible to get the cpu usage for a specific thread, process, or some code in the application?
If you look at AUGraph it has a function which returns average cpu usage. How do they do that?
I am not sure that this also applies to iOS, but for OS X, you can get the info you need by accessing the Mach/BSD subsystem. Basically you get the list of threads for a given process, then sum all threads' usage to get the process cpu usage. If you need only a thread cpu usage, you don't need to sum up.
It is not as easy and straightforward as one might desire, but here it is (this code is based on Amit Singh "Mac OS X internals"):
pid_t pid = ...; //-- this is the process id you need info for
task_t port;
task_for_pid(mach_task_self(), pid, &port);
task_info_data_t tinfo;
mach_msg_type_number_t task_info_count;
task_info_count = TASK_INFO_MAX;
kr = task_info(port, TASK_BASIC_INFO, (task_info_t)tinfo, &task_info_count);
if (kr != KERN_SUCCESS) {
continue; //-- skip this task
}
task_basic_info_t basic_info;
thread_array_t thread_list;
mach_msg_type_number_t thread_count;
thread_info_data_t thinfo;
mach_msg_type_number_t thread_info_count;
thread_basic_info_t basic_info_th;
uint32_t stat_thread = 0; // Mach threads
basic_info = (task_basic_info_t)tinfo;
// get threads in the task
kr = task_threads(port, &thread_list, &thread_count);
if (kr != KERN_SUCCESS) {
<HANDLE ERROR>
continue;
}
if (thread_count > 0)
stat_thread += thread_count;
long tot_sec = 0;
long tot_usec = 0;
long tot_cpu = 0;
int j;
for (j = 0; j < thread_count; j++) {
thread_info_count = THREAD_INFO_MAX;
kr = thread_info(thread_list[j], THREAD_BASIC_INFO,
(thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
<HANDLE ERROR>
continue;
}
basic_info_th = (thread_basic_info_t)thinfo;
if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
tot_sec = tot_sec + basic_info_th->user_time.seconds + basic_info_th->system_time.seconds;
tot_usec = tot_usec + basic_info_th->system_time.microseconds + basic_info_th->system_time.microseconds;
tot_cpu = tot_cpu + basic_info_th->cpu_usage;
}
} // for each thread
I don't know if there is a better way to do that, possibly so, but this one has worked for me.
EDIT: Apologies--misread your question. This doesn't show usage for a specific thread/process in an app, but does do what AUGraph does.
Just use getloadavg():
#include <stdlib.h>
double loadavg[3];
getloadavg(loadavg, 3);
NSLog(#"Average over last minute: %f", loadavg[0]);
NSLog(#"Average over last 5 minutes: %f", loadavg[1]);
NSLog(#"Average over last 10 minutes: %f", loadavg[2]);
Sample output:
... getloadavg[62486:207] Average over last minute: 0.377441
... getloadavg[62486:207] Average over last 5 minutes: 0.450195
... getloadavg[62486:207] Average over last 10 minutes: 0.415527
So, those reported values are percentages. To see them as such:
#include <stdlib.h>
double loadavg[3];
getloadavg(loadavg, 3);
NSLog(#"Average over last minute: %02.2f%%", loadavg[0] * 100);
NSLog(#"Average over last 5 minutes: %02.2f%%", loadavg[1] * 100);
NSLog(#"Average over last 10 minutes: %02.2f%%", loadavg[2] * 100);
Sample output from this:
... getloadavg[62531:207] Average over last minute: 23.93%
... getloadavg[62531:207] Average over last 5 minutes: 33.01%
... getloadavg[62531:207] Average over last 10 minutes: 36.72%
More from the Apple man page here.
Related
I am working on an ARMv7 Embedded system, which uses an RTOS in the cortex-A7 SOC(2 cores, 256KB L2 Cache).
Now I want to measure the memory bandwidth of the CPU in this system, so I wrote following functions to do the measurement.
The function allocates 32MB memory and does memory reading in 10 loops. And measure the time of the reading, to get the memory reading bandwidth.. (I know there is dcache involved in it so the measurement is not precise).
#define T_MEM_SIZE 0x2000000
static void print_summary(char *tst, uint64_t ticks)
{
uint64_t msz = T_MEM_SIZE/1000000;
float msec = (float)ticks/24000;
printf("%s: %.2f MB/Sec\n", tst, 1000 * msz/msec);
}
static void memrd_cache(void)
{
int *mptr = malloc(T_MEM_SIZE);
register uint32_t i = 0;
register int va = 0;
uint32_t s, e, diff = 0, maxdiff = 0;
uint16_t loop = 0;
if (mptr == NULL)
return;
while (loop++ < 10) {
s = read_cntpct();
for (i = 0; i < T_MEM_SIZE/sizeof(int); i++) {
va = mptr[i];
}
e = read_cntpct();
diff = e - s;
if (diff > maxdiff) {
maxdiff = diff;
}
}
free(mptr);
print_summary("memrd", maxdiff);
}
Below is the measurement of reading, which tries to remove the caching effect of Dcache.
It fills 4Bytes in each cache line (CPU may fill the cacheline), until the 256KB L2 cache is full and be flushed/reloaded, so I think the Dcache effect should be minimized. (I may be wrong, correct me please).
#define CLINE_SIZE 16 // 16 * 4B
static void memrd_nocache(void)
{
int *mptr = malloc(T_MEM_SIZE);
register uint32_t col = 0, ln = 0;
register int va = 0;
uint32_t s, e, diff = 0, maxdiff = 0;
uint16_t loop = 0;
if (mptr == NULL)
return;
while (loop++ < 10) {
s = read_cntpct();
for (col = 0; col < CLINE_SIZE; col++) {
for (ln = 0; ln < T_MEM_SIZE/(CLINE_SIZE*sizeof(int)); ln++) {
va = *(mptr + ln * CLINE_SIZE + col);
}
}
e = read_cntpct();
diff = e - s;
if (diff > maxdiff) {
maxdiff = diff;
}
}
free(mptr);
print_summary("memrd_nocache", maxdiff);
}
After running these 2 functions, I found the bandwith is about,
memrd: 1973.04 MB/Sec
memrd_nocache: 1960.67 MB/Sec
The CPU is running at 1GHz, with DDR3 on dieļ¼ the two testing has the similar data!? It is a big surprise to me.
I had worked with lmbench in Linux ARM server, but I don't think it can be ran in this embedded system.
So I want to get a software tool to measure the memory bandwidth in this embedded system, get one from community or do it by myself.
I want to calculate total hours of mobile use per day. I already implemented the code below but still not getting any proper response of total hours. So, please suggest another way to find total hours of daily mobile use.
Import two file:-
#import <mach/mach.h>
#import <assert.h>
-(float) cpu_usage
{
kern_return_t kr;
task_info_data_t tinfo;
mach_msg_type_number_t task_info_count;
task_info_count = TASK_INFO_MAX;
kr = task_info(mach_task_self(), TASK_BASIC_INFO, (task_info_t)tinfo, &task_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
task_basic_info_t basic_info;
thread_array_t thread_list;
mach_msg_type_number_t thread_count;
thread_info_data_t thinfo;
mach_msg_type_number_t thread_info_count;
thread_basic_info_t basic_info_th;
uint32_t stat_thread = 0; // Mach threads
basic_info = (task_basic_info_t)tinfo;
// get threads in the task
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
if (kr != KERN_SUCCESS) {
return -1;
}
if (thread_count > 0)
NSLog(#"Count::%u",thread_count);
NSDate *methodStart = [NSDate date];
NSDate *methodFinish = [NSDate date];
NSTimeInterval executionTime = [methodFinish timeIntervalSinceDate:methodStart];
NSLog(#"executionTime = %f", executionTime);
NSString *sTr=[NSString stringWithFormat:#"%u",thread_count];
lbl_smartphone.text = sTr;
stat_thread += thread_count;
long tot_sec = 0;
long tot_usec = 0;
float tot_cpu = 0;
int j;
for (j = 0; j < (int)thread_count; j++)
{
thread_info_count = THREAD_INFO_MAX;
kr = thread_info(thread_list[j], THREAD_BASIC_INFO,
(thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
basic_info_th = (thread_basic_info_t)thinfo;
if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
tot_sec = tot_sec + basic_info_th->user_time.seconds + basic_info_th->system_time.seconds;
tot_usec = tot_usec + basic_info_th->user_time.microseconds + basic_info_th->system_time.microseconds;
tot_cpu = tot_cpu + basic_info_th->cpu_usage / (int)TH_USAGE_SCALE * 100.0;
NSLog(#"Total CPU use:: %f",tot_cpu);
}
} // for each thread
kr = vm_deallocate(mach_task_self(), (vm_offset_t)thread_list, thread_count * sizeof(thread_t));
assert(kr == KERN_SUCCESS);
return tot_cpu;
}
I get thread count from above code but I want to get total hours mobile use per day
I want to get notification whenever my application CPU usage goes to certain level.
I can then print all logs and find out whats going wrong.
May be the following links will help you-
Get CPU percent usage
iOS - Get CPU usage from application
How to get the active processes running in iOS
And after getting CPU uses you can find desired point.
With the help of this answer
- (float)cpuUsages
{
kern_return_t kr;
task_info_data_t tinfo;
mach_msg_type_number_t task_info_count;
task_info_count = TASK_INFO_MAX;
kr = task_info(mach_task_self(), TASK_BASIC_INFO, (task_info_t)tinfo, &task_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
task_basic_info_t basic_info;
thread_array_t thread_list;
mach_msg_type_number_t thread_count;
thread_info_data_t thinfo;
mach_msg_type_number_t thread_info_count;
thread_basic_info_t basic_info_th;
uint32_t stat_thread = 0; // Mach threads
basic_info = (task_basic_info_t)tinfo;
// get threads in the task
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
if (kr != KERN_SUCCESS) {
return -1;
}
if (thread_count > 0)
stat_thread += thread_count;
long tot_sec = 0;
long tot_usec = 0;
float tot_cpu = 0;
int j;
for (j = 0; j < thread_count; j++)
{
thread_info_count = THREAD_INFO_MAX;
kr = thread_info(thread_list[j], THREAD_BASIC_INFO,
(thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
basic_info_th = (thread_basic_info_t)thinfo;
if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
tot_sec = tot_sec + basic_info_th->user_time.seconds + basic_info_th->system_time.seconds;
tot_usec = tot_usec + basic_info_th->system_time.microseconds + basic_info_th->system_time.microseconds;
tot_cpu = tot_cpu + basic_info_th->cpu_usage / (float)TH_USAGE_SCALE * 100.0;
}
} // for each thread
kr = vm_deallocate(mach_task_self(), (vm_offset_t)thread_list, thread_count * sizeof(thread_t));
assert(kr == KERN_SUCCESS);
return tot_cpu;
}
You can fire a Timer let say 5 sec, on each tick you need to call the blow method. And when you find the point you want then post your notification or what ever you want to do.
- (void)checkCPUUsagesInEvery5Sec {
if([self cpuUsages] > 10.0f) // your desire point
{
// your stuff
}
}
I am using the following code to get CPU load on iOS and Cocoa (Mach).
The strange thing is: if I call this code at regular intervals, say 30 times per second, available memory shrinks progressively and eventually the program crashes.
Profiling the program with Instruments, I see neither leaks, nor new memory allocation (the leaks diagram is empty and the allocation diagram is flat). Still, available physical memory keeps going down until the program crashes (it takes at least 40 minutes on an 256MB iPod, so it is not a big memory occupation).
What I suspect is that this code uses some kernel resource and does not release it correctly.
Is anyone able to explain this behaviour?
#include "cpu_usage.h"
#import <mach/mach.h>
float cpu_usage()
{
kern_return_t kr;
task_info_data_t tinfo;
mach_msg_type_number_t task_info_count;
task_info_count = TASK_INFO_MAX;
kr = task_info(mach_task_self(), TASK_BASIC_INFO, (task_info_t)tinfo, &task_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
thread_array_t thread_list;
mach_msg_type_number_t thread_count;
thread_info_data_t thinfo;
mach_msg_type_number_t thread_info_count;
thread_basic_info_t basic_info_th;
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
if (kr != KERN_SUCCESS) {
return -1;
}
long tot_sec = 0;
long tot_usec = 0;
float tot_cpu = 0;
int j;
for (j = 0; j < thread_count; j++)
{
thread_info_count = THREAD_INFO_MAX;
kr = thread_info(thread_list[j], THREAD_BASIC_INFO,
(thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
basic_info_th = (thread_basic_info_t)thinfo;
if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
tot_sec = tot_sec + basic_info_th->user_time.seconds + basic_info_th->system_time.seconds;
tot_usec = tot_usec + basic_info_th->system_time.microseconds + basic_info_th->system_time.microseconds;
tot_cpu = tot_cpu + basic_info_th->cpu_usage / (float)TH_USAGE_SCALE * 100.0;
}
}
return tot_cpu;
}
From the answer to this question iOS - Get CPU usage from application it can be seen that the memory allocated by
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
must be released using
kr = vm_deallocate(mach_task_self(), (vm_offset_t)thread_list,
thread_count * sizeof(thread_t));
I am runnig the follwoing code using shared memory:
__global__ void computeAddShared(int *in , int *out, int sizeInput){
//not made parameters gidata and godata to emphasize that parameters get copy of address and are different from pointers in host code
extern __shared__ float temp[];
int tid = blockIdx.x * blockDim.x + threadIdx.x;
int ltid = threadIdx.x;
temp[ltid] = 0;
while(tid < sizeInput){
temp[ltid] += in[tid];
tid+=gridDim.x * blockDim.x; // to handle array of any size
}
__syncthreads();
int offset = 1;
while(offset < blockDim.x){
if(ltid % (offset * 2) == 0){
temp[ltid] = temp[ltid] + temp[ltid + offset];
}
__syncthreads();
offset*=2;
}
if(ltid == 0){
out[blockIdx.x] = temp[0];
}
}
int main(){
int size = 16; // size of present input array. Changes after every loop iteration
int cidata[] = {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16};
/*FILE *f;
f = fopen("invertedList.txt" , "w");
a[0] = 1 + (rand() % 8);
fprintf(f, "%d,",a[0]);
for( int i = 1 ; i< N; i++){
a[i] = a[i-1] + (rand() % 8) + 1;
fprintf(f, "%d,",a[i]);
}
fclose(f);*/
int* gidata;
int* godata;
cudaMalloc((void**)&gidata, size* sizeof(int));
cudaMemcpy(gidata,cidata, size * sizeof(int), cudaMemcpyHostToDevice);
int TPB = 4;
int blocks = 10; //to get things kicked off
cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);
cudaEventRecord(start, 0);
while(blocks != 1 ){
if(size < TPB){
TPB = size; // size is 2^sth
}
blocks = (size+ TPB -1 ) / TPB;
cudaMalloc((void**)&godata, blocks * sizeof(int));
computeAddShared<<<blocks, TPB,TPB>>>(gidata, godata,size);
cudaFree(gidata);
gidata = godata;
size = blocks;
}
//printf("The error by cuda is %s",cudaGetErrorString(cudaGetLastError()));
cudaEventRecord(stop, 0);
cudaEventSynchronize(stop);
float elapsedTime;
cudaEventElapsedTime(&elapsedTime , start, stop);
printf("time is %f ms", elapsedTime);
int *output = (int*)malloc(sizeof(int));
cudaMemcpy(output, gidata, sizeof(int), cudaMemcpyDeviceToHost);
//Cant free either earlier as both point to same location
cudaError_t chk = cudaFree(godata);
if(chk!=0){
printf("First chk also printed error. Maybe error in my logic\n");
}
printf("The error by threadsyn is %s", cudaGetErrorString(cudaGetLastError()));
printf("The sum of the array is %d\n", output[0]);
getchar();
return 0;
}
Clearly, the first while loop in computeAddShared is causing out of bounds error because I am allocating 4 bytes to shared memory. Why does cudamemcheck not catch this. Below is the output of cuda-memcheck
========= CUDA-MEMCHECK
time is 12.334816 msThe error by threadsyn is no errorThe sum of the array is 13
6
========= ERROR SUMMARY: 0 errors
Shared memory allocation granularity. The Hardware undoubtedly has a page size for allocations (probably the same as the L1 cache line side). With only 4 threads per block, there will "accidentally" be enough shared memory in a single page to let you code work. If you used a sensible number of threads block (ie. a round multiple of the warp size) the error would be detected because there would not be enough allocated memory.