I'm causing the device (iPad) to run out of memory apparently, so it is jettisoning my app. I'm trying to understand what is going on as Instruments is telling me that I'm using about 80Mb, and there is no other app running on the device.
I found this code snippet to ask the Mach system under iOS for the memory stats:
#import <mach/mach.h>
#import <mach/mach_host.h>
static void print_free_memory () {
mach_port_t host_port;
mach_msg_type_number_t host_size;
vm_size_t pagesize;
host_port = mach_host_self();
host_size = sizeof(vm_statistics_data_t) / sizeof(integer_t);
host_page_size(host_port, &pagesize);
vm_statistics_data_t vm_stat;
if (host_statistics(host_port, HOST_VM_INFO, (host_info_t)&vm_stat, &host_size) != KERN_SUCCESS)
NSLog(#"Failed to fetch vm statistics");
/* Stats in bytes */
natural_t mem_used = (vm_stat.active_count +
vm_stat.inactive_count +
vm_stat.wire_count) * pagesize;
natural_t mem_free = vm_stat.free_count * pagesize;
natural_t mem_total = mem_used + mem_free;
NSLog(#"used: %u free: %u total: %u", mem_used, mem_free, mem_total);
}
When I use this function to obtain the three memory values, I am finding that the mem_total value is going way down even though the mem_used total is not changing all that much. Here are two successive output lines:
<Warning>: used: 78585856 free: 157941760 total: 236527616
some code executes....
<Warning>: used 83976192 free: 10551296 total: 94527488
So all of a suden I go from 157MB of free memory to 10MB of free memory, but my usage only increased from 78MB to 84MB. The total memory decreased from 236MB to 94MB.
Does this make sense to anybody? There are no other applications running on the device during this time period, the device should be basically totally dedicated to my application.
All of the code that executes between the two memory checks is native C++ code that has no interaction at all with any Apple framework. There are indeed many, many calls to the memory system to allocate and deallocate objects from the C++ heap, but as seen, only about 4MB of additional memory is in the end allocated, all the rest is freed / deleted.
Can it be that the missing memory is being consumed by heap fragmentation? I.e. is the heap simply so fragmented that the block overhead is consuming all of the additional, unaccounted, memory?
Has anyone else seen this behavior?
Thanks,
-Eric
You should use task_info instead of host_statistics to retrieve memory usage of your application:
# include <mach/mach.h>
# include <mach/mach_host.h>
void dump_memory_usage() {
task_basic_info info;
mach_msg_type_number_t size = sizeof( info );
kern_return_t kerr = task_info( mach_task_self(), TASK_BASIC_INFO, (task_info_t)&info, &size );
if ( kerr == KERN_SUCCESS ) {
NSLog( #"task_info: 0x%08lx 0x%08lx\n", info.virtual_size, info.resident_size );
}
else {
NSLog( #"task_info failed with error %ld ( 0x%08lx ), '%s'\n", kerr, kerr, mach_error_string( kerr ) );
}
}
Related
I am using the below piece of code for computing free RAM in iphone.
size_t get_free_memory(void)
{
mach_port_t host_port;
mach_msg_type_number_t host_size;
vm_size_t pagesize;
host_port = mach_host_self();
host_size = sizeof(vm_statistics_data_t) / sizeof(integer_t);
host_page_size(host_port, &pagesize);
vm_statistics_data_t vm_stat;
if (host_statistics(host_port, HOST_VM_INFO, (host_info_t)&vm_stat, &host_size) != KERN_SUCCESS)
{
NSLog(#"Failed to fetch vm statistics");
return 0;
}
/* Stats in bytes */
size_t mem_free = vm_stat.free_count * pagesize;
return mem_free;
}
Although, the above code works in iOS also but host_statistics is macOS ONLY supported API.
The official Apple documentation doesn't say anything about the API being officially supported
in iOS.
My question, is there another way to compute the free and used RAM in iphone?
I want to user access memory from a PCIe board which provides a 1GB memory with BAR0.
Currently I use only read and write functionality of my character device driver, which is VERY slow (1MB/s read and 16MB/s write) on a 8x PCIe Gen3.
static ssize_t
MPD_read(
struct file *filp,
char *buffer,
size_t bufferSize,
loff_t *offset )
{
unsigned long unusedBytes = copy_to_user(
( void * ) buffer,
MPD_AdapterBoard.bars[ 0 ].barHWAddress,
bufferSize );
return 0;
}
static ssize_t
MPD_write(
struct file *filp,
const char *buffer,
size_t bufferSize,
loff_t *offset )
{
unsigned long unusedBytes = copy_from_user(
MPD_AdapterBoard.bars[ 0 ].barHWAddress,
( void * ) buffer,
bufferSize );
return 0;
}
Is it possible to use the MMAP (with the .mmap file operation) to get more speed ?
Or is DMA the only option ?
Thanks in advance!
/Jesko
I found out how it's working:
static int
MPD_mmap(
struct file *filp,
struct vm_area_struct *vma )
{
unsigned long offset;
offset = vma->vm_pgoff << PAGE_SHIFT;
if (( offset + ( vma->vm_end - vma->vm_start )) > MPD_AdapterBoard.bars[ 0 ].barSizeInBytes )
{
return -EINVAL;
}
offset += ( unsigned long ) MPD_AdapterBoard.bars[ 0 ].mmioStart;
vma->vm_page_prot = pgprot_noncached( vma->vm_page_prot );
if ( io_remap_pfn_range(vma, vma->vm_start, offset >> PAGE_SHIFT, vma->vm_end - vma->vm_start, vma->vm_page_prot ))
{
return -EAGAIN;
}
return 0;
}
Attention: This is work in progress, so error checking is fairly limited.
In the hope to help someone here, the complete code can be downloaded including a test program from here: https://github.com/jesko42/minipci
I thought the maximal size of global memory should be only limited by the GPU device no matter it is allocated statically using __device__ __manged__ or dynamically using cudaMalloc.
But I found that if using the __device__ manged__ way, the maximum array size I can declare is much smaller than the GPU device limit.
The minimal working example is as follows:
#include <stdio.h>
#include <cuda_runtime.h>
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
#define MX 64
#define MY 64
#define MZ 64
#define NX 64
#define NY 64
#define M (MX * MY * MZ)
__device__ __managed__ float A[NY][NX][M];
__device__ __managed__ float B[NY][NX][M];
__global__ void swapAB()
{
int tid = blockIdx.x * blockDim.x + threadIdx.x;
for(int j = 0; j < NY; j++)
for(int i = 0; i < NX; i++)
A[j][i][tid] = B[j][i][tid];
}
int main()
{
swapAB<<<M/256,256>>>();
gpuErrchk( cudaPeekAtLastError() );
gpuErrchk( cudaDeviceSynchronize() );
return 0;
}
It uses 64 ^5 * 2 * 4 / 2^30 GB = 8 GB global memory, and I'll run compile and run it on a Nvidia Telsa K40c GPU which has a 12GB global memory.
Compiler cmd:
nvcc test.cu -gencode arch=compute_30,code=sm_30
Output warning:
warning: overflow in implicit constant conversion.
When I ran the generated executable, an error says:
GPUassert: an illegal memory access was encountered test.cu
Surprisingly, if I use the dynamically allocated global memory of the same size (8GB) via the cudaMalloc API instead, there is no compiling warning and runtime error.
I'm wondering if there are any special limitation about the allocatable size of static global device memory in CUDA.
Thanks!
PS: OS and CUDA: CentOS 6.5 x64, CUDA-7.5.
This would appear to be a limitation of the CUDA runtime API. The root cause is this function (in CUDA 7.5):
__cudaRegisterVar(
void **fatCubinHandle,
char *hostVar,
char *deviceAddress,
const char *deviceName,
int ext,
int size,
int constant,
int global
);
which only accepts a signed int for the size of any statically declared device variable. This would limit the maximum size to 2^31 (2147483648) bytes. The warning you see is because the CUDA front end is emitting boilerplate code containing calls to __cudaResgisterVar like this:
__cudaRegisterManagedVariable(__T26, __shadow_var(A,::A), 0, 4294967296, 0, 0);
__cudaRegisterManagedVariable(__T26, __shadow_var(B,::B), 0, 4294967296, 0, 0);
It is the 4294967296 which is the source of the problem. The size will overflow the signed integer and cause the API call to blow up. So it seems you are limited to 2Gb per static variable for the moment. I would recommend raising this as a bug with NVIDIA if it is a serious problem for your application.
I've got some problem to get free, inactive, used and wire memory system on iphone using code (the purpose is to display these informations to the user).
I have found on other thread code that give all memory details :
int mib[6];
mib[0] = CTL_HW;
mib[1] = HW_PAGESIZE;
int pagesize;
size_t length;
length = sizeof (pagesize);
if (sysctl (mib, 2, &pagesize, &length, NULL, 0) < 0)
{
fprintf (stderr, "getting page size");
}
mach_msg_type_number_t count = HOST_VM_INFO_COUNT;
vm_statistics_data_t vmstat;
if (host_statistics (mach_host_self (), HOST_VM_INFO, (host_info_t) &vmstat, &count) != KERN_SUCCESS)
{
fprintf (stderr, "Failed to get VM statistics.");
}
double unit = 1024 * 1024;
NSLog(#"Free : %f , inactive : %f, active : %f, wired : %f",vmstat.free_count * pagesize / unit,vmstat.inactive_count * pagesize / unit,vmstat.active_count * pagesize / unit,vmstat.wire_count * pagesize / unit);
But the result is very strange :
Free : 344.687500 , inactive : 359.593750, active : 1383.843750, wired : 842.578125
I'm currently testing on iphone 5s, which is known to have only 1024 Mb of RAM.
So if someone has an idea of how to get real informations about memory that would be great :D
Thanks!
Romain
I am using the following code to get CPU load on iOS and Cocoa (Mach).
The strange thing is: if I call this code at regular intervals, say 30 times per second, available memory shrinks progressively and eventually the program crashes.
Profiling the program with Instruments, I see neither leaks, nor new memory allocation (the leaks diagram is empty and the allocation diagram is flat). Still, available physical memory keeps going down until the program crashes (it takes at least 40 minutes on an 256MB iPod, so it is not a big memory occupation).
What I suspect is that this code uses some kernel resource and does not release it correctly.
Is anyone able to explain this behaviour?
#include "cpu_usage.h"
#import <mach/mach.h>
float cpu_usage()
{
kern_return_t kr;
task_info_data_t tinfo;
mach_msg_type_number_t task_info_count;
task_info_count = TASK_INFO_MAX;
kr = task_info(mach_task_self(), TASK_BASIC_INFO, (task_info_t)tinfo, &task_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
thread_array_t thread_list;
mach_msg_type_number_t thread_count;
thread_info_data_t thinfo;
mach_msg_type_number_t thread_info_count;
thread_basic_info_t basic_info_th;
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
if (kr != KERN_SUCCESS) {
return -1;
}
long tot_sec = 0;
long tot_usec = 0;
float tot_cpu = 0;
int j;
for (j = 0; j < thread_count; j++)
{
thread_info_count = THREAD_INFO_MAX;
kr = thread_info(thread_list[j], THREAD_BASIC_INFO,
(thread_info_t)thinfo, &thread_info_count);
if (kr != KERN_SUCCESS) {
return -1;
}
basic_info_th = (thread_basic_info_t)thinfo;
if (!(basic_info_th->flags & TH_FLAGS_IDLE)) {
tot_sec = tot_sec + basic_info_th->user_time.seconds + basic_info_th->system_time.seconds;
tot_usec = tot_usec + basic_info_th->system_time.microseconds + basic_info_th->system_time.microseconds;
tot_cpu = tot_cpu + basic_info_th->cpu_usage / (float)TH_USAGE_SCALE * 100.0;
}
}
return tot_cpu;
}
From the answer to this question iOS - Get CPU usage from application it can be seen that the memory allocated by
kr = task_threads(mach_task_self(), &thread_list, &thread_count);
must be released using
kr = vm_deallocate(mach_task_self(), (vm_offset_t)thread_list,
thread_count * sizeof(thread_t));