Filebench Error: out of shared memory when I try to set $nfiles a very big number (about 1000000)

Filebench Error: out of shared memory when I try to set $nfiles a very big number (about 1000000) - microbenchmark

When I use filebench to test my file system, the process crashed when I try to set $nfiles a very big number (about 1000000). And the way in the official site do not work!
Here is the solution on the official web site
Second warning informs that Filebench was not able to increase shared memory region size. You can either:
* Run Filebench as root
* Increase shared memory region size to 256MB (sudo echo 268435456 > /proc/sys/kernel/randomize_va_space) and ignore this warning

The shared memory region size is based on the size of the filebench_shm_t struct. If you want to run with more files, you will need to modify the struct. You can do this by modifying ipc.h before compiling. Since you want set $nfiles to a large number, I suggest changing the line:
#define FILEBENCH_NFILESETENTRIES (1024 * 1024)
to
#define FILEBENCH_NFILESETENTRIES (1024 * 1024 * 10)
Recompile and retest. On my RHEL6 machine, the shared memory region went from 170MB to close to 1.5GB.
HTH,
Scott

Related

Calculate disk capacity with given information of disk blocks

I got stuck with the follow question regarding file system
Consider a file system that uses inodes to represent files. Disk blocks are 4 KB in size, and a pointer to a disk block requires 4 bytes. This file system has 12 direct disk blocks, as well as single, double, and triple indirect disk blocks.
(1). What is the maximum size of a file that can be stored in this file system?
(2). What is the disk capacity?
Part 1 is simple, which is
(12*4KB)+(1024*4KB)+(1024*1024*4KB)+(1024*1024*1024*4KB) = 4TB
as each block has 4KB / 4 bytes = 1024 pointers
But I got stuck at part 2, my initial thought is that since the maximum size of a file that can be stored in the file system is 4TB, the disk capacity is 4TB as well. Is that the case?
Hope anyone can help with my problem. Thank you very much.

mmap error : cannot allocate memory. how to allocate enough default sized huge pages as admin?

I was compiling and running this program but received 'mmap error : cannot allocate memory'.
The comment at the top reads
/*
* Example of using hugepage memory in a user application using the mmap
* system call with MAP_HUGETLB flag. Before running this program make
* sure the administrator has allocated enough default sized huge pages
* to cover the 256 MB allocation.
*
* For ia64 architecture, Linux kernel reserves Region number 4 for hugepages.
* That means the addresses starting with 0x800000... will need to be
* specified. Specifying a fixed address is not required on ppc64, i386
* or x86_64.
*/
I want to check if the administrator has allocated enough default sized huge pages to cover the 256 MB allocation but I am the system administrator. What should I do? I'm on ubuntu 20.04 x86_64 machine. ( a side question : Does mmap use heap area?)
ADD : please see my comment (I added a boot command argument and the code works. I temporarily added boot argument in the grub menu.) but I wish I could add a init script so that this takes effect every time the computer boots.

There seem to be 2 methods.
add vm.nr_hugepages = 16 in /etc/sysctrl.conf and reboot.
I've checked this works.
(as Nate Eldredge commented) add 'hugepages=16' in GRUB_CMDLINE_LINUX="" line (inside quotes) of /etc/default/grub and do update-grub.

opencl- async work group copy and maximum work group size

I'm trying to copy global to local memory in OpenCL.
I use "async work group copy" instruction for copying data from global memory to local memory .
__local float gau2_sh[1024];
event_t tevent = (event_t)0;
__local float gau4_sh[256];
tevent = async_work_group_copy(gau2_sh, GAU2, 1024, tevent);
tevent = async_work_group_copy(gau4_sh, GAU4, 256, tevent);
wait_group_events(2, &tevent);
Global memory size of gau2 is 1024 * 4. When I use less than 128 threads, it works fine. But if I use more than 128 threads, kernel results in error CL_INVALID_WORK_GROUP_SIZE.
My GPU is an Adreno420, where the maximum work group size is 1024.
Do I need to consider other thing for local memory copy?

It is caused by register usage and local memory.
Similarly to -cl-nv-maxrregcount=<N> of CUDA, for Qualcomm Adreno series, they have compile option for reducing register usage.
.
The official document related with this thing is proprietary.
So if you concerned about it, please read document included in Qualcomm Adreno SDK.
For the details, please refer to the following links:
Using a barrier causes a CL_INVALID_WORK_GROUP_SIZE error
Questions about global and local work size
Qualcomm Forums - Strange Behavior With OpenCL on Adreno 320
Mobile Gaming & Graphics (Adreno) Tools and Resources

Memory access after ioremap very slow

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.
I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:
#define RESERVED_REGION_SIZE (1 * 1024 * 1024 * 1024) // 1GB
#define RESERVED_REGION_OFFSET (1 * 1024 * 1024 * 1024) // 1GB
static int __init memdrv_init(void)
{
struct timeval t1, t2;
printk(KERN_INFO "[memdriver] init\n");
// Remap reserved physical memory (that we grabbed at boot time)
do_gettimeofday( &t1 );
reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );
// Set the memory to a known value
do_gettimeofday( &t1 );
memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );
// Register the character device
...
return 0;
}
I load the driver, and check dmesg. It reports:
[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec
That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?
This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.
EDIT:
The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.
See my eventual solution below.

ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.
You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.

I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.

It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.
Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.

I have tried out doing a huge memory chunk reservations with the memmap
The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.
when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo
0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).
Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.
So, the time taken has to do mainly with address space conversion and non cacheable page loading.

Image magick/PHP is falling over with large images

I have a PHP script which is used to resize images in a user's FTP folder for use on his website.
While slow to resize, the script has completed correctly with all images in the past. Recently however, the user uploaded an album of 21-Megapixel JPEG images and as I have found, the script is failing to convert the images but not giving out any PHP errors. When I consulted various logs, I've found multiple Apache processes being killed off with Out Of Memory errors.
The functional part of the PHP script is essentially a for loop that iterates through my images on the disk and calls a method that checks if a thumbnail exists and then performs the following:
$image = new Imagick();
$image->readImage($target);
$image->thumbnailImage(1000, 0);
$image->writeImage(realpath($basedir)."/".rescale."/".$filename);
$image->clear();
$image->destroy();
The server has 512MB of RAM, with usually at least 360MB+ free.
PHP has it's memory limit set currently at 96MB, but I have set it higher before without any effect on the issue.
By my estimates, a 21-Megapixel image should occupy in the region of 80MB+ when uncompressed, and so I am puzzled as to why the RAM is disappearing so rapidly unless the Image Magick objects are not being removed from memory.
Is there some way I can optimise my script to use less memory or garbage collect more efficiently?
Do I simply not have the RAM to cope with such large images?
Cheers

See this answer for a more detailed explanation.
imagick uses a shared library and it's memory usage is out of reach for PHP, so tuning PHP memory and garbage collection won't help.
Try adding this prior to creating the new Imagick() object:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 32);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 32);
It will cause imagick to swap to disk (defaults to /tmp) when it needs more than 32 MB for juggling images. It will be slower, but it will not run out of RAM (unless /tmp is on ramdisk, in that case you need to change where imagick writes its temp files).

MattBianco is nearly correct, only change is that the memory limits are in bytes so would be 33554432 for 32MB:
// pixel cache max size
IMagick::setResourceLimit(imagick::RESOURCETYPE_MEMORY, 33554432);
// maximum amount of memory map to allocate for the pixel cache
IMagick::setResourceLimit(imagick::RESOURCETYPE_MAP, 33554432);

Call $image->setSize() before $image->readImage() to have libjpeg resize the image whilst loading to reduce memory usage.
(edit), example usage: Efficient JPEG Image Resizing in PHP

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Filebench Error: out of shared memory when I try to set $nfiles a very big number (about 1000000) - microbenchmark

Related

Calculate disk capacity with given information of disk blocks

mmap error : cannot allocate memory. how to allocate enough default sized huge pages as admin?

opencl- async work group copy and maximum work group size

Memory access after ioremap very slow

Image magick/PHP is falling over with large images

Categories

Resources