Unable to cause Fragmentation - contiki

I am using latest build of contiki.
I have just added the following lines here.
#define SICLOWPAN_CONF_FRAG 1
#define UIP_CONF_BROADCAST 1
#define UIP_CONF_REASSEMBLY 1
#define NETSTACK_CONF_WITH_IPV6 1
#define UIP_CONF_IPV6_REASSEMBLY 1
#define UIP_CONF_UDP 1
I have done no other changes to any of the files.
I want to force fragmentation. Irrespective of whether SICSLOWPAN_CONF_FRAG is enabled, the control reaches this line.
I have experimented with various values for MAX_PAYLOAD_LEN but I could not force fragmentation.
I noticed that large packets are not delivered, but are not fragmented.
What is the minimum MAX_PAYLOAD_LEN at which fragmentation should happen?
How do I make fragmentation happen?

I was increasing size of payload but I partially fed data into buffer. As soon as I fed a lot of data in complete buf, the issue was solved.
Credits: https://github.com/contiki-os/contiki/issues/2141

Related

Where does the increment value for sbrk come from when using newlib nano malloc?

I have a target (Stm32f030R8) I am using with FreeRTOS and the newlib reentrant heap implementation (http://www.nadler.com/embedded/newlibAndFreeRTOS.html). This shim defines sbrk in addition to implementing the actual FreeRTOS memory allocation shim. The project is built with GNU ARM GCC and using --specs=nosys.specs.
For the following example, FreeRTOS has not been started yet. malloc has not been called before. The mcu is fresh off boot and has only just copied initialized data from flash into ram and configured clocks and basic peripherals.
Simply having the lib in my project, I am seeing that sbrk is being called with very large increment values for seemingly small malloc calls.
The target has 8K of memory, of which I have 0x12b8 (~4KB bytes between the start of the heap and end of ram (top of the stack).
I am seeing that if I allocate 1000 bytes with str = (char*) malloc(1000);, that sbrk gets called twice. First with an increment value of 0x07e8 and then again with an increment value of 0x0c60. the result being that the desired total increment count is 0x1448 (5192 bytes!) and of course this overflows not just the stack, but available ram.
What on earth is going on here? Why are these huge increment values being used by malloc for such a relatively small desired buffer allocation?
I think it may not possible to answer definitively, rather than just advise on debugging. The simplest solution is to step through the allocation code to determine where and why the allocation size request is being corrupted (as it appears to be). You will need to course to build the library from source or at least include mallocr.c in your code to override any static library implementation.
In Newlib Nano the call-stack to _sbrk_r is rather simple (compared to regular Newlib). The increment is determined from the allocation size s in nano_malloc() in nano-mallocr.c as follows:
malloc_size_t alloc_size;
alloc_size = ALIGN_TO(s, CHUNK_ALIGN); /* size of aligned data load */
alloc_size += MALLOC_PADDING; /* padding */
alloc_size += CHUNK_OFFSET; /* size of chunk head */
alloc_size = MAX(alloc_size, MALLOC_MINCHUNK);
Then if a chunk of alloc_size is not found in the free-list (which is always the case when free() has not been called), then sbrk_aligned( alloc_size ) is called which calls _sbrk_r(). None of the padding, alignment or minimum allocation macros add such a large amount.
sbrk_aligned( alloc_size ) calls _sbrk_r if the request is not aligned. The second call should be never be larger than CHUNK_ALIGN - (sizeof(void*)).
In you debugger you should be able to inspect the call stack or step through the call to see where parameter becomes incorrect.

Ever-increasing RAM usage with low series cardinality

I'm just testing influxdb 1.3.5 for storing a small number (~30-300) of very long integer series (worst case: (86400)*(12*365) [sec/day * ((days/year)*12) * 1 device] = 378.432.000)
e.g. the number of total points would be for 320 devices: (86400)*(12*365)*320 [sec/day * ((days/year)*12) * 320 devices] = 121.098.240.000)
The series cardinality is low, it equals the number of devices. I'm using second-precision timestamps (that mode is enabled when I commit to influxdb via the php-API.
Yes, I really need to keep all the samples, so downsampling is not an option.
I'm inserting the samples as point-arrays of size 86400 per request sorted from oldest to newest. The behaviour is similar (OOM in both cases) for inmem and tsi1 indexing modes.
Despite all that, I'm not able to insert this number of points to the database without crashing it due to out of memory. The host-vm has 8GiB of RAM and 4GiB of Swap which fill up completely. I cannot find anything about that setup being problematic in the documentation. I cannot find a notice that indicates this setup should result in a high RAM usage at all...
Does anyone have a hint on what could be wrong here?
Thanks and all the best!
b-
[ I asked the same question here but received no replies, that's the reason for crossposting: https://community.influxdata.com/t/ever-increasing-ram-usage-with-low-series-cardinality/2555 ]
I found out what the issue most likely was:
I had a bug in my feeder that caused timestamps not being updated to lots of points with distinct values were written over and over again to the same timestamp/tag combination.
If you experience something similar, try double-checking each step in the pipeline for a time concerning error.
This was not the issue unfortunately, the ram usage rises nevertheless then importing more points than before.

Filebench Error: out of shared memory when I try to set $nfiles a very big number (about 1000000)

When I use filebench to test my file system, the process crashed when I try to set $nfiles a very big number (about 1000000). And the way in the official site do not work!
Here is the solution on the official web site
Second warning informs that Filebench was not able to increase shared memory region size. You can either:
* Run Filebench as root
* Increase shared memory region size to 256MB (sudo echo 268435456 > /proc/sys/kernel/randomize_va_space) and ignore this warning
The shared memory region size is based on the size of the filebench_shm_t struct. If you want to run with more files, you will need to modify the struct. You can do this by modifying ipc.h before compiling. Since you want set $nfiles to a large number, I suggest changing the line:
#define FILEBENCH_NFILESETENTRIES (1024 * 1024)
to
#define FILEBENCH_NFILESETENTRIES (1024 * 1024 * 10)
Recompile and retest. On my RHEL6 machine, the shared memory region went from 170MB to close to 1.5GB.
HTH,
Scott

Using inactive memory to my advantage. What is this code storing in RAM or inactive memory?

I'm developing on OS X 10.8.3. The following code is simple. It can perform two operations. If the read function is uncommented then the program will open the file at "address" and transfer all of its contents into data. If instead, the memcpy function is uncommented the program will copy the mmapped contents into data. I am developing on a mac which caches commonly used files in inactive memory of RAM for faster future access. I have turned off caching in both the file control and mmap becuase I am working with large files of 1 GB or greater. If i did not setup the NOCACHE option, the entire 1 GB would be stored in inactive memory.
If the read function is uncommented, the program behaves as expected. Nothing is cached and everytime the program is ran it takes about 20 seconds to read the entire 1 GB.
But if instead the memcpy function is uncommented something changes. Still i see no increase in memory and it still takes 20 seconds to copy on the first run. But every execution after the previous one, copies in under a second. This is very analogous to the behavior of caching the entire file in inactive memory, but I never see an increase in memory. Even if I then do not mmap the file and only perform a read, it performs in the same time, under a second.
Something must be getting stored in inactive memory, but what and how do I track it? I would like to find what is being stored and use it to my advantage.
I am using activity monitor to see a general memory size. I am using Xcode Instruments to compare the initial memcpy execution to an execution where both read and memcpy are commented. I see no difference in the Allocations, File Activity, Reads/Writes, VM Tracker, or Shared Memory tools.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/mman.h>
int main(int argc, const char * argv[])
{
unsigned char *data;
unsigned char *mmapdata;
size_t length;
int file = open("address", O_RDONLY);
fcntl(file, F_NOCACHE, 1);
struct stat st;
stat("address", &st);
length = st.st_size;
data = malloc(length);
memset(data,0,length);
mmapdata = mmap(NULL, length, PROT_READ,MAP_SHARED|MAP_NOCACHE, file, 0);
if (mmapdata == MAP_FAILED)
fprintf(stderr, "failure");
// read(file,data,length);
close(file);
// memcpy(data,mmapdata,length);
munmap(mmapdata,length);
free(data);
return 0;
}
UPDATE:
Sorry if I was unclear. During program execution, the active memory portion of my RAM increases according to the data I malloc and the size of the mmapped file. This is surely where the pages are residing. After cleanup, the amount of available memory returns to as it was before. Inactive memory is never increased. It makes sense that the OS wouldn't really throw that active memory away since free memory is useless, but this process is not identical to caching, because of the following reason. I've tested two scenarios. In both I load a number of files who's size totals more than my available ram. One scenario I cache the files and one I do not. With caching, my inactive memory increases and once I fill my ram everything slows down tremendously. Loading a new file will replace another file's allocated inactive memory but this process will take an exceptionally longer time than the next scenario. The next scenario is with caching off. I again run the program several times loading enough files to fill my ram, but inactive memory never increases and active memory always returns to normal so it appears I've done nothing. The files I've mmapped still load fast, just as before, but mmapping new files load in a normal amount of time, replacing other files. My system never slows down with this method. Why is the second scenario faster?
How could the OS possibly make a memcpy on an mmap'd file work if the file's pages weren't resident in memory? The OS takes your hint that you don't want the data cached, but if still will if it has no choice or it has nothing better to do with the memory.
Your pages have the lowest priority, because the OS believes you that you won't access them again. But they had to be resident for the memcpy to work, and the OS won't throw them away just to have free memory (which is 100% useless). Inactive memory is better than free memory because there's at least some chance it might save I/O operations.

Memory access after ioremap very slow

I'm working on a Linux kernel driver that makes a chunk of physical memory available to user space. I have a working version of the driver, but it's currently very slow. So, I've gone back a few steps and tried making a small, simple driver to recreate the problem.
I reserve the memory at boot time using the kernel parameter memmap=2G$1G. Then, in the driver's __init function, I ioremap some of this memory, and initialize it to a known value. I put in some code to measure the timing as well:
#define RESERVED_REGION_SIZE (1 * 1024 * 1024 * 1024) // 1GB
#define RESERVED_REGION_OFFSET (1 * 1024 * 1024 * 1024) // 1GB
static int __init memdrv_init(void)
{
struct timeval t1, t2;
printk(KERN_INFO "[memdriver] init\n");
// Remap reserved physical memory (that we grabbed at boot time)
do_gettimeofday( &t1 );
reservedBlock = ioremap( RESERVED_REGION_OFFSET, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] ioremap() took %d usec\n", usec_diff( &t2, &t1 ) );
// Set the memory to a known value
do_gettimeofday( &t1 );
memset( reservedBlock, 0xAB, RESERVED_REGION_SIZE );
do_gettimeofday( &t2 );
printk( KERN_ERR "[memdriver] memset() took %d usec\n", usec_diff( &t2, &t1 ) );
// Register the character device
...
return 0;
}
I load the driver, and check dmesg. It reports:
[memdriver] init
[memdriver] ioremap() took 76268 usec
[memdriver] memset() took 12622779 usec
That's 12.6 seconds for the memset. That means the memset is running at 81 MB/sec. Why on earth is it so slow?
This is kernel 2.6.34 on Fedora 13, and it's an x86_64 system.
EDIT:
The goal behind this scheme is to take a chunk of physical memory and make it available to both a PCI device (via the memory's bus/physical address) and a user space application (via a call to mmap, supported by the driver). The PCI device will then continually fill this memory with data, and the user-space app will read it out. If ioremap is a bad way to do this (as Ben suggested below), I'm open to other suggestions that'll allow me to get any large chunk of memory that can be directly accessed by both hardware and software. I can probably make do with a smaller buffer also.
See my eventual solution below.
ioremap allocates uncacheable pages, as you'd desire for access to a memory-mapped-io device. That would explain your poor performance.
You probably want kmalloc or vmalloc. The usual reference materials will explain the capabilities of each.
I don't think ioremap() is what you want there. You should only access the result (what you call reservedBlock) with readb, readl, writeb, memcpy_toio etc. It is not even guaranteed that the return is virtually mapped (although it apparently is on your platform). I'd guess that the region is being mapped uncached (suitable for IO registers) leading to the terrible performance.
It's been a while, but I'm updating since I did eventually find a workaround for this ioremap problem.
Since we had custom hardware writing directly to the memory, it was probably more correct to mark it uncacheable, but it was unbearably slow and wasn't working for our application. Our solution was to only read from that memory (a ring buffer) once there was enough new data to fill a whole cache line on our architecture (I think that was 256 bytes). This guaranteed we never got stale data, and it was plenty fast.
I have tried out doing a huge memory chunk reservations with the memmap
The ioremapping of this chunk gave me a mapped memory address space which in beyond few tera bytes.
when you ask to reserve 128GB memory starting at 64 GB. you see the following in /proc/vmallocinfo
0xffffc9001f3a8000-0xffffc9201f3a9000 137438957568 0xffffffffa00831c9 phys=1000000000 ioremap
Thus the address space starts at 0xffffc9001f3a8000 (which is waay too large).
Secondly, Your observation is correct. even the memset_io results in a extremely large delays (in tens of minutes) to touch all this memory.
So, the time taken has to do mainly with address space conversion and non cacheable page loading.

Resources