How to calculate vmrss from elf file

How to calculate vmrss from elf file - memory

I am working on an embedded system with limited memory. I want to find a way to calculate how much memory will be used when an elf file is running by analyzing it.
I hope the result is close to vmrss, which I can use cat /proc/pid/status to get. The memory changes every moment when running. so a closer result or lower bound is also useful.
Assuming that there is no dynamic memory(like through malloc) or mapped memory(through mmap).

Simplifying assumptions:
you don't use shared libraries
you don't run multiple instances of your ELF binary
you don't use swap
your binary accesses all of its code and data
there is no significant malloc or mmap usage
With above assumptions, you can look at readelf -Wl a.out | grep LOAD, and simply add together the PT_LOAD segment sizes for an upper bound on RSS.
If you do use shared libraries, you'll need to add their PT_LOAD segments as well. But if they are used by more than one binary, then total system memory consumed will be less than the total of RSS for each process. Same goes for violating assumption 2.
Violating assumptions 3 and 4 will reduce observed RSS, while violating 5 will increase it.

Related

Spreadsheet Gear -- Generating large report via copy and paste seems to use a lot of memory and processor

I am attempting to generate a large workbook based report with 3 supporting worksheets of 100,12000 and 12000 rows and a final output sheet all formula based that ends up representing about 120 entities at 100 rows a piece. I generate a template range and copy and paste it replacing the entity ID cell after pasting each new range. It is working fine but I noticed that memory usage in the IIS Express process is approx 500mb and it is taking 100% processor usage as well.
Are there any guidelines for generating workbooks in this manner?

At least in terms of memory utilization, it would help to have some comparison, maybe against Excel, in how much memory is utilized to simply have the resultant workbook opened. For instance, if you were to open the final report in both Excel and the "SpreadsheetGear 2012 for Windows" application (available in the SpreadsheetGear folder under the Start menu), what does the Task Manager measure for each of these applications in terms of memory consumption? This may provide some insight as to whether the memory utilization you are seeing in the actual report-building process is unusually high (is there a lot of extra overhead for your routine?), or just typical given the size of the workbook you are generating.
In terms of CPU utilization, this one is a bit more difficult to pinpoint and is certainly dependent on your hardware as well as implementation details in your code. Running a VS Profiler against your routine certainly would be interesting to look into, if you have this tool available to you. Generally speaking, the CPU time could potentially be broken up into a couple broad categories—CPU cycles used to "build" your workbook and CPU cycles to "calculate" it. It could be helpful to better determine which of these is dominating the CPU. One way to do this might be to, if possible, ensure that calculations don't occur until you are finished actually generating the workbook. In fact, avoiding any unnecessary calculations could potentially speed things up...it depends on the workbook, though. You could avoid calculations by setting IWorkbookSet.Calculation to Manual mode and not calling any of the IWorkbook’s "Calculate" methods (Calculate/CalculateFull/CalculateFullRebuild) until you are fished up with this process. If you don't have access to a Profiler too, maybe set some timers, Console.WriteLines and monitor the Task Manager to see how your CPU fluctuates during different parts of your routine. With any luck you might be able to better isolate what part of the routine is taking the most amount of time.

How much SRAM will I use on my ARM board?

I am developing for the Arduino Due which has 96k SRAM and 512k flash memory for code. If I have a program that will compile to, say, 50k, when I run the code, how much sram will I use? will I use 50k immediately, or only the memory used by the functions I call? Is there a way to measure this memory usage before I upload the sketch to the arduino?

You can run
arm-none-eabi-size bin.elf
Where:
bin.elf is the generated binary (look it up in the compile log)
arm-none-eabi-size is a tool included with Arduino for arm which lets you know the memory distribution of your binary. This program can be found inside the Arduino directory. In my mac, this is /Applications/Arduino.app/Contents/Resources/Java/hardware/tools/g++_arm_none_eabi/bin
This command will output:
text data bss dec hex filename
9648 0 1188 10836 2a54 /var/folders/jz/ylfb9j0s76xb57xrkb605djm0000gn/T/build2004175178561973401.tmp/sketch_oct24a.cpp.elf
data + bss is RAM, text is program memory.
Very important: This doesn't account for dynamic memory (created in stack), this is only RAM memory for static and global variables. There are other techniques to check the RAM usage dynamically, like this one, but it will depend on the linker capabilities of the compiler suite you are using.

Your whole program is loaded into arduino, so atleast 50K flash memory will be used. Then on running the code, you will allocate some variables, some on stack, some global which will take some memory too but on SRAM.
I am not sure if there is a way to exactly measure the memory required but you can get a rough estimation based on the number and types of variables being allocated in the code. Remember, the global variables will take the space during the entire time the code is running on arduino, the local variables( the ones that are declared within a pair of {..}) remain in the memory till the '}' brace also known as the scope of the variables. Also remember, the compiled 50K code which you are mentioning is just the code portion, it does not include your variables, not even the global ones. The code is store in Flash memory and the variables are stored in the SRAM. The variables start taking memory only during runtime.
Also I curious to know how you are calculating that your code uses 50K memory?

Here is a little library to output the avalaible RAM memory.
I used it a lot when my program was crashing with no bug in the code. It turned out that I was running out of RAM.
So it's very handy!
Avalaible Memory Library
Hope it helps! :)

2 Files, Half the Content, vs. 1 File, Twice the Content, Which is Greater?

If I have 2 files each with this:
"Hello World" (x 1000)
Does that take up more space than 1 file with this:
"Hello World" (x 2000)
What are the drawbacks of dividing content into multiple smaller files (assuming there's reason to divide them into more files, not like this example)?
Update:
I'm using a Macbook Pro, 10.5. But I'd also like to know for Ubuntu Linux.

Marcelos gives the general performance case. I'd argue worrying about this is premature optimization. you should split things into different files where it is logical to split them.
also if you really care about file size of such repetitive files then you can compress them.
your example even hints at this, a simple run length encoding of
"Hello World"x1000
is much more space efficient than actually having "hello world" written out 1000 times.

Files take up space in the form of clusters on the disk. A cluster is a number of sectors, and the size depends on how the disk was formatted.
A typical size for clusters is 8 kilobytes. That would mean that the two smaller files would use two clusters (16 kilobytes) each and the larger file would use three clusters (24 kilobytes).
A file will by average use half a cluster more than it's size. So with a cluster size of 8 kilobytes each file will by average have an overhead of 4 kilobytes.

Most filesystems use a fixed-size cluster (4 kB is typical but not universal) for storing files. Files below this cluster size will all take up the same minimum amount.
Even above this size, the proportional wastage tends to be high when you have lots of small files. Ignoring skewness of size distribution (which makes things worse), the overall wastage is about half the cluster size times the number of files, so the fewer files you have for a given amount of data, the more efficiently you will store things.
Another consideration is that metadata operations, especially file deletion, can be very expensive, so again smaller files aren't your friends. Some interesting work was done in ReiserFS on this front until the author was jailed for murdering his wife (I don't know the current state of that project).
If you have the option, you can also tune the file sizes to always fill up a whole number of clusters, and then small files won't be a problem. This is usually too finicky to be worth it though, and there are other costs. For high-volume throughput, the optimal file size these days is between 64 MB and 256 MB (I think).
Practical advice: Stick your stuff in a database unless there are good reasons not to. SQLite substantially reduces the number of reasons.

I think the usage of file(s) is to take into consideration, according to the API and the language used to read/write them (and hence eventually API restrictions).
Fragmentation of the disk, that will tend to decrease with only big files, will penalize data access if you're reading one big file in one shot, whereas several access spaced out time to small files will not be penalized by fragmentation.

Most filesystems allocate space in units larger than a byte (typically 4KB nowadays). Effective file sizes get "rounded up" to the next multiple of that "cluster size". Therefore, dividing up a file will almost always consume more total space. And of course there's one extra entry in the directory, which may cause it to consume more space, and many file systems have an extra intermediate layer of inodes where each file consumes one entry.
What are the drawbacks of dividing
content into multiple smaller files
(assuming there's reason to divide
them into more files, not like this
example)?
More wasted space
The possibility of running out of inodes (in extreme cases)
On some filesystems: very bad performance when directories contain many files (because they're effectively unordered lists)
Content in a single file can usually be read sequentially (i.e. without having to move the read/write head) from the HD, which is the most efficient way. When it spans multiple files, this ideal case becomes much less likely.

OutOfMemoryException Processing Large File

We are loading a large flat file into BizTalk Server 2006 (Original release, not R2) - about 125 MB. We run a map against it and then take each row and make a call out to a stored procedure.
We receive the OutOfMemoryException during orchestration processing, the Windows Service restarts, uses full 2 GB memory, and crashes again.
The server is 32-bit and set to use the /3GB switch.
Also I've separated the flow into 3 hosts - one for receive, the other for orchestration, and the third for sends.
Anyone have any suggestions for getting this file to process wihout error?
Thanks,
Krip

If this is a flat file being sent through a map you are converting it to XML right? The increase in size could be huge. XML can easily add a factor of 5-10 times over a flat file. Especially if you use descriptive or long xml tag names (which normally you would).
Something simple you could try is to rename the xml nodes to shorter names, depending on the number of records (sounds like a lot) it might actually have a pretty significant impact on your memory footprint.
Perhaps a more enterprise approach, would be to subdivide this in a custom pipeline into separate message packets that can be fed through the system in more manageable chunks (similar to what Chris suggests). Then the system throttling and memory metrics could take over. Without knowing more about your data it would be hard to say how to best do this, but with a 125 MB file I am guessing that you probably have a ton of repeating rows that do not need to be processed sequentially.

Where does it crash? Does it make it past the Transform shape? Another suggestion to try is to run the transform in the Receive Port. For more efficient processing, you could even debatch the message and have multiple simultaneous orchestration instances be calling the stored procs. This would definately reduce the memory profile and increase performance.

more info on Memory layout of an executable program (process)

I attended interview for samsung. They asked lot of questions on memory layout of the program. I barely know anything about this.
I googled it "Memory layout of an executable program". "Memory layout of process".
I'm surprised to see that there isn't much info on these topics. Most of the results are forum queries. I just wonder why?
These are the few links I found:
Run-Time Storage Organization
Run-Time Memory Organization
Memory layout of C process ^pdf^
I want to learn this from a proper book instead of some web links.(Randy Hyde's is also a book but some other book). In which book can I find clear & more information on this subject?
I also wonder, why didn't the operating systems book cover this in their books? I read stallings 6th edition. It just discusses the Process Control Block.
This entire creation of layout is task of linker right? Where can I read more about this process. I want COMPLETE info from a program on the disk to its execution on the processor.
EDIT:
Initially, I was not clear even after reading the answers given below. Recently, I came across these articles after reading them, I understood things clearly.
Resources that helped me in understanding:
www.tenouk.com/Bufferoverflowc/Bufferoverflow1b.html
5 part PE file format tutorial: http://win32assembly.online.fr/tutorials.html
Excellent article : http://www.linuxforums.org/articles/understanding-elf-using-readelf-and-objdump_125.html
PE Explorer: http://www.heaventools.com/
Yes, "layout of an executable program(PE/ELF)" != "Memory layout of process"). Findout for yourself in the 3rd link. :)
After clearing my concepts, my questions are making me look so stupid. :)

How things are loaded depends very strongly on the OS and on the binary format used, and the details can get nasty. There are standards for how binary files are laid out, but it's really up to the OS how a process's memory is laid out. This is probably why the documentation is hard to find.
To answer your questions:
Books:
If you're interested in how processes lay out their memory, look at Understanding the Linux Kernel. Chapter 3 talks about process descriptors, creating processes, and destroying processes.
The only book I know of that covers linking and loading in any detail is Linkers and Loaders by John Levine. There's an online and a print version, so check that out.
Executable code is created by the compiler and the linker, but it's the linker that puts things in the binary format the OS needs. On Linux, this format is typically ELF, on Windows and older Unixes it's COFF, and on Mac OS X it's Mach-O. This isn't a fixed list, though. Some OS's can and do support multiple binary formats. Linkers need to know the output format to create executable files.
The process's memory layout is pretty similar to the binary format, because a lot of binary formats are designed to be mmap'd so that the loader's task is easier.
It's not quite that simple though. Some parts of the binary format (like static data) are not stored directly in the binary file. Instead, the binary just contains the size of these sections. When the process is loaded into memory, the loader knows to allocate the right amount of memory, but the binary file doesn't need to contain large empty sections.
Also, the process's memory layout includes some space for the stack and the heap, where a process's call frames and dynamically allocated memory go. These generally live at opposite ends of a large address space.
This really just scratches the surface of how binaries get loaded, and it doesn't cover anything about dynamic libraries. For a really detailed treatment of how dynamic linking and loading work, read How to Write Shared Libraries.

Here is one way a program can be executed from a file (*nix).
The process is created (e.g. fork()). This gives the new process its own memory map. This includes a stack in some area of memory (usually high up in memory somewhere).
The new process calls exec() to replace the current executable (often a shell) with the new executable. Often, the new executables .text (executable code and constants) and .data (r/w initialized variables) are set up for demand page mapping, that is, they are mapped into the process memory space as needed. Often, the .text section comes first, followed by .data. The .bss section (uninitialized variables) is often allocated after the .data section. Many times it is mapped to return a page of zeros when the page containing a bss variable is first accessed. The heap often starts at the next page boundary after the .bss section. The heap then grows up in memory while the stack grows down (remember I said usually, there are exceptions!).
If the heap and stack collide, that often causes an out of memory situation, which is why the stack is often placed in high memory.
In a system without a memory management unit, demand paging is usually unavailable but the same memory layout is often used.

Art of assembly programming http://homepage.mac.com/randyhyde/webster.cs.ucr.edu/www.artofasm.com/Windows/PDFs/MemoryAccessandOrg.pdf

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart