Completely restore a binary from memory? - memory

I want to know if it's possible to completely restore the binary running in memory.
This is what I've tried,
First read /proc/PID/maps, then dump all relevant sections with gdb (ignore all libraries).
grep sleep /proc/1524/maps | awk -F '[- ]' \
'{print "dump memory sleep." $1 " 0x" $1 " 0x" $2 }' \
| gdb -p 1524
Then I concatenate all dumps in order:
cat sleep.* > sleep-bin
But the file is very much different than /bin/sleep
It seems like to be relocation table and other uninitialized data, so is it impossible to fix a memory dump? (Make it runnable)

Disclaimer: I'm a windows guy and don't know much about the linux process internals and ELF format, but I hope I can help!
I would say it's definitly possible to do, but not for ALL programs. The OS loader loads all parts of the executable into memory that are within a well defined place in the file. For example some uninstallers store data that is appended to the executable file - this will not be loaded to memory so this will be information you cannot restore just by dumping memory.
Another problem is that the information written by the OS is free to be modified by anything on the system that has the right to do so. No normal program would do something like that though.
The starting point would be to find the ELF headers of your executable module in memory and dump that. It will contain pretty much all the data you need for your task. For example:
the number of sections and where they are in memory and in the file
how sections in the file are mapped to sections in virtual memory (they usually have different base addresses and sizes!)
where the relocation data is
For the relocs you would have to read up on that how the reloc data is stored and processed with the ELF format. Once you know that it should be pretty easy to undo the changes for your dump.

Related

Find size contributed by each external library on iOS

I'm trying to reduce my app store binary size and we have lots of external libs that might be contributing to the size of the final ipa. Is there any way to find out how much each external static lib takes up in the final binary (Other than going about removing each one ?) ?
All of this information is contained in the link map, if you have the patience for sifting through it (for large apps, it can be quite large). The link map has a listing of all the libraries, their object files, and all symbols that were packaged into your app, all in human-readable text. Normally, projects aren't configured to generate them by default, so you'll have to make a quick project file change.
From within Xcode:
Under 'Build Settings' for your target, search for "map"
In the results below, under the 'Linking' section, set 'Write Link Map File' to "Yes"
Make sure to make note of the full path and file name listed under 'Path to Link Map File'
The next time you build your app you'll get a link map dumped to that file path. Note that the path is relative to your app's location in the DerivedData folder (usually ~/Library/Developer/Xcode/DerivedData/<your-app-name>-<random-string-of-letters-and-numbers>/Build/Intermediates/..., but YMMV). Since it's just a text file, you can read it with any text editor.
The contents of the link map are divided into 3 sections, of which 2 will be relevant to what you're looking for:
Object Files: this section contains a listing of all of the object files included in your final app, including your own code and that of any third-party libraries you've included. Importantly, each object file also lists the library where it came from;
Sections: this section, not relevant to your question, contains a list of the processor segments and their sections;
Symbols: this section contains the raw data that you're interested in: a list of all symbols/methods with their absolute location (i.e. address in the processor's memory map), size, and most important of all, a cross-reference to their containing object module (under the 'File' column).
From this raw data, you have everything you need to do the required size calculation. From #1, you see that, for every library, there are N possible constituent object modules; from #2, you see that, for every object module, there are M possible symbols, each occupying size S. For any given library, then, your rough order of size will be something like O(N * M * S). That's only to give you an indication of the components that would go into your actual calculations, it's not any sort of a useful formula. To perform the calculation itself, I'm sorry to say that I'm not aware of any existing tools that will do the requisite processing for you, but given that the link map is just a text file, with a little script magic and ingenuity you can construct a script to do the heavy lifting.
For example, I have a little sample project that links to the following library: https://github.com/ColinEberhardt/LinqToObjectiveC (the sample project itself is from a nice tutorial on ReactiveCocoa, here: http://www.raywenderlich.com/62699/reactivecocoa-tutorial-pt1), and I want to know how much space it occupies. I've generated a link map, TwitterInstant-LinkMap-normal-x86_64.txt (it runs in the simulator). In order to find all object modules included by the library, I do this:
$ grep -i "libLinqToObjectiveC.a" TwitterInstant-LinkMap-normal-x86_64.txt
which gives me this:
[ 8] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(LinqToObjectiveC-dummy.o)
[ 9] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(NSArray+LinqExtensions.o)
[ 10] /Users/XXX/Library/Developer/Xcode/DerivedData/TwitterInstant-ecppmzhbawtxkwctokwryodvgkur/Build/Products/Debug-iphonesimulator/libLinqToObjectiveC.a(NSDictionary+LinqExtensions.o)
The first column contains the cross-references to the symbol table that I need, so I can search for those:
$ cat TwitterInstant-LinkMap-normal-x86_64.txt | grep -e "\[ 8\]"
which gives me:
0x100087161 0x0000001B [ 8] literal string: PodsDummy_LinqToObjectiveC
0x1000920B8 0x00000008 [ 8] anon
0x100093658 0x00000048 [ 8] l_OBJC_METACLASS_RO_$_PodsDummy_LinqToObjectiveC
0x1000936A0 0x00000048 [ 8] l_OBJC_CLASS_RO_$_PodsDummy_LinqToObjectiveC
0x10009F0A8 0x00000028 [ 8] _OBJC_METACLASS_$_PodsDummy_LinqToObjectiveC
0x10009F0D0 0x00000028 [ 8] _OBJC_CLASS_$_PodsDummy_LinqToObjectiveC
The second column contains the size of the symbol in question (in hexadecimal), so if I add them all up, I get 0x103, or 259 bytes.
Even better, I can do a bit of stream hacking to whittle it down to the essential elements and do the addition for me:
$ cat TwitterInstant-LinkMap-normal-x86_64.txt | grep -e "\[ 8\]" | grep -e "0x" | awk '{print $2}' | xargs printf "%d\n" | paste -sd+ - | bc
which gives me the number straight up:
259
Doing the same for "\[ 9\]" (13016 bytes) and "\[ 10\]" (5503 bytes), and adding them to the previous 259 bytes, gives me 18778 bytes.
You can certainly improve upon the stream hacking I've done here to make it a bit more robust (in this implementation, you have to make sure you get the exact number of spaces right and quote the brackets), but you at least get the idea.
Make a .ipa file of your app and save it in your system.
Then open the terminal and execute the following command:
unzip -lv /path/to/your/app.ipa
It will return a table of data about your .ipa file. The size column has the compressed size of each file within your .ipa file.
I think you should be able to extract the information you need from this:
symbols -w -noSources YourFileHere
Ref: https://devforums.apple.com/message/926442#926442
IIRC, it isn't going to give you clear summary information on each lib, but you should find that the functions from each library should be clustered together, so with a bit of effort you can calculate the approximate contribution from each lib:
Also make sure that you set Generate Debug Symbols to NO in your build settings. This can reduce the size of your static library by about 30%.
In case it's part of your concern, a static library is just the relevant .o files archived together plus some bookkeeping. So a 1.7mb static library — even if the code within it is the entire 1.7mb — won't usually add 1.7mb to your product. The usual rules about dead code stripping will apply.
Beyond that you can reduce the built size of your code. The following probably isn't a comprehensive list.
In your target's build settings look for 'Optimization Level'. By switching that to 'Fastest, Smallest -Os' you'll permit the compiler to sacrifice some speed for size.
Make sure you're building for thumb, the more compact ARM code. Assuming you're using LLVM that means making sure you don't have -mno-thumb anywhere in your project settings.
Also consider which architectures you want to build for. Apple doesn't allow submission of an app that supports both ARMv6 and the iPhone 5 screen and have dropped ARMv6 support entirely from the latest Xcode. So there's probably no point including that at this point.

eheap allocated in erlang

I use recon_alloc:memory(allocated_types) and get info like below.
34> recon_alloc:memory(allocated_types).
[{binary_alloc,1546650440},
{driver_alloc,21504840},
{eheap_alloc,28704768840},
{ets_alloc,526938952},
{fix_alloc,145359688},
{ll_alloc,403701800},
{sl_alloc,688968},
{std_alloc,67633992},
{temp_alloc,21504840}]
The eheap_alloc is using 28G. But sum up with heap_size of all process
>lists:sum([begin {_, X}=process_info(P, heap_size), X end || P <- processes()]).
683197586
Only 683M !Any idea where is the 28G ?
You are not comparing the right values. From erlang:process_info
{heap_size, Size}
Size is the size in words of youngest heap generation of the
process. This generation currently include the stack of the process.
This information is highly implementation dependent, and may change if
the implementation change.
recon_alloc:memory(allocated_types) is in bytes by default. You can change it using set_unit. It is not the memory that is currently used but it is the memory reserved by the VM grouped into different allocators. You can use recon_alloc:memory(used) instead. More details in allocator() - Recon Library
Searching through the Erlang source code for the eheap_alloc keyword I didn't come up with much. The most relevant piece of code was this XML code from erts_alloc.xml (https://github.com/erlang/otp/blob/172e812c491680fbb175f56f7604d4098cdc9de4/erts/doc/src/erts_alloc.xml#L46):
<tag><c>eheap_alloc</c></tag>
<item>Allocator used for Erlang heap data, such as Erlang process heaps.</item>
This says that process heaps are stored in eheap_alloc but it doesn't say what else is stored in eheap_alloc. The eheap_alloc stores everything your application needs to run along with some extra memory along with some additional space, so the VM doesn't have to request more memory from the OS every time something needs to be added. There are things the VM must keep in memory that aren't associated with a specific process. For example, large binaries, even though they may used within a process, are not stored inside that processes heap. They are stored in a shared process binary heap called binary_alloc. The binary heap, along with the process heaps and some extra memory, are what make up eheap_alloc.
In your case it looks like you have a lot of memory in your binary_alloc. binary_alloc is probably using a significant portion of your eheap_alloc.
For more details on binary handling checkout these pages:
http://blog.bugsense.com/post/74179424069/erlang-binary-garbage-collection-a-love-hate
http://www.erlang.org/doc/efficiency_guide/binaryhandling.html#id65224

Does awk store an output file in RAM?

I was doing a simple parsing like awk '{print $3 > "file1.txt"}'
What I noticed was that awk is taking up too much of RAM (the files were huge). Does streaming awk output to file consume memory? Does this work like stream write or does the file remain open till the program terminates?
The exact command that I gave was:
for i in ../../*.txt; do j=${i#*/}; mawk -v f=${j%.txt} '{if(NR%8<=4 && NR%8!=0){print >f"_1.txt" } else{print >f"_2.txt"}}' $i & done
As evident I used mawk. The five input files were around 6GB each and when I ran top I saw 22% memory ~5GB being taken up by each mawk process at its peak. I noticed it because my system was hanging because of low memory.
I am particularly sure that redirection outside awk consumes negligible memory. Have done it several times with files much larger than this and operations more complex than this; I never faced this problem. Since I had to copy different sections of the input files to different output files, I used redirection inside awk.
I know there are other ways to implement this task and in any case my job is done without much issues. All I was interested in is how awk works when writing to a file.
I am not sure if this question is better suited for Superuser.

mnesia memory allocation

i was testing the application by inserting some 1000 users and each user having 1000 contacts in a database table under mnesia and during insertion at some part the error i got is as follows:
Crash dump was written to: erl_crash.dump
binary_alloc: Cannot allocate 422879872 bytes of memory (of type "binary").
Aborted
i started the erl emulator with erl +MBas af (B-binary allocator af- a fit) and tried again but the error was same,
note:: i am using erlang r12b version and the system ram is 8gb on ubuntu 10.04
so may i know how to solve it?
the records definitions are:
%% database
-record(database,{dbid,guid,data}).
%% changelog
-record(changelog,{dbid,timestamp,changelist,type}).
here data is a vcard(contact info) , dbid and type is "contacts", guid is an integer automatically generated by the server
the database record contains all the vcard data of all users.if there are 1000 users and each user having 1000 contacts then we will have 10^6 records.
the changelog record will contain what are the changes done on the database table at that timestamp
the code for creation of tables are::
mnesia:create_table(database, [{type,bag}, {attributes,Record_of_database},
{record_name,database},
{index,guid},
{disc_copies,[node()]}])
mnesia:create_table(changelog, [{type,set}, {attributes,Record_of_changelog},
{record_name,changelog},
{index,timestamp},
{disc_copies,[node()]}])
the insertion of records on table is:
commit_data(DataList = [#database{dbid=DbID}|_]) ->
io:format("commit data called~n"),
[mnesia:dirty_write(database,{database,DbId,Guid,Key})|| {database,DbId,Guid,X}<-DataList].
write_changelist(Username,Dbname,Timestamp,ChangeList) ->
Type="contacts",
mnesia:dirty_write(changelog,{changelog,DbID,Timestamp,ChangeList,Type}).
I suppose that the list DataList is huge and should not be sent at once from a remote node. It should be sent in small pieces. The client can send one by one item from the DataList generated at the client. Also, because this problem occurs during insertion, i think that we should parallelise the list comprehension. We could have a parallel map where for each item in the list, the insertion is done in a separate process. Then, i also think that something is still wrong with the list comprehension. Variable Key is unbound and variable X is unused. Otherwise, probably the entire methodology needs a change. Lets see what others think. Thanks
This error normally occurs when there is no memory to allocate for binary heap by ERTS memory allocator called binary_alloc. Check the current binary heap size using erlang:system_info() or erlang:memory() or erlang:memory(binary) commands. If the binary heap size is huge then run erlang:garbage_collect() to free all non-referenced binary objects in binary heap. This will free the memory ..
In case you use long strings (it is just list in erlang) for vcard or somewehre else, they consumes much memory.
If this is the case, you change them to binary to suppress memory usage (use list_to_binary before insert to mnesia).
This may be not helpfull, because I don't know about your data structure (type, length and so on)...

Finding what hard drive sectors occupy a file

I'm looking for a nice easy way to find what sectors occupy a given file. My language preference is C#.
From my A-Level Computing class I was taught that a hard drive has a lookup table on the first few KB of the disk. In this table there is a linked list for each file detailing what sectors that file occupies. So I'm hoping there's a convinient way to look in this table for a certain file and see what sectors it occupies.
I have tried Google'ing but I am finding nothing useful. Maybe I'm not searching for the right thing but I can't find anything at all.
Any help is appreciated, thanks.
About Drives
The physical geometry of modern hard drives is no longer directly accessible by the operating system. Early hard drives were simple enough that it was possible to address them according to their physical structure, cylinder-head-sector. Modern drives are much more complex and use systems like zone bit recording , in which not all tracks have the same amount of sectors. It's no longer practical to address them according to their physical geometry.
from the fdisk man page:
If possible, fdisk will obtain the disk geometry automatically. This is not necessarily the physical disk geometry (indeed, modern disks do not really have anything
like a physical geometry, certainly not something that can be described in simplistic Cylinders/Heads/Sectors form)
To get around this problem modern drives are addressed using Logical Block Addressing, which is what the operating system knows about. LBA is an addressing scheme where the entire disk is represented as a linear set of blocks, each block being a uniform amount of bytes (usually 512 or larger).
About Files
In order to understand where a "file" is located on a disk (at the LBA level) you will need to understand what a file is. This is going to be dependent on what file system you are using. In Unix style file systems there is a structure called an inode which describes a file. The inode stores all the attributes a file has and points to the LBA location of the actual data.
Ubuntu Example
Here's an example of finding the LBA location of file data.
First get your file's inode number
$ ls -i
659908 test.txt
Run the file system debugger. "yourPartition" will be something like sda1, it is the partition that your file system is located on.
$sudo debugfs /dev/yourPartition
debugfs: stat <659908>
Inode: 659908 Type: regular Mode: 0644 Flags: 0x80000
Generation: 3039230668 Version: 0x00000000:00000001
...
...
Size of extra inode fields: 28
EXTENTS:
(0): 266301
The number under "EXTENTS", 266301, is the logical block in the file system that your file is located on. If your file is large there will be multiple blocks listed. There's probably an easier way to get that number, I couldn't find one.
To validate that we have the right block use dd to read that block off the disk. To find out your file system block size, use dumpe2fs.
dumpe2fs -h /dev/yourPartition | grep "Block size"
Then put your block size in the ibs= parameter, and the extent logical block in the skip= parameter, and run dd like this:
sudo dd if=/dev/yourPartition of=success.txt ibs=4096 count=1 skip=266301
success.txt should now contain the original file's contents.
sudo hdparm --fibmap file
For ext, vfat and NTFS ..maybe more.
fibmap is also a linux C library.

Resources