Cannot allocate memory for gawk coprocess - memory

I have a gawk program that dies when trying to start a coprocess. Error message is "fatal: can't open two way pipe `...' for input/output (Cannot allocate memory)". Memory usage of the gawk process at the time of starting the coprocess is around 50%.
The gawk program is structured as follows:
BEGIN {
## Read big file into memory -- takes about 50% of memory
while ( (getline <"bigFile") >0) {
list[$0]
}
}
{
print |& "cat"
}
I assume that starting the coprocess involves a fork(), which would double memory usage, and thus cause an error?
If I force the coprocess to start before loading the file into memory, there is no problem starting the coprocess. But the best way I know of to force the coprocess to start is to write an empty line to it:
print "" |& "cat"
And this obviously is not ideal. (Though I can live with it, if there's no better way around this problem.)
Any ideas on cleaner solutions to this problem?

Related

RAM Memory in TCL

I have a code that creates a global array and when I unset the array the memory is still busy.
I have tried in Windows with TCL 8.4 and 8.6
console show
puts "allocating memory..."
update
for {set i 0} {$i < 10000} {incr i} {
set a($i) $i
}
after 10000
puts "deallocating memory..."
update
foreach v [array names a] {
unset a($v)
}
after 10000
exit
In a lot of programs, both written in Tcl and in other languages, past memory usage is a pretty good indicator of future memory usage. Thus, as a general heuristic, Tcl's implementation does not try to return memory to the OS (it can always page it out if it wants; the OS is always in charge). Indeed, each thread actually has its own memory pool (allowing memory handling to be largely lock-free), but this doesn't make much difference here where there's only one main thread (and a few workers behind the scenes that you can normally ignore). Also, the memory pools will tend to overallocate because it is much faster to work that way.
Whatever you are measuring with, if it is with a tool external to Tcl at all, it will not provide particularly good real memory usage tracking because of the way the pooling works. Tcl's internal tools for this (the memory command) provide much more accurate information but aren't there by default: they're a compile-time option when building the Tcl library, and are usually switched off because they have a lot of overhead. Also, on Windows some of their features only work at all if you build a console application (a consequence of how they're implemented).

Does awk store an output file in RAM?

I was doing a simple parsing like awk '{print $3 > "file1.txt"}'
What I noticed was that awk is taking up too much of RAM (the files were huge). Does streaming awk output to file consume memory? Does this work like stream write or does the file remain open till the program terminates?
The exact command that I gave was:
for i in ../../*.txt; do j=${i#*/}; mawk -v f=${j%.txt} '{if(NR%8<=4 && NR%8!=0){print >f"_1.txt" } else{print >f"_2.txt"}}' $i & done
As evident I used mawk. The five input files were around 6GB each and when I ran top I saw 22% memory ~5GB being taken up by each mawk process at its peak. I noticed it because my system was hanging because of low memory.
I am particularly sure that redirection outside awk consumes negligible memory. Have done it several times with files much larger than this and operations more complex than this; I never faced this problem. Since I had to copy different sections of the input files to different output files, I used redirection inside awk.
I know there are other ways to implement this task and in any case my job is done without much issues. All I was interested in is how awk works when writing to a file.
I am not sure if this question is better suited for Superuser.

Completely restore a binary from memory?

I want to know if it's possible to completely restore the binary running in memory.
This is what I've tried,
First read /proc/PID/maps, then dump all relevant sections with gdb (ignore all libraries).
grep sleep /proc/1524/maps | awk -F '[- ]' \
'{print "dump memory sleep." $1 " 0x" $1 " 0x" $2 }' \
| gdb -p 1524
Then I concatenate all dumps in order:
cat sleep.* > sleep-bin
But the file is very much different than /bin/sleep
It seems like to be relocation table and other uninitialized data, so is it impossible to fix a memory dump? (Make it runnable)
Disclaimer: I'm a windows guy and don't know much about the linux process internals and ELF format, but I hope I can help!
I would say it's definitly possible to do, but not for ALL programs. The OS loader loads all parts of the executable into memory that are within a well defined place in the file. For example some uninstallers store data that is appended to the executable file - this will not be loaded to memory so this will be information you cannot restore just by dumping memory.
Another problem is that the information written by the OS is free to be modified by anything on the system that has the right to do so. No normal program would do something like that though.
The starting point would be to find the ELF headers of your executable module in memory and dump that. It will contain pretty much all the data you need for your task. For example:
the number of sections and where they are in memory and in the file
how sections in the file are mapped to sections in virtual memory (they usually have different base addresses and sizes!)
where the relocation data is
For the relocs you would have to read up on that how the reloc data is stored and processed with the ELF format. Once you know that it should be pretty easy to undo the changes for your dump.

SBCL used memory reports from top and (room) differs

I am running SBCL 1.0.51 on a Linux (Fedora 15) 32-bit system (kernel 3.6.5) with 1GB Ram and 256MB swap space.
I fire up sbcl --dynamic-space-size 125 and start calling a function that makes ~10000 http-requests (using drakma) to an http (couchDB) server and I just format to the standard-output the results of an operation on the returned data.
After each call I do a (sb-ext:gc :full t) and then (room). The results are not growing. No matter how many times I run the function, (room) reports the same used space (with some ups and downs, but around the same average which does not grow).
BUT: After every time I call the function, top reports that the VIRT and RES amount of the sbcl process keeps growing ,even beyond the 125MB space I told sbcl to ask for itself. So I have the following questions:
Why top -reported memory keeps growing, while (room) says it does not? The only thing I can think of is some leakage through ffi. I am not directly calling out with ffi but maybe some drakma dep does and forgets to free its C garbage. Anyway I dont know if this could even be an explanation. Could it be something else? Any insights?
Why isnt --dynamic-space-size honoured?

popen() system call hangs in HP-Ux 11.11

I have a program which calculates 'Printer Queues Total' value using '/usr/bin/lpstat' through popen() system call.
{
int n=0;
FILE *fp=NULL;
printf("Before popen()");
fp = popen("/usr/bin/lpstat -o | grep '^[^ ]*-[0-9]*[ \t]' | wc -l", "r");
printf("After popen()");
if (fp == NULL)
{
printf("Failed to start lpstat - %s", strerror(errno));
return -1;
}
printf("Before fscanf");
fscanf(fp, "%d", &n);
printf("After fscanf");
printf("Before pclose()");
pclose(fp);
printf("After pclose()");
printf("Value=%d",n);
printf("=== END ===");
return 0;
}
Note: In the command line, '/usr/bin/lpstat' command is hanging for some time as there are many printers available in the network.
The problem here is, the execution is hanging at popen() system call, Where as I would expect it to hang at fscanf() which reads the output from the file stream fp.
If anybody can tell me the reasons for the hang at popen() system call, it will help me in modifying the program to work for my requirement.
Thanks for taking time in reading this post and your efforts.
What people expect does not always have a basis in reality :-)
The command you're running doesn't actually generate any output until it's finished. That would be why it would seem to be hung in the popen rather than the fscanf.
There are two possible reasons for that which spring to mind immediately.
The first is that it's implemented this way, with popen capturing the output in full before delivering the first line. Based on my knowledge of UNIX, this seems unlikely but I can't be sure.
Far more likely is the impact of the pipe. One thing I've noticed is that some filters (like grep) batch up their lines for efficiency. So, while popen itself may be spewing forth its lines immediately (well, until it gets to the delay bit anyway), the fact that grep is holding on to the lines until it gets a big enough block may be causing the delay.
In fact, it's almost certainly the pipe-through-wc, which cannot generate any output until all lines are received from lpstat (you cannot figure out how many lines there are until all the lines have been received). So, even if popen just waited for the first character to be available, that would seem to be where the hang was.
It would be a simple matter to test this by simply removing the pipe-through-grep-and-wc bit and seeing what happens.
Just one other point I'd like to raise. Your printf statements do not have newlines following and, even if they did, there are circumstances where the output may still be fully buffered (so that you probably wouldn't see anything until that program exited, or the buffer filled up).
I would start by changing them to the form:
printf ("message here\n"); fflush (stdout); fsync (fileno (stdout));
to ensure they're flushed fully before continuing. I'd hate this to be a simple misunderstanding of a buffering issue :-)
It sounds as if popen may be hanging whilst lpstat attempts to retrieve information from remote printers. There is a fair amount of discussion on this particular problem. Have a look at that thread, and especially the ones that are linked from that.

Resources