Why is my Ruby script utilizing 90% of my CPU? - ruby-on-rails

I wrote a admin script that tails a heroku log and every n seconds, it summarizes averages and notifies me if i cross a certain threshold (yes I know and love new relic -- but I want to do custom stuff).
Here is the entire script.
I have never been a master of IO and threads, I wonder if I am making a silly mistake. I have a couple of daemon threads that have while(true){} which could be the culprit. For example:
# read new lines
f = File.open(file, "r")
f.seek(0, IO::SEEK_END)
while true do
select([f])
line = f.gets
parse_heroku_line(line)
end
I use one daemon to watch for new lines of a log, and the other to periodically summarize.
Does someone see a way to make it less processor-intensive?

This probably runs hot because you never really block while reading from the temporary file. IO::select is a thin layer over POSIX select(2). It looks like you're trying to block until the file is ready for reading, but select(2) considers EOF to be ready ("a file descriptor is also ready on end-of-file"), so you always return right away from select then call gets which returns nil at EOF.
You can get a truer EOF reading and nice blocking behavior by avoiding the thread which writes to the temp file and instead using IO::popen to fork the %x[heroku logs --ps router --tail --app pipewave-cedar] log tailer, connected to a ruby IO object on which you can loop over gets, exiting when gets returns nil (indicating the log tailer finished). gets on the pipe from the tailer will block when there's nothing to read and your script will only run as hot as it takes to do your line parsing and reporting.
EDIT: I'm not set up to actually try your code, but you should be able to replace the log tailer thread and your temp file read loop with this code to get the behavior described above:
IO.popen( %w{ heroku logs --ps router --tail --app my-heroku-app } ) do |logf|
while line = logf.gets
parse_heroku_line(line) if line =~ /^/
end
end
I also notice your reporting thread does not do anything to synchronize access to #total_lines, #total_errors, etc. So, you have some minor race conditions where you can get inconsistent values from the instance vars that parse_heroku_line method updates.

select is about whether a read would block. f is just a plain old file, so you when get to the end reads don't block, they just return nil instantly. As a result select returns instantly rather than waiting for something to be appending to the file as I assume you're expecting. Because of this you're sitting in a tight busy loop, so high cpu is to be expected.
If you are at eof (you could either check f.eof? or whether gets returns nil), then you could either start sleeping (perhaps with some sort of back off) or use something like listen to be notified of filesystem changes

Related

Lua io.popen freezing after a while of reading a log file

I'm attempting to constantly read and parse a log file (Minecraft log file) by using io.popen in tandem with Ubuntu's tail command so that I can send some messages upon certain events.
Now, I have mostly everything working here, except one small issue. After a while of reading, the entire program just freezes.
Here is the relevant code:
-- Open the tail command, return a file handle for it.
local pop = io.popen(config.listen_command)
-- Simply read a single line, I've pulled this into its own
-- function so that if this ever needs changing I can do so
-- easily.
local function get_line()
logger:log(4, "READ LINE")
return pop:read("*l")
end
-- For each line in the log file, check if it matches any
-- of a list of patterns, return the matches and the
-- pattern information if so.
local function match_line()
local line = get_line()
logger:log(4, "Line: %s", line)
-- This all works, and I've tested that it's not freezing
-- here. I've just included it for completion of the call
-- -stack.
for event_type, data in pairs(config.message_patterns) do
for event_name, pattern in pairs(data) do
local matches = {line:match(pattern)}
if matches[1] then
return event_type, event_name, matches
end
end
end
end
-- The main loop, simply read a line and send a message
-- if there was a match.
logger:log(4, "Main loop begin.")
while true do
local event_type, event_name, matches = match_line()
-- ...
-- The rest of the code here is not relevant.
config.listen_command = "tail -F --lines=1 latest.log"
The issue is in the get_line function. After a while of reading the log file, it completely freezes on the pop:read("*l"). It prints the READ LINE message, but never prints the Line: <whatever data here> line.
This is a really strange issue that I've been getting really confused about. I've tried swapping to different distributions of Lua (Luvit, LuaJIT, Lua) and a very large amount of debugging, changing small things, rerunning, ... But I cannot think of anything that'd be causing this.
Perhaps there's something small I've missed.
So my question here is this: Why is pop:read("*l") freezing, even though more data is being outputted to the logfile? Is there a way to fix this? Perhaps to detect if the next read will freeze indefinitely, so I can try closing the popen'ed file and re-open it (or to preferably stop it happening altogether?)

Redis - monitoring maximum memory before inserts fail?

While this Q/A does not address the actual issue of: How to detect with client (eg redis-py) that redis is running out of memory constraint not by machine but by the maxmem configuration? Before inserts fail which command to use in the programm to detect about to be full?
My first guess is: info and check if used_memory_peak < maxmem setting. Is this correct?
(Besides, for out of machine memory, since defrag, use which setting, none of the returned INFO fields help here)
Well should i just try an insert and see if fail (but that would be after the fact then.)
Trail and error, good enough tested by running
while true; do redis-cli lpush mm longstringhere; done; results on maxmem - used_memory < 0.1MB with insert failures:
(error) OOM command not allowed when used memory > 'maxmemory'.
So i have set i poll it via redis-py client and once the diff goes <1mb threshold throw up, sry raise Error of course. Make sure the user_memory memory addon of your longest command is < threshold too of course otherwise you run into it on insert.
I try to figure how to calc the ~percentage of used mem so i get notification way earlier eg 90% of maxmem, therefore this solution is fine.
Info dump:
# Memory
used_memory:3126272
used_memory_human:2.98M
used_memory_rss:5292032
used_memory_rss_human:5.05M
used_memory_peak:4914296
used_memory_peak_human:4.69M
used_memory_peak_perc:63.62%
used_memory_overhead:696654...
Furthermore maxmem is not a hardcap, when running it further by eg adding members to existing set.
used_memory:3162584
used_memory_human:3.02M
code to get percent 0-100
rmem_info = pipe.info(section='memory')
{'redis_mem_percent': math.ceil(rmem_info['used_memory'] / rmem_info['maxmemory'] *100)}

IPython.parallel client is hanging while waiting for result of map_async

I am running 7 worker processes on a single machine with 4 cores. I may have made a poor choice with this loop while waiting for the result of map_async:
while not result.ready():
time.sleep(10)
for out in result.stdout:
print out
rec_file_list = result.get()
result.stdout keeps growing with all the printed output from the 7 processes running, and it caused the console that initiated the map to hang. The activity monitor on my MacBook Pro shows the 7 processes are still running, and the terminal running the Controller is still active. What are my options here? Is there any way to acquire the result once the processes have completed?
I found an answer:
Remote introspection of ASyncResult objects is possible from another client as long as a 'database backend' has been enabled by the controller with:
ipcontroller --dictb # or --mongodb or --sqlitedb
Then, it is possible to create a new client instance and retrieve the results with:
client.get_result(task_id)
where the task_ids can be retrieved with:
client.hub_history()
Also, a simple way to avoid the buffer overflow I encountered is to periodically print just the last few lines from each engine's stdout history, and to flush the buffer like:
from IPython.display import clear_output
import sys
while not result.ready():
clear_output()
for stdout in result.stdout:
if stdout:
lines = stdout.split('\n')
for line in lines[-4:-1]:
if line:
print line
sys.stdout.flush()
time.sleep(30)

Simple program that reads and writes to a pipe

Although I am quite familiar with Tcl this is a beginner question. I would like to read and write from a pipe. I would like a solution in pure Tcl and not use a library like Expect. I copied an example from the tcl wiki but could not get it running.
My code is:
cd /tmp
catch {
console show
update
}
proc go {} {
puts "executing go"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
fileevent $pipe readable [list piperead $pipe]
if {![eof $pipe]} {
puts $pipe "hello cat program!"
flush $pipe
set got [gets $pipe]
puts "result: $got"
}
}
go
The output is executing go\n result:, however I would expect that reading a value from the pipe would return the line that I have sent to the cat program.
What is my error?
--
EDIT:
I followed potrzebie's answer and got a small example working. That's enough to get me going. A quick workaround to test my setup was the following code (not a real solution but a quick fix for the moment).
cd /home/stephan/tmp
catch {
console show
update
}
puts "starting pipe"
set pipe [open "|cat" RDWR]
fconfigure $pipe -buffering line -blocking 0
after 10
puts $pipe "hello cat!"
flush $pipe
set got [gets $pipe]
puts "got from pipe: $got"
Writing to the pipe and flushing won't make the OS multitasking immediately leave your program and switch to the cat program. Try putting after 1000 between the puts and the gets command, and you'll see that you'll probably get the string back. cat has then been given some time slices and has had the chance to read it's input and write it's output.
You can't control when cat reads your input and writes it back, so you'll have to either use fileevent and enter the event loop to wait (or periodically call update), or periodically try reading from the stream. Or you can keep it in blocking mode, in which case gets will do the waiting for you. It will block until there's a line to read, but meanwhile no other events will be responded to. A GUI for example, will stop responding.
The example seem to be for Tk and meant to be run by wish, which enters the event loop automatically at the end of the script. Add the piperead procedure and either run the script with wish or add a vwait command to the end of the script and run it with tclsh.
PS: For line-buffered I/O to work for a pipe, both programs involved have to use it (or no buffering). Many programs (grep, sed, etc) use full buffering when they're not connected to a terminal. One way to prevent them to, is with the unbuffer program, which is part of Expect (you don't have to write an Expect script, it's a stand-alone program that just happens to be included with the Expect package).
set pipe [open "|[list unbuffer grep .]" {RDWR}]
I guess you're executing the code from http://wiki.tcl.tk/3846, the page entitled "Pipe vs Expect". You seem to have omitted the definition of the piperead proc, indeed, when I copy-and-pasted the code from your question, I got an error invalid command name "piperead". If you copy-and-paste the definition from the wiki, you should find that the code works. It certainly did for me.

Capturing output from WshShell.Exec using Windows Script Host

I wrote the following two functions, and call the second ("callAndWait") from JavaScript running inside Windows Script Host. My overall intent is to call one command line program from another. That is, I'm running the initial scripting using cscript, and then trying to run something else (Ant) from that script.
function readAllFromAny(oExec)
{
if (!oExec.StdOut.AtEndOfStream)
return oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
return "STDERR: " + oExec.StdErr.ReadLine();
return -1;
}
// Execute a command line function....
function callAndWait(execStr) {
var oExec = WshShell.Exec(execStr);
while (oExec.Status == 0)
{
WScript.Sleep(100);
var output;
while ( (output = readAllFromAny(oExec)) != -1) {
WScript.StdOut.WriteLine(output);
}
}
}
Unfortunately, when I run my program, I don't get immediate feedback about what the called program is doing. Instead, the output seems to come in fits and starts, sometimes waiting until the original program has finished, and sometimes it appears to have deadlocked. What I really want to do is have the spawned process actually share the same StdOut as the calling process, but I don't see a way to do that. Just setting oExec.StdOut = WScript.StdOut doesn't work.
Is there an alternate way to spawn processes that will share the StdOut & StdErr of the launching process? I tried using "WshShell.Run(), but that gives me a "permission denied" error. That's problematic, because I don't want to have to tell my clients to change how their Windows environment is configured just to run my program.
What can I do?
You cannot read from StdErr and StdOut in the script engine in this way, as there is no non-blocking IO as Code Master Bob says. If the called process fills up the buffer (about 4KB) on StdErr while you are attempting to read from StdOut, or vice-versa, then you will deadlock/hang. You will starve while waiting for StdOut and it will block waiting for you to read from StdErr.
The practical solution is to redirect StdErr to StdOut like this:
sCommandLine = """c:\Path\To\prog.exe"" Argument1 argument2"
Dim oExec
Set oExec = WshShell.Exec("CMD /S /C "" " & sCommandLine & " 2>&1 """)
In other words, what gets passed to CreateProcess is this:
CMD /S /C " "c:\Path\To\prog.exe" Argument1 argument2 2>&1 "
This invokes CMD.EXE, which interprets the command line. /S /C invokes a special parsing rule so that the first and last quote are stripped off, and the remainder used as-is and executed by CMD.EXE. So CMD.EXE executes this:
"c:\Path\To\prog.exe" Argument1 argument2 2>&1
The incantation 2>&1 redirects prog.exe's StdErr to StdOut. CMD.EXE will propagate the exit code.
You can now succeed by reading from StdOut and ignoring StdErr.
The downside is that the StdErr and StdOut output get mixed together. As long as they are recognisable you can probably work with this.
Another technique which might help in this situation is to redirect the standard error stream of the command to accompany the standard output.
Do this by adding "%comspec% /c" to the front and "2>&1" to the end of the execStr string.
That is, change the command you run from:
zzz
to:
%comspec% /c zzz 2>&1
The "2>&1" is a redirect instruction which causes the StdErr output (file descriptor 2) to be written to the StdOut stream (file descriptor 1).
You need to include the "%comspec% /c" part because it is the command interpreter which understands about the command line redirect. See http://technet.microsoft.com/en-us/library/ee156605.aspx
Using "%comspec%" instead of "cmd" gives portability to a wider range of Windows versions.
If your command contains quoted string arguments, it may be tricky to get them right:
the specification for how cmd handles quotes after "/c" seems to be incomplete.
With this, your script needs only to read the StdOut stream, and will receive both standard output and standard error.
I used this with "net stop wuauserv", which writes to StdOut on success (if the service is running)
and StdErr on failure (if the service is already stopped).
First, your loop is broken in that it always tries to read from oExec.StdOut first. If there is no actual output then it will hang until there is. You wont see any StdErr output until StdOut.atEndOfStream becomes true (probably when the child terminates). Unfortunately, there is no concept of non-blocking I/O in the script engine. That means calling read and having it return immediately if there is no data in the buffer. Thus there is probably no way to get this loop to work as you want. Second, WShell.Run does not provide any properties or methods to access the standard I/O of the child process. It creates the child in a separate window, totally isolated from the parent except for the return code. However, if all you want is to be able to SEE the output from the child then this might be acceptable. You will also be able to interact with the child (input) but only through the new window (see SendKeys).
As for using ReadAll(), this would be even worse since it collects all the input from the stream before returning so you wouldn't see anything at all until the stream was closed. I have no idea why the example places the ReadAll in a loop which builds a string, a single if (!WScript.StdIn.AtEndOfStream) should be sufficient to avoid exceptions.
Another alternative might be to use the process creation methods in WMI. How standard I/O is handled is not clear and there doesn't appear to be any way to allocate specific streams as StdIn/Out/Err. The only hope would be that the child would inherit these from the parent but that's what you want, isn't it? (This comment based upon an idea and a little bit of research but no actual testing.)
Basically, the scripting system is not designed for complicated interprocess communication/synchronisation.
Note: Tests confirming the above were performed on Windows XP Sp2 using Script version 5.6. Reference to current (5.8) manuals suggests no change.
Yes, the Exec function seems to be broken when it comes to terminal output.
I have been using a similar function function ConsumeStd(e) {WScript.StdOut.Write(e.StdOut.ReadAll());WScript.StdErr.Write(e.StdErr.ReadAll());} that I call in a loop similar to yours. Not sure if checking for EOF and reading line by line is better or worse.
You might have hit the deadlock issue described on this Microsoft Support site.
One suggestion is to always read both from stdout and stderr.
You could change readAllFromAny to:
function readAllFromAny(oExec)
{
var output = "";
if (!oExec.StdOut.AtEndOfStream)
output = output + oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
output = output + "STDERR: " + oExec.StdErr.ReadLine();
return output ? output : -1;
}

Resources