I want to save the standard out from my dag separately from the logging that airflow generates.
My scripts print to standard out and I want to capture that (and only that) into a specified log file each time it runs. I've tried
logfile = 'mylog_01.log'
myfunc = def()...
log = open(logfile, "ab")
sys.stdout = log
print_params = PythonOperator(task_id="print_params",
python_callable=myfunc,
provide_context=True,
dag=dag)
... but std.out is reset within each PythonOperator. How do I control where standard out is printing to within all python operators?
Also, I am using hundreds of print statements within the scripts so modifying each of those isn't practical in this case.
Related
When I execute spss syntax commands from a .sps script, each command is written to the output window before it executes giving me a clear log of exactly how an output was created.
Even if the command is an INSERT command executing a different script - I get a log of the commands from that script.
This is very useful for many reasons:
sanity checking - I can always see exactly what went in to creating a specific output (which filters I used, etc.)
recreation - I (or someone else with this output) can easily re-run the same commands because they're right there.
debugging - if there's an error, I can see which commands caused it
However, when I run commands using spss.Submit inside a python block (in a BEGIN PROGRAM-END PROGRAM block), the actual commands called aren't logged into the output window.
I know I can find a full log in a log file - but that's not helpful.
Is there a way to tell spss to continue to log all the commands in the output window?
You can use set mprint on. before the begin program statement to have the syntax that is run via spss.Submit()show up in the output window. I like simpy putting it on the very top of my syntax file as a "set it and forget it".
For example like so:
set mprint on.
begin program python3.
import spss
vars = list(range(1,11))
for var in vars:
spss.Submit(f'compute v{var} = 0. ')
end program.
For existing flow, there would be a whole bunch of namespaces loaded when running some script job.
However, if I want to check & trace the usage of some command in some namespace, I need to find the script path of the certain namespace.
Is there some way to get that? Particularly, I'm talking about Primetime scripts.
Technically, namespaces don't have script paths. But we can do something close enough:
proc report_current_file {call code result op} {
if {$code == 0} {
# If the [proc] call was successful...
set cmd [lindex $call 1]
set qualified_cmd [uplevel 1 [list namespace which $cmd]]
set file [file normalize [info script]]
puts "Defined $qualified_cmd in $file"
}
}
trace add execution proc leave report_current_file
It's not perfect if you've got procedures creating procedures dynamically — the current file might be wrong — but that's fortunately not what most code does.
Another option that might work for you is to use tcl::unsupported::getbytecode, which produces a lot of information in machine-readable format (a dictionary). One of the pieces of information is the sourcefile key. Here's an example running interactively on my machine:
% parray tcl_platform
tcl_platform(byteOrder) = littleEndian
tcl_platform(engine) = Tcl
tcl_platform(machine) = x86_64
tcl_platform(os) = Darwin
tcl_platform(osVersion) = 20.2.0
tcl_platform(pathSeparator) = :
tcl_platform(platform) = unix
tcl_platform(pointerSize) = 8
tcl_platform(threaded) = 1
tcl_platform(user) = dkf
tcl_platform(wordSize) = 8
% dict get [tcl::unsupported::getbytecode proc parray] sourcefile
/opt/local/lib/tcl8.6/parray.tcl
Note that the procedure has to be already defined for this to work. And if Tcl's become confused about what file the code was in (because of dynamic programming trickery) then that key is absent.
In pyspark reading csv files from different paths gets failed if even one path does not exist.
Logs = spark.read.load(Logpaths, format="csv", schema=logsSchema, header="true", mode="DROPMALFORMED");
Here Logpaths is an array that contain multiple paths. And these paths are created dynamically depending upon given startDate and endDate range. If Logpaths contain 5 paths and first 3 exists but 4th does not exist. Then whole extraction gets failed. How can I avoid this in pyspark or how can I check there existance before reading?
In scala I did this by checking file existance and filter out non-existed records by using hadoop hdfs filesystem globStatus function.
Path = '/bilal/2018.12.16/logs.csv'
val hadoopConf = new org.apache.hadoop.conf.Configuration()
val fs = org.apache.hadoop.fs.FileSystem.get(hadoopConf)
val fileStatus = fs.globStatus(new org.apache.hadoop.fs.Path(Path));
So I got what I was looking for. Like the code I posted in the question which can be used in scala for file existance check. We can use below code in case of PySpark.
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("bilal/logs/log.csv"))
This is exactly the same code also used in scala, so in this case we are using java library for hadoop and java code runs on JVM on which spark is running.
I am using Verilator to incorporate an algorithm written in SystemVerilog into an executable utility that manipulates I/O streams passed via stdin and stdout. Unfortunately, when I use the SystemVerilog $display() function, the output goes to stdout. I would like it to go to stderr so that stdout remains uncontaminated for my other purposes.
How can I make this happen?
Thanks to #toolic for pointing out the existence of $fdisplay(), which can be used thusly...
$fdisplay(STDERR,"hello world"); // also supports formatted arguments
IEEE Std 1800-2012 states that STDERR should be pre-opened, but it did not seem to be known to Verilator. A workaround for this is:
integer STDERR = 32'h8000_0002;
Alternatively, you can create a log file handle for use with $fdisplay() like so...
integer logfile;
initial begin
$system("echo 'initial at ['$(date)']'>>temp.log");
logfile = $fopen("temp.log","a"); // or open with "w" to start fresh
end
It might be nice if you could create a custom wrapper that works like $display but uses your selected file descriptor (without specifying it every time). Unfortunately, that doesn't seem to be possible within the language itself -- but maybe you can do it with the DPI, see DPI Display Functions (I haven't gotten this to work so far).
I wrote the following two functions, and call the second ("callAndWait") from JavaScript running inside Windows Script Host. My overall intent is to call one command line program from another. That is, I'm running the initial scripting using cscript, and then trying to run something else (Ant) from that script.
function readAllFromAny(oExec)
{
if (!oExec.StdOut.AtEndOfStream)
return oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
return "STDERR: " + oExec.StdErr.ReadLine();
return -1;
}
// Execute a command line function....
function callAndWait(execStr) {
var oExec = WshShell.Exec(execStr);
while (oExec.Status == 0)
{
WScript.Sleep(100);
var output;
while ( (output = readAllFromAny(oExec)) != -1) {
WScript.StdOut.WriteLine(output);
}
}
}
Unfortunately, when I run my program, I don't get immediate feedback about what the called program is doing. Instead, the output seems to come in fits and starts, sometimes waiting until the original program has finished, and sometimes it appears to have deadlocked. What I really want to do is have the spawned process actually share the same StdOut as the calling process, but I don't see a way to do that. Just setting oExec.StdOut = WScript.StdOut doesn't work.
Is there an alternate way to spawn processes that will share the StdOut & StdErr of the launching process? I tried using "WshShell.Run(), but that gives me a "permission denied" error. That's problematic, because I don't want to have to tell my clients to change how their Windows environment is configured just to run my program.
What can I do?
You cannot read from StdErr and StdOut in the script engine in this way, as there is no non-blocking IO as Code Master Bob says. If the called process fills up the buffer (about 4KB) on StdErr while you are attempting to read from StdOut, or vice-versa, then you will deadlock/hang. You will starve while waiting for StdOut and it will block waiting for you to read from StdErr.
The practical solution is to redirect StdErr to StdOut like this:
sCommandLine = """c:\Path\To\prog.exe"" Argument1 argument2"
Dim oExec
Set oExec = WshShell.Exec("CMD /S /C "" " & sCommandLine & " 2>&1 """)
In other words, what gets passed to CreateProcess is this:
CMD /S /C " "c:\Path\To\prog.exe" Argument1 argument2 2>&1 "
This invokes CMD.EXE, which interprets the command line. /S /C invokes a special parsing rule so that the first and last quote are stripped off, and the remainder used as-is and executed by CMD.EXE. So CMD.EXE executes this:
"c:\Path\To\prog.exe" Argument1 argument2 2>&1
The incantation 2>&1 redirects prog.exe's StdErr to StdOut. CMD.EXE will propagate the exit code.
You can now succeed by reading from StdOut and ignoring StdErr.
The downside is that the StdErr and StdOut output get mixed together. As long as they are recognisable you can probably work with this.
Another technique which might help in this situation is to redirect the standard error stream of the command to accompany the standard output.
Do this by adding "%comspec% /c" to the front and "2>&1" to the end of the execStr string.
That is, change the command you run from:
zzz
to:
%comspec% /c zzz 2>&1
The "2>&1" is a redirect instruction which causes the StdErr output (file descriptor 2) to be written to the StdOut stream (file descriptor 1).
You need to include the "%comspec% /c" part because it is the command interpreter which understands about the command line redirect. See http://technet.microsoft.com/en-us/library/ee156605.aspx
Using "%comspec%" instead of "cmd" gives portability to a wider range of Windows versions.
If your command contains quoted string arguments, it may be tricky to get them right:
the specification for how cmd handles quotes after "/c" seems to be incomplete.
With this, your script needs only to read the StdOut stream, and will receive both standard output and standard error.
I used this with "net stop wuauserv", which writes to StdOut on success (if the service is running)
and StdErr on failure (if the service is already stopped).
First, your loop is broken in that it always tries to read from oExec.StdOut first. If there is no actual output then it will hang until there is. You wont see any StdErr output until StdOut.atEndOfStream becomes true (probably when the child terminates). Unfortunately, there is no concept of non-blocking I/O in the script engine. That means calling read and having it return immediately if there is no data in the buffer. Thus there is probably no way to get this loop to work as you want. Second, WShell.Run does not provide any properties or methods to access the standard I/O of the child process. It creates the child in a separate window, totally isolated from the parent except for the return code. However, if all you want is to be able to SEE the output from the child then this might be acceptable. You will also be able to interact with the child (input) but only through the new window (see SendKeys).
As for using ReadAll(), this would be even worse since it collects all the input from the stream before returning so you wouldn't see anything at all until the stream was closed. I have no idea why the example places the ReadAll in a loop which builds a string, a single if (!WScript.StdIn.AtEndOfStream) should be sufficient to avoid exceptions.
Another alternative might be to use the process creation methods in WMI. How standard I/O is handled is not clear and there doesn't appear to be any way to allocate specific streams as StdIn/Out/Err. The only hope would be that the child would inherit these from the parent but that's what you want, isn't it? (This comment based upon an idea and a little bit of research but no actual testing.)
Basically, the scripting system is not designed for complicated interprocess communication/synchronisation.
Note: Tests confirming the above were performed on Windows XP Sp2 using Script version 5.6. Reference to current (5.8) manuals suggests no change.
Yes, the Exec function seems to be broken when it comes to terminal output.
I have been using a similar function function ConsumeStd(e) {WScript.StdOut.Write(e.StdOut.ReadAll());WScript.StdErr.Write(e.StdErr.ReadAll());} that I call in a loop similar to yours. Not sure if checking for EOF and reading line by line is better or worse.
You might have hit the deadlock issue described on this Microsoft Support site.
One suggestion is to always read both from stdout and stderr.
You could change readAllFromAny to:
function readAllFromAny(oExec)
{
var output = "";
if (!oExec.StdOut.AtEndOfStream)
output = output + oExec.StdOut.ReadLine();
if (!oExec.StdErr.AtEndOfStream)
output = output + "STDERR: " + oExec.StdErr.ReadLine();
return output ? output : -1;
}