Powershell, wait for first process to end - docker

Powershell script should exit if one of the main Processes stopped.
This script is the main Docker process. Docker container should stop if one of those Apps (app1, app2)stopped.
Current approach is to use Exit Events for one of the Apps and Wait-Process for the other. Is there a better approach?
$pApp1 = Start-Process -PassThru app1
$pApp2 = Start-Process -PassThru app2
Register-ObjectEvent -InputObject $pApp1 -EventName exited -Action {
Get-EventSubscriber | Unregister-Event
exit 1
}
Wait-Process -Id $pApp2.id
exit 1

Wait for the HasExited property on either of them to change:
$apps = 'app1','app2' |ForEach-Object { Start-Process $_ -PassThru }
while(#($apps |Where HasExited -eq $true).Count -lt 1){
Write-Host "Waiting for one of them to exit..."
Start-Sleep -Seconds 1
}

As of PowerShell 7.2.1, Wait-Process, when given multiple processes, invariably waits for all of them to terminate before returning; potentially introducing an -Any switch so as to only wait for any one among them is the subject of GitHub proposal #16972, which would simplify the solution to Wait-Process -Any -Id $pApp1.id, $pApp2.id
Delegating waiting for the processes to exit to thread / background jobs avoids the need for an event-based or periodic-polling solution.
# Start all processes asynchronously and get process-information
# objects for them.
$allPs = 'app1', 'app2' | ForEach-Object { Start-Process -PassThru $_ }
# Start a thread job for each process that waits for that process to exit
# and then pass the process-info object for the terminated process through.
# Exit the overall pipeline once the first output object from one of the
# jobs is received.
$terminatedPs = $allPs |
ForEach-Object { Start-ThreadJob { $ps = $using:_; Wait-Process -Id $ps.Id; $ps } } |
Receive-Job -Wait |
Select-Object -First 1
Write-Verbose -Verbose "Process with ID $($terminatedPs.Id) exited."
exit 1
Note:
I'm using he Start-ThreadJob cmdlet, which offers a lightweight, much faster thread-based alternative to the child-process-based regular background jobs created with Start-Job.
It comes with PowerShell (Core) 7+ and in Windows PowerShell can be installed on demand with, e.g., Install-Module ThreadJob -Scope CurrentUser.
In most cases, thread jobs are the better choice, both for performance and type fidelity - see the bottom section of this answer for why.
If Start-ThreadJob isn't available to you / cannot be installed, simply substitute Start-Job in the code above.
PowerShell (Core) 7+-only solution with ForEeach-Object -Parallel:
PowerShell 7.0 introduced the -Parallel parameter to the ForEach-Object cmdlet, which in essence brings thread-based parallelism to the pipeline; it is a way to create multiple, implicit thread jobs, one for each pipeline input object, that emit their output directly to the pipeline (albeit in no guaranteed order).
Therefore, the following simplified solution is possible:
# Start all processes asynchronously and get process-information
# objects for them.
$allPs = 'app1', 'app2' | ForEach-Object { Start-Process -PassThru $_ }
$terminatedPs = $allPs |
ForEach-Object -Parallel { $_ | Wait-Process; $_ } |
Select-Object -First 1
Write-Verbose -Verbose "Process with ID $($terminatedPs.Id) exited."
exit 1

Related

Vivado Synthesis hangs in Docker container spawned by Jenkins

I'm attempting to move our large FPGA build into a Jenkins CI environment, but the build hangs at the end of synthesis when run in a Docker container spawned by Jenkins.
I've attempted to replicate the environment that Jenkins is creating, but when I spawn a Docker container myself, there's no issue with the build.
I've tried:
reducing the number of jobs (aka threads) that Vivado uses, thinking
that perhaps there was some thread collision occurring when writing
out log files
on the same note, used the -nolog -nojournal options on the vivado
commands to remove any log file collisions
taking control of the cloned/checked-out project and running commands
as the local user in the Docker container
I also have an extremely small build that makes it through the entire build process in Jenkins with no issue, so I don't think there is a fundamental flaw with my Docker containers.
agent {
docker {
image "vivado:2017.4"
args """
-v <MOUNT XILINX LICENSE FILE>
--dns <DNS_ADDRESS>
--mac-address <MAC_ADDRESS>
"""
}
}
steps {
sh "chmod -R 777 ."
dir(path: "${params.root_dir}") {
timeout(time: 15, unit: 'MINUTES') {
// Create HLS IP for use in Vivado project
sh './run_hls.sh'
}
timeout(time: 20, unit: 'MINUTES') {
// Create vivado project, add sources, constraints, HLS IP, generated IP
sh 'source source_vivado.sh && vivado -mode batch -source tcl/setup_proj.tcl'
}
timeout(time: 20, unit: 'MINUTES') {
// Create block designs from TCL scripts
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_bd.tcl'
}
timeout(time: 1, unit: 'HOURS') {
// Synthesize complete project
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_synth.tcl'
}
}
}
This code block below was running 1 job with a 12 hour timeout. You can see that Synthesis finished, then a timeout occurred 8 hours later.
[2019-04-17T00:30:06.131Z] Finished Writing Synthesis Report : Time (s): cpu = 00:01:53 ; elapsed = 00:03:03 . Memory (MB): peak = 3288.852 ; gain = 1750.379 ; free physical = 332 ; free virtual = 28594
[2019-04-17T00:30:06.131Z] ---------------------------------------------------------------------------------
[2019-04-17T00:30:06.131Z] Synthesis finished with 0 errors, 0 critical warnings and 671 warnings.
[2019-04-17T08:38:37.742Z] Sending interrupt signal to process
[2019-04-17T08:38:43.013Z] Terminated
[2019-04-17T08:38:43.013Z]
[2019-04-17T08:38:43.013Z] Session terminated, killing shell... ...killed.
[2019-04-17T08:38:43.013Z] script returned exit code 143
Running the same commands in locally spawned Docker containers has no issues whatsoever. Unfortunately, the timeout Jenkins step doesn't appear to flush open buffers, as my post:unsuccesful step that prints out all log files doesn't find synth_1, though I wouldn't expect there to be anything different from the Jenkins capture.
Are there any known issues with Jenkins/Vivado integration? Is there a way to enter a Jenkins spawned container so I can try and duplicate what I'm expecting vs what I'm experiencing?
EDIT: I've since added in a timeout in the actual tcl scripts to move past the wait_on_runs command used in run_synth.tcl, but now I'm experiencing the same hanging behavior during implementation.
The problem lies in the way vivado deals (or doesn't deal...) with its forked processes. Specifically I think this applies to the parallel synthesis. This is maybe, why you only see it in some of your projects. In the state you describe above (stuck after "Synthesis finished") I noticed a couple of abandoned zombie processes of vivado. To my understanding these are child processes which ended, but the parent didn't collect the status before ending themselves. Tracing with strace even reveals that vivado tries to kill these processes:
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, 0x7f86edcf4dd0) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, <detached ...>
But (as we all know) you can't kill zombies, they are already dead...
Normally these processes would be adopted by the init process and handled there. But in the case of Jenkins Pipeline in Docker there is no init by default. The pipeline spawns the container and runs cat with no inputs to keep it alive. This way cat becomes pid 1 and takes the abandoned children of vivado. cat of course doesn't know what do do with them and ignores them (a tragedy really).
cat,1
|-(sh,16)
|-sh,30 -c ...
| |-sh,31 -c ...
| | `-sleep,5913 3
| `-sh,32 -xe /home/user/.jenkins/workspace...
| `-sh,35 -xe /home/user/.jenkins/workspace...
| `-vivado,36 /opt/Xilinx/Vivado/2019.2/bin/vivado -mode tcl ...
| `-loader,60 /opt/Xilinx/Vivado/2019.2/bin/loader -exec vivado -mode tcl ...
| `-vivado,82 -mode tcl ...
| |-{vivado},84
| |-{vivado},85
| |-{vivado},111
| |-{vivado},118
| `-{vivado},564
|-(vivado,319)
|-(vivado,370)
|-(vivado,422)
`-(vivado,474)
Luckily there is a way to have an init process in the docker container. Passing the --init argument with the docker run solves the problem for me.
agent {
docker {
image 'vivado:2019.2'
args '--init'
}
}
This creates the init process vivado seems to rely on and the build runs without problems.
Hope this helps you!
Cheers!

How to monitor resources during slurm job?

I'm running jobs on our university cluster (regular user, no admin rights), which uses the SLURM scheduling system and I'm interested in plotting the CPU and memory usage over time, i.e while the job is running. I know about sacct and sstat and I was thinking to include these commands in my submission script, e.g. something in the line of
#!/bin/bash
#SBATCH <options>
# Running the actual job in background
srun my_program input.in output.out &
# While loop that records resources
JobStatus="$(sacct -j $SLURM_JOB_ID | awk 'FNR == 3 {print $6}')"
FIRST=0
#sleep time in seconds
STIME=15
while [ "$JobStatus" != "COMPLETED" ]; do
#update job status
JobStatus="$(sacct -j $SLURM_JOB_ID | awk 'FNR == 3 {print $6}')"
if [ "$JobStatus" == "RUNNING" ]; then
if [ $FIRST -eq 0 ]; then
sstat --format=AveCPU,AveRSS,MaxRSS -P -j ${SLURM_JOB_ID} >> usage.txt
FIRST=1
else
sstat --format=AveCPU,AveRSS,MaxRSS -P --noheader -j ${SLURM_JOB_ID} >> usage.txt
fi
sleep $STIME
elif [ "$JobStatus" == "PENDING" ]; then
sleep $STIME
else
sacct -j ${SLURM_JOB_ID} --format=AllocCPUS,ReqMem,MaxRSS,AveRSS,AveDiskRead,AveDiskWrite,ReqCPUS,AllocCPUs,NTasks,Elapsed,State >> usage.txt
JobStatus="COMPLETED"
break
fi
done
However, I'm not really convinced of this solution:
sstat unfortunately doesn't show how many cpus are used at the
moment (only average)
MaxRSS is also not helpful if I try to record memory usage over time
there still seems to be some error (script doesn't stop after job finishes)
Does anyone have an idea how to do that properly? Maybe even with top or htop instead of sstat? Any help is much appreciated.
Slurm offers a plugin to record a profile of a job (PCU usage, memory usage, even disk/net IO for some technologies) into a HDF5 file. The file contains a time series for each measure tracked, and you can choose the time resolution.
You can activate it with
#SBATCH --profile=<all|none|[energy[,|task[,|filesystem[,|network]]]]>
See the documentation here.
To check that this plugin is installed, run
scontrol show config | grep AcctGatherProfileType
It should output AcctGatherProfileType = acct_gather_profile/hdf5.
The files are created in the folder referred to in the ProfileHDF5Dir Slurm configuration parameter (in slurm.conf)
As for your script, you could try replacing sstat with an SSH connection to the compute nodes to run ps. Assuming pdsh or clush is installed, you could run something like:
pdsh -j $SLURM_JOB_ID ps -u $USER -o pid,state,cputime,%cpu,rssize,command --columns 100 >> usage.txt
This will give you CPU and memory usage per process.
As a final note, your job never terminates simply because it will terminate when the while loop terminates, and the while loop will terminate when the job terminates... The condition "$JobStatus" == "COMPLETED" will never be observed from within the script. When the job is completed, the script is killed.

Convert unix script to use gnu parallel

I have the following piece of code, which works as expected. It ensures that 2 processes are always spawned, and if any process fails, the script comes to a halt.
I have worked with GNU parallel earlier on simple one line scripts and they have worked really well.I'm sure the one below too can be made simpler.
The sleeper function in reality is MUCH more complex than one shown below.
The objective is that GNU parallel will call sleeper function in parallel and also do error handling
`sleeper(){
stat=$1
sleep 5
echo "Status is $1"
return $1
}
PROCS=2
errfile="errorfile"
rm "$errfile"
while read LINE && [ ! -f "$errfile" ]
do
while [ ! -f "$errfile" ]
do
NUM=$(jobs | wc -l)
if [ $NUM -lt $PROCS ]; then
(sleeper $LINE || echo "bad exit status" > "$errfile") &
break
else
sleep 2
fi
done
done<sleep_file
wait`
Thanks
What you are looking for is --halt (requires version 20150622):
sleeper(){
stat=$1
sleep 5
echo "Status is $1"
return $1
}
export -f sleeper
parallel -j2 --halt now,fail=1 -v sleeper ::: 0 0 0 1 0 1 0
If you do not want the sleeper to get killed (maybe you want it to finish so it cleans up), then use --halt soon,fail=1 to let the running jobs complete without starting new ones.

Simultaneous PowerShell script execution

I have written the following script for SQL patching:
cls
$computers = Get-Content D:\Abhi\Server.txt
foreach ($line in $computers)
{
psexec \\$line -s -u Adminuser -p AdminPassword msiexec /i D:\SQL_PATCH\rsSharePoint.msi SKIPCA=1 /qb
}
My doubt here is to parallelize this script execution on all the servers mentioned in the text file. Meaning, as soon I start the execution of the script, this should initiate the patching activity on the servers simultaneously and also to track the progess on all the servers, as this script is doing now only for one server.
Kindly help me on this.
Thanks Ansgar Wiechers.
This piece of code did it. It helps in executing the .exe simultaneously on all the servers as well as track their status:
cls
$servers = Get-Content 'D:\Abhi\Server.txt'
$servers | ForEach-Object {$comp = $_
Start-Job -ScriptBlock {psexec \\$input -s -u Adminuser -p AdminPassword C:\SQL_PATCH\SQLServer2008R2SP3-KB2979597-x64-ENU.exe /quiet /action=patch /allinstances /IAcceptSQLServerLicenseTerms} -InputObject $comp}
While (Get-Job -State Running)
{
Get-Job | Receive-Job
#Start-Sleep 2
#Write-Host "Waiting for update removal to finish ..."
}
Remove-Job *

Long Running Powershell Script Freezes

We are using a long running PowerShell script to perform a lot of small operations that can take an extremely long amount of time. After about 30 minutes the scripts froze. We were able to get the scripts to start running again by pressing Ctrl-C which caused the scripts to resume execution instead of killing the process.
Is there some sort of script timeout or mechanism that prevents long running scripts within PowerShell?
I had this problem due to a bad habit I have. If you select a little bit of text inside a console powershell, scripts logs freeze. Make sure nothing is selected after launching a big script :)
Like mentioned, when clicking/selecting text in powershell console, the script stops. You can disable this behaviour like this:
Right-click the title bar
Select Properties
Select Options
Under Edit Options, disable QuickEdit Mode
Note: You won't be able to select text from powershell window anymore.
Try my kill timer script. Just change the $ScriptLocation variable to the script you want to run. That script will then run as a background job while the current windows keeps track of the timer. After the time expires the current window will kill the background job and write it all to logs.
Start-Transcript C:\Transcriptlog-Cleanup.txt #write log to this location
$p = Get-Process -Id $pid | select -Expand id # -Expand selects the string from the object id out of the current process.
Write-Host $p
$BJName = "Clean-up-script" #Define Background job name
$startTime = (Get-Date) # set start time
$startTime
$expiration = (Get-Date).AddMinutes(2)#program expires at this time
# you could change the expiration time by changing (Get-Date).AddSeconds(20) to (Get-Date).AddMinutes(10)or to hours or whatever you like
#-----------------
#Timer update function setup
function UpdateTime
{
$LeftMinutes = ($expiration) - (Get-Date) | Select -Expand minutes # sets minutes left to left time
$LeftSeconds = ($expiration) - (Get-Date) | Select -Expand seconds # sets seconds left to left time
#Write time to console
Write-Host "------------------------------------------------------------------"
Write-Host "Timer started at : " $startTime
Write-Host "Current time : " (Get-Date)
Write-Host "Timer ends at : " $expiration
Write-Host "Time on expire timer : " $LeftMinutes "Minutes" $LeftSeconds "Seconds"
Write-Host "------------------------------------------------------------------"
}
#get background job info and remove the it afterwards + print info
function BJManager
{
Receive-Job -Name $BJName #recive background job results
Remove-Job -Name $BJName -Force #remove job
Write-Host "Retrieving Background-Job info and Removing Job..."
}
#-----------------
$ScriptLocation = "C:\\Local-scripts\Windows-Server-CleanUp-Script-V2.4(Beta).ps1" #change this Var for different kind of script locations
Start-Job -Name $BJName -FilePath $ScriptLocation #start this script as background job
# dont start job in the loop.
do{ #start loop
Write-Host "Working"#start doing other script stuff
Start-Sleep -Milliseconds 5000 #add delay to reduce spam and processing power
UpdateTime #call upadate function to print time
Get-Job -Name $BJName | select Id, State ,Location , Name
if((Get-Job).State -eq "Failed")
{
BJManager
}
elseif((Get-Job).State -eq "Completed")
{
BJManager
}
}
until ($p.HasExited -or (Get-Date) -gt $expiration) #check exit time
Write-Host "Timer Script Finished"
Get-Job -Name $BJName | select Id, State ,Location , Name
UpdateTime
BJManager
Start-Sleep -Milliseconds 5000 #give it some time to write to log
Stop-Transcript
Start-Sleep -Milliseconds 5000 #give it some time to stop the logging before killing process
if (-not $p.HasExited) { Stop-Process -ID $p -PassThru } # kill process after time expires
try to add percentage calculation in your script.. so you can identity that how much time it would take to complete...

Resources