How to print out elapsed time using pause in ansible playbook - stdout

I'm using a pause command in my playbook that pauses for 10 minutes while EC2 instances are built in AWS. I would like to see the elapsed time during the 10 minutes instead of guessing when the pause started. I see that stdout is an output of the pause command but I can't seem to get ansible to print stdout while the pause is pausing.
- name: Wait 10 minutes for the infra machines to be built in AWS
pause:
minutes: 10
Output:
TASK [machinesets : Wait 10 minutes for the infra machines to be built in AWS] ********************************************
Pausing for 600 seconds
(ctrl+C then 'C' = continue early, ctrl+C then 'A' = abort)

I don't think the pause module prints out the progress as it waits. You could instead add a print right before the pause. Something like:
- name: Print datetime before pause
run_once: true
debug:
msg: "{{ lookup('pipe', 'date') }}"
- name: Wait 10 minutes for the infra machines to be built in AWS
pause:
minutes: 10

Related

Docker container logging interval: logs are not being send as they are outputted by container

I am building my first very simple container. The python script simply outputs the message with the number of seconds that have passed.
When I try to read the log using:
docker logs "container name"
I do get the logs, however, it seems like the logs are only sent to the log file at a certain interval. In my case, this interval is about 3.5 minutes. I want there to be a constant stream (so that when the container outputs x, it is immediately written to the log file.
Example of the log with timestamps: (only shows a small part)
...
2022-08-09T09:46:24.677360000Z Waiting for file... 208 seconds passed
2022-08-09T09:46:24.677364500Z Waiting for file... 209 seconds passed
2022-08-09T09:46:24.677369700Z Waiting for file... 210 seconds passed
2022-08-09T09:46:24.677388700Z Waiting for file... 211 seconds passed
2022-08-09T09:46:24.677395900Z Waiting for file... 212 seconds passed
2022-08-09T09:49:54.949131800Z Waiting for file... 213 seconds passed
2022-08-09T09:49:54.949169300Z Waiting for file... 214 seconds passed
2022-08-09T09:49:54.949176000Z Waiting for file... 215 seconds passed
2022-08-09T09:49:54.949180700Z Waiting for file... 216 seconds passed
...
use follow flag
--follow, -f
try it
docker container logs --follow CONTAINER-NAME
relevant link https://docs.docker.com/engine/reference/commandline/container_logs/#usage

GNU Parallel: thread id

In GNU Parallel with the -j option it is possible to specify the number concurrent jobs.
Is it possible to get an id of the thread running the job?. With thread id I mean a number from
1 to 12 on my machine with 12 threads. As of now I use the following workaround:
doit() {
let var=$1*12+$2
echo $var $2
}
export -f doit
for ((i=0;i<2;++i))
do
parallel -j12 doit ::: $i ::: {1..12}
done
This has the problem that every iteration of the loop waits for all 12 threads to finish.
I am only interested in not running iterations with the same thread id concurrently.
My motivation for this is that every thread uses a writelock on one of 12 files. I got exactly 12 files and if a thread on one file finishes, the next thread could immediately use this file again.
As #MarkSetchell writes you should use the replacement string {%} which gives the jobslot number:
parallel --line-buffer -j12 'echo starting job {#} on {%}; sleep {=$_=rand()*30=}; echo finishing job {#} on {%}' ::: {1..50}

docker-compose healthcheck retry frequency != interval

I recently set up healthchecks in my docker-compose config.
It is doing great and I like it. Here's a typical example:
services:
app:
healthcheck:
test: curl -sS http://127.0.0.1:4000 || exit 1
interval: 5s
timeout: 3s
retries: 3
start_period: 30s
My container is quite slow to boot, hence I set up a 30 seconds start_period.
But it doesn't really fit my expectation: I don't need check every 5 seconds, but I need to know when the container is ready for the first time as soon as possible for my orchestration, and since my start_period is approximative, if it is not ready yet at first check, I have to wait for interval before retry.
What I'd like to have is:
While container is not healthy, retry every 5 seconds
Once it is healthy, check every 1 minute
Ain't there a way to achieve this out-of-the-box with docker-compose?
I could write a custom script to achieve this, but I'd rather have a native solution if it is possible.
Unfortunately, this is not possible out of the box.
All the duration set are final. They can't be changed depending on the container state.
However, according to the documentation, the probe does not seem to wait for the start_period to finish before checking your test. The only thing it does is that any failure hapenning during start_period will not be considered as an error.
Below is the sentence that make me think that :
start_period provides initialization time for containers that need time to bootstrap. Probe failure during that period will not be counted towards the maximum number of retries. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.
I encourage you to test if this is really the case as I've never really paid any attention if the healthcheck is tested during the start period or not.
And if it is the case, you can probably increase your start_period if you're unsure about the duration and also increase the interval in order to find a good compromise.
I wrote a script that does this, though I'd rather find a native solution:
#!/bin/sh
HEALTHCHECK_FILE="/root/.healthchecked"
COMMAND=${*?"Usage: healthcheck_retry <COMMAND>"}
if [ -r "$HEALTHCHECK_FILE" ]; then
LAST_HEALTHCHECK=$(date -r "$HEALTHCHECK_FILE" +%s)
# FIVE_MINUTES_AGO=$(date -d 'now - 5 minutes' +%s)
FIVE_MINUTES_AGO=$(echo "$(( $(date +%s)-5*60 ))")
echo "Healthcheck file present";
# if (( $LAST_HEALTHCHECK > $FIVE_MINUTES_AGO )); then
if [ $LAST_HEALTHCHECK -gt $FIVE_MINUTES_AGO ]; then
echo "Healthcheck too recent";
exit 0;
fi
fi
if $COMMAND ; then
echo "\"$COMMAND\" succeed: updating file";
touch $HEALTHCHECK_FILE;
exit 0;
else
echo "\"$COMMAND\" failed: exiting";
exit 1;
fi
Which I use: test: /healthcheck_retry.sh curl -fsS localhost:4000/healthcheck
The pain is that I need to make sure the script is available in every container, so I have to create an extra volume for this:
image: postgres:11.6-alpine
volumes:
- ./scripts/utils/healthcheck_retry.sh:/healthcheck_retry.sh

Vivado Synthesis hangs in Docker container spawned by Jenkins

I'm attempting to move our large FPGA build into a Jenkins CI environment, but the build hangs at the end of synthesis when run in a Docker container spawned by Jenkins.
I've attempted to replicate the environment that Jenkins is creating, but when I spawn a Docker container myself, there's no issue with the build.
I've tried:
reducing the number of jobs (aka threads) that Vivado uses, thinking
that perhaps there was some thread collision occurring when writing
out log files
on the same note, used the -nolog -nojournal options on the vivado
commands to remove any log file collisions
taking control of the cloned/checked-out project and running commands
as the local user in the Docker container
I also have an extremely small build that makes it through the entire build process in Jenkins with no issue, so I don't think there is a fundamental flaw with my Docker containers.
agent {
docker {
image "vivado:2017.4"
args """
-v <MOUNT XILINX LICENSE FILE>
--dns <DNS_ADDRESS>
--mac-address <MAC_ADDRESS>
"""
}
}
steps {
sh "chmod -R 777 ."
dir(path: "${params.root_dir}") {
timeout(time: 15, unit: 'MINUTES') {
// Create HLS IP for use in Vivado project
sh './run_hls.sh'
}
timeout(time: 20, unit: 'MINUTES') {
// Create vivado project, add sources, constraints, HLS IP, generated IP
sh 'source source_vivado.sh && vivado -mode batch -source tcl/setup_proj.tcl'
}
timeout(time: 20, unit: 'MINUTES') {
// Create block designs from TCL scripts
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_bd.tcl'
}
timeout(time: 1, unit: 'HOURS') {
// Synthesize complete project
sh 'source source_vivado.sh && vivado -mode batch -source tcl/run_synth.tcl'
}
}
}
This code block below was running 1 job with a 12 hour timeout. You can see that Synthesis finished, then a timeout occurred 8 hours later.
[2019-04-17T00:30:06.131Z] Finished Writing Synthesis Report : Time (s): cpu = 00:01:53 ; elapsed = 00:03:03 . Memory (MB): peak = 3288.852 ; gain = 1750.379 ; free physical = 332 ; free virtual = 28594
[2019-04-17T00:30:06.131Z] ---------------------------------------------------------------------------------
[2019-04-17T00:30:06.131Z] Synthesis finished with 0 errors, 0 critical warnings and 671 warnings.
[2019-04-17T08:38:37.742Z] Sending interrupt signal to process
[2019-04-17T08:38:43.013Z] Terminated
[2019-04-17T08:38:43.013Z]
[2019-04-17T08:38:43.013Z] Session terminated, killing shell... ...killed.
[2019-04-17T08:38:43.013Z] script returned exit code 143
Running the same commands in locally spawned Docker containers has no issues whatsoever. Unfortunately, the timeout Jenkins step doesn't appear to flush open buffers, as my post:unsuccesful step that prints out all log files doesn't find synth_1, though I wouldn't expect there to be anything different from the Jenkins capture.
Are there any known issues with Jenkins/Vivado integration? Is there a way to enter a Jenkins spawned container so I can try and duplicate what I'm expecting vs what I'm experiencing?
EDIT: I've since added in a timeout in the actual tcl scripts to move past the wait_on_runs command used in run_synth.tcl, but now I'm experiencing the same hanging behavior during implementation.
The problem lies in the way vivado deals (or doesn't deal...) with its forked processes. Specifically I think this applies to the parallel synthesis. This is maybe, why you only see it in some of your projects. In the state you describe above (stuck after "Synthesis finished") I noticed a couple of abandoned zombie processes of vivado. To my understanding these are child processes which ended, but the parent didn't collect the status before ending themselves. Tracing with strace even reveals that vivado tries to kill these processes:
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, 0x7f86edcf4dd0) = 0
kill(319, SIG_0) = 0
kill(370, SIG_0) = 0
kill(422, SIG_0) = 0
kill(474, SIG_0) = 0
nanosleep({tv_sec=5, tv_nsec=0}, <detached ...>
But (as we all know) you can't kill zombies, they are already dead...
Normally these processes would be adopted by the init process and handled there. But in the case of Jenkins Pipeline in Docker there is no init by default. The pipeline spawns the container and runs cat with no inputs to keep it alive. This way cat becomes pid 1 and takes the abandoned children of vivado. cat of course doesn't know what do do with them and ignores them (a tragedy really).
cat,1
|-(sh,16)
|-sh,30 -c ...
| |-sh,31 -c ...
| | `-sleep,5913 3
| `-sh,32 -xe /home/user/.jenkins/workspace...
| `-sh,35 -xe /home/user/.jenkins/workspace...
| `-vivado,36 /opt/Xilinx/Vivado/2019.2/bin/vivado -mode tcl ...
| `-loader,60 /opt/Xilinx/Vivado/2019.2/bin/loader -exec vivado -mode tcl ...
| `-vivado,82 -mode tcl ...
| |-{vivado},84
| |-{vivado},85
| |-{vivado},111
| |-{vivado},118
| `-{vivado},564
|-(vivado,319)
|-(vivado,370)
|-(vivado,422)
`-(vivado,474)
Luckily there is a way to have an init process in the docker container. Passing the --init argument with the docker run solves the problem for me.
agent {
docker {
image 'vivado:2019.2'
args '--init'
}
}
This creates the init process vivado seems to rely on and the build runs without problems.
Hope this helps you!
Cheers!

PowerShell remote invocation mysteriously hangs

I have created a series of functions that basically collect all the IIS configurations about a site, when run on a server locally it executes without issue (albeit slowly) however when I run them remotely using an invoke-command in PowerShell 2 it runs through and mysteriously stops approximately 15-20 seconds into the process. It generally stalls on the same request but not always. The same commands executed locally work without any issues. No exception is raised, it just hangs indefinitely.
I can post the code if necessary however it is several hundred lines so I'm more looking for guidance on how to investigate a problem like this or if anyone has encountered something similar.
Comparing IISConfig between [targetserver] and localhost.
Checking Installed IIS version on [targetserver]:
IIS major version : 7
IIS minor version : 5
IIS7+ detected, using WebAdmin module and IIS metabase
Name Value
---- -----
name Default Web Site
id 1
serverAutoStart True
state 1
Site Configuration:
Name Path PSPath Handlers_Ac Access_sslF Asp_AppAllo Asp_AppAllo Asp_limits_ Asp_EnableP Asp_limits_
cessFlags lags wClientDebu wDebugging bufferingLi arentPaths queueTimeou
g mit t
---- ---- ------ ----------- ----------- ----------- ----------- ----------- ----------- -----------
Default ... IIS:Site... WebAdmin... Read,Script False False 25000000 True 00:00:00
WebApp VDir: /MyApp, App Pool: MyApp
App pool Configuration:
AppPoolID Enable32Bit managedPipe managedRunt AppPoolName AppPoolAuto processMode processMode processMode recycling_l
AppOnWin64 lineMode imeVersion Start l_idleTimeo l_identityT l_UserName ogEventOnRe
ut ype cycle
--------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
False Classic v2.0 MyApp True 00:20:00 LocalSer... Time,Req...
Analyzing web directories for /MyApp, this could take a while....
Initial Collection Completed, found 141... took 0.9516122 seconds
0 C:\inetpub\wwwroot\MyApp\Core
1 C:\inetpub\wwwroot\MyApp\Core\AdminTools
2 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Cache
3 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Extra
4 C:\inetpub\wwwroot\MyApp\Core\AdminTools\HTTPPostTest
5 C:\inetpub\wwwroot\MyApp\Core\AdminTools\IISAdmin
6 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Profiling
7 C:\inetpub\wwwroot\MyApp\Core\AdminTools\RecordTestData
8 C:\inetpub\wwwroot\MyApp\Core\AdminTools\ScrambleTest
9 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
Analyzed 10 so far... took 6.7236862 seconds, remaining time 88.08028922 seconds
Current Folder: C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
10 C:\inetpub\wwwroot\MyApp\Core\AdminTools\SoapTest
11 C:\inetpub\wwwroot\MyApp\Core\AdminTools\StaticContent
Sometimes it makes it to 15 or so. I tried from my laptop and from one server to another and the behavior is the same.
Here is the loop which is hanging:
$start = [System.DateTime]::Now
$numanalyzed = 0
if ($true) #skip to test
{
# loop through all physical folders as it is much faster
foreach ($folder in $folders)
{
write-host $numanalyzed $folder.fullname
#figure out the virtual path to the folder
$iis7vwebfolderpath = $folder.FullName.Replace($iis7webapp.PhysicalPath, $iis7VDirWebApppath)
#Get-item $iis7vwebfolderpath | gm
$iis7VWebDirConfigItem = Get-LNOSIIS7ConfigForPSPath -PSPath $iis7vwebfolderpath
# add new item to list
$iis7VWebDirConfig += $iis7VWebDirConfigItem
# increment counter and report out progress every 10
$numAnalyzed++
if ($numanalyzed % 10 -eq 0)
{
$end = [System.DateTime]::Now
$timeSoFar = (NEW-TIMESPAN –Start $Start –End $End).TotalSeconds
$timeremaining = ($folders.Count - $numAnalyzed) * ($timeSoFar / $numanalyzed)
"Analyzed {0} so far... took {1} seconds, remaining time {2} seconds" -f $numanalyzed,$timeSoFar,$timeremaining | write-host
"Current Folder: {0}" -f $folder.FullName | Write-Host
}
}
}
$end = [System.DateTime]::Now
"Processed web dirs: {0} took {1} seconds" -f $iis7VWebDirConfig.Count,(NEW-TIMESPAN –Start $Start –End $End).TotalSeconds | write-host | Write-Host
The function I'm having performance problems with and I've got a separate question about but this post has the source code for the function:
web-administration vs WMI to query web directory properties performance problems
In my case, it seemed my PowerShell call froze due to the Idle-Timeout expiration (the call runs for a very long time).
Setting IdleTimeout value to a sufficiently long duration fixed my issue.
Once again, query the current configuration using
winrm get winrm/config/winrs
And set the timeout using
winrm set winrm/config/winrs '#{IdleTimeout="18000000"}'
I think i may have discovered the problem, i started getting some odd failures in other parts of the script:
[SEVERNAME] Processing data from remote server SERVERNAME failed with the following error message: The WSMan provider host process did not return a proper response. A provider in the host process may have behaved improperly. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OpenError: (SERVERNAME:String) [], PSRemotingTransportException
+ FullyQualifiedErrorId : 1726,PSSessionStateBroken
and
Processing data for a remote command failed with the following error message: Not enough storage is available to complete this operation. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OperationStopped (System.Manageme...pressionSyncJob:PSInvokeExpressionSyncJob) [], PSRemotingTransportException
+ FullyQualifiedErrorId : JobFailure
This lead me to the following site: http://www.gsx.com/blog/bid/83018/Troubleshooting-unknown-PowerShell-error-messages
The following recommendations seems to have cleared up most of the problems although i still have some testing to do.
Excerpt from site below:
As the first error message specifies, an overflow of memory in the remote session has occurred. Open a PowerShell prompt on the remote server and display the configuration of winrs using:
winrm get winrm/config/winrs
Check the "MaxMemoryPerShellMB" value. It is set by default to 150 MB on Windows Server 2008 R2 and Windows 7. This is something that Microsoft changed in Windows Server 2012 and Windows 8 to 1024 MB.
In order to resolve this issue, you need to increase the value to at least 512 MB with the following command:
winrm set winrm/config/winrs `#`{MaxMemoryPerShellMB=`"512`"`}
As an FYI if Invoke-Command always hangs:
Try a simple command to system :
Invoke-Command -ComputerName XXXXX -ScriptBlock { Get-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion }
Start the Windows Remote Management Service (on that system)
Check for the listening port:
netstat -aon | findstr "5985"
TCP 0.0.0.0:5985 0.0.0.0:0 LISTENING 4
TCP [::]:5985 [::]:0 LISTENING 4

Resources