Docker container RestartCount not incrementing - docker

Test
def test_can_pop_new_container(self):
config = {
'ip': '10.49.0.2',
'subnet': '10.49.0.0/16',
'gateway': '10.49.0.202',
'vlan': 102,
'hostname': 'test-container',
}
container = container_services.pop_new_container(config, self.docker_api)
inspection = self.docker_api.inspect_container(container.get('Id'))
print('before', inspection.get('RestartCount'), inspection.get('StartedAt'))
container_services.restart(container, self.docker_api)
new_inspection = self.docker_api.inspect_container(container.get('Id'))
print('after', new_inspection.get('RestartCount'), new_inspection.get('StartedAt'))
Code
def restart(container, docker_client):
return docker_client.restart(container.get('Id'))
Output
From the test I get
before 0 None
after 0 None
From docker ps that confirm the container restarted.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
86f16438ffdd docker.akema.fr:5000/coaxis/coaxisopt_daemon:latest "/usr/bin/supervis..." 28 seconds ago Up 17 seconds confident_dijkstra
Question
Why is RestartCount still at 0 then? Am I using the wrong field?

As already indicated in the comment, the field RestartCount is used in the context of Restart Policies to keep track of restart attempts in case of failures.
It will not be incremented in case of user-initiated restarts.
You can look at docker events to keep track on normal container restarts. This is also available for dockerpy.

Related

WhatsApp Business API - How to access a wacore container to Check-Health with Postman

After proceed with the installation of WhatsApp Business API (developer single instance) in docker on windows 10 Enterprise, I´m facing the following msg error when using https://192.168.43.200:8080/v1/health by postman
Error msg:
{
"meta": {
"version": "v2.33.3",
"api_status": "stable"
},
"errors": [
{
"code": 1014,
"title": "Internal error",
"details": "php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution. Please check if wacore is running: wacore:6252"
}
]
}
Looking in log files it´s seems that the core is listening in one port that is different from expected by web
---> Web log
[2021-02-24 12:46:38.560338] app.INFO: [064af96616514f6f8b41fc530047db4b] Matched route "{route}". {"route":"GET_v1_health","route_parameters":{"_controller":"WhatsApp\Controller\HealthController::getHealth","_route":"GET_v1_health"},"request_uri":"https://192.168.43.200:8080/v1/health","method":"GET"} []
[2021-02-24 12:46:38.587929] app.INFO: [064af96616514f6f8b41fc530047db4b] Guard authentication successful! {"token":"[object] (Symfony\Component\Security\Guard\Token\PostAuthenticationGuardToken: PostAuthenticationGuardToken(user="admin", authenticated=true, roles="ROLE_ADMIN"))","authenticator":"WhatsApp\Security\TokenAuthenticator"} []
[2021-02-24 12:47:14.646964] app.INFO: [064af96616514f6f8b41fc530047db4b] Response: {"meta":{"version":"v2.33.3","api_status":"stable"},"errors":[{"code":1014,"title":"Internal error","details":"php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution. Please check if wacore is running: wacore:6252"}]} []
[2021-02-24 12:47:14.650236] app.INFO: [064af96616514f6f8b41fc530047db4b] Request GET_/v1/health returns 500 in 36269.15 ms [] []
===================================================================================
Core log
D 2021-02-24 12:10:39.282 UTC 28 apiendpointmanager.cpp:190] Endpoint "healthcheck" is listening on address "0.0.0.0" port 6253 req_id=Main
D 2021-02-24 12:10:39.282 UTC 29 apiendpointmanager.cpp:190] Endpoint "control" is listening on address "0.0.0.0" port 6252 req_id=Main
===================================================================================
No one change was executed in docker-compose.yml. Is the same that is on GitHub (https://github.com/WhatsApp/WhatsApp-Business-API-Setup-Scripts) except network mode was changed "bridge" to "nat" since I´m using windows
$ docker network ls
NETWORK ID NAME DRIVER SCOPE
9d811d5d3283 Default Switch ics local
27dc22b69113 nat nat local
4e2733cd792d none null local
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
8d7000856b95 docker.whatsapp.biz/web:v2.33.3 "/opt/whatsapp/bin/w…" 17 hours ago Exited (4294967295) 6 minutes ago postgres_waweb_1
909781cdb775 docker.whatsapp.biz/coreapp:v2.33.3 "/opt/whatsapp/bin/w…" 17 hours ago Up 5 minutes 6250-6253/tcp postgres_wacore_1
7d68b7a61cad postgres:10.6 "docker-entrypoint.s…" 17 hours ago Up 6 minutes 5432/tcp, 33060/tcp, 0.0.0.0:33060->3306/tcp postgres_db_1
219b1e393f21 nginx "/docker-entrypoint.…" 42 hours ago Exited (4294967295) 41 hours ago nostalgic_jennings
The current WA_API_VERSION is 2.33.3
Database used is Postgress10.6
Looking at a similar question answered by #WeiyanWang (How to access wacore container using WhatsApp Business API) I tried to execute the same command in Postgres, but no success
Regards,
after some investigation on scenery described, I found some settings mistakes in windows docker.
To fix these problems following the steps:
Change docker settings to original installation
Choose the option "Switch to linux container..."
Reinstall the WhatsApp Business Api following documentation
Note: Is not necessary change the "bridge" to "nat" network settings. I just only changed the "waweb" from 9090:443 to 8080:443

Spark executor sends result to a random port though all the ports are explicitly set up

I am trying to run a spark job with PySpark through Jupyter notebook running in Docker. Workers are located on separate machines in the same network. I am performing a take operation on RDD:
data.take(number_of_elements)
When the number_of_elements is 2000 everything works fine. When it is 20000 an exception occurs. From my point of view it breaks when the size of the result exceeds 2GB (or it seems for me so). The idea about 2GB comes from that spark can send results smaller than 2GB in one block and when the result is bigger than 2GB another mechanism starts to work and something breaks there (see here). Here is the exception from executor log:
19/11/05 10:27:14 INFO CodeGenerator: Code generated in 205.7623 ms
19/11/05 10:27:40 INFO PythonRunner: Times: total = 25421, boot = 3, init = 1751, finish = 23667
19/11/05 10:27:42 INFO MemoryStore: Block taskresult_4 stored as bytes in memory (estimated size 927.7 MB, free 6.4 GB)
19/11/05 10:27:42 INFO Executor: Finished task 0.0 in stage 3.0 (TID 4). 972788748 bytes result sent via BlockManager)
19/11/05 10:27:49 ERROR TransportRequestHandler: Error sending result ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=1585998572000, chunkIndex=0}, buffer=org.apache.spark.storage.BlockManagerManagedBuffer#4399ad49} to /10.0.0.9:56222; closing connection
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
at sun.nio.ch.IOUtil.write(IOUtil.java:65)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471)
at org.apache.spark.util.io.ChunkedByteBufferFileRegion.transferTo(ChunkedByteBufferFileRegion.scala:64)
at org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:121)
at io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355)
at io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224)
at io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362)
at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901)
at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776)
at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768)
at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749)
at io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:983)
at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:248)
at io.netty.channel.nio.AbstractNioByteChannel$1.run(AbstractNioByteChannel.java:284)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
As we can see from the log executor tries to send result to 10.0.0.9:56222. It fails because the port is not opened in docker compose. 10.0.0.9 is an IP address of a master node but port 56222 is random though I explicitly set up all ports I can find in documentation to disable random port selection:
spark = SparkSession.builder\
.master('spark://spark.cyber.com:7077')\
.appName('My App')\
.config('spark.task.maxFailures', '16')\
.config('spark.driver.port', '20002')\
.config('spark.driver.host', 'spark.cyber.com')\
.config('spark.driver.bindAddress', '0.0.0.0')\
.config('spark.blockManager.port', '6060')\
.config('spark.driver.blockManager.port', '6060')\
.config('spark.shuffle.service.port', '7070')\
.config('spark.driver.maxResultSize', '14g')\
.getOrCreate()
I mapped these ports with docker compose:
version: "3"
services:
jupyter:
image: jupyter/pyspark-notebook:latest
ports:
- "4040-4050:4040-4050"
- "6060:6060"
- "7070:7070"
- "8888:8888"
- "20000-20010:20000-20010"
You should probably configure you spark driver memory to follow your docker container memory settings
I added
.config('spark.driver.memory', '14g')
as #ML_TN proposed and everything works now.
From my point of view it is strange that the memory setting affects the ports that spark uses.

Starting Zabbix Server within docker replaces strings with nothing in config file →

→ or totally ignored strings like name of new DB for testing purposes.
Firstly tries to add something about ~250 to 250 already added hosts and Z-server shutted down. I've restarted it and inside docker logs I saw this:
6:20191014:091840.201 using configuration file: /etc/zabbix/zabbix_server.conf
6:20191014:091840.223 current database version (mandatory/optional): 04020000/04020001
6:20191014:091840.223 required mandatory version: 04020000
6:20191014:091840.484 __mem_malloc: skipped 7 asked 108424 skip_min 304 skip_max 12192
6:20191014:091840.484 [file:dbconfig.c,line:94] __zbx_mem_realloc(): out of memory (requested 108424 bytes)
6:20191014:091840.484 [file:dbconfig.c,line:94] __zbx_mem_realloc(): please increase CacheSize configuration parameter
6:20191014:091840.484 === memory statistics for configuration cache ===
Solution for those problem was to increase CacheSize in zabbix_server.conf . Okay, that's not a problem and after this Im push a new config to Z-server and restart it... → and z-server stops already after start and logs says the same problem. After reading config in container I saw what string what I corrected to matching my wishes are missing O_o. Strings are deleted.
My config:
LogType=console
DBHost=postgres-server
DBName=zabbix_pwd
DBSchema=public
DBUser=zabbix
DBPassword=zabbix
DBPort=5432
StartPollers=5
StartIPMIPollers=5
StartPollersUnreachable=5
SNMPTrapperFile=/var/lib/zabbix/snmptraps/snmptraps.log
StartSNMPTrapper=1
CacheSize=512M
HistoryCacheSize=512M
HistoryIndexCacheSize=512M
TrendCacheSize=512m
ValueCacheSize=256M
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
Fping6Location=/usr/sbin/fping6
SSHKeyLocation=/var/lib/zabbix/ssh_keys
SSLCertLocation=/var/lib/zabbix/ssl/certs/
SSLKeyLocation=/var/lib/zabbix/ssl/keys/
SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/
LoadModulePath=/var/lib/zabbix/modules/
And what I've getting after starting z-server:
LogType=console
DBHost=postgres-server
DBName=zabbix_pwd
DBSchema=public
DBUser=zabbix
DBPassword=zabbix
DBPort=5432
SNMPTrapperFile=/var/lib/zabbix/snmptraps/snmptraps.log
StartSNMPTrapper=1
AlertScriptsPath=/usr/lib/zabbix/alertscripts
ExternalScripts=/usr/lib/zabbix/externalscripts
FpingLocation=/usr/sbin/fping
Fping6Location=/usr/sbin/fping6
SSHKeyLocation=/var/lib/zabbix/ssh_keys
SSLCertLocation=/var/lib/zabbix/ssl/certs/
SSLKeyLocation=/var/lib/zabbix/ssl/keys/
SSLCALocation=/var/lib/zabbix/ssl/ssl_ca/
LoadModulePath=/var/lib/zabbix/modules/
Any suggestions to how-to rule the world and don't be captured by doctors ?
With docker you need to send conf parameters in the docker-compose.yml file, or in your docker run command using the -e :
For example from my docker yml file:
zabbix-server:
image: zabbix/zabbix-server-pgsql:ubuntu-4.2.6
environment:
ZBX_MAXHOUSEKEEPERDELETE: 5000
ZBX_STARTPOLLERS: 15
ZBX_CACHESIZE: 8M
ZBX_STARTDBSYNCERS: 4
ZBX_HISTORYCACHESIZE: 16M
ZBX_TRENDCACHESIZE: 4M
ZBX_VALUECACHESIZE: 8M
ZBX_LOGSLOWQUERIES: 3000
Another way to work with zabbix:
https://hub.docker.com/r/monitoringartist/zabbix-3.0-xxl/

How can I change the UUID of my nixos OS partition, and update the bootloader?

So essentially I've got an exact clone of my partition (I've changed the UUID though), and I'd now like to change over the bootloader to load the new partition.
What I tried:
I naively (while booted / running on the original partition) tried to modify the hardware-configuration.nix (on the original partition) with the new UUID and then tried to:
sudo nixos-rebuild switch
sudo nixos-rebuild boot
Both which fails** at the point of mounting the drives (I think).
updating GRUB 2 menu...
lsblk: /dev/mapper/no*[0-9]: not a block device
lsblk: /dev/mapper/raid*[0-9]: not a block device
lsblk: /dev/mapper/disks*[0-9]: not a block device
Found Arch Linux on /dev/sdb3
Also, I'd assume I'd possibly need to mount this new partition somewhere (unless, this isn't required to actually boot into it (after a reboot)?).
** Actually although it appears to 'fail', when I reboot, and select the usual nixos grub entry, I see the following (the UUID mentioned is the UUID that does exist - and it's the new partition):
Worst case scenario, it seems I'd be able to use a nixos live USB to mount the new partition to /mnt and then just follow the usual nixos-install (which has worked in the past - with only the /etc/nixos directory present though)?
Firstly, get the system in working order again by changing the UUID back in hardware-configuration.nix and making sure it boots OK.
Next, change the UUID in hardware-configuration.nix, like you have done before, but this time run sudo nixos-rebuild boot.
When you reboot you'll have a new entry in your systemd-boot or GRUB2 menu. The new entry will boot NixOS from the new partition.
I tried using the nixos-install route.
I had issues with my existing hardware-configuration.nix it seems as I ran into the exact same issue waiting for device....
Finally I ran nixos-generate-config --root /mnt which generated a new config which had the following differences:
diff -u nixos.backup/hardware-configuration.nix /etc/nixos/hardware-configuration.nix
--- nixos.backup/hardware-configuration.nix 2018-11-22 20:18:01.361647120 +0000
+++ /etc/nixos/hardware-configuration.nix 2018-11-22 20:18:41.818644420 +0000
## -8,8 +8,8 ##
[ <nixpkgs/nixos/modules/installer/scan/not-detected.nix>
];
- boot.initrd.availableKernelModules = [ "xhci_pci" "ehci_pci" "ahci" "usb_storage" "sd_mod" "rtsx_pci_sdmmc" ];
- boot.kernelModules = [ "kvm-intel" ];
+ boot.initrd.availableKernelModules = [ "nvme" "xhci_pci" "ahci" "usb_storage" "usbhid" "sd_mod" ];
+ boot.kernelModules = [ "kvm-amd" ];
boot.extraModulePackages = [ ];
fileSystems."/" =
## -20,6 +20,4 ##
swapDevices = [ ];
nix.maxJobs = lib.mkDefault 4;
- powerManagement.cpuFreqGovernor = "powersave";
}
-
So probably the nvme bit. Also add that I had kvm-intel as my CPU stayed the same (which is an AMD).

Docker container with status "Dead" after consul healthcheck runs

I am using consul's healthcheck feature, and I keep getting these these "dead" containers:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
20fd397ba638 progrium/consul:latest "\"/bin/bash -c 'cur 15 minutes ago Dead
What is exactly a "Dead" container? When does a stopped container become "Dead"?
For the record, I run progrium/consul + gliderlabs/registrator images + SERVICE_XXXX_CHECK env variables to do health checking. It runs a healthcheck script running an image every X secs, something like docker run --rm my/img healthcheck.sh
I'm interested in general to what "dead" means and how to prevent it from happening. Another peculiar thing is that my dead containers have no name.
this is some info from the container inspection:
"State": {
"Dead": true,
"Error": "",
"ExitCode": 1,
"FinishedAt": "2015-05-30T19:00:01.814291614Z",
"OOMKilled": false,
"Paused": false,
"Pid": 0,
"Restarting": false,
"Running": false,
"StartedAt": "2015-05-30T18:59:51.739464262Z"
},
The strange thing is that only every now and then a container becomes dead and isn't removed.
Thank you
Edit:
Looking at the logs, I found what makes the container stop fail:
Handler for DELETE /containers/{name:.*} returned error: Cannot destroy container 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc:
Driver aufs failed to remove root filesystem 003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc:
rename /var/lib/docker/aufs/diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc
/var/lib/docker/aufs/ diff/003876e41429013e46187ebcf6acce1486bc5011435c610bd163b159ba550fbc-removing:
device or resource busy
Why does this happen?
edit2:
found this: https://github.com/docker/docker/issues/9665
Update March 2016: issue 9665 has just been closed by PR 21107 (for docker 1.11 possibly)
That should help avoid the "Driver aufs failed to remove root filesystem", "device or resource busy" problem.
Original answer May 2015
Dead is one if the container states, which is tested by Container.Start()
if container.removalInProgress || container.Dead {
return fmt.Errorf("Container is marked for removal and cannot be started.")
}
It is set Dead when stopping fails, in order to prevent that container to be restarting.
Amongst the possible cause of failure, see container.Kill().
It means kill -15 and kill -9 are both failing.
// 1. Send a SIGTERM
if err := container.killPossiblyDeadProcess(15); err != nil {
logrus.Infof("Failed to send SIGTERM to the process, force killing")
if err := container.killPossiblyDeadProcess(9); err != nil {
That usually mean, as the OP mention, a busy device or resource, preventing the process to be killed.
There are a lot of bugs caused by EBUSY, in particular when devicemapper is used.
There is a tracker bug for all of the EBUSY related issues.
see https://github.com/docker/docker/issues/5684#issuecomment-69052334

Resources