Rebuilding a container in Dokku causes an error - docker

So I'm trying to rebuild my container but it returns:
Error response from daemon: Minimum memory limit allowed is 4MB
After typing htop on my ubuntu server this is what Im receiving:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
X X.web.1 0.02% 695.2MiB / 11.73GiB 5.79% 246MB / 382MB 22.2MB / 39.2MB 271
While having this on a test server it is taking:
CONTAINER ID NAME CPU MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
X X.web 0.01% 125.9MiB / 7.796GiB 1.58% 1.12MB / 4.48MB 26.9MB / 393kB 257
Any idea what I should look for?

Did you set a resource limit via the resources plugin? If so, you may have it configured incorrectly. See http://dokku.viewdocs.io/dokku/advanced-usage/resource-management/ for usage docs on that module if that is the case.

dokku resource:limit-clear appname
worked for me

Related

uboot fails to execute load cmd from uboot.env

I am working with
U-boot v2021.10
BeagleBone Black rev C
I've created an uboot.env image with mkenvimage tool from file
loadfromsd=load mmc 0:1 0x82000000 /zImage; load mmc 0:1 0x88000000 /am335x-boneblack.dtb
set_bootargs=setenv bootargs console=ttyS0,115200n8 root=/dev/mmcblk0p2 rw rootfstype=ext4 rootwait
uenvcmd=setenv auotload no; run set_bootargs; run loadfromsd; printenv bootargs; bootz 0x82000000 - 0x88000000
The problem is in files loading to memory with load cmd in first line.
Full message from start is:
U-Boot SPL 2021.10 (Oct 14 2021 - 20:41:20 -0700)
Trying to boot from MMC1
U-Boot 2021.10 (Oct 14 2021 - 20:41:20 -0700)
CPU : AM335X-GP rev 2.1
Model: TI AM335x BeagleBone Black
DRAM: 512 MiB
ti_sysc target-module#9000: failed to get fck clock
WDT: Started with servicing (60s timeout)
NAND: nand_base: timeout while waiting for chip to become ready
nand_base: No NAND device found
0 MiB
MMC: ti_sysc target-module#7000: failed to get fck clock
OMAP SD/MMC: 0, OMAP SD/MMC: 1
Loading Environment from FAT... OK
<ethaddr> not set. Validating first E-fuse MAC
Net: eth2: ethernet#4a100000, eth3: usb_ether
=> run uenvcmd
4295456 bytes read in 282 ms (14.5 MiB/s)
'ailed to load '/am335x-boneblack.dtb
bootargs=console=ttyS0,115200n8 root=/dev/mmcblk0p2 rw rootfstype=ext4 rootwait
Kernel image # 0x82000000 [ 0x000000 - 0x418b20 ]
ERROR: Did not find a cmdline Flattened Device Tree
Could not find a valid device tree
Actual error is
=> run uenvcmd
4295456 bytes read in 282 ms (14.5 MiB/s)
'ailed to load '/am335x-boneblack.dtb
P.S. My u-boot fails to recognize ${} substitutions properly, and usage of
console=ttyS0,115200n8
bootpartition=mmcblk0p2
set_bootargs=setenv bootargs console=${console} root=/dev/${bootpartition} rw rootfstype=ext4 rootwait
caused and error
syntax error:
rootfstype=ext4 rootwait0n8
this 0n8 was appended after rootwait and shouldn't be there. So I've written this "straight" file without variables.
Thanks to sawdust for info that carriage return character matters and overrides first letter of error msg - I've got an idea that it also matters for path to file in load cmd, and it matters.
If I use space+\r, NOT just \r - everything works fine.

How to significantly reduce apache2 average process size?

Current config:
16GO RAM, 4 cpu cores, Apache/2.2 using prefork module (which is set at 700 maxClients, since avg process size ~18MB), with suexec and suphp mods enabled (PHP 5.5).
Back-end of site using CakePHP2 and storing on a MySQL server. The site consists of text / some compressed images in the front and data processing in the back.
I use this shell script to calculate my avg process size:
total_httpd_processes_size=`ps -ylC apache2 --sort:rss | awk '{ sum += $9 } END { print sum }'`
total_http_processes_count=`ps -ylC apache2 --sort:rss | wc -l`
echo "total_http_processes_count=$total_http_processes_count"
AVG_httpd_process_size=$(expr $total_httpd_processes_size / $total_http_processes_count)
httpd_process_size_MB=$(expr $AVG_httpd_process_size / 1024)
echo "httpd_process_size_MB=$total_httpd_process_size_MB"
By disabling unused apache mods, I won ~4mb which is pretty nice (from 22mb to 18mb).
What else can I do to significantly reduce the AVG process size ?

dask jobqueue worker failure at startup 'Resource temporarily unavailable'

I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my workers are forking into many other processes, but it's pretty difficult to track that. I can ssh into the node but I'm not seeing an abnormal number of processes, and each node has a 500gb ssd, so I shouldn't be writing excessively.
Everything below this is just information about my configurations and such
My setup is as follows:
cluster = SLURMCluster(cores=1, memory=f"{args.gbmem}GB", queue='fast_q', name=args.name,
env_extra=["source ~/.zshrc"])
cluster.adapt(minimum=1, maximum=200)
client = await Client(cluster, processes=False, asynchronous=True)
I suppose i'm not even sure if processes=False should be set.
I run this starter script via sbatch under the conditions of 4gb of memory, 2 cores (-c) (even though i expect to only need 1) and 1 task (-n). And this sets off all of my jobs via the slurmcluster config from above. I dumped my slurm submission scripts to files and they look reasonable.
Each job is not complex, it is a subprocess.call( command to a compiled executable that takes 1 core and 2-4 GB of memory. I require the client call and further calls to be asynchronous because I have a lot of conditional computations. So each worker when loaded should consist of 1 python processes, 1 running executable, and 1 shell.
Imposed by the scheduler we have
>> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 512
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 1031203
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
And each node has 64 cores. so I don't really think i'm hitting any limits.
i'm using the jobqueue.yaml file that looks like:
slurm:
name: dask-worker
cores: 1 # Total number of cores per job
memory: 2 # Total amount of memory per job
processes: 1 # Number of Python processes per job
local-directory: /scratch # Location of fast local storage like /scratch or $TMPDIR
queue: fast_q
walltime: '24:00:00'
log-directory: /home/dbun/slurm_logs
I would appreciate any advice at all! Full log is below.
FORK BLOCKING IO ERROR
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.131.82:13687'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dbun/.local/share/pyenv/versions/3.7.0/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.dask_worker - INFO - End worker
Aborted!
CANT START NEW THREAD ERROR
https://pastebin.com/ibYUNcqD
BLOCKING IO ERROR
https://pastebin.com/FGfxqZEk
EDIT:
Another piece of the puzzle:
It looks like dask_worker is running multiple multiprocessing.forkserver calls? does that sound reasonable?
https://pastebin.com/r2pTQUS4
This problem was caused by having ulimit -u too low.
As it turns out each worker has a few processes associated with it, and the python ones have multiple threads. In the end you end up with approximately 14 threads that contribute to your ulimit -u. Mine was set to 512, and with a 64 core system I was likely hitting ~896. It looks like the a maximum threads per a process I could have had would have been 8.
Solution:
in .zshrc (.bashrc) I added the line
ulimit -u unlimited
Haven't had any problems since.

Why does docker see the container is hitting the rss limit?

I'm trying to understand why the limits have decided a task needs to be killed, and how it's doing the accounting. When my GCE Docker container kills a process, it shows something like:
Task in /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e killed as a result of limit of /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e
memory: usage 2097152kB, limit 2097152kB, failcnt 74571
memory+swap: usage 0kB, limit 18014398509481983kB, failcnt 0
kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
Memory cgroup stats for /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e: cache:368KB rss:2096784KB rss_huge:0KB mapped_file:0KB writeback:0KB inactive_anon:16KB active_anon:2097040KB inactive_file:60KB active_file:36KB unevictable:0KB
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 4343] 0 4343 5440 65 15 0 0 bash
[ 4421] 0 4421 265895 6702 77 0 0 npm
[ 4422] 0 4422 12446 2988 28 0 0 gunicorn
[ 4557] 0 4557 739241 346035 1048 0 0 gunicorn
[ 4560] 0 4560 1086 24 8 0 0 sh
[ 4561] 0 4561 5466 103 15 0 0 bash
[14594] 0 14594 387558 168790 672 0 0 node
Memory cgroup out of memory: Kill process 4557 (gunicorn) score 662 or sacrifice child
Killed process 4557 (gunicorn) total-vm:2956964kB, anon-rss:1384140kB, file-rss:0kB
Supposedly the memory hit a 2GB usage limit, and something needs to die. According to the cgroup stats, I appear to have 2GB of usage in active_anon and rss.
When I look at the table of process stats, I don't see where the 2GB is:
For rss, I see the two major processes 346035 + 168790 = 514MB?
For total_vm, I see three major processes 265895 + 739241 + 387558 = 1.4GB?
But when it decides to kill the gunicorn process, it says it had 3GB of Total VM and 1.4GB of Anon RSS. I don't see how this follows from the above numbers at all...
For most of it's life, according to top, the gunicorn process appears to hum along with 555m RES and 2131m VIRT and 22% MEM * 2.5GB box = 550MB of memory usage. (I haven't yet been able to time it properly to peek at top values at the time it dies...)
Can someone help me understand this?
Under what accounting, do these sum to 2GB of usage? (virtual? rss? something else?)
Is there something else besides top/ps I should use to track how much memory a process is using for the purposes of docker's killing it?
From what I know, the total_vm and rss are counted in 4kB (refer to: https://stackoverflow.com/a/43611576), instead of kB.
So for pid<4557>:
rss=346035, means anon-rss:1384140kB (=346035*4kB)
total_vm=739241, means total-vm:2956964kB(=739241*4kB)
This will explain your mem usage very well.

PowerShell remote invocation mysteriously hangs

I have created a series of functions that basically collect all the IIS configurations about a site, when run on a server locally it executes without issue (albeit slowly) however when I run them remotely using an invoke-command in PowerShell 2 it runs through and mysteriously stops approximately 15-20 seconds into the process. It generally stalls on the same request but not always. The same commands executed locally work without any issues. No exception is raised, it just hangs indefinitely.
I can post the code if necessary however it is several hundred lines so I'm more looking for guidance on how to investigate a problem like this or if anyone has encountered something similar.
Comparing IISConfig between [targetserver] and localhost.
Checking Installed IIS version on [targetserver]:
IIS major version : 7
IIS minor version : 5
IIS7+ detected, using WebAdmin module and IIS metabase
Name Value
---- -----
name Default Web Site
id 1
serverAutoStart True
state 1
Site Configuration:
Name Path PSPath Handlers_Ac Access_sslF Asp_AppAllo Asp_AppAllo Asp_limits_ Asp_EnableP Asp_limits_
cessFlags lags wClientDebu wDebugging bufferingLi arentPaths queueTimeou
g mit t
---- ---- ------ ----------- ----------- ----------- ----------- ----------- ----------- -----------
Default ... IIS:Site... WebAdmin... Read,Script False False 25000000 True 00:00:00
WebApp VDir: /MyApp, App Pool: MyApp
App pool Configuration:
AppPoolID Enable32Bit managedPipe managedRunt AppPoolName AppPoolAuto processMode processMode processMode recycling_l
AppOnWin64 lineMode imeVersion Start l_idleTimeo l_identityT l_UserName ogEventOnRe
ut ype cycle
--------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
False Classic v2.0 MyApp True 00:20:00 LocalSer... Time,Req...
Analyzing web directories for /MyApp, this could take a while....
Initial Collection Completed, found 141... took 0.9516122 seconds
0 C:\inetpub\wwwroot\MyApp\Core
1 C:\inetpub\wwwroot\MyApp\Core\AdminTools
2 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Cache
3 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Extra
4 C:\inetpub\wwwroot\MyApp\Core\AdminTools\HTTPPostTest
5 C:\inetpub\wwwroot\MyApp\Core\AdminTools\IISAdmin
6 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Profiling
7 C:\inetpub\wwwroot\MyApp\Core\AdminTools\RecordTestData
8 C:\inetpub\wwwroot\MyApp\Core\AdminTools\ScrambleTest
9 C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
Analyzed 10 so far... took 6.7236862 seconds, remaining time 88.08028922 seconds
Current Folder: C:\inetpub\wwwroot\MyApp\Core\AdminTools\Sessions
10 C:\inetpub\wwwroot\MyApp\Core\AdminTools\SoapTest
11 C:\inetpub\wwwroot\MyApp\Core\AdminTools\StaticContent
Sometimes it makes it to 15 or so. I tried from my laptop and from one server to another and the behavior is the same.
Here is the loop which is hanging:
$start = [System.DateTime]::Now
$numanalyzed = 0
if ($true) #skip to test
{
# loop through all physical folders as it is much faster
foreach ($folder in $folders)
{
write-host $numanalyzed $folder.fullname
#figure out the virtual path to the folder
$iis7vwebfolderpath = $folder.FullName.Replace($iis7webapp.PhysicalPath, $iis7VDirWebApppath)
#Get-item $iis7vwebfolderpath | gm
$iis7VWebDirConfigItem = Get-LNOSIIS7ConfigForPSPath -PSPath $iis7vwebfolderpath
# add new item to list
$iis7VWebDirConfig += $iis7VWebDirConfigItem
# increment counter and report out progress every 10
$numAnalyzed++
if ($numanalyzed % 10 -eq 0)
{
$end = [System.DateTime]::Now
$timeSoFar = (NEW-TIMESPAN –Start $Start –End $End).TotalSeconds
$timeremaining = ($folders.Count - $numAnalyzed) * ($timeSoFar / $numanalyzed)
"Analyzed {0} so far... took {1} seconds, remaining time {2} seconds" -f $numanalyzed,$timeSoFar,$timeremaining | write-host
"Current Folder: {0}" -f $folder.FullName | Write-Host
}
}
}
$end = [System.DateTime]::Now
"Processed web dirs: {0} took {1} seconds" -f $iis7VWebDirConfig.Count,(NEW-TIMESPAN –Start $Start –End $End).TotalSeconds | write-host | Write-Host
The function I'm having performance problems with and I've got a separate question about but this post has the source code for the function:
web-administration vs WMI to query web directory properties performance problems
In my case, it seemed my PowerShell call froze due to the Idle-Timeout expiration (the call runs for a very long time).
Setting IdleTimeout value to a sufficiently long duration fixed my issue.
Once again, query the current configuration using
winrm get winrm/config/winrs
And set the timeout using
winrm set winrm/config/winrs '#{IdleTimeout="18000000"}'
I think i may have discovered the problem, i started getting some odd failures in other parts of the script:
[SEVERNAME] Processing data from remote server SERVERNAME failed with the following error message: The WSMan provider host process did not return a proper response. A provider in the host process may have behaved improperly. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OpenError: (SERVERNAME:String) [], PSRemotingTransportException
+ FullyQualifiedErrorId : 1726,PSSessionStateBroken
and
Processing data for a remote command failed with the following error message: Not enough storage is available to complete this operation. For more information, see the about_Remote_Troubleshooting Help topic.
+ CategoryInfo : OperationStopped (System.Manageme...pressionSyncJob:PSInvokeExpressionSyncJob) [], PSRemotingTransportException
+ FullyQualifiedErrorId : JobFailure
This lead me to the following site: http://www.gsx.com/blog/bid/83018/Troubleshooting-unknown-PowerShell-error-messages
The following recommendations seems to have cleared up most of the problems although i still have some testing to do.
Excerpt from site below:
As the first error message specifies, an overflow of memory in the remote session has occurred. Open a PowerShell prompt on the remote server and display the configuration of winrs using:
winrm get winrm/config/winrs
Check the "MaxMemoryPerShellMB" value. It is set by default to 150 MB on Windows Server 2008 R2 and Windows 7. This is something that Microsoft changed in Windows Server 2012 and Windows 8 to 1024 MB.
In order to resolve this issue, you need to increase the value to at least 512 MB with the following command:
winrm set winrm/config/winrs `#`{MaxMemoryPerShellMB=`"512`"`}
As an FYI if Invoke-Command always hangs:
Try a simple command to system :
Invoke-Command -ComputerName XXXXX -ScriptBlock { Get-ItemProperty -Path HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion }
Start the Windows Remote Management Service (on that system)
Check for the listening port:
netstat -aon | findstr "5985"
TCP 0.0.0.0:5985 0.0.0.0:0 LISTENING 4
TCP [::]:5985 [::]:0 LISTENING 4

Resources