After migrating a informix binaries to new server by os level cloning there was a warning when I execute the oninit -vy like could not open some chunks. then I ask system administrator link those missing chunks and again executed the oninit -vy again it prompt a warning with mentioning those chunks are bad chunks. what is the reason behind that. Is there any mistake happen when chunks re-configuring to new server
nwnhost#nwn$oninit -vy
Reading configuration file '/informix/strim/inf11/etc/onconfig'...succeeded
Creating /INFORMIXTMP/.infxdirs...succeeded
Checking config parameters...succeeded
Allocating and attaching to shared memory...succeeded
Creating resident pool 1629910 kbytes...succeeded
Allocating 6606044 kbytes for buffer pool of 2K page size...succeeded
Allocating 19267600 kbytes for buffer pool of 8K page size...succeeded
Creating infos file "/informix/strim/inf11/etc/.infos.ocs_test"...succeeded
Linking conf file "/informix/strim/inf11/etc/.conf.ocs_test"...succeeded
Initializing rhead structure...succeeded
Writing to infos file...succeeded
Initialization of Encryption...succeeded
Initializing ASF...succeeded
Initializing Dictionary Cache and SPL Routine Cache...succeeded
Bringing up ADM VP...succeeded
Creating VP classes...succeeded
Forking main_loop thread...succeeded
Initializing DR structures...succeeded
Forking 1 'soctcp' listener threads...succeeded
Starting tracing...succeeded
Initializing 128 flushers...succeeded
Initializing SDS Server network connections...succeeded
Initializing log/checkpoint information...succeeded
Initializing dbspaces...succeeded
Opening primary chunks...Bad Primary Chunk '/dev/chunk1186'.
Bad Primary Chunk '/dev/chunk1188'.
Bad Primary Chunk '/dev/chunk1265'.
Bad Primary Chunk '/dev/chunk1279'.
Bad Primary Chunk '/dev/chunk1317'.
Bad Primary Chunk '/dev/chunk1319'.
Bad Primary Chunk '/dev/chunk1320'.
succeeded
Validating chunks...succeeded
Initialize Async Log Flusher...succeeded
Starting B-tree Scanner...succeeded
Init ReadAhead Daemon...succeeded
Initializing DBSPACETEMP list...succeeded
Checking database partition index...succeeded
Initializing dataskip structure...succeeded
Checking for temporary tables to drop...succeeded
Updating Global Row Counter...succeeded
Forking onmode_mon thread...succeeded
Creating periodic thread...succeeded
Creating periodic thread...succeeded
Starting scheduling system...succeeded
Verbose output complete: mode = 5
here is the onstat -d output for those chunks
nwnhost#nwn$onstat -d | egrep 'chunk1188|chunk1186|chunk1265|chunk1279|chunk1317|chunk1319|chunk1320'
7be211028 1252 36 48 2097125 0 PD-B-- /dev/chunk1186
7be211428 1254 36 48 2097125 0 PD-B-- /dev/chunk1188
7be22d028 1331 37 48 2097139 0 PD-B-- /dev/chunk1265
7be22fc28 1345 38 48 2097000 0 PD-B-- /dev/chunk1279
7be241228 1383 48 48 2097139 0 PD-B-- /dev/chunk1317
7be241628 1385 38 48 2097139 0 PD-B-- /dev/chunk1319
7be241828 1386 37 48 2097000 0 PD-B-- /dev/chunk1320
nwnhost#nwn$
I could resolve above error by opening those chunks by below command.
onspaces -s [dbspace_name] -p [pathname] -o [offset] -O
eg :-
onspaces -s dbspace1 -p /dev/chunk1186 -o 96 -O
Related
I'm currently reading Master Embedded Linux Programming and I'm on the chapter where it goes into bootloaders, more specifically U-Boot for the Beaglebone Black.
I have built a crosscompiler and I'm able to build U-Boot, however I can't make it run the way it is described in the book.
After some experimentation and Google'ing, I can make it work by writing MLO and u-boot.img in raw mode (using these command)
However, if I put the files in a FAT32 MBR boot partition, the Beaglebone will not boot, it will only show a string of C's, which indicate that it is trying to get its bootloader from the serial interface and it has decided it cannot boot from SD card.
I have also studied this answer. According to that answer I should be doing everything correctly. I've tried to experiment with the MMC raw mode options in the U-Boot build configuration, but I've not been able to find a change that works.
I feel like there must be something obvious I'm missing, but I can't figure it out. Are there any things I can try to debug this further?
Update: some more details on the partition tables.
When using the "raw way" of putting LBO and u-boot.img on the SD cards, I have not created any partitions at all. This works:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
When trying to use a boot partition, that does not work, I have this configuration:
$ sudo sfdisk /dev/sda -l
Disk /dev/sda: 117,75 GiB, 126437294080 bytes, 246947840 sectors
Disk model: MassStorageClass
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x3d985ec3
Device Boot Start End Sectors Size Id Type
/dev/sda1 * 2048 133119 131072 64M c W95 FAT32 (LBA)
Update 2: The contents of the boot partition is the exact same 2 files that I use for the raw writes, so they are confirmed to work:
$ ls -al
total 1000
drwxr-xr-x 2 peter peter 16384 Jan 1 1970 .
drwxr-x---+ 3 root root 4096 Jul 18 08:44 ..
-rw-r--r-- 1 peter peter 108184 Jul 14 13:56 MLO
-rw-r--r-- 1 peter peter 893144 Jul 14 13:56 u-boot.img
Update 3: I have already tried the following U-Boot options to try it go get to work (in the SPL / TPL menu):
"Support FAT filesystems" This is enabled by default. I can't really find a good reference for the U-Boot options, but I am guessing this is what enables booting from a FAT partition (which is what I'm trying to do)
"MCC raw mode: by sector" I have disabled this. As expected, this indeed breaks the booting in raw mode, which is the only thing I got working up till now.
"MCC raw mode: by partition". I have tried to enable this and using partition 1 to load U-Boot from. I'm not sure how to understand this option. I assume raw mode does not require partitions, but this asks for what partition to use...
In general, if any one can point me to a U-Boot configuration reference, that would already by very helpful. Right now, I'm just randomly turning things on and off that sound like they may help.
I'm running dask over slurm via jobqueue and I have been getting 3 errors pretty consistently...
Basically my question is what could be causing these failures? At first glance the problem is that too many workers are writing to disk at once, or my workers are forking into many other processes, but it's pretty difficult to track that. I can ssh into the node but I'm not seeing an abnormal number of processes, and each node has a 500gb ssd, so I shouldn't be writing excessively.
Everything below this is just information about my configurations and such
My setup is as follows:
cluster = SLURMCluster(cores=1, memory=f"{args.gbmem}GB", queue='fast_q', name=args.name,
env_extra=["source ~/.zshrc"])
cluster.adapt(minimum=1, maximum=200)
client = await Client(cluster, processes=False, asynchronous=True)
I suppose i'm not even sure if processes=False should be set.
I run this starter script via sbatch under the conditions of 4gb of memory, 2 cores (-c) (even though i expect to only need 1) and 1 task (-n). And this sets off all of my jobs via the slurmcluster config from above. I dumped my slurm submission scripts to files and they look reasonable.
Each job is not complex, it is a subprocess.call( command to a compiled executable that takes 1 core and 2-4 GB of memory. I require the client call and further calls to be asynchronous because I have a lot of conditional computations. So each worker when loaded should consist of 1 python processes, 1 running executable, and 1 shell.
Imposed by the scheduler we have
>> ulimit -a
-t: cpu time (seconds) unlimited
-f: file size (blocks) unlimited
-d: data seg size (kbytes) unlimited
-s: stack size (kbytes) 8192
-c: core file size (blocks) 0
-m: resident set size (kbytes) unlimited
-u: processes 512
-n: file descriptors 1024
-l: locked-in-memory size (kbytes) 64
-v: address space (kbytes) unlimited
-x: file locks unlimited
-i: pending signals 1031203
-q: bytes in POSIX msg queues 819200
-e: max nice 0
-r: max rt priority 0
-N 15: unlimited
And each node has 64 cores. so I don't really think i'm hitting any limits.
i'm using the jobqueue.yaml file that looks like:
slurm:
name: dask-worker
cores: 1 # Total number of cores per job
memory: 2 # Total amount of memory per job
processes: 1 # Number of Python processes per job
local-directory: /scratch # Location of fast local storage like /scratch or $TMPDIR
queue: fast_q
walltime: '24:00:00'
log-directory: /home/dbun/slurm_logs
I would appreciate any advice at all! Full log is below.
FORK BLOCKING IO ERROR
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.131.82:13687'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dbun/.local/share/pyenv/versions/3.7.0/lib/python3.7/multiprocessing/forkserver.py", line 250, in main
pid = os.fork()
BlockingIOError: [Errno 11] Resource temporarily unavailable
distributed.dask_worker - INFO - End worker
Aborted!
CANT START NEW THREAD ERROR
https://pastebin.com/ibYUNcqD
BLOCKING IO ERROR
https://pastebin.com/FGfxqZEk
EDIT:
Another piece of the puzzle:
It looks like dask_worker is running multiple multiprocessing.forkserver calls? does that sound reasonable?
https://pastebin.com/r2pTQUS4
This problem was caused by having ulimit -u too low.
As it turns out each worker has a few processes associated with it, and the python ones have multiple threads. In the end you end up with approximately 14 threads that contribute to your ulimit -u. Mine was set to 512, and with a 64 core system I was likely hitting ~896. It looks like the a maximum threads per a process I could have had would have been 8.
Solution:
in .zshrc (.bashrc) I added the line
ulimit -u unlimited
Haven't had any problems since.
All the commands below are ran under the root user. In order to find out the PID of Jenkins, I ran the command like this.
#ps aux | grep jenkins
and with the PID I ran another one, which is
#pmap -x [PID]
Here's the result I got from the command.
Address Kbytes RSS Dirty Mode Mapping
0000000000400000 4 0 0 r-x-- java
0000000000600000 4 4 4 r---- java
0000000000601000 4 4 4 rw--- java
0000000000b3e000 312 216 216 rw--- [ anon ]
...
00007ffc29848000 1156 32 32 rw--- [ stack ]
00007ffc29976000 8 4 0 r-x-- [ anon ]
ffffffffff600000 4 0 0 r-x-- [ anon ]
---------------- ------- ------- -------
total kB 10027288 1172504 1163812
So, Jenkins seems to be taking approximately 9.6 gigabytes. Currently there are around 35 items added in Jenkins, and only 8 out of them are built periodically on a daily basis. I do believe that there should not be any reason for Jenkins to consume this huge memory, so I now have the following 3 doubts:
That I figured out the memory usage in a wrong way (the pmap command did not deliver the right figure),
or there is really a problem with the Jenkins configuration
or it is just natural to consume this amount with that number of items
Any Jenkins experts out there? I do need your help.
I'm not a Jenkins expert, but I have some knowledge for Linux memory management and Java applications.
You said Jenkins seems to be taking approximately 9.6 gigabytes., it's not correct an aspect of memory consumption.
The 9.6GiB( Check the your jenkin's java heap memory option ) memory is virtual memory that just was estimated from OS, RSS(Resident Set Size) is real memory usage.
So my answer is similar with it, it is just natural to consume this amount with that number of items.
I hope this will help you.
I'm trying to understand why the limits have decided a task needs to be killed, and how it's doing the accounting. When my GCE Docker container kills a process, it shows something like:
Task in /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e killed as a result of limit of /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e
memory: usage 2097152kB, limit 2097152kB, failcnt 74571
memory+swap: usage 0kB, limit 18014398509481983kB, failcnt 0
kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
Memory cgroup stats for /404daacfcf6b9e55f71b3d7cac358f0dc921a2d580eed460c2826aea8e43f05e: cache:368KB rss:2096784KB rss_huge:0KB mapped_file:0KB writeback:0KB inactive_anon:16KB active_anon:2097040KB inactive_file:60KB active_file:36KB unevictable:0KB
[ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[ 4343] 0 4343 5440 65 15 0 0 bash
[ 4421] 0 4421 265895 6702 77 0 0 npm
[ 4422] 0 4422 12446 2988 28 0 0 gunicorn
[ 4557] 0 4557 739241 346035 1048 0 0 gunicorn
[ 4560] 0 4560 1086 24 8 0 0 sh
[ 4561] 0 4561 5466 103 15 0 0 bash
[14594] 0 14594 387558 168790 672 0 0 node
Memory cgroup out of memory: Kill process 4557 (gunicorn) score 662 or sacrifice child
Killed process 4557 (gunicorn) total-vm:2956964kB, anon-rss:1384140kB, file-rss:0kB
Supposedly the memory hit a 2GB usage limit, and something needs to die. According to the cgroup stats, I appear to have 2GB of usage in active_anon and rss.
When I look at the table of process stats, I don't see where the 2GB is:
For rss, I see the two major processes 346035 + 168790 = 514MB?
For total_vm, I see three major processes 265895 + 739241 + 387558 = 1.4GB?
But when it decides to kill the gunicorn process, it says it had 3GB of Total VM and 1.4GB of Anon RSS. I don't see how this follows from the above numbers at all...
For most of it's life, according to top, the gunicorn process appears to hum along with 555m RES and 2131m VIRT and 22% MEM * 2.5GB box = 550MB of memory usage. (I haven't yet been able to time it properly to peek at top values at the time it dies...)
Can someone help me understand this?
Under what accounting, do these sum to 2GB of usage? (virtual? rss? something else?)
Is there something else besides top/ps I should use to track how much memory a process is using for the purposes of docker's killing it?
From what I know, the total_vm and rss are counted in 4kB (refer to: https://stackoverflow.com/a/43611576), instead of kB.
So for pid<4557>:
rss=346035, means anon-rss:1384140kB (=346035*4kB)
total_vm=739241, means total-vm:2956964kB(=739241*4kB)
This will explain your mem usage very well.
I've made a simple Lua-execution in Ada, it cooly run everything I need via Lua.Load_Buffer("os.execute('wxlua.exe wx.lua')") but I don't need win32 cmd.exe-window, which is default opens on program's start-up. Is there any way to control this event directly from Ada?
P.S. os.execute() sends commands directly to Windows, not via cmd.exe.
To help see what's going on, try creating a desktop shortcut that executes that command.
If double-clicking on it also brings up the win32 cmd.exe window, then that means they compiled wxlua.exe to be a windows console app. You can try reading their docs or poking around in their exe directory to see if there's a Win32 version instead. If that doesn't do it for you, your best remaining option I know of is to call Win32's CreateProcess with the CREATE_NO_WINDOW or DETACHED_PROCESS flag set instead of using os.execute.
If the shortcut doesn't bring up a console window, that means os.execute is doing it to you. In that case, you'll have to go directly to the CreateProcess solution mentioned above.
Gnat should come with the Win32 bindings you need to call CreateProcess. If you'd like a cleaner interface to that routine, I have some thicker bindings I created to some of the Win32 calls, including that one and some other calls associated with creating Win32 services. I released them to the Public Domain (or tried to...) They are available in the source distribution of my old SETI#Home Service, and I believe they are somewhere in the examples directory of AWS (at least it used to be before ACT took it over).
Standard Lua os.execute calls the standard C function system(). The C standard only requires that system() do something platform dependent, and specifically avoids saying what that might be. On Unix-like platforms, system() usually invokes the shell /bin/sh. On Windows it usually invokes cmd.exe.
Of course, its exact implementation could depend on exactly which toolchain you are using. You haven't said where you got your Lua (or for that matter, who's Ada compiler you have). One likely and recommended source is Lua for Windows, which is linked to the C runtime library from VC8, aka VS2005. Quoting from VS2005's documentation:
int system(const char *command);
.... The system function passes command to
the command interpreter, which
executes the string as an
operating-system command. system
refers to the COMSPEC and PATH
environment variables that locate the
command-interpreter file (the file
named CMD.EXE in Windows NT and
later). If command is NULL, the
function simply checks to see whether
the command interpreter exists.
Since os.execute calls system which invokes cmd.exe, you will get the console window. The way to avoid that is to not use os.execute.
If the program being executed is compiled for Windows Console Mode, then you will get a console window, regardless. However, wxlua.exe is almost certainly not compiled as a console application since it is intended to host GUI applications written in Lua based on the wxWidgets library.
Edit:
Naturally, if your lua.exe is built in a way that replaces either the implementation of os.execute or the standard library routine system() with a different implementation then you might see different results.
To demonstrate that standard Lua's os.execute() eventually invokes cmd.exe, try the following:
C:\Documents and Settings\Ross>lua -e "os.execute[[pause]]"
Press any key to continue . . .
C:\Documents and Settings\Ross>wlua -e "os.execute[[pause]]"
C:\Documents and Settings\Ross>
This invokes a simple Lua script from the standard Lua interpreter, first the usual one that is a console application, and second from the variation supplied with Lua for Windows who's only difference is that it is linked to be a Windows GUI application.
Both cause the "Press any key" message to appear. The first in the same console window where I invoked Lua, and the second in a separate console.
Before hitting a key, I used PsList from Sysinternals in a separate console to show the process tree, with the command pslist -t. I've excerpted out only the relevant bits below:
C:\Documents and Settings\Ross>pslist -t
pslist v1.29 - Sysinternals PsList
Copyright (C) 2000-2009 Mark Russinovich
Sysinternals
Process information for LAMPWORK:
Name Pid Pri Thd Hnd VM WS Priv
Idle 0 0 2 0 0 16 0
System 4 8 79 1208 1884 220 0
...
explorer 3592 8 17 1263 115968 33964 25816
...
cmd 4300 8 1 96 35032 4448 2260
PsList 4688 13 2 109 29556 2776 1248
cmd 5208 8 1 33 30340 2704 1984
lua 5592 8 1 17 8528 1564 400
cmd 5680 8 1 30 30144 2428 1956
Notice that the instance of CMD that invoked Lua has a CMD as a child.
Repeating the experiment with wlua -e "os.execute[[pause]]" and pslist -t again:
C:\Documents and Settings\Ross>pslist -t
....
Name Pid Pri Thd Hnd VM WS Priv
Idle 0 0 2 0 0 16 0
...
explorer 3592 8 16 1251 115712 33956 25752
...
cmd 4300 8 1 96 35032 4448 2260
PsList 4888 13 2 109 29556 2780 1248
procexp 4800 13 7 328 108492 33232 29464
cmd 5208 8 1 32 30340 2704 1984
wlua 3272 8 1 15 8536 1576 400
cmd 5104 8 1 30 30144 2440 1956
Again, wlua has a CMD as a child.
Using ProcessExplorer, also from Sysinternals, I can see the command line of the child CMD process. It is CMD.EXE /C pause. In effect, system() prepends CMD /C to its argument and passes the result to spawn() for execution in a child process.