Increase Training Iterations in StyleGAN - machine-learning

I’m a complete noob trying to learn/play with StyleGAN.
I’m using StyleGAN to train a model with an image set that I built, but the output image quality is poor. The training stops after 10 ticks.
How can I change the default parameters in the code to continue training past 10 ticks?
Do I need to change the kimg or fid50k settings? Are the settings in this block of the train.py file? or the metrics file?
{ 'D': {'func_name': 'training.networks_stylegan.D_basic'},
'D_loss': {'func_name': 'training.loss.D_logistic_simplegp', 'r1_gamma': 10.0},
'D_opt': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
'G': {'func_name': 'training.networks_stylegan.G_style'},
'G_loss': {'func_name': 'training.loss.G_logistic_nonsaturating'},
'G_opt': {'beta1': 0.0, 'beta2': 0.99, 'epsilon': 1e-08},
'dataset': {'resolution': 64, 'tfrecord_dir': 'fab_dataset'},
'desc': 'sgan-fab_dataset-1gpu',
'grid': {'layout': 'random', 'size': '4k'},
'metrics': [{'func_name': 'metrics.frechet_inception_distance.FID', 'minibatch_per_gpu': 8, 'name': 'fid50k', 'num_images': 50000}],
'sched': { 'D_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
'G_lrate_dict': {128: 0.0015, 256: 0.002, 512: 0.003, 1024: 0.003},
'lod_initial_resolution': 8,
'minibatch_base': 4,
'minibatch_dict': {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}},
'tf_config': {'rnd.np_random_seed': 1000},
'train': {'mirror_augment': False, 'run_func_name': 'training.training_loop.training_loop', 'total_kimg': 25000}}
Training output log:
tick 1 kimg 140.3 lod 4.00 minibatch 128 time 8m 38s sec/tick 461.9 sec/kimg 3.29 maintenance 56.1 gpumem 3.0
network-snapshot-000140 time 13m 11s fid50k 353.7462
tick 2 kimg 280.6 lod 4.00 minibatch 128 time 29m 42s sec/tick 453.7 sec/kimg 3.23 maintenance 809.9 gpumem 3.4
tick 3 kimg 420.9 lod 4.00 minibatch 128 time 37m 16s sec/tick 453.8 sec/kimg 3.23 maintenance 0.9 gpumem 3.4
tick 4 kimg 561.2 lod 4.00 minibatch 128 time 44m 52s sec/tick 454.4 sec/kimg 3.24 maintenance 0.9 gpumem 3.4
tick 5 kimg 681.5 lod 3.87 minibatch 128 time 1h 01m 06s sec/tick 973.4 sec/kimg 8.09 maintenance 0.9 gpumem 4.5
tick 6 kimg 801.8 lod 3.66 minibatch 128 time 1h 21m 43s sec/tick 1235.8 sec/kimg 10.27 maintenance 1.6 gpumem 4.5
tick 7 kimg 922.1 lod 3.46 minibatch 128 time 1h 42m 20s sec/tick 1235.3 sec/kimg 10.27 maintenance 1.2 gpumem 4.5
tick 8 kimg 1042.4 lod 3.26 minibatch 128 time 2h 02m 57s sec/tick 1235.9 sec/kimg 10.27 maintenance 1.2 gpumem 4.5
tick 9 kimg 1162.8 lod 3.06 minibatch 128 time 2h 23m 34s sec/tick 1236.4 sec/kimg 10.28 maintenance 1.2 gpumem 4.5
tick 10 kimg 1283.1 lod 3.00 minibatch 128 time 2h 43m 58s sec/tick 1222.6 sec/kimg 10.16 maintenance 1.2 gpumem 4.5
network-snapshot-001283 time 14m 20s fid50k 349.5387
I can post the complete train.py file if required but it is standard version available on git.
https://github.com/NVlabs/stylegan/blob/master/train.py

Related

Why the same tasks cost differerent CPU on linux kernel 4.9 and 5.4?

My application is a compute intensive task(I.e. video encoding). When it is running on linux kernel 4.9(Ubuntu 16.04), the cpu usage is 3300%. But when it is running on linux kernel 5.4(Ubuntu 20.04), the cpu Usage is just 2850%. Promise the processes do the same job.
So I wonder if linux kernel had done some cpu scheduling optimization or related work between 4.9 and 5.4? Could you give any advice to investigate the reason?
I am not sure if the version of glic has effect or not, for your information, the version of glic is 2.23 on linux kernel 4.9 while 2.31 on linux kernel 5.4.
CPU Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 40
On-line CPU(s) list: 0-39
Thread(s) per core: 2
Core(s) per socket: 10
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Silver 4210 CPU # 2.20GHz
Stepping: 7
CPU MHz: 2200.000
BogoMIPS: 4401.69
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
L3 cache: 14080K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
Output of perf stat on Linux Kernel 4.9
Performance counter stats for process id '32504':
3146297.833447 cpu-clock (msec) # 32.906 CPUs utilized
1,718,778 context-switches # 0.546 K/sec
574,717 cpu-migrations # 0.183 K/sec
2,796,706 page-faults # 0.889 K/sec
6,193,409,215,015 cycles # 1.968 GHz (30.76%)
6,948,575,328,419 instructions # 1.12 insn per cycle (38.47%)
540,538,530,660 branches # 171.801 M/sec (38.47%)
33,087,740,169 branch-misses # 6.12% of all branches (38.50%)
1,966,141,393,632 L1-dcache-loads # 624.906 M/sec (38.49%)
184,477,765,497 L1-dcache-load-misses # 9.38% of all L1-dcache hits (38.47%)
8,324,742,443 LLC-loads # 2.646 M/sec (30.78%)
3,835,471,095 LLC-load-misses # 92.15% of all LL-cache hits (30.76%)
<not supported> L1-icache-loads
187,604,831,388 L1-icache-load-misses (30.78%)
1,965,198,121,190 dTLB-loads # 624.607 M/sec (30.81%)
438,496,889 dTLB-load-misses # 0.02% of all dTLB cache hits (30.79%)
7,139,892,384 iTLB-loads # 2.269 M/sec (30.79%)
260,660,265 iTLB-load-misses # 3.65% of all iTLB cache hits (30.77%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
95.615072142 seconds time elapsed
Output of perf stat on Linux Kernel 5.4
Performance counter stats for process id '3355137':
2,718,192.32 msec cpu-clock # 29.184 CPUs utilized
1,719,910 context-switches # 0.633 K/sec
448,685 cpu-migrations # 0.165 K/sec
3,884,586 page-faults # 0.001 M/sec
5,927,930,305,757 cycles # 2.181 GHz (30.77%)
6,848,723,995,972 instructions # 1.16 insn per cycle (38.47%)
536,856,379,853 branches # 197.505 M/sec (38.47%)
32,245,288,271 branch-misses # 6.01% of all branches (38.48%)
1,935,640,517,821 L1-dcache-loads # 712.106 M/sec (38.47%)
177,978,528,204 L1-dcache-load-misses # 9.19% of all L1-dcache hits (38.49%)
8,119,842,688 LLC-loads # 2.987 M/sec (30.77%)
3,625,986,107 LLC-load-misses # 44.66% of all LL-cache hits (30.75%)
<not supported> L1-icache-loads
184,001,558,310 L1-icache-load-misses (30.76%)
1,934,701,161,746 dTLB-loads # 711.760 M/sec (30.74%)
676,618,636 dTLB-load-misses # 0.03% of all dTLB cache hits (30.76%)
6,275,901,454 iTLB-loads # 2.309 M/sec (30.78%)
391,706,425 iTLB-load-misses # 6.24% of all iTLB cache hits (30.78%)
<not supported> L1-dcache-prefetches
<not supported> L1-dcache-prefetch-misses
93.139551411 seconds time elapsed
UPDATE:
It is confirmed the performance gain comes from linux kernel 5.4, because the performance on linux kernel 5.3 is the same as linux kernel 4.9.
It is confirmed the performance gain has no relation with libc, because on linux kernel 5.10 whose libc is 2.23 the performance is the same as linux kernel 5.4 whose libc is 2.31
It seems performance gain comes from this fix:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=de53fd7aedb100f03e5d2231cfce0e4993282425

already use docker limit, why server cpu load average still high

cpu:8
memory:64G
when use docker swarm to deploy application, already limit cpu and memory
docker service update java --reserved-cpu 2 --limit-cpu 2 --limit-memory 4G --reserve-memory 4G
use top to check server,
top - 11:09:26 up 889 days, 19:31, 16 users, load average: 72.50, 78.28, 55.54
Tasks: 271 total, 2 running, 183 sleeping, 0 stopped, 0 zombie
%Cpu(s): 36.7 us, 7.2 sy, 0.0 ni, 52.4 id, 0.0 wa, 0.0 hi, 3.7 si, 0.0 st
KiB Mem : 65916324 total, 27546636 free, 8884904 used, 29484784 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 56442876 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15566 root 20 0 12.563g 2.871g 17580 S 199.0 4.6 42:06.92 java
45076 root 20 0 19.018g 337692 16680 S 167.2 0.5 0:07.38 java
14692 root 20 0 1941688 122152 50868 S 4.0 0.2 617:42.71 dockerd
have not other application, why load average run so high
could someone help to check it,thx

understanding docker container cpu usages

docker stats shows that the cpu usage to be very high. But top command out shows that 88.3% cpu is not being used. Inside the container is a java service httpthrift service.
docker stats :
CONTAINER CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
8a0488xxxx5 540.9% 41.99 GiB / 44 GiB 95.43% 0 B / 0 B 0 B / 35.2 MB 286
top output :
top - 07:56:58 up 2 days, 22:29, 0 users, load average: 2.88, 3.01, 3.05
Tasks: 13 total, 1 running, 12 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.2 us, 2.7 sy, 0.0 ni, 88.3 id, 0.0 wa, 0.0 hi, 0.9 si, 0.0 st
KiB Mem: 65959920 total, 47983628 used, 17976292 free, 357632 buffers
KiB Swap: 7999484 total, 0 used, 7999484 free. 2788868 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8823 root 20 0 58.950g 0.041t 21080 S 540.9 66.5 16716:32 java
How to reduce the cpu usage and bring it under 100%?
According to the top man page:
When operating in Solaris mode (`I' toggled Off), a task's cpu usage will be divided by the total number of CPUs. After issuing this command, you'll be told the new state of this toggle.
So by pressing the key I when using top in interactive mode, you will switch to the Solaris mode and the CPU usage will be divided by the total number of CPUs (or cores).
P.S.: This option is not available on all versions of top.

What are memory requirments for OrientDB/Can I run on EC2 micro?

I would like to run OrientDB on an EC2 micro (free tier) instance. I am unable to find official documentation for OrientDB that gives memory requirements, however I found this question that says 512MB should be fine. I am running an EC2 micro instance which has 1GB RAM. However, when I try to run OrientDB I get the JRE error shown below. My initial thought was that I needed to increase the jre memory using -xmx, but I guess it would be the shell script that would do this.. Has anyone successfully run OrientDB in an EC2 micro instance or run into this problem?
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a04a0000, 1431699456, 0) failed; error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 1431699456 bytes for committing reserved memory.
An error report file with more information is saved as:
/tmp/jvm-14728/hs_error.log
Here are the contents of the error log:
OS:Linux
uname:Linux 4.14.47-56.37.amzn1.x86_64 #1 SMP Wed Jun 6 18:49:01 UTC 2018 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 3867, NOFILE 4096, AS infinity
load average:0.00 0.00 0.00
/proc/meminfo:
MemTotal: 1011168 kB
MemFree: 322852 kB
MemAvailable: 822144 kB
Buffers: 83188 kB
Cached: 523056 kB
SwapCached: 0 kB
Active: 254680 kB
Inactive: 369952 kB
Active(anon): 18404 kB
Inactive(anon): 48 kB
Active(file): 236276 kB
Inactive(file): 369904 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 36 kB
Writeback: 0 kB
AnonPages: 18376 kB
Mapped: 31660 kB
Shmem: 56 kB
Slab: 51040 kB
SReclaimable: 41600 kB
SUnreclaim: 9440 kB
KernelStack: 1564 kB
PageTables: 2592 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 505584 kB
Committed_AS: 834340 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 49152 kB
DirectMap2M: 999424 kB
CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 63 stepping 2, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, erms, tsc
/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2676 v3 # 2.40GHz
stepping : 2
microcode : 0x3c
cpu MHz : 2400.043
cache size : 30720 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4800.05
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Memory: 4k page, physical 1011168k(322728k free), swap 0k(0k free)
vm_info: OpenJDK 64-Bit Server VM (24.181-b00) for linux-amd64 JRE (1.7.0_181-b00), built on Jun 5 2018 20:36:03 by "mockbuild" with gcc 4.8.5 20150623 (Red Hat 4.8.5-28)
time: Mon Aug 20 20:51:08 2018
elapsed time: 0 seconds
Orient can easily run in 512MB though your performance and throughput will not be as high. In OrientDB 3.0.x you can use the environment variable ORIENTDB_OPTS_MEMORY to set it. On the command line I can, for example run:
cd $ORIENTDB_HOME/bin
export ORIENTDB_OPTS_MEMORY="-Xmx512m"
./server.sh
(where $ORIENTDB_HOME is where you have OrientDB installed) and I'm running with 512MB of memory.
As an aside, if you look in $ORIENTDB_HOME/bin/server.sh you'll see that there is even code to check if the server is running on a Raspberry Pi and those range from 256MB to 1GB so the t2.micro will run just fine.

Why are ruby processes at 100% CPU on passenger

I have a rails app (2.3.5) running on a VPS with 4 cores # 2 GHz and 4GB memory. I am running nginx (0.7.61) and phusion passenger(2.2.14) on Ruby Enterprise (1.8.7-2010.01) with the max pool size set at 30. My problem is that it seems as if every ruby process that is executing a rails request runs at near 100% cpu. If I run TOP they drop off every time the display refreshes so they are not getting hung, but they are still running at 100%.
Is there any way I can bring this down? Or at least figure out what portion of code is spiking the CPU? Is this a normal behavior?
Here is the TOP output:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2427 psadmin 25 0 91904 76m 2696 R 100 1.9 739:05.96 Rails: /var/www/apps/main_rails_app/current
3457 psadmin 25 0 98180 82m 2532 R 100 2.0 711:21.91 Rails: /var/www/apps/main_rails_app/current
2415 psadmin 25 0 93952 77m 2708 R 99 1.9 727:49.31 Rails: /var/www/apps/main_rails_app/current
3455 psadmin 25 0 99204 83m 2528 R 69 2.0 726:04.70 Rails: /var/www/apps/main_rails_app/current
2791 psadmin 16 0 98044 81m 2492 S 31 2.0 0:10.16 Rails: /var/www/apps/main_rails_app/current
8034 psadmin 15 0 8160 3656 1772 S 1 0.1 0:35.39 nginx: worker process
8035 psadmin 15 0 8324 3696 1732 S 0 0.1 0:31.34 nginx: worker process
2588 psadmin 15 0 197m 183m 2712 S 0 4.5 1:02.16 Rails: /var/www/apps/main_rails_app/current
Thanks!
Edit: Tried strace with follow forks as mentioned below. This is the output that is dumped over and over:
sudo strace -f -p 3455
clock_gettime(CLOCK_MONOTONIC, {394577, 508326476}) = 0
select(0, [], [], [], {0, 0}) = 0 (Timeout)
--- SIGVTALRM (Virtual timer expired) # 0 (0) ---
sigreturn()
check your logs for suspicious behavior. In general rails does suck a bunch of cpu though...you could also try pointing strace at the offending pids.

Resources