why my application doesnt finish spark - memory

i run a spark application (input 90Mo), according to jobs UI all my jobs has completed but i notice that my application doesn't finish , this is the last messages ( why i don't get that the SC has successfully stopped )
16/09/13 02:17:31 INFO DAGScheduler: Job 15 finished: saveAsTextFile at slowlyChangingDimension.java:244, took 1,222274 s
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_49_piece0 on 10.0.10.45:46789 in memory (size: 26.2 KB, free: 1424.7 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_49_piece0 on 10.0.10.51:54860 in memory (size: 26.2 KB, free: 1424.7 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.45:46789 in memory (size: 4.4 KB, free: 1424.7 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.53:56003 in memory (size: 4.4 KB, free: 1458.5 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.47:51300 in memory (size: 4.4 KB, free: 1458.5 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.51:54860 in memory (size: 4.4 KB, free: 1424.7 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.54:44644 in memory (size: 4.4 KB, free: 1458.6 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_48_piece0 on 10.0.10.52:32794 in memory (size: 4.4 KB, free: 1458.6 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_47_piece0 on 10.0.10.45:46789 in memory (size: 26.2 KB, free: 1424.7 MB)
16/09/13 02:17:38 INFO BlockManagerInfo: Removed broadcast_47_piece0 on 10.0.10.48:54348 in memory (size: 26.2 KB, free: 1458.6 MB)
16/09/13 02:17:38 INFO ContextCleaner: Cleaned shuffle 28

As this is the batch application. You should use context.stop() at the last line of program, then it will work fine.
If it is not the case then can you share the code that you are running.

Related

What are memory requirments for OrientDB/Can I run on EC2 micro?

I would like to run OrientDB on an EC2 micro (free tier) instance. I am unable to find official documentation for OrientDB that gives memory requirements, however I found this question that says 512MB should be fine. I am running an EC2 micro instance which has 1GB RAM. However, when I try to run OrientDB I get the JRE error shown below. My initial thought was that I needed to increase the jre memory using -xmx, but I guess it would be the shell script that would do this.. Has anyone successfully run OrientDB in an EC2 micro instance or run into this problem?
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000007a04a0000, 1431699456, 0) failed; error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (malloc) failed to allocate 1431699456 bytes for committing reserved memory.
An error report file with more information is saved as:
/tmp/jvm-14728/hs_error.log
Here are the contents of the error log:
OS:Linux
uname:Linux 4.14.47-56.37.amzn1.x86_64 #1 SMP Wed Jun 6 18:49:01 UTC 2018 x86_64
libc:glibc 2.17 NPTL 2.17
rlimit: STACK 8192k, CORE 0k, NPROC 3867, NOFILE 4096, AS infinity
load average:0.00 0.00 0.00
/proc/meminfo:
MemTotal: 1011168 kB
MemFree: 322852 kB
MemAvailable: 822144 kB
Buffers: 83188 kB
Cached: 523056 kB
SwapCached: 0 kB
Active: 254680 kB
Inactive: 369952 kB
Active(anon): 18404 kB
Inactive(anon): 48 kB
Active(file): 236276 kB
Inactive(file): 369904 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 36 kB
Writeback: 0 kB
AnonPages: 18376 kB
Mapped: 31660 kB
Shmem: 56 kB
Slab: 51040 kB
SReclaimable: 41600 kB
SUnreclaim: 9440 kB
KernelStack: 1564 kB
PageTables: 2592 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 505584 kB
Committed_AS: 834340 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB
AnonHugePages: 0 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 49152 kB
DirectMap2M: 999424 kB
CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 63 stepping 2, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, avx, avx2, aes, erms, tsc
/proc/cpuinfo:
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Xeon(R) CPU E5-2676 v3 # 2.40GHz
stepping : 2
microcode : 0x3c
cpu MHz : 2400.043
cache size : 30720 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass
bogomips : 4800.05
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Memory: 4k page, physical 1011168k(322728k free), swap 0k(0k free)
vm_info: OpenJDK 64-Bit Server VM (24.181-b00) for linux-amd64 JRE (1.7.0_181-b00), built on Jun 5 2018 20:36:03 by "mockbuild" with gcc 4.8.5 20150623 (Red Hat 4.8.5-28)
time: Mon Aug 20 20:51:08 2018
elapsed time: 0 seconds
Orient can easily run in 512MB though your performance and throughput will not be as high. In OrientDB 3.0.x you can use the environment variable ORIENTDB_OPTS_MEMORY to set it. On the command line I can, for example run:
cd $ORIENTDB_HOME/bin
export ORIENTDB_OPTS_MEMORY="-Xmx512m"
./server.sh
(where $ORIENTDB_HOME is where you have OrientDB installed) and I'm running with 512MB of memory.
As an aside, if you look in $ORIENTDB_HOME/bin/server.sh you'll see that there is even code to check if the server is running on a Raspberry Pi and those range from 256MB to 1GB so the t2.micro will run just fine.

When Cassandra is running almost all RAM is consumed, why?

I have CentOS 6.8, Cassandra 3.9, 32 GB RAM. When I start Cassandra and once it is started, it starts consuming the memory and start adding up 'Cached' memory value when I start querying from CQLSH or Apache Spark and in this process, very less memory remain for other processing like cron execution.
Here are some details from my system
free -m
total used free shared buffers cached
Mem: 32240 32003 237 0 41 24010
-/+ buffers/cache: 7950 24290
Swap: 2047 25 2022
And here is the output of top -M command
top - 08:54:39 up 5 days, 16:24, 4 users, load average: 1.22, 1.20, 1.29
Tasks: 205 total, 2 running, 203 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.5%us, 1.2%sy, 19.8%ni, 75.3%id, 0.1%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 31.485G total, 31.271G used, 219.410M free, 42.289M buffers
Swap: 2047.996M total, 25.867M used, 2022.129M free, 23.461G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14313 cassandr 20 0 595g 28g 22g S 144.5 91.3 300:56.34 java
You can see only 220 MB is left and 23.46 is cached.
My question is how to configure Cassandra so that it can use 'cached' memory to certain value and leave more RAM available for other processes.
Thanks in advance.
In linux in general cached memory as your 23g is just really fine. This memory is used as filesystem cache and so on - not by cassandra itself. Linux systems tend to use all available memory.
This helps to speed up your system in many ways to prevent disk reads.
You can still use the cached memory - just start processes and use your ram, the kernel will free it immediatly.
You can set the sizes in cassandra-env.sh under conf folder. This article should help. http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html

Spark: Not enough space to cache red in container while still a lot of total storage memory

I have a 30 node cluster, each node has 32 core, 240 G memory (AWS cr1.8xlarge instance). I have the following configurations:
--driver-memory 200g --driver-cores 30 --executor-memory 70g --executor-cores 8 --num-executors 90
I can see from the job tracker that I still have a lot of total storage memory left, but in one of the containers, I got the following message saying Storage limit = 28.3 GB. I am wondering where does this 28.3 GB came from? My memoryFraction for storage is 0.45
And how do I solve this Not enough space to cache rdd issue? Should I do more partition or change default parallelism ... since I still have a lot of total storage memory unused. Thanks!
15/12/05 22:39:36 WARN storage.MemoryStore: Not enough space to cache rdd_31_310 in memory! (computed 1326.6 MB so far)
15/12/05 22:39:36 INFO storage.MemoryStore: Memory use = 9.6 GB (blocks) + 18.1 GB (scratch space shared across 4 tasks(s)) = 27.7 GB. Storage limit = 28.3 GB.
15/12/05 22:39:36 WARN storage.MemoryStore: Not enough space to cache rdd_31_136 in memory! (computed 1835.8 MB so far)
15/12/05 22:39:36 INFO storage.MemoryStore: Memory use = 9.6 GB (blocks) + 18.1 GB (scratch space shared across 5 tasks(s)) = 27.7 GB. Storage limit = 28.3 GB.
15/12/05 22:39:36 INFO executor.Executor: Finished task 136.0 in stage 12.0 (TID 85168). 1272 bytes result sent to driver

Hazelcast memory is continuously increasing

I have a hazelcast cluster with two machines.
The only object in the cluster is a map. Analysing the log files I noticed that the health monitor starts to report a slow increase in memory consumption even though no new entries are being added to map (see sample of log entries below)
Any ideas of what may be causing the memory increase?
<p>2015-09-16 10:45:49 INFO HealthMonitor:? - [10.11.173.129]:5903
[dev] [3.2.1] memory.used=97.6M, memory.free=30.4M,
memory.total=128.0M, memory.max=128.0M, memory.used/total=76.27%,
memory.used/max=76.27%, load.process=0.00%, load.system=1.00%,
load.systemAverage=3.00%, thread.count=96, thread.peakCount=107,
event.q.size=0, executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=1, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>
<p>2015-09-16 10:46:02 INFO
InternalPartitionService:? - [10.11.173.129]:5903 [dev] [3.2.1]
Remaining migration tasks in queue = 51 2015-09-16 10:46:12 DEBUG
TeleavisoIvrLoader:71 - Checking for new files... 2015-09-16 10:46:13
INFO InternalPartitionService:? - [10.11.173.129]:5903 [dev] [3.2.1]
All migration tasks has been completed, queues are empty. 2015-09-16
10:46:19 INFO HealthMonitor:? - [10.11.173.129]:5903 [dev] [3.2.1]
memory.used=103.9M, memory.free=24.1M, memory.total=128.0M,
memory.max=128.0M, memory.used/total=81.21%, memory.used/max=81.21%,
load.process=0.00%, load.system=1.00%, load.systemAverage=2.00%,
thread.count=73, thread.peakCount=107, event.q.size=0,
executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=0, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>
<p>2015-09-16 10:46:49 INFO HealthMonitor:? - [10.11.173.129]:5903
[dev] [3.2.1] memory.used=105.1M, memory.free=22.9M,
memory.total=128.0M, memory.max=128.0M, memory.used/total=82.11%,
memory.used/max=82.11%, load.process=0.00%, load.system=1.00%,
load.systemAverage=1.00%, thread.count=73, thread.peakCount=107,
event.q.size=0, executor.q.async.size=0, executor.q.client.size=0,
executor.q.operation.size=0, executor.q.query.size=0,
executor.q.scheduled.size=0, executor.q.io.size=0,
executor.q.system.size=0, executor.q.operation.size=0,
executor.q.priorityOperation.size=0, executor.q.response.size=0,
operations.remote.size=0, operations.running.size=0, proxy.count=2,
clientEndpoint.count=0, connection.active.count=2,
connection.count=2</p>

Passenger memory stats decomposing

I've been looking for info on fusion passenger's memory stats (passenger-memory-stats) and haven't quite found what I'm looking for. For example below is stats for my latest app.
---- Passenger processes -----
PID VMSize Private Name
------------------------------
4238 22.9 MB 0.3 MB PassengerWatchdog
4241 31.7 MB 0.4 MB PassengerHelperAgent
4243 42.5 MB 6.7 MB Passenger spawn server
4246 72.9 MB 0.7 MB PassengerLoggingAgent
4403 273.3 MB 31.4 MB Passenger ApplicationSpawner: /var/www/TheApp
4556 281.0 MB 34.9 MB Rack: /var/www/TheApp
### Processes: 6
### Total private dirty RSS: 74.42 MB
What I'd like to grasp in more detail is
What is the relation between VMSize and Private and what to look for in those?
As of my understanding now, the private dirty RSS is the real memory usage and if so, do I have to care for the virtual memory sizes at all?

Resources