I get 10 to 20 of these errors all within 1 second of each other:
Memcache_connect Connection timed out
This happens several times a day, on a server with about 2500 daily active users, 1GB of ram. I don't think the server is swapping. Most of the time, I'm at less then 75% memory utilization, and less then 25% CPU utilization. The load averages are usually less then 9. I followed the debugging instructions here: http://code.google.com/p/memcached/wiki/Timeouts
Here are my memcache stats:
stats
STAT pid 15365
STAT uptime 173776
STAT time 1329157234
STAT version 1.2.8
STAT pointer_size 32
STAT rusage_user 1171.316354
STAT rusage_system 7046.435826
STAT curr_items 28494
STAT total_items 4039745
STAT bytes 3371127
STAT curr_connections 36
STAT total_connections 102206685
STAT connection_structures 328
STAT cmd_flush 0
STAT cmd_get 73532547
STAT cmd_set 4039745
STAT get_hits 40779162
STAT get_misses 32753385
STAT evictions 0
STAT bytes_read 2153565193
STAT bytes_written 38768040520
STAT limit_maxbytes 67108864
STAT threads 2
STAT accepting_conns 1
STAT listen_disabled_num 0
My hypothesis is that I'm running out of TIME_WAIT buckets:
netstat -n | grep TIME_WAIT | wc -l
51892
But I don't know if that's too high or not.
I'm on Solaris (on the Joyent servers) and the tcp_time_wait_interval is set to 60000. Some other readings suggested decreasing this setting to 30000, or 15000. But this doesn't seem like a scalable solution to me.
How do I know that it's running out of buckets? Should i increase the number of TIME_WAIT buckets? if so, how?
Related
I have CentOS 6.8, Cassandra 3.9, 32 GB RAM. When I start Cassandra and once it is started, it starts consuming the memory and start adding up 'Cached' memory value when I start querying from CQLSH or Apache Spark and in this process, very less memory remain for other processing like cron execution.
Here are some details from my system
free -m
total used free shared buffers cached
Mem: 32240 32003 237 0 41 24010
-/+ buffers/cache: 7950 24290
Swap: 2047 25 2022
And here is the output of top -M command
top - 08:54:39 up 5 days, 16:24, 4 users, load average: 1.22, 1.20, 1.29
Tasks: 205 total, 2 running, 203 sleeping, 0 stopped, 0 zombie
Cpu(s): 3.5%us, 1.2%sy, 19.8%ni, 75.3%id, 0.1%wa, 0.1%hi, 0.0%si, 0.0%st
Mem: 31.485G total, 31.271G used, 219.410M free, 42.289M buffers
Swap: 2047.996M total, 25.867M used, 2022.129M free, 23.461G cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
14313 cassandr 20 0 595g 28g 22g S 144.5 91.3 300:56.34 java
You can see only 220 MB is left and 23.46 is cached.
My question is how to configure Cassandra so that it can use 'cached' memory to certain value and leave more RAM available for other processes.
Thanks in advance.
In linux in general cached memory as your 23g is just really fine. This memory is used as filesystem cache and so on - not by cassandra itself. Linux systems tend to use all available memory.
This helps to speed up your system in many ways to prevent disk reads.
You can still use the cached memory - just start processes and use your ram, the kernel will free it immediatly.
You can set the sizes in cassandra-env.sh under conf folder. This article should help. http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsTuneJVM.html
I have cluster RHEL6,
cman, corosync, pacemaker.
After adding new memebers I got error in GFS mounting. GFS never mounts on servers.
group_tool
fence domain
member count 4
victim count 0
victim now 0
master nodeid 1
wait state none
members 1 2 3 4
dlm lockspaces
name clvmd
id 0x4104eefa
flags 0x00000000
change member 4 joined 1 remove 0 failed 0 seq 1,1
members 1 2 3 4
gfs mountgroups
name lv_gfs_01
id 0xd5eacc83
flags 0x00000005 blocked,join
change member 3 joined 1 remove 0 failed 0 seq 1,1
members 1 2 3
In processes:
root 2695 2690 0 08:03 pts/1 00:00:00 /bin/bash /etc/init.d/gfs2 start
root 2702 2695 0 08:03 pts/1 00:00:00 /bin/bash /etc/init.d/gfs2 start
root 2704 2703 0 08:03 pts/1 00:00:00 /sbin/mount.gfs2 /dev/mapper/vg_shared-lv_gfs_01 /mnt/share -o rw,_netdev,noatime,nodiratime
fsck.gfs2 -yf /dev/vg_shared/lv_gfs_01
Initializing fsck
jid=1: Replayed 0 of 0 journaled data blocks
jid=1: Replayed 20 of 21 metadata blocks
Recovering journals (this may take a while)
Journal recovery complete.
Validating Resource Group index.
Level 1 rgrp check: Checking if all rgrp and rindex values are good.
(level 1 passed)
RGs: Consistent: 183 Cleaned: 1 Inconsistent: 0 Fixed: 0 Total: 184
2 blocks may need to be freed in pass 5 due to the cleaned resource groups.
Starting pass1
Pass1 complete
Starting pass1b
Pass1b complete
Starting pass1c
Pass1c complete
Starting pass2
Pass2 complete
Starting pass3
Pass3 complete
Starting pass4
Pass4 complete
Starting pass5
Block 11337799 (0xad0047) bitmap says 1 (Data) but FSCK saw 0 (Free)
Fixed.
Block 11337801 (0xad0049) bitmap says 1 (Data) but FSCK saw 0 (Free)
Fixed.
RG #11337739 (0xad000b) free count inconsistent: is 65500 should be 65502
RG #11337739 (0xad000b) Inode count inconsistent: is 15 should be 13
Resource group counts updated
Pass5 complete
The statfs file is wrong:
Current statfs values:
blocks: 12057320 (0xb7fae8)
free: 9999428 (0x989444)
dinodes: 15670 (0x3d36)
Calculated statfs values:
blocks: 12057320 (0xb7fae8)
free: 9999432 (0x989448)
dinodes: 15668 (0x3d34)
The statfs file was fixed.
Writing changes to disk
gfs2_fsck complete
gfs2_edit -p 0xad0047 field di_size /dev/vg_shared/lv_gfs_01
10 (Block 11337799 is type 10: Ext. attrib which is not implemented)
Howto drop flag blocked,join from GFS?
I solved it by reboot all servers which have GFS,
it is one of the unpleasant behavior of GFS.
GFS lock based on kernel and in the few cases it solved only with reboot.
there is very usefull manual - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/Global_File_System_2/index.html
I'm running a docker for a daemon job. And the container will be killed every several hours. I'd like to add some hook (callback), such as:
restart the container and then run some commands on the restarted container
Is it possible to do that with Docker?
Otherwise, is there any better approach to detect the behaviour with Python or Ruby?
java invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
java cpuset=bcb33ac552c23cfa531814fbc3a64ae5cd8d85aa19245e1560e0ce3e3310c798 mems_allowed=0
CPU: 3 PID: 14182 Comm: java Not tainted 4.1.0-x86_64-linode59 #1
0000000000000000 ffff8800dc520800 ffffffff8195b396 ffff880002cf5ac0
ffffffff81955e58 ffff8800a2918c38 ffff8800f43c3e78 0000000000000000
ffff8800b5f687f0 000000000000000d ffffea0002d7da30 ffff88005bebdec0
Call Trace:
[<ffffffff8195b396>] ? dump_stack+0x40/0x50
[<ffffffff81955e58>] ? dump_header+0x7b/0x1fe
[<ffffffff8119655d>] ? __do_fault+0x3f/0x79
[<ffffffff811789d6>] ? find_lock_task_mm+0x2c/0x7b
[<ffffffff81961c55>] ? _raw_spin_unlock_irqrestore+0x2d/0x3e
[<ffffffff81178dee>] ? oom_kill_process+0xc5/0x387
[<ffffffff811789d6>] ? find_lock_task_mm+0x2c/0x7b
[<ffffffff811b76be>] ? mem_cgroup_oom_synchronize+0x3ad/0x4c7
[<ffffffff811b6c92>] ? mem_cgroup_is_descendant+0x29/0x29
[<ffffffff811796e7>] ? pagefault_out_of_memory+0x1c/0xc1
[<ffffffff81963e58>] ? page_fault+0x28/0x30
Task in /docker/bcb33ac552c23cfa531814fbc3a64ae5cd8d85aa19245e1560e0ce3e3310c798 killed as a result of limit of /docker/bcb33ac552c23cfa531814fbc3a64ae5cd8d85aa19245e1560e0ce3e3310c798
memory: usage 524288kB, limit 524288kB, failcnt 14716553
memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
kmem: usage 0kB, limit 9007199254740988kB, failcnt 0
Memory cgroup stats for /docker/bcb33ac552c23cfa531814fbc3a64ae5cd8d85aa19245e1560e0ce3e3310c798: cache:72KB rss:524216KB rss_huge:0KB mapped_file:64KB writeback:0KB inactive_anon:262236KB active_anon:262044KB inactive_file:4KB active_file:4KB unevictable:0KB
[ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
[14097] 1000 14097 5215 20 17 3 47 0 entry_point.sh
[14146] 0 14146 11960 0 30 3 101 0 sudo
[14150] 1000 14150 1112 7 8 3 22 0 xvfb-run
[14162] 1000 14162 51929 11220 90 3 95 0 Xvfb
[14169] 1000 14169 658641 18749 120 6 0 0 java
[14184] 1000 14184 28364 555 58 3 0 0 fluxbox
[24639] 1000 24639 5212 59 16 3 0 0 bash
Memory cgroup out of memory: Kill process 14169 (java) score 96 or sacrifice child
Killed process 14169 (java) total-vm:2634564kB, anon-rss:74996kB, file-rss:0kB
Docker itself doesn't have any such mechanism. All you can do is pass the --restart flag to tell Docker when it should try to bring a failed container back.
However, most places where you want to keep a container up you'll want something more complex than the --restart flag anyway. Once you're using runit or systemd to manage your containers it's easy to add in a little extra shell code to figure out why the last invocation crashed and take some special actions based on that.
I currently have a rails app set up on a Digital Ocean VPS (1GB RAM) trough Cloud 66. The problem being that the VPS' memory runs full with Passenger processes.
The output of passenger-status:
# passenger-status
Version : 4.0.45
Date : 2014-09-23 09:04:37 +0000
Instance: 1762
----------- General information -----------
Max pool size : 2
Processes : 2
Requests in top-level queue : 0
----------- Application groups -----------
/var/deploy/cityspotters/web_head/current#default:
App root: /var/deploy/cityspotters/web_head/current
Requests in queue: 0
* PID: 7675 Sessions: 0 Processed: 599 Uptime: 39m 35s
CPU: 1% Memory : 151M Last used: 1m 10s ago
* PID: 7686 Sessions: 0 Processed: 477 Uptime: 39m 34s
CPU: 1% Memory : 115M Last used: 10s ago
The max_pool_size seems to be configured correctly.
The output of passenger-memory-stats:
# passenger-memory-stats
Version: 4.0.45
Date : 2014-09-23 09:10:41 +0000
------------- Apache processes -------------
*** WARNING: The Apache executable cannot be found.
Please set the APXS2 environment variable to your 'apxs2' executable's filename, or set the HTTPD environment variable to your 'httpd' or 'apache2' executable's filename.
--------- Nginx processes ---------
PID PPID VMSize Private Name
-----------------------------------
1762 1 51.8 MB 0.4 MB nginx: master process /opt/nginx/sbin/nginx
7616 1762 53.0 MB 1.8 MB nginx: worker process
### Processes: 2
### Total private dirty RSS: 2.22 MB
----- Passenger processes -----
PID VMSize Private Name
-------------------------------
7597 218.3 MB 0.3 MB PassengerWatchdog
7600 565.7 MB 1.1 MB PassengerHelperAgent
7606 230.8 MB 1.0 MB PassengerLoggingAgent
7675 652.0 MB 151.7 MB Passenger RackApp: /var/deploy/cityspotters/web_head/current
7686 652.1 MB 116.7 MB Passenger RackApp: /var/deploy/cityspotters/web_head/current
### Processes: 5
### Total private dirty RSS: 270.82 MB
.. 2 Passenger RackApp processes, OK.
But when I use the htop command, the output is as follows:
There seem to be a lot of Passenger Rackup processes. We're also running Sidekiq with the default configuration.
New Relic Server reports the following memory usage:
I tried tuning Passenger settings, adding a load balancer and another server but honestly don't know what to do from here. How can I find out what's causing so much memory usage?
Update: I had to restart ngnix because of some changes and it seemed to free quite a lot of memory.
Press Shift-H to hide threads in htop. Those aren't processes but threads within a process. The key column is RSS: you have two passenger processes at 209MB and 215MB and one Sidekiq process at 154MB.
Short answer: this is completely normal memory usage for a Rails app. 1GB is simply a little small if you want multiple processes for each. I'd cut down passenger to one process.
Does your application create child processes? If so, then it's likely that those extra "Passenger RackApp" processes are not actually processes created by Phusion Passenger, but are in fact processes created by your own app. You should double check whether your app spawns child processes and whether you clean up those child processes correctly. Also double check whether any libraries you use, also properly clean up their child processes.
I see that you're using Sidekiq and you've configured 25 Sidekiq processes. Those are also eating a lot of memory. A Sidekiq process eats just as much memory as a Passenger RackApp process, because both of them load your entire application (including Rails) in memory. Try reducing the number of Sidekiq processes.
I am trying to figure out what is wrong with my Debian server - I am getting warnings of having not enough free memory - top (as you can see below) is saying that 1.8G is consumed, but I am unable to find which application is responsible for it. There is only Tomcat running, which consumes, according to top, ~25 % and so 530m. But There is more than 1 GB left, which I am unable to find!
Tasks: 54 total, 1 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s):100.0%us, 0.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2150400k total, 1877728k used, 272672k free, 0k buffers
Swap: 0k total, 0k used, 0k free, 0k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3271 root 18 0 1559m 530m 12m S 0 25.2 1:44.31 java
1568 mysql 15 0 270m 71m 7332 S 0 3.4 0:50.79 mysqld
(Full top output here)
Linux systems always try to use as much ram as available for various functions like caching of executables our even just page reads from disk. That's what you bought your fast RAM for after all.
You can find out more about your system by doing a
cat /proc/meminfo
More info in this helpful blog post
If you find out a lot is used in cache then you don't have to worry about the system.if individual processes warn you about memory issues then you'll have to check their settings for any memory limiting settings. Many server processes have those, like php or java based processes.
Questions of this nature are also probably more at home at https://serverfault.com/
As i see, your 'Free' command returned NO swap space
Swap: 0k total, 0k used, 0k free, 0k cached
either there is no swap partition available
this swap space is not mounted
one can manual make a swapfile
and mount this file as becoming the active swap
howto make swap
To test your real usage
reboot the machine and
test the used amount
retest after 1 hour
some processes are memory hoggs
like apache or ntop
refer:
check memory
display sorted
memory usage