I've written a simulator, which is distributed over two hosts. When I launch a few thousand processes, after about 10 minutes and half a million events written, my main Erlang (OTP v22) virtual machine crashes with this message:
no next heap size found: 18446744071789822643, offset 0.
It's always that same number - 18446744071789822643.
Because my server is very capable, the crash dump is also huge and I can't view it on my headless server (no WX installed).
Are there any tips on what I can look at?
What would be the first things I can try out to debug this issue?
First, see what memory() says:
> memory().
[{total,18480016},
{processes,4615512},
{processes_used,4614480},
{system,13864504},
{atom,331273},
{atom_used,306525},
{binary,47632},
{code,5625561},
{ets,438056}]
Check which one is growing - processes, binary, ets?
If it's processes, try typing i(). in the Erlang shell while the processes are running. You'll see something like:
Pid Initial Call Heap Reds Msgs
Registered Current Function Stack
<0.0.0> otp_ring0:start/2 233 1263 0
init init:loop/1 2
<0.1.0> erts_code_purger:start/0 233 44 0
erts_code_purger erts_code_purger:wait_for_request 0
<0.2.0> erts_literal_area_collector:start 233 9 0
erts_literal_area_collector:msg_l 5
<0.3.0> erts_dirty_process_signal_handler 233 128 0
erts_dirty_process_signal_handler 2
<0.4.0> erts_dirty_process_signal_handler 233 9 0
erts_dirty_process_signal_handler 2
<0.5.0> erts_dirty_process_signal_handler 233 9 0
erts_dirty_process_signal_handler 2
<0.8.0> erlang:apply/2 6772 238183 0
erl_prim_loader erl_prim_loader:loop/3 5
Look for a process with a very big heap, and that's where you'd start looking for a memory leak.
(If you weren't running headless, I'd suggest starting Observer with observer:start(), and look at what's happening in the Erlang node.)
Related
Cephadm Pacific v16.2.7
Our Ceph cluster is stuck pgs degraded and osd are down
Reason:- OSD's got filled up
Things we tried
Changed vale to to maximum possible combination (not sure if done right ?)
backfillfull < nearfull, nearfull < full, and full < failsafe_full
ceph-objectstore-tool - tried to delte some pgs to recover space
tried to mount osd and delete pg's to recover some space, but not sure how to do it in bluestore .
Global Recovery Event - stuck for ever
ceph -s
cluster:
id: a089a4b8-2691-11ec-849f-07cde9cd0b53
health: HEALTH_WARN
6 failed cephadm daemon(s)
1 hosts fail cephadm check
Reduced data availability: 362 pgs inactive, 6 pgs down, 287 pgs peering, 48 pgs stale
Degraded data redundancy: 5756984/22174447 objects degraded (25.962%), 91 pgs degraded, 84 pgs undersized
13 daemons have recently crashed
3 slow ops, oldest one blocked for 31 sec, daemons [mon.raspi4-8g-18,mon.raspi4-8g-20] have slow ops.
services:
mon: 5 daemons, quorum raspi4-8g-20,raspi4-8g-25,raspi4-8g-18,raspi4-8g-10,raspi4-4g-23 (age 2s)
mgr: raspi4-8g-18.slyftn(active, since 3h), standbys: raspi4-8g-12.xuuxmp, raspi4-8g-10.udbcyy
osd: 19 osds: 15 up (since 2h), 15 in (since 2h); 6 remapped pgs
data:
pools: 40 pools, 636 pgs
objects: 4.28M objects, 4.9 TiB
usage: 6.1 TiB used, 45 TiB / 51 TiB avail
pgs: 56.918% pgs not active
5756984/22174447 objects degraded (25.962%)
2914/22174447 objects misplaced (0.013%)
253 peering
218 active+clean
57 undersized+degraded+peered
25 stale+peering
20 stale+active+clean
19 active+recovery_wait+undersized+degraded+remapped
10 active+recovery_wait+degraded
7 remapped+peering
7 activating
6 down
2 active+undersized+remapped
2 stale+remapped+peering
2 undersized+remapped+peered
2 activating+degraded
1 active+remapped+backfill_wait
1 active+recovering+undersized+degraded+remapped
1 undersized+peered
1 active+clean+scrubbing+deep
1 active+undersized+degraded+remapped+backfill_wait
1 stale+active+recovery_wait+undersized+degraded+remapped
progress:
Global Recovery Event (2h)
[==========..................] (remaining: 4h)
'''
Some versions of BlueStore were susceptible to BlueFS log growing extremely large - beyond the point of making booting OSD impossible. This state is indicated by booting that takes very long and fails in _replay function.
This can be fixed by::
ceph-bluestore-tool fsck –path osd path –bluefs_replay_recovery=true
It is advised to first check if rescue process would be successful::
ceph-bluestore-tool fsck –path osd path –bluefs_replay_recovery=true –bluefs_replay_recovery_disable_compact=true
If above fsck is successful fix procedure can be applied
Special Thank you to, this has been solved with the help of a dewDrive Cloud backup faculty Member
We have a java application running on Mule. We have the XMX value configured for 6144M, but are routinely seeing the overall memory usage climb and climb. It was getting close to 20 GB the other day before we proactively restarted it.
Thu Jun 30 03:05:57 CDT 2016
top - 03:05:58 up 149 days, 6:19, 0 users, load average: 0.04, 0.04, 0.00
Tasks: 164 total, 1 running, 163 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.2%us, 1.7%sy, 0.0%ni, 93.9%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24600552k total, 21654876k used, 2945676k free, 440828k buffers
Swap: 2097144k total, 84256k used, 2012888k free, 1047316k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3840 myuser 20 0 23.9g 18g 53m S 0.0 79.9 375:30.02 java
The jps command shows:
10671 Jps
3840 MuleContainerBootstrap
The jstat command shows:
S0C S1C S0U S1U EC EU OC OU PC PU YGC YGCT FGC FGCT GCT
37376.0 36864.0 16160.0 0.0 2022912.0 1941418.4 4194304.0 445432.2 78336.0 66776.7 232 7.044 17 17.403 24.447
The startup arguments are (sensitive bits have been changed):
3840 MuleContainerBootstrap -Dmule.home=/mule -Dmule.base=/mule -Djava.net.preferIPv4Stack=TRUE -XX:MaxPermSize=256m -Djava.endorsed.dirs=/mule/lib/endorsed -XX:+HeapDumpOnOutOfMemoryError -Dmyapp.lib.path=/datalake/app/ext_lib/ -DTARGET_ENV=prod -Djava.library.path=/opt/mapr/lib -DksPass=mypass -DsecretKey=aeskey -DencryptMode=AES -Dkeystore=/mule/myStore -DkeystoreInstance=JCEKS -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dmule.mmc.bind.port=1521 -Xms6144m -Xmx6144m -Djava.library.path=%LD_LIBRARY_PATH%:/mule/lib/boot -Dwrapper.key=a_guid -Dwrapper.port=32000 -Dwrapper.jvm.port.min=31000 -Dwrapper.jvm.port.max=31999 -Dwrapper.disable_console_input=TRUE -Dwrapper.pid=10744 -Dwrapper.version=3.5.19-st -Dwrapper.native_library=wrapper -Dwrapper.arch=x86 -Dwrapper.service=TRUE -Dwrapper.cpu.timeout=10 -Dwrapper.jvmid=1 -Dwrapper.lang.domain=wrapper -Dwrapper.lang.folder=../lang
Adding up the "capacity" items from jps shows that only my 6144m is being used for java heap. Where the heck is the rest of the memory being used? Stack memory? Native heap? I'm not even sure how to proceed.
If left to continue growing, it will consume all memory on the system and we will eventually see the system freeze up throwing swap space errors.
I have another process that is starting to grow. Currently at about 11g resident memory.
pmap 10746 > pmap_10746.txt
cat pmap_10746.txt | grep anon | cut -c18-25 | sort -h | uniq -c | sort -rn | less
Top 10 entries by count:
119 12K
112 1016K
56 4K
38 131072K
20 65532K
15 131068K
14 65536K
10 132K
8 65404K
7 128K
Top 10 entries by allocation size:
1 6291456K
1 205816K
1 155648K
38 131072K
15 131068K
1 108772K
1 71680K
14 65536K
20 65532K
1 65512K
And top 10 by total size:
Count Size Aggregate
1 6291456K 6291456K
38 131072K 4980736K
15 131068K 1966020K
20 65532K 1310640K
14 65536K 917504K
8 65404K 523232K
1 205816K 205816K
1 155648K 155648K
112 1016K 113792K
This seems to be telling me that because the Xmx and Xms are set to the same value, there is a single allocation of 6291456K for the java heap. Other allocations are NOT java heap memory. What are they? They are getting allocated in rather large chunks.
Expanding a bit more details on Peter's answer.
You can take a binary heap dump from within VisualVM (right click on the process in the left-hand side list, and then on heap dump - it'll appear right below shortly after). If you can't attach VisualVM to your JVM, you can also generate the dump with this:
jmap -dump:format=b,file=heap.hprof $PID
Then copy the file and open it with Visual VM (File, Load, select type heap dump, find the file.)
As Peter notes, a likely cause for the leak may be non collected DirectByteBuffers (e.g.: some instance of another class is not properly de-referencing buffers, so they are never GC'd).
To identify where are these references coming from, you can use Visual VM to examine the heap and find all instances of DirectByteByffer in the "Classes" tab. Find the DBB class, right click, go to instances view.
This will give you a list of instances. You can click on one and see who's keeping a reference each one:
Note the bottom pane, we have "referent" of type Cleaner and 2 "mybuffer". These would be properties in other classes that are referencing the instance of DirectByteBuffer we drilled into (it should be ok if you ignore the Cleaner and focus on the others).
From this point on you need to proceed based on your application.
Another equivalent way to get the list of DBB instances is from the OQL tab. This query:
select x from java.nio.DirectByteBuffer x
Gives us the same list as before. The benefit of using OQL is that you can execute more more complex queries. For example, this gets all the instances that are keeping a reference to a DirectByteBuffer:
select referrers(x) from java.nio.DirectByteBuffer x
What you can do is take a heap dump and look for object which are storing data off heap such as ByteBuffers. Those objects will appear small but are a proxy for larger off heap memory areas. See if you can determine why lots of those might be retained.
I'm using Ipython parallel in an optimisation algorithm that loops a large number of times. Parallelism is invoked in the loop using the map method of a LoadBalancedView (twice), a DirectView's dictionary interface and an invocation of a %px magic. I'm running the algorithm in an Ipython notebook.
I find that the memory consumed by both the kernel running the algorithm and one of the controllers increases steadily over time, limiting the number of loops I can execute (since available memory is limited).
Using heapy, I profiled memory use after a run of about 38 thousand loops:
Partition of a set of 98385344 objects. Total size = 18016840352 bytes.
Index Count % Size % Cumulative % Kind (class / dict of class)
0 5059553 5 9269101096 51 9269101096 51 IPython.parallel.client.client.Metadata
1 19795077 20 2915510312 16 12184611408 68 list
2 24030949 24 1641114880 9 13825726288 77 str
3 5062764 5 1424092704 8 15249818992 85 dict (no owner)
4 20238219 21 971434512 5 16221253504 90 datetime.datetime
5 401177 0 426782056 2 16648035560 92 scipy.optimize.optimize.OptimizeResult
6 3 0 402654816 2 17050690376 95 collections.defaultdict
7 4359721 4 323814160 2 17374504536 96 tuple
8 8166865 8 196004760 1 17570509296 98 numpy.float64
9 5488027 6 131712648 1 17702221944 98 int
<1582 more rows. Type e.g. '_.more' to view.>
You can see that about half the memory is used by IPython.parallel.client.client.Metadata instances. A good indicator that results from the map invocations are being cached is the 401177 OptimizeResult instances, the same number as the number of optimize invocations via lbview.map - I am not caching them in my code.
Is there a way I can control this memory usage on both the kernel and the Ipython parallel controller (who'se memory consumption is comparable to the kernel)?
Ipython parallel clients and controllers store past results and other metadata from past transactions.
The IPython.parallel.Client class provides a method for clearing this data:
Client.purge_everything()
documented here. There is also purge_results() and purge_local_results() methods that give you some control over what gets purged.
Informix 11.70.TC5DE,
Windows Vista with Dual Core Processor, 8GB RAM, 1TB HDD:
During the installation of this server, I specified it was going to be used for a data warehousing application. These are the onconfig parameters the install script generated.
Can any of these parameters be changed to maximize the performance of the server?
#(onconfig.ol_informix1170) - for data warehousing app.
ROOTNAME rootdbs
ROOTPATH C:\PROGRA~1\IBM\Informix\11.70\OL_INF~2\dbspaces\rootdbs.000
ROOTOFFSET 0
ROOTSIZE 312992
MIRROR 0
MIRRORPATH
MIRROROFFSET 0
PHYSFILE 49152
PLOG_OVERFLOW_PATH
PHYSBUFF 512
LOGFILES 6
LOGSIZE 10000
DYNAMIC_LOGS 2
LOGBUFF 256
LTXHWM 70
LTXEHWM 80
MSGPATH C:\PROGRA~1\IBM\Informix\11.70\ol_informix1170_1.log
CONSOLE C:\PROGRA~1\IBM\Informix\11.70\ol_informix1170_1.con
TBLTBLFIRST 0
TBLTBLNEXT 0
TBLSPACE_STATS 1
DBSPACETEMP tempdbs
SBSPACETEMP
SBSPACENAME sbspace
SYSSBSPACENAME
ONDBSPACEDOWN 2
SERVERNUM 6
DBSERVERNAME ol_informix1170_1
DBSERVERALIASES dr_informix1170_1
NETTYPE olsoctcp,1,150,NET
LISTEN_TIMEOUT 60
MAX_INCOMPLETE_CONNECTIONS 1024
FASTPOLL 1
NS_CACHE host=900,service=900,user=900,group=900
MULTIPROCESSOR 0
VPCLASS cpu,num=1,noage
VP_MEMORY_CACHE_KB 0
SINGLE_CPU_VP 1
#VPCLASS aio,num=1
CLEANERS 2
AUTO_AIOVPS 1
DIRECT_IO 0
LOCKS 2000
DEF_TABLE_LOCKMODE page
RESIDENT 0
SHMBASE 0xc000000L
SHMVIRTSIZE 209920
SHMADD 6560
EXTSHMADD 8192
SHMTOTAL 0
SHMVIRT_ALLOCSEG 0,3
#SHMNOACCESS 0x70000000-0x7FFFFFFF
CKPTINTVL 300
AUTO_CKPTS 1
RTO_SERVER_RESTART 60
BLOCKTIMEOUT 3600
CONVERSION_GUARD 2
RESTORE_POINT_DIR $INFORMIXDIR\tmp
TXTIMEOUT 300
DEADLOCK_TIMEOUT 60
HETERO_COMMIT 0
TAPEDEV \\.\TAPE0
TAPEBLK 16
TAPESIZE 0
LTAPEDEV
LTAPEBLK 16
LTAPESIZE 0
BAR_ACT_LOG $INFORMIXDIR\tmp\bar_act.log
BAR_DEBUG_LOG $INFORMIXDIR\tmp\bar_dbug.log
BAR_DEBUG 0
BAR_MAX_BACKUP 0
BAR_RETRY 1
BAR_NB_XPORT_COUNT 20
BAR_XFER_BUF_SIZE 15
RESTARTABLE_RESTORE ON
BAR_PROGRESS_FREQ 0
BAR_BSALIB_PATH
BACKUP_FILTER
RESTORE_FILTER
BAR_PERFORMANCE 0
BAR_CKPTSEC_TIMEOUT 15
ISM_DATA_POOL ISMData
ISM_LOG_POOL ISMLogs
DD_HASHSIZE 31
DD_HASHMAX 10
DS_HASHSIZE 31
DS_POOLSIZE 127
PC_HASHSIZE 31
PC_POOLSIZE 127
PRELOAD_DLL_FILE
STMT_CACHE 0
STMT_CACHE_HITS 0
STMT_CACHE_SIZE 512
STMT_CACHE_NOLIMIT 0
STMT_CACHE_NUMPOOL 1
USEOSTIME 0
STACKSIZE 64
ALLOW_NEWLINE 0
USELASTCOMMITTED NONE
FILLFACTOR 90
MAX_FILL_DATA_PAGES 0
BTSCANNER num=1,threshold=5000,rangesize=-1,alice=6,compression=default
ONLIDX_MAXMEM 188928
MAX_PDQPRIORITY 100
DS_MAX_QUERIES 1
DS_TOTAL_MEMORY 188928
DS_MAX_SCANS 1
DS_NONPDQ_QUERY_MEM 188928
DATASKIP
OPTCOMPIND 2
DIRECTIVES 1
EXT_DIRECTIVES 0
OPT_GOAL -1
IFX_FOLDVIEW 0
AUTO_REPREPARE 1
USTLOW_SAMPLE 0
RA_PAGES 64
RA_THRESHOLD 16
BATCHEDREAD_TABLE 1
BATCHEDREAD_INDEX 1
BATCHEDREAD_KEYONLY 0
EXPLAIN_STAT 1
#SQLTRACE level=low,ntraces=1000,size=2,mode=global
#DBCREATE_PERMISSION informix
#DB_LIBRARY_PATH
IFX_EXTEND_ROLE 1
SECURITY_LOCALCONNECTION
UNSECURE_ONSTAT
ADMIN_USER_MODE_WITH_DBSA
ADMIN_MODE_USERS
PLCY_POOLSIZE 127
PLCY_HASHSIZE 31
USRC_POOLSIZE 127
USRC_HASHSIZE 31
STAGEBLOB
OPCACHEMAX 0
SQL_LOGICAL_CHAR OFF
SEQ_CACHE_SIZE 10
ENCRYPT_HDR
ENCRYPT_SMX
ENCRYPT_CDR 0
ENCRYPT_CIPHERS
ENCRYPT_MAC
ENCRYPT_MACFILE
ENCRYPT_SWITCH
CDR_EVALTHREADS 1,2
CDR_DSLOCKWAIT 5
CDR_QUEUEMEM 4096
CDR_NIFCOMPRESS 0
CDR_SERIAL 0
CDR_DBSPACE
CDR_QHDR_DBSPACE
CDR_QDATA_SBSPACE
CDR_SUPPRESS_ATSRISWARN
CDR_DELAY_PURGE_DTC 0
CDR_LOG_LAG_ACTION ddrblock
CDR_LOG_STAGING_MAXSIZE 0
CDR_MAX_DYNAMIC_LOGS 0
DRAUTO 0
DRINTERVAL 30
DRTIMEOUT 30
HA_ALIAS
DRLOSTFOUND $INFORMIXDIR\etc\dr.lostfound
DRIDXAUTO 0
LOG_INDEX_BUILDS
SDS_ENABLE
SDS_TIMEOUT 20
SDS_TEMPDBS
SDS_PAGING
SDS_LOGCHECK 0
UPDATABLE_SECONDARY 0
FAILOVER_CALLBACK
FAILOVER_TX_TIMEOUT 0
TEMPTAB_NOLOG 0
DELAY_APPLY 0
STOP_APPLY 0
LOG_STAGING_DIR
RSS_FLOW_CONTROL 0
ENABLE_SNAPSHOT_COPY 0
SMX_COMPRESS 0
ON_RECVRY_THREADS 2
OFF_RECVRY_THREADS 5
DUMPDIR $INFORMIXDIR\tmp
DUMPSHMEM 1
DUMPGCORE 0
DUMPCORE 0
DUMPCNT 1
ALARMPROGRAM $INFORMIXDIR\etc\alarmprogram.bat
ALRM_ALL_EVENTS 0
#SYSALARMPROGRAM $INFORMIXDIR\etc\evidence.bat
STORAGE_FULL_ALARM 600,3
RAS_PLOG_SPEED 10982
RAS_LLOG_SPEED 0
EILSEQ_COMPAT_MODE 0
QSTATS 0
WSTATS 0
#VPCLASS MQ,noyield
MQSERVER
MQCHLLIB
MQCHLTAB
#VPCLASS jvp,num=1
#JVPJAVAHOME $INFORMIXDIR\extend\krakatoa\jre
#JVPHOME $INFORMIXDIR\extend\krakatoa
JVPPROPFILE $INFORMIXDIR\extend\krakatoa\.jvpprops
JVPLOGFILE $INFORMIXDIR\jvp.log
#JDKVERSION 1.5
#JVPJAVALIB \bin
#JVPJAVAVM jvm
#JVPARGS -verbose:jni
#JVPCLASSPATH $INFORMIXDIR\extend\krakatoa\krakatoa_g.jar;$INFORMIXDIR\extend\krakatoa\jdbc_g.jar
JVPARGS -Dcom.ibm.tools.attach.enable=no
JVPCLASSPATH $INFORMIXDIR\extend\krakatoa\krakatoa.jar;$INFORMIXDIR\extend\krakatoa\jdbc.jar
BUFFERPOOL default,buffers=10000,lrus=8,lru_min_dirty=50.00,lru_max_dirty=60.50
BUFFERPOOL size=4K,buffers=13108,lrus=16,lru_min_dirty=70.00,lru_max_dirty=80.00
AUTO_LRU_TUNING 1
USERMAPPING OFF
SP_AUTOEXPAND 1
SP_THRESHOLD 0
SP_WAITTIME 30
DEFAULTESCCHAR \
LOW_MEMORY_RESERVE 0
LOW_MEMORY_MGR 0
REMOTE_SERVER_CFG
REMOTE_USERS_CFG
S6_USE_REMOTE_SERVER_CFG 0
GSKIT_VERSION
NETTYPE drsoctcp,1,150,NET
If it is a multiprocessor machine, definitely consider turning on MULTIPROCESSOR by setting it to a non-zero value.
The ONCONFIG parameters of greatest interest to you for DSS are those related to Parallel Data Query, or PDQ. The block that commences with MAX_PDQPRIORITY. It is worth perusing the fine manual on these specifically, because the inter-relationship between them and some other parameters is too complex to go into here.
But in essence, DS_MAX_QUERIES is the maxumum number of parallel queries permitted at any time, and DS_MAX_SCANS determines the number of IO threads for scanning your tables. DS_TOTAL_MEMORY determines the amount of memory allocated for PDQ processing, and there is an algorithm in the manual that shows how these variables and the user's PDQPRIORITY setting combine.
You might also want to consider lifting the RA_PAGES and RA_THRESHOLD values - these determine how many pages are read into memory as 'blocks' before grabbing the next batch. If you're wanting to favour table-scans (which generally you do in DSS) then increasing these to something like 256 and 128 might improve performance.
My experience is with SMP and MPP unix boxes, rather than Windows, so I'm not sure how much you can wring out of your architecture, but this is where you want to start.
I would recommend identifying a good DSS query that runs for a decent length of time, and changing one parameter at a time to see the effect. SET EXPLAIN ON is your friend here, too.
One last thing - 11.7 supports table compression, and the tests I've seen show dramatic improvements in a DSS environment with large reads and irregular writes.
Can someone explain this in a practical way? Sample represents usage for one, low-traffic Rails site using Nginx and 3 Mongrel clusters. I ask because I am aiming to learn about page caching, wondering if these figures have significant meaning to that process. Thank you. Great site!
me#vps:~$ free -m
total used free shared buffers cached
Mem: 512 506 6 0 15 103
-/+ buffers/cache: 387 124
Swap: 1023 113 910
Physical memory is all used up. Why? Because it's there, the system should be using it.
You'll note also that the system is using 113M of swap space. Bad? Good? It depends.
See also that there's 103M of cached disk; this means that the system has decided that it's better to cache 103M of disk and swap out these 113M; maybe you have some processes using memory that are not being used and thus are paged out to disk.
As the other poster said, you should be using other tools to see what's happening:
Your perception: is the site running appropiately when you use it?
Benchmarking: what response times are your clients seeing?
More fine-grained diagnostics:
top: you can see live which processes are using memory and CPU
vmstat: it produces this kind of output:
alex#armitage:~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 1 71184 156520 92524 316488 1 5 12 23 362 250 13 6 80 1
0 0 71184 156340 92528 316508 0 0 0 1 291 608 10 1 89 0
0 0 71184 156364 92528 316508 0 0 0 0 308 674 9 2 89 0
0 0 71184 156364 92532 316504 0 0 0 72 295 723 9 0 91 0
1 0 71184 150892 92532 316508 0 0 0 0 370 722 38 0 62 0
0 0 71184 163060 92532 316508 0 0 0 0 303 611 17 2 81 0
which will show you whether swap is hurting you (high numbers on si, so) and a more easier to see performance-over-time statistic.
by my reading of this, you have used almost all your memory, have 6 M free, and are going into about 10% of your swap. A more useful tools is to use top or perhaps ps to see how much each of your individual mongrels are using in RAM. Because you're going into swap, you're probably getting more slowdowns. you might find having only 2 mongrels rather than 3 might actually respond faster because it likely wouldn't go into swap memory.
Page caching will for sure help a tonne on response time, so if your pages are cachable (eg, they don't have content that is unique to the individual user) I would say for sure check it out