On a 6 node cassandra cluster, heap size is configured as 31g. When I run nodetool info, I see below
Nodetool info -
[root#ip-10-216-86-94 ~]# nodetool info
ID : 88esdsd01-5233-4b56-a240-ea051ced2928
Gossip active : true
Thrift active : false
Native Transport active: true
Load : 53.31 GiB
Generation No : 1549564460
Uptime (seconds) : 734
Heap Memory (MB) : 828.45 / 31744.00
Off Heap Memory (MB) : 277.25
Data Center : us-east
Rack : 1a
Exceptions : 0
Key Cache : entries 8491, size 1.12 MiB, capacity 100 MiB, 35299 hits, 44315 requests, 0.797 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds
Counter Cache : entries 5414, size 1.22 MiB, capacity 50 MiB, 5387 hits, 10801 requests, 0.499 recent hit rate, 7200 save period in seconds
Chunk Cache : entries 6164, size 249.5 MiB, capacity 480 MiB, 34840 misses, 177139 requests, 0.803 recent hit rate, 121.979 microseconds miss latency
Percent Repaired : 0.0%
Token : (invoke with -T/--tokens to see all 8 tokens)
Heap memory used and allocated maps to what I see on jconsole. But for non-heap memory, on jconsole it shows 188mb whereas from info command it shows 277mb, why is there a mismatch?
Non-Heap Memory in JConsole and Off Heap Memory shown by nodetool are completely different things.
Non-Heap Memory in JConsole is the sum of JVM non-heap memory pools. JVM exports this information through MemoryPoolMXBean. As of JDK 8, these pools include:
Metaspace
Compressed Class Space
Code Cache
So, Non-Heap pools show how much memory JVM uses for class metadata and compiled code.
Nodetool gets Off Heap Memory stats from Cassandra's Column Family Metrics. This is the the total size of Bloom filters, Index Summaries and Compression Metadata for all open tables.
See nodetool tablestats for detailed breakdown of this statistics.
Related
Due to temp (hopefully) financial problems I have to use old laptop. It's FSB (Front Side Bridge) clock is 333MHz (https://www.techsiting.com/mt-s-vs-mhz/). It has 2 SO-DIMM slots for DDR2 SDRAM. It had only 1 DIMM 2 Gb previously and it was a nightmare.
Each slot can handle maximum 2Gb so maximum amount of memory is 4Gb. Knowing that supported DDR stands for double data ratio, I've bought for funny money (10 euro) 2 DDR2 DIMM SO-DIMM 800MHz hoping to get (assuming memory divider is 1:2 - it's a double data ratio, isn't it?) 2x333MHz->apply divider=667MT/s (no idea how they have avoided 666). As I have Core2Duo I even had a very little hope to get 4x333MHz=1333MT/s.
But it seems that my memory divider is 1:1, so I get either
2x333MHzxDivider=333MT/s
4x333MHzxDivider=?
And utilities like lshw and dmidecode seem to confirm that:
~ >>> sudo lshw -C memory | grep clock
clock: 333MHz (3.0ns) # notice 333MHz here
clock: 333MHz (3.0ns) # notice 333MHz here
~ >>> sudo dmidecode --type memory | grep Speed
Supported Speeds:
Current Speed: Unknown
Current Speed: Unknown
Speed: 333 MT/s # notice 333MT/s here
Speed: 333 MT/s # notice 333MT/s here
~ >>>
So my 333MHz on FSB has been multiplied by 1 (one) and I've got 333MT/s (if I understood correct). I'm still satisfied: OS does not swap that much, boot process is faster, programs starts faster, browser does not hang every hour and I can open much more tabs). I just want to know, since I have Core2Duo what **MT/s8*8 do I have from these two? Or maybe it is even more comlicated?
2x333MHzxDivider=333MT/s
4x333MHzxDivider=667MT/s # 4 because of Duo
and is there any difference for 2 processors system with just 4Gb of RAM with MT\s == MHz?
PS BIOS is old (although latest) and I cannot see real FSB clock there, nor change it nor change the memory divider.
Looks like there's no point in checking I/O bus clock using some Linux command/tool because it is just always half of memory clock.
if what is written in electronics.stackexchange.com/a/424928:
I/O bus clock is always half of bus data rate.
my old machine has these parameters:
It is DDR2-333 (not standardized by JEDEC since they start from DDR-400)
It has memory MHz = 333
It has memory MT/s = 333
It has I/O bus MHz = 166.5 # just because
The thing I still don't get is that I have Core2Duo, so is my memory MT/s = 333 or 666.
What is meant by Active Disk Partition Usage: 100.00% when checking message-spool config? if this a concern how would I go about fixing it?
I have zero message spooled into Solace, but getting the above status, I also had issues provisioning new queues.
Is there a way to clear all system logs, i.e. command, event, debug and system? I understand that there is an archive policy for that, but I would like to have a clean state for my logs.
A full trace of message-spool is:
Config Status: Enabled (Primary)
Maximum Spool Usage: 1500 MB
Using Internal Disk: Yes
Operational Status: AD-Active
Datapath Status: Up
Synchronization Status: Synced
Spool-Sync Status: Synced
Last Failure Reason: N/A
Last Failure Time: N/A
Max Message Count: 240M
Message Count Utilization: 0.00%
Transaction Resource Utilization: 0.00%
Delivered Unacked Msgs Utilization: 0.00%
Spool Files Utilization: 0.00%
Active Disk Partition Usage: 100.00%
Mate Disk Partition Usage: -%
Next Message Id: 222789873
Defragmentation Status: Idle
Number of delete in-progress: 0
Current Persistent Store Usage (MB) 0.0000 0.0000 0.0000
Number of Messages Currently Spooled 0 0 0
I am using System Software. SolOS-TR Version 7.2.2.34.
When configuring a software message broker disk space needs to be allocated to the device for spooling of messages. The Active Disk Partition Usage statistic refers to the amount of this allocated space that has currently been utilized.
If disk space is not partitioned for the spool, the disk space will be shared with the entire system. In this event, you would observe a high active disk partition with no messages spooled. To resolve the “Active Disk Partition Usage: 100.00%” you are experiencing, spool space should be provisioned to the broker. You can read more about Storage Configuration on Solace Brokers here (https://docs.solace.com/Configuring-and-Managing/Configuring-Storage-Machine-Cloud.htm#).
Toi address your system logs question, clearing the logs is not an aspect that is supported by the router. Information stored to the system logs is timestamped and kept for troubleshooting purposes, as time progresses new information is appended to these files.
The AWS Lambda function has 2 GB of RAM and 512 MB of hard disk space. I have set accordingly the environment variables
MAGICK_TEMPORARY_PATH /tmp
MAGICK_DISK_LIMIT 512MB
MAGICK_MEMORY_LIMIT 1536MB
Which value should I choose for the environment variable MAGICK_MAP_LIMIT? From the documentation (https://www.imagemagick.org/script/resources.php#environment):
Set maximum amount of memory map in bytes to allocate for the pixel
cache. When this limit is exceeded, the image pixels are cached to
disk (see MAGICK_DISK_LIMIT).
Based on the file system limit so 512 MB?
And what value is recommended for MAGICK_AREA_LIMIT?
I'm using ubuntu server and configure the odoo project. it has 8GB of ram and available memory is arround 6GB so i need to increase the odoo default memory. So please let me know how to increase?
Have you tried playing with some of Odoo's Advanced and Multiprocessing options?
odoo.py --help
Advanced options:
--osv-memory-count-limit=OSV_MEMORY_COUNT_LIMIT
Force a limit on the maximum number of records kept in
the virtual osv_memory tables. The default is False,
which means no count-based limit.
--osv-memory-age-limit=OSV_MEMORY_AGE_LIMIT
Force a limit on the maximum age of records kept in
the virtual osv_memory tables. This is a decimal value
expressed in hours, and the default is 1 hour.
--max-cron-threads=MAX_CRON_THREADS
Maximum number of threads processing concurrently cron
jobs (default 2).
Multiprocessing options:
--workers=WORKERS Specify the number of workers, 0 disable prefork mode.
--limit-memory-soft=LIMIT_MEMORY_SOFT
Maximum allowed virtual memory per worker, when
reached the worker be reset after the current request
(default 671088640 aka 640MB).
--limit-memory-hard=LIMIT_MEMORY_HARD
Maximum allowed virtual memory per worker, when
reached, any memory allocation will fail (default
805306368 aka 768MB).
--limit-time-cpu=LIMIT_TIME_CPU
Maximum allowed CPU time per request (default 60).
--limit-time-real=LIMIT_TIME_REAL
Maximum allowed Real time per request (default 120).
--limit-request=LIMIT_REQUEST
Maximum number of request to be processed per worker
(default 8192).
Also if you are using WSGI or something similar to run Odoo, these may also need some tuning.
I have a kernel which uses a lot of registers and spills them into local memory heavily.
4688 bytes stack frame, 4688 bytes spill stores, 11068 bytes spill loads
ptxas info : Used 255 registers, 348 bytes cmem[0], 56 bytes cmem[2]
Since the spillage seems quite high I believe it gets past L1 or even L2 cache. Since the local memory is private to each thread, how are accesses to local memory coalesced by the compiler? Is this memory read in 128byte transactions like global memory? With this amount of spillage I am getting low memory bandwidth utilisation (50%). I have similar kernels without the spillage that obtain up to 80% of the peak memory bandwidth.
EDIT
I've extracted some more metrics from with the nvprof tool. If I understand well the technique mentioned here, then I have a significant amount of memory traffic due to register spilling (4 * l1 hits and misses / sum of all writes across 4 sectors of L2 = (4 * (45936 + 4278911)) / (5425005 + 5430832 + 5442361 + 5429185) = 79.6%). Could somebody verify whether I am right here?
Invocations Event Name Min Max Avg
Device "Tesla K40c (0)"
Kernel: mulgg(double const *, double*, int, int, int)
30 l2_subp0_total_read_sector_queries 5419871 5429821 5425005
30 l2_subp1_total_read_sector_queries 5426715 5435344 5430832
30 l2_subp2_total_read_sector_queries 5438339 5446012 5442361
30 l2_subp3_total_read_sector_queries 5425556 5434009 5429185
30 l2_subp0_total_write_sector_queries 2748989 2749159 2749093
30 l2_subp1_total_write_sector_queries 2748424 2748562 2748487
30 l2_subp2_total_write_sector_queries 2750131 2750287 2750205
30 l2_subp3_total_write_sector_queries 2749187 2749389 2749278
30 l1_local_load_hit 45718 46097 45936
30 l1_local_load_miss 4278748 4279071 4278911
30 l1_local_store_hit 0 1 0
30 l1_local_store_miss 1830664 1830664 1830664
EDIT
I've realised that it is 128-byte and not bit transactions I was thinking of.
According to
Local Memory and Register Spilling
the impact of register spills on performance entails more than just coalescing decided at compile time; more important: read/write from/to L2 cache is already quite expensive and you want to avoid it.
The presentation suggests that using a profiler you can count at run time the number of L2 queries due to local memory (LMEM) access, see whether they have a major impact on the total number of all L2 queries, then optimize the shared to L1 ratio in favour of the latter, through a single host call for example
cudaDeviceSetCacheConfig( cudaFuncCachePreferL1 );
Hope this helps.