I'm trying to run a relatively large DASKLightGBM task on relatively small machine (32GB RAM, 8 cores), so I cap the memory usage to 20GB... The dataset is about 100M rows with 50 columns. I know it is large, but aren't we trying to do out of core ML?
client = Client(memory_limit='20GB', processes=False,
n_workers=1, threads_per_worker=7)
params = {"max_depth":4,"n_estimators":800,"client":client}
learner = lightgbm.DaskLGBMRegressor(**params)
learner.fit(dd_feature_009a013a_train[x_columns],dd_price_solely_y_train[y_column_now])
However, errors are output and process are dead:
/home/ubuntu/anaconda3/lib/python3.8/site-packages/lightgbm/dask.py:317: UserWarning: Parameter n_jobs will be ignored.
_log_warning(f"Parameter {param_alias} will be ignored.")
Finding random open ports for workers
distributed.worker - WARNING - Worker is at 85% memory usage. Pausing worker. Process memory: 15.93 GiB -- Worker memory limit: 18.63 GiB
distributed.comm.inproc - WARNING - Closing dangling queue in <InProc local=inproc://172.31.91.159/37355/1 remote=inproc://172.31.91.159/37355/9>
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 26.48 GiB -- Worker memory limit: 18.63 GiB
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 26.48 GiB -- Worker memory limit: 18.63 GiB
Without a MCVE it's difficult to answer your question precisely.
The "Memory use is high" error could be thrown for a few different reasons. I found this resource by a core Dask maintainer helpful in diagnosing the exact issue.
To summarise, consider:
Breaking your data into smaller chunks.
Manually triggering garbage collection and/or tweak the gc settings on the workers through a Worker Plugin
Trimming memory using malloc_trim (esp. if working with non-NumPy data or small NumPy chunks)
I'd also advise you to make sure you can see the Dask Dashboard while your computations are running to figure out which approach is working.
Related
I am dealing with a legacy system (Ruby 2.7.6), which suffers from a memory leak, that led the previous developers to make use of puma worker killer that overcomes the memory issue by restarting the process every 30 minutes.
As traffic increases, we now need to increase the number of instances and decrease the 30 minutes kill rate to even 20 minutes.
We would like to investigate the source of this memory leak, which apparently originates from one of our many Gem dependencies (information given by a previous developer).
The system is on AWS (Elastic Beanstalk) but can also run on docker.
Can anyone suggest a good tool and guide how to find the source for this memory leak?
Thanks
** UPDATE:
I made use of mini-profiler and I took some memory snapshot to see the influence of about 100 requests on the server, [BEFORE, DURING, AFTER]
judging by the outputs, it does not seem there is a memory leak in Ruby, but the memory usage did increase and stay up, although does not seem to be used by us...
BEFORE:
KiB Mem : 2007248 total, 628156 free, 766956 used, 612136
buff/cache KiB Swap: 2097148 total, 2049276 free, 47872 used.
1064852 avail Mem
Total allocated: 115227 bytes (1433 objects) Total retained: 21036
bytes (147 objects)
allocated memory by gem
33121 activesupport-6.0.4.7
21687 actionpack-6.0.4.7
14484 activerecord-6.0.4.7
12582 var/app
9904 ipaddr
6957 rack-2.2.4
3512 actionview-6.0.4.7
2680 mysql2-0.5.3
1813 rack-mini-profiler-3.0.0
1696 audited-5.0.2
1552 concurrent-ruby-1.1.10
DURING:
KiB Mem : 2007248 total, 65068 free, 1800424 used, 141756
buff/cache KiB Swap: 2097148 total, 2047228 free, 49920 used.
58376 avail Mem
Total allocated: 225272583 bytes (942506 objects) Total retained:
1732241 bytes (12035 objects)
allocated memory by gem
106497060 maxmind-db-1.0.0
58308032 psych
38857594 user_agent_parser-2.7.0
4949108 activesupport-6.0.4.7
3967930 other
3229962 activerecord-6.0.4.7
2154670 rack-2.2.4
1467383 actionpack-6.0.4.7
1336204 activemodel-6.0.4.7
AFTER:
KiB Mem : 2007248 total, 73760 free, 1817688 used, 115800
buff/cache KiB Swap: 2097148 total, 2032636 free, 64512 used.
54448 avail Mem
Total allocated: 109563 bytes (1398 objects) Total retained: 14988
bytes (110 objects)
allocated memory by gem
29745 activesupport-6.0.4.7
21495 actionpack-6.0.4.7
13452 activerecord-6.0.4.7
12502 var/app
9904 ipaddr
7237 rack-2.2.4
3128 actionview-6.0.4.7
2488 mysql2-0.5.3
1813 rack-mini-profiler-3.0.0
1360 audited-5.0.2
1360 concurrent-ruby-1.1.10
Where can the leak be then? is it Puma?
It seems from the statistics in the question that most objects get freed properly by the memory allocator.
However - when you have a lot of repeated allocations, the system's malloc can sometimes (and often does) hold the memory without releasing it to the system (Ruby isn't aware of this memory that is considered "free").
This is done for 2 main reasons:
Most importantly: heap fragmentation (the allocator is unable to free the memory and unable to use parts of it for future allocations).
The system's memory allocator knows it would probably need this memory again soon (that's in relation to the part of the memory that can be freed and doesn't suffer from fragmentation).
This can be solved by trying to replace the system's memory allocator with an allocator that's tuned for your specific needs (i.e., jamalloc, such as suggested here and here and asked about here).
You could also try to use gems that have a custom memory allocator when using C extensions (the iodine gem does that, but you could make other gems do it too).
This approach should help mitigate the issue, but the fact is that some of your gems appear memory hungry... I mean...:
is the maxmind-db gem using 106,497,060 bytes (106MB) of memory or did it allocate that number of objects?
and why is psych so hungry? are there any roundtrips between data and YAML that could be skipped?
there seems to be a lot of user agent strings stored concurrently... (the user_agent_parser gem)... maybe you could make a cache of these strings instead of having a lot of duplicates. For example, you could make a Set of these strings and replace each String object with the object in the Set. This way equal strings would point at the same object (preventing some object duplication and freeing up some memory).
Is it Puma?
Probably not.
Although I am the author of the iodine web server, I really love the work the Puma team did over the years and think it's a super solid server for what it offers. I really doubt the leak is from the server, but you can always switch and see what happens.
Re: the difference between the Linux report and the Ruby profiler
The difference is in the memory held by malloc - "free" memory that isn't returned to the system but Ruby doesn't know about.
Ruby profilers test the memory Ruby allocated ("live" memory, if you will). They have access to the number of objects allocated and the memory held by those objects.
The malloc library isn't part of Ruby. It's part of the C runtime library on top of which Ruby sits.
There's memory allocated for the process by malloc that isn't used by Ruby. That memory is either waiting to be used (retained by malloc for future use) or waiting to be released back to the system (or fragmented and lost for the moment).
That difference between what Ruby uses and what malloc holds should explain the difference between The Linux reporting and the Ruby profiling reporting.
Some gems might be using their own custom made memory allocator (i.e., iodine does that). These behave the same as malloc in the sense that the memory they hold will not show up in the Ruby profiler (at least not completely).
I have two tiers: MEM+SSD. The MEM layer is almost always at 90% full and sometimes the SSD tier is also full.
Now this (kind of) message is sometimes spamming my log:
2022-06-14 07:11:43,607 WARN TieredBlockStore - Target tier: BlockStoreLocation{TierAlias=MEM, DirIndex=0, MediumType=MEM} has no available space to store 67108864 bytes for session: -4254416005596851101
2022-06-14 07:11:43,607 WARN BlockTransferExecutor - Transfer-order: BlockTransferInfo{TransferType=SWAP, SrcBlockId=36401609441282, DstBlockId=36240078405636, SrcLocation=BlockStoreLocation{TierAlias=MEM, DirIndex=0, MediumType=MEM}, DstLocation=BlockStoreLocation{TierAlias=SSD, DirIndex=0, MediumType=SSD}} failed. alluxio.exception.WorkerOutOfSpaceException: Failed to find space in BlockStoreLocation{TierAlias=MEM, DirIndex=0, MediumType=MEM} to move blockId 36240078405636
2022-06-14 07:11:43,607 WARN AlignTask - Insufficient space for worker swap space, swap restore task called.
Is my setup flawed? What can I do to get rid of these warnings?
looks like alluxio worker is trying to move/swap some blocks but there is no enough space to finish the operation. I guess it might be caused by both the ssd and mem tiers are full. Have you tried this property? alluxio.worker.tieredstore.free.ahead.bytes This can help us determine whether the swap failed due to insufficient storage space.
I am working with dask on a distributed cluster, and I noticed a peak memory consumption when getting the results back to the local process.
My minimal example consists in instanciating the cluster and creating a simple array of ~1.6G with dask.array.arange.
I expected the memory consumption to be around the array size, but I observed a memory peak around 3.2G.
Is there any copy done by Dask during the computation ? Or does Jupyterlab needs to make a copy ?
import dask.array
import dask_jobqueue
import distributed
cluster_conf = {
"cores": 1,
"log_directory": "/work/scratch/chevrir/dask-workspace",
"walltime": '06:00:00',
"memory": "5GB"
}
cluster = dask_jobqueue.PBSCluster(**cluster_conf)
cluster.scale(n=1)
client = distributed.Client(cluster)
client
# 1.6 G in memory
a = dask.array.arange(2e8)
%load_ext memory_profiler
%memit a.compute()
# peak memory: 3219.02 MiB, increment: 3064.36 MiB
What happens when you do compute():
the graph of your computation is constructued (this is small) and send to the scheduler
the scheduler gets workers to produce the pieces of the array, which should be a total of about 1.6GB on the workers
the client constructs an empty array for the output you are asking for, knowing its type and size
the client receives bunches of bytes across the network or IPC from each worker which has pieces of the output. These are copied into the output of the client
the complete array is returned to you
You can see that the penultimate step here necessarily requires duplication of data. The original bytes buffers may eventually be garbage collected later.
I'm distributing the computation of some functions using Dask. My general layout looks like this:
from dask.distributed import Client, LocalCluster, as_completed
cluster = LocalCluster(processes=config.use_dask_local_processes,
n_workers=1,
threads_per_worker=1,
)
client = Client(cluster)
cluster.scale(config.dask_local_worker_instances)
work_futures = []
# For each group do work
for group in groups:
fcast_futures.append(client.submit(_work, group))
# Wait till the work is done
for done_work in as_completed(fcast_futures, with_results=False):
try:
result = done_work.result()
except Exception as error:
log.exception(error)
My issue is that for a large number of jobs I tend to hit memory limits. I see a lot of:
distributed.worker - WARNING - Memory use is high but worker has no data to store to disk. Perhaps some other process is leaking memory? Process memory: 1.15 GB -- Worker memory limit: 1.43 GB
It seems that each future isn't releasing its memory. How can I trigger that? I'm using dask==1.2.0 on Python 2.7.
Results are help by the scheduler so long as there is a future on a client pointing to it. Memory is released when (or shortly after) the last future is garbage-collected by python. In your case you are keeping all of your futures in a list throughout the computation. You could try modifying your loop:
for done_work in as_completed(fcast_futures, with_results=False):
try:
result = done_work.result()
except Exception as error:
log.exception(error)
done_work.release()
or replacing the as_completed loop with something that explicitly removes futures from the list once they have been processed.
I want to run thousands of identical single-thread simulations with different random seeds (which I pass to my program). Some of them have gone out of memory, yet I don't know why. I call run_batch_job as sbatch --array=0-999%100 --mem=200M run_batch_job, where run_batch_job contains:
#!/bin/env bash
#SBATCH --ntasks=1 # Number of cores
#SBATCH --nodes=1 # All cores on one machine
srun my_program.out $SLURM_ARRAY_TASK_ID
For a single thread, 200M should be more than enough memory, yet for some simulations, I get the error:
slurmstepd: error: Exceeded step memory limit at some point.
slurmstepd: error: Exceeded job memory limit at some point.
srun: error: cluster-cn002: task 0: Out Of Memory
slurmstepd: error: Exceeded job memory limit at some point.
Am I allocating 200M to each of the thousand threads, or am I doing something wrong?
EDIT: I've tried specifying --cpus-per-task=1 and --mem-per-cpu=200M instead of --ntasks=1, --nodes=1 and --mem=200M, with the same results.
Your submission is correct, but 200M might be low depending on the libraries you use or the files you read. Request at least 2G as virtually all clusters have at least 2GB of memory per core.