how to set a limit number of thread in cvxpy - cvxpy

I'm using CVXPY on a shared computer, and I have to set a limited number of thread.
prob = cvx.Problem(objective, constraints)
prob.solve(solver=cvx.CVXOPT)
is there any option to limit the number of threads for CVXPY solver?
thanks!

update a solution to the problem. both the CVXPY and NUMPY might create threads according to available cores by default. it's optional to limit OMP_NUM_THREADS before import NUMPY and CVXPY.
import os
os.environ["OMP_NUM_THREADS"] = "1"

Related

DBSCAN running out of memory and getting killed

I am passing data normalized using MinMaxScaler to DBSCAN's fit_predict. My data is very small (12 MB, around 180,000 rows and 9 columns). However while running this, the memory usage quickly climbs up and the kernel gets killed (I presume by OOM killer). I even tried it on a server with 256 GB RAM and it fails fairly quickly.
Here is my repro code:
import pandas as pd
X_ml = pd.read_csv('Xml.csv')
from sklearn.cluster import DBSCAN
dbscan = DBSCAN(eps=0.28, min_samples=9)
dbscan_pred = dbscan.fit_predict(X_ml)
and here is my Xml.csv data file.
Any ideas how to get it working?

Don't estimate total runtime with tqdm

I'm using tqdm to generate the progress bar for a loop where iterations take an increasing amount of time with increasing value of the iterator. The iterations per second and estimated completion metrics are thus not particularly meaningful, as previous iterations cannot (easily) be used to predict the runtime of future iterations.
Is there an easy way to disable displaying the estimation of iterations per second and total runtime with tqdm?
Relevant example code:
from tqdm import tqdm
import time
for t in tqdm(range(10)):
time.sleep(t)
tqdm's README describes the bar_format argument as follows:
Specify a custom bar string formatting. May impact performance.
[default: '{l_bar}{bar}{r_bar}'], where l_bar='{desc}: {percentage:3.0f}%|' and r_bar='| {n_fmt}/{total_fmt} [{elapsed}<{remaining}, {rate_fmt}{postfix}]'...
Since the part you don't care about is mostly in "{r_bar}", you can just tweak that part of the default value as follows to omit [{elapsed}<{remaining}, {rate_fmt}:
from time import sleep
from tqdm import tqdm
for time in tqdm(range(10),
bar_format = "{l_bar}|{bar}| {n_fmt}/{total_fmt}{postfix}"):
sleep(time)

dask dataframe: merge two dataframes, impute missing value and write to csv only use partial CPUs (20% in each CPU)

I want to merge two dask dataframes, impute missing values with column median and export the merged dataframe to csv files.
I got one problem: my current code cannot utilize all the 8 CPUs (~20% of each CPU)
I am not sure which part limits the CPU usage. Here is the repeatable code
import numpy as np
import pandas as pd
df1 = pd.DataFrame(
np.c_[(np.random.randint(100, size=(10000, 1)), np.random.randn(10000, 3))],
columns=['id', 'a', 'b', 'c'])
df2 = pd.DataFrame(
np.c_[(np.array(range(100)), np.random.randn(100, 10000))],
columns=['id'] + ['d_' + str(i) for i in range(10000)])
df1.id=df1.id.astype(int).astype(object)
df2.id=df2.id.astype(int).astype(object)
## some cells are missing in df2
df2.iloc[:, 1:] = df2.iloc[:,1:].mask(np.random.random(df2.iloc[:, 1:].shape) < .05)
## dask codes starts here
import dask.dataframe as dd
from dask.distributed import Client
ddf1 = dd.from_pandas(df1, npartitions=3)
ddf2 = dd.from_pandas(df2, npartitions=3)
ddf = ddf1.merge(ddf2, how='left', on='id')
ddf = ddf.fillna(ddf.quantile())
ddf.to_csv('train_*.csv', index=None, header=None)
Although all the 8 CPUs are invoked to use, only ~20% of each CPU is utilized. Can I code to improve the CPU usage?
Firstly, not that if you don't specify otherwise, Dask will use threads for execution. In threads, only one python operation can occur at a time (the "GIL"), except some lower-level code which explicitly releases the lock. The "merge" operation involves a lot of shuffling of data in memory, and I suspect releases the lock some of the time.
Secondly, all of the output is being written to the filesystem, so you will always have a bottleneck here: however fast other processing may be, you still need to feed all of it through the storage bus.
If the CPUs are working ~20%, I daresay this is still faster than a single-core version? Put simply, some workloads just parallelise better than others.

KNN classifier taking too much time even on gpu

I am classifying the MNSIT digit using KNN on kaggle but at last step it is taking to much time to execute and mnsit data is juts 15 mb like i am still waiting can you point any problem that is in my code thanks.
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
print(os.listdir("../input"))
#Loading datset
train=pd.read_csv('../input/mnist_test.csv')
test=pd.read_csv('../input/mnist_train.csv')
X_train=train.drop('label',axis=1)
y_train=train['label']
X_test=test.drop('label',axis=1)
y_test=test['label']
from sklearn.neighbors import KNeighborsClassifier
clf=KNeighborsClassifier(n_neighbors=3)
clf.fit(X_train,y_train)
accuracy=clf.score(X_test,y_test)
accuracy
There isn't anything wrong with your code per se. KNN is just a slow algorithm, it's slower for you because computing distances between images is hard at scale, and it's slower for you because the problem is large enough that your cache can't really be used effectively.
Without using a different library or coding your own GPU kernel, you can probably get a speed boost by replacing
clf=KNeighborsClassifier(n_neighbors=3)
with
clf=KNeighborsClassifier(n_neighbors=3, n_jobs=-1)
to at least use all of your cores.
because you are not using gpu on kaggle actually. KNeighborsClassifier do not support gpu
In order to use the GPU for KNN, you need to specify it otherwise it defaults to CPU the documentation is here: https://simbsig.readthedocs.io/en/latest/KNeighborsClassifier.html
knn = KNeighborsClassifier(n_neighbors=3, device = 'gpu')

Use already done computation wisely

If I've got a dask dataframe df. Now I apply some computation on it.
Mathematically,
df1 = f1(df)
df2 = f2(df1)
df3 = f3(df1)
Now if I run, df2.compute(), now after that if I run df1.compute(). How can I stop dask from recomputing the result of df1?
Taking the other case, if I run df3.compute(), then df2.compute(). How can I tell dask to use the already computed value of df1 (which is computed in df3.compute()) in running df2.compute()?
You can use dask.persist to create a dask dataframe with the subgraph computed, or computing.
If you are using the local scheduler then you should take a look at dask.cache.Cache
from dask.cache import Cache
cache = Cache(4e9).register()

Resources