CVXPY takes compilation time for days for MNIST data - cvxpy

Trying to simulate SOCP optimization problem using multiple class SVM for MNIST data. Could see that with solver set to ECOS, cvxpy compilation takes days to complete it.
CVXPY
v1.2.0
(CVXPY) Apr 20 03:59:48 PM: Your problem has 25680 variables, 45000 constraints, and 0 parameters.
(CVXPY) Apr 20 03:59:48 PM: It is compliant with the following grammars: DCP, DQCP
(CVXPY) Apr 20 03:59:48 PM: (If you need to solve this problem
multiple times, but with different data, consider using parameters.)
(CVXPY) Apr 20 03:59:48 PM: CVXPY will first compile your problem; then, it will invoke a numerical solver to obtain a solution.
Compilation
(CVXPY) Apr 20 04:00:00 PM: Compiling problem (target solver=ECOS).
(CVXPY) Apr 20 04:00:00 PM: Reduction chain: Dcp2Cone -> CvxAttr2Constr -> ConeMatrixStuffing -> ECOS
(CVXPY) Apr 20 04:00:00 PM: Applying reduction Dcp2Cone
(CVXPY) Apr 20 04:00:14 PM: Applying reduction CvxAttr2Constr
(CVXPY) Apr 20 04:00:15 PM: Applying reduction ConeMatrixStuffing
This is a sample text, but Reduction of ConeMatrixStuffing is taking huge time in specific. Any thing we can do here to reduce the time?
Will CVXPY support big Data or atleast MNIST data set?

Related

Ideas for model selection for predicting sales at locations based on time component and class column

I am trying to build a model for sales prediction out of three different storages based on previous sales. However there is an extra (and very important) component to this which is a column with the values of A and B. These letters indicate a price category, where A siginifies a comparetively cheaper price compared to similar products. Here is a mock example of the table
week
Letter
Storage1 sales
Storage2 sales
Storage3 sales
1
A
50
28
34
2
A
47
29
19
3
B
13
11
19
4
B
14
19
8
5
B
21
13
3
6
A
39
25
23
I have previously worked with both types of prediction problems seperately, namely time series analysis and regression problems, using classical methods and using machine learning but I have not built a model which can take both predicition types into account.
I am writing this to hear any suggestions as how to tackle such a prediction problem. I am thinking of converting the three storage sale columns into one, in order to have one feature column, and having three one-hot encoder columns to indicate the storage. However I am not sure how to tackle this problem with a machine learning approach and would like to hear if anyone knows where to start with such a prediction problem.

If I get 15 mins interval data for predicting a hourly target, should I use 15 mins data or aggregate to 1 hr data for training?

I got the following datasets, and the data is in 15 mins interval:
Time A B A+B
2021-01-01 00:00 10 20 30
2021-01-01 00:15 20 30 50
2021-01-01 00:30 30 40 70
2021-01-01 01:00 40 50 90
2021-01-01 01:00 10 20 30
2021-01-01 01:15 20 30 50
2021-01-01 01:30 30 40 70
2021-01-01 02:00 40 50 90
Basically I need to develop a machine learning model for predicting the hourly A+B
Time A+B
2021-01-02 00:00
2021-01-02 01:00
2021-01-02 02:00
2021-01-02 03:00
I want to ask when selecting my target label for my training model
Should I use 15 mins data for training and add the result afterward for hourly A+B or should I aggregate the 15 mins data into hourly data for training? What is the difference?
Is there any difference if I try to train A and B separately and add them up comparing with training A+B directly?
Thanks a lot
Here is a possible solution. Since you care about the hour total and you get data every 15 mins I would give the 15 minute interval data for a whole hour as the input to the network. Then the output would be the final value at the end of that hour.
so for example the input the net would be shape [4,2] this would be the A and B values. The output will be the final result after the hour.
On another note this doesn't sound like a problem that needs machine learning but I'm sure there is more info i dont know about
I would first split the data into a training and validation set as-is.
Then take a third option of using a sliding window of 1 hour over the samples in each set to produce data at one hour intervals. This will create 3x more valid training samples than simple aggregating.
Whether to build a model of A, B or A+B depends on what you want to predict. Do you need predictions for A and B separately? or do you only need A+B? If you only want A+B, then build the model around that. Any basic ML model will be able to handle the summation, so it will likely not make significant difference. As with most data-driven problems, it will depend on the data, so if you really want to find out if there is a difference for your data, you may want to try both, and compare results on a hold-out set.

F1 score for a Random Forest model

I have built a Random Forest model (H2O library) and then checked its accuracy on some test data. I would like to use the F1 score as a measure of the success of the model. However, I cannot find in the documentation a way to retrieve it.
I know that it is possible as this appears here
performance = best_nn.model_performance(test_data = test)
F1 = performance.F1()
However, in my case, for some reason, performance does not have F1 as a method.
What is wrong, and how is it possible to retreive it?
Environment:
H2O cluster uptime: 7 mins 29 secs
H2O cluster timezone: Asia/Jerusalem
H2O data parsing timezone: UTC
H2O cluster version: 3.22.0.2
H2O cluster version age: 10 days
H2O cluster name: H2O_from_python_user_24aghd
H2O cluster total nodes: 1
H2O cluster free memory: 894 Mb
H2O cluster total cores: 4
H2O cluster allowed cores: 4
H2O cluster status: locked, healthy
H2O connection url: http://localhost:54321
H2O connection proxy: None
H2O internal security: False
H2O API Extensions: Algos, AutoML, Core V3, Core V4
Python version: 2.7.15 final
It seems that I have found the reason, and it is rather a simple one:
F1 is appropriate only for models which have two possible classes as the response variable. Mine had more.
So, H2O did not suggest the metric.

Why is my GPU slower than CPU when training LSTM/RNN models?

My machine has the following spec:
CPU: Xeon E5-1620 v4
GPU: Titan X (Pascal)
Ubuntu 16.04
Nvidia driver 375.26
CUDA tookit 8.0
cuDNN 5.1
I've benchmarked on the following Keras examples with Tensorflow as the backed reference:
SCRIPT NAME GPU CPU
stated_lstm.py 5sec 5sec
babi_rnn.py 10sec 12sec
imdb_bidirectional_lstm.py 240sec 116sec
imbd_lstm.py 113sec 106sec
My gpu is clearly out performing my cpu in non-lstm models.
SCRIPT NAME GPU CPU
cifar10_cnn.py 12sec 123sec
imdb_cnn.py 5sec 119sec
mnist_cnn.py 3sec 47sec
Has anyone else experienced this?
If you use Keras, use CuDNNLSTM in place of LSTM or CuDNNGRU in place of GRU. In my case (2 Tesla M60), I am seeing 10x boost of performance. By the way I am using batch size 128 as suggested by #Alexey Golyshev.
Too small batch size. Try to increase.
Results for my GTX1050Ti:
imdb_bidirectional_lstm.py
batch_size time
32 (default) 252
64 131
96 87
128 66
imdb_lstm.py
batch_size time
32 (default) 108
64 50
96 34
128 25
It's just a tip.
Using GPU is powerful when
1. your neural network model is big.
2. batch size is big.
It's what I found from googling.
I have got similar issues here:
Test 1
CPU: Intel(R) Xeon(R) CPU E5-2697 v3 # 2.60GHz
Ubuntu 14.04
imdb_bidirectional_lstm.py: 155s
Test 2
GPU: GTX 860m
Nvidia Driver: 369.30
CUDA Toolkit: v8.0
cuDNN: v6.0
imdb_bidirectional_lstm.py:450s
Analyse
When I observe the GPU load curve, I found one interesting thing:
for lstm, GPU load jumps quickly between ~80% and ~10%
GPU load
This is mainly due to the sequential computation in LSTM layer. Remember that LSTM requires sequential input to calculate hidden layer weights iteratively, in other words, you must wait for hidden state at time t-1 to calculate hidden state at time t.
That's not a good idea for GPU cores, since they are many small cores who like doing computations in parallel, sequential compuatation can't fully utilize their computing powers. That's why we are seeing GPU load around 10% - 20% most of the time.
But in the phase of backpropagation, GPU could run derivative computation in parallel, so we can see GPU load peak around 80%.

inconsistency between libsvm and scikit-learn.svc results

I have a project that is based on SVM algorithm implemented by libsvm. Recently I decided to try several other classification algorithm, this is where scikit-learn comes to the picture.
The connection to the scikit was pretty straightforward, it supports libsvm format by load_svmlight_file routine. Ans it's svm implementation is based on the same libsvm.
When everything was done, I decided to the check the consistence of the results by directly running libsvm and via scikit-learn, and the results were different. Among 18 measures in learning curves, 7 were different, and the difference is located at the small steps of the learning curve. The libsvm results seems much more stable, but scikit-learn results have some drastic fluctuation.
The classifiers have exactly the same parameters of course.
I tried to check the version of libsvm in scikit-learn implementation, but I din't find it, the only thing I found was libsvm.so file.
Currently I am using libsvm 3.21 version, and scikit-learn 0.17.1 version.
I wound appreciate any help in addressing this issue.
size libsvm scikit-learn
1 0.1336239435355727 0.1336239435355727
2 0.08699516468193455 0.08699516468193455
3 0.32928301642777424 0.2117238289550198 #different
4 0.2835688734876902 0.2835688734876902
5 0.27846766962743097 0.26651875338163966 #different
6 0.2853854654662907 0.18898048915599963 #different
7 0.28196058132165136 0.28196058132165136
8 0.31473956032575623 0.1958710201604552 #different
9 0.33588303670653136 0.2101641630182972 #different
10 0.4075242509025311 0.2997807499800962 #different
15 0.4391771087975972 0.4391771087975972
20 0.3837789445609818 0.2713167833345173 #different
25 0.4252154334940311 0.4252154334940311
30 0.4256407777477492 0.4256407777477492
35 0.45314944605858387 0.45314944605858387
40 0.4278633233755064 0.4278633233755064
45 0.46174762022239796 0.46174762022239796
50 0.45370452524846866 0.45370452524846866

Resources