Very large networks in tensorflow - memory

I am trying to train a neural network in tensorflow, but my array of weights is sufficiently large that I am running into the 2GB GraphDef limit. What is my best recourse in this situation?
Note: I am not really using the full functionality of tensorflow (e.g. my network has no optimizers). Rather, I am just using tensorflow as a way to perform some basic array ops on the GPU.

You are probably accidentally initialising a tf.Variable with a large constant. See https://github.com/tensorflow/tensorflow/issues/2382
Workaround from the github issue:
init_val = np.array(...) # Construct a large numpy array.
init_placeholder = tf.placeholder(tf.float32, shape=init_val.shape)
v = tf.Variable(init_placeholder)
# ...
sess.run(v.initializer, feed_dict={init_placeholder: init_val})

Related

How should I train a SVM using julia?

Does anyone have experience of training a support vector machine (SVM) in Julia (1.4.1) ?
I tried the LIBSVM interface, but the example on the gituhub page gave an error :
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = convert(Vector, iris[:Species])
# First dimension of input data is features; second is instances
instances = convert(Array, iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);```
ERROR: MethodError: no method matching LIBSVM.SupportVectors(::Int32, ::Array{Int32,1}, ::CategoricalArray{String,1,UInt8,String,CategoricalValue{String,UInt8},Union{}}, ::Array{Float64,2}, ::Array{Int32,1}, ::Array{LIBSVM.SVMNode,1})
Closest candidates are:
LIBSVM.SupportVectors(::Int32, ::Array{Int32,1}, ::Array{T,1}, ::AbstractArray{U,2}, ::Array{Int32,1}, ::Array{LIBSVM.SVMNode,1}) where {T, U} at /home/benny/.julia/packages/LIBSVM/5Z99T/src/LIBSVM.jl:18
LIBSVM.SupportVectors(::LIBSVM.SVMModel, ::Any, ::Any) at /home/benny/.julia/packages/LIBSVM/5Z99T/src/LIBSVM.jl:27
It looks like LIBSVM.jl documentation is rather outdated and package was not updated appropriately, so it worth an issue (or at least pull request to update README).
Error that you see is not related to the package itself, but the fact that in current versions of DataFrames.jl and RDatasets.jl labels column is no longer Vector (as it was at the time when LIBSVM.jl was developed) but CategoricalArray. You can avoid this problem by converting CategoricalArray to usual Vector{String}. Complete example looks like this
using RDatasets, LIBSVM
using StatsBase, Printf # `mean` and `printf` are no longer in Base, and should be used explicitly
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = string.(convert(Vector, iris[:Species]))
# First dimension of input data is features; second is instances
instances = convert(Array, iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);
# Test model on the other half of the data.
(predicted_labels, decision_values) = svmpredict(model, instances[:, 2:2:end]);
# Compute accuracy
#printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[2:2:end]))*100
Alternatively, you can use MLJ.jl or ScikitLearn.jl
which should correctly wrap LIBSVM.jl on their own.
Oskin's answer is for an older version.
In the current version, it should be modified as,
using RDatasets, LIBSVM
using StatsBase, Printf # `mean` and `printf` are no longer in Base, and should be used explicitly
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = string.(convert(Vector, iris[:,:Species]))
# First dimension of input data is features; second is instances
instances = Matrix(iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);
# Test model on the other half of the data.
(predicted_labels, decision_values) = svmpredict(model, instances[:, 2:2:end]);
# Compute accuracy
#printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[2:2:end]))*100

Improving boosting model ,reducing Root mean square error

Hi i am solving a regression problem.My data set consists of 13 features and 550068 rows.I tried different different models and found that boosting algorithms(i.e xgboost,catboost,lightgbm) are performing well on that big data set.here is the code
import lightgbm as lgb
gbm = lgb.LGBMRegressor(objective='regression',num_leaves=100,learning_rate=0.2,n_estimators=1500)
gbm.fit(x_train, y_train,
eval_set=[(x_test, y_test)],
eval_metric='l2_root',
early_stopping_rounds=10)
y_pred = gbm.predict(x_test, num_iteration=gbm.best_iteration_)
accuracy = round(gbm.score(x_train, y_train)*100,2)
mse = mean_squared_error(y_test,y_pred)
rmse = np.sqrt(mse)
import xgboost as xgb
boost_params = {'eval_metric': 'rmse'}
xgb0 = xgb.XGBRegressor(
max_depth=8,
learning_rate=0.1,
n_estimators=1500,
objective='reg:linear',
gamma=0,
min_child_weight=1,
subsample=1,
colsample_bytree=1,
scale_pos_weight=1,
seed=27,
**boost_params)
xgb0.fit(x_train,y_train)
accuracyxgboost = round(xgb0.score(x_train, y_train)*100,2)
predict_xgboost = xgb0.predict(x_test)
msexgboost = mean_squared_error(y_test,predict_xgboost)
rmsexgboost= np.sqrt(msexgboost)
from catboost import Pool, CatBoostRegressor
train_pool = Pool(x_train, y_train)
cbm0 = CatBoostRegressor(rsm=0.8, depth=7, learning_rate=0.1,
eval_metric='RMSE')
cbm0.fit(train_pool)
test_pool = Pool(x_test)
predict_cat = cbm0.predict(test_pool)
acc_cat = round(cbm0.score(x_train, y_train)*100,2)
msecat = mean_squared_error(y_test,predict_cat)
rmsecat = np.sqrt(msecat)
By using the above models i am getting rmse values about 2850.Now i want to improve my model performance by reducing root mean square error.How can i improve my model performance? As i am new to boosting algorithms,which parameters effect the models?And how can i do hyperparameter tuning for those algorithms(xgboost,catboost,lightgbm).I am using Windows10 os and intel i5 7th genration.
Out of those 3 tools that you have tried CatBoost provides an edge in categorical feature processing (it could be also faster, but I did not see a benchmark demonstrating it, and it seems to be not dominating on kaggle, so most likely it is not as quick as LightGBM, but I might be wrong in that hypothesis). So I would use it if I have many of those in my sample. The other two (LightGBM and XGBoost) provide very similar functionality and I would suggest to choose one of them and stick top it. At the moment it seems that LightGBM outperforms XGBoost in training time on CPU providing a very comparable precision of predictions. See for example GBM-perf beachmark on github or this in-depth analysis. If you have GPU's available, than in fact XGBoost seems to be preferable, judging on the benachmark above.
In general, you can improve your model performance in several ways:
train longer (if early stopping was not triggered, that means that there is still room for generalisation; if it was, then you can not improve further by training longer the chosen model with chosen hyper-parameters)
optimise hyper-parameters (see below)
choose a different model. There is no single silver bullet for all problems. Typically GBMs work very well on large samples of structured data, but for some classes of problems (e.g. linear dependence) it is hard for a GBM to learn how to generalise, as it might require very many splits. So it might be that for your problem a linear model, an SVM or something else will do better out of the box.
Since we narrowed down to 2 options, I can not advice on catboost hyper-parameter optimisation, as I have no hands-on experience with it yet. But for lightgbm tuning you can read this official lightgbm doc and these instructions in one of the issues. There are very many good examples of hyper parameter tuning for LightGBM. I can quickly dig out my kernel on kaggle: see here. I do not claim it to be perfect but that's something what is easy for me to find :)
If you are using Intel CPU, then try Intel XGBoost. Intel has powered several optimizations for XGBoost to accelerate gradient boosting models and improve its training and inference capabilities. Also, please check out the article, https://www.intel.com/content/www/us/en/developer/articles/technical/easy-introduction-xgboost-for-intel-architecture.html#gs.q4c6p6 on how to use XGBoost with Intel optimizations.
You can use either of lasso or ridge, these methods could improve the performance.
For hyper parameter tuning, you can use loops. iterate the values and check where you getting lowest RMSE values.
You can also try stacked ensemble techniques.
If you use R, use h20.ai package, It gives good result.

How to apply CNN for multi-channel pixel data based weights to each channel?

I have an image with 8 channels.I have a conventional algorithm where weights are added to each of these channels to get an output as '0' or '1'.This works fine with several samples and complex scenarios. I would like implement the same in Machine Learning using CNN method.
I am new to ML and started looking out the tutorials which seem to be exclusively dealing with image processing problems- Hand writing recognition,Feature extraction etc.
http://cv-tricks.com/tensorflow-tutorial/training-convolutional-neural-network-for-image-classification/
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/content/neural_networks.html
I have setup the Keras with Theano as background.Basic Keras samples are working without problem.
What steps do I require to follow in order achieve the same result using CNN ? I do not comprehend the use of filters,kernels,stride in my use case.How do we provide Training data to Keras if the pixel channel values and output are in the below form?
Pixel#1 f(C1,C2...C8)=1
Pixel#2 f(C1,C2...C8)=1
Pixel#3 f(C1,C2...C8)=0 .
.
Pixel#N f(C1,C2...C8)=1
I think you should treat this the same way you use CNN to do semantic segmentation. For an example look at
https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf
You can use the same architecture has they are using but for the first layer instead of using filters for 3 channels use filters for 8 channels.
For the loss function you can use the same loos function or something that is more specific for binary loss.
There are several implementation for keras but with tensorflow
backend
https://github.com/JihongJu/keras-fcn
https://github.com/aurora95/Keras-FCN
Since the input is in the form of channel values,that too in sequence.I would suggest you to use Convolution1D. Here,you are taking each pixel's channel values as the input and you need to predict for each pixel.Try this
eg :
Conv1D(filters, kernel_size, strides=1, padding='valid')
Conv1D()
MaxPooling1D(pool_size)
......
(Add many layers as you want)
......
Dense(1)
use binary_crossentropy as the loss function.

How does one calculate the GPU memory required to run a model in TensorFlow?

Is there a straightforward way to find the GPU memory consumed by, say, an inception-resnet-v2 model that is initialized in tensorflow? This includes the inference and the backprop memories required.
You can explicitly calculate the memory needed to store parameters, but I am afraid it would be difficult to compute the size of all buffers needed for training. Probably, a more clever way would be to make TF do it for you. Set the gpu_options.allow_growth config option to True and see how much does it consume. Another option is to try smaller values for gpu_options.per_process_gpu_memory_fraction until it fails with out of memory.
Since using gpu.options.allow_growth and gpu_options.per_process_gpu_memory_fraction for model size estimation is currently a trial-and-error and tedious solution, I suggest using tf.RunMetadata() in combination with tensorboard.
Example:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
summary, _ = sess.run(train_step, feed_dict, options=run_options, run_metadata=run_metadata)
train_writer.add_run_metadata(run_metadata, 'step%d' % i)
Run your model and tensorboard, navigate to the desired part of your graph and read the node statistics.
Source: https://www.tensorflow.org/get_started/graph_viz

Sentiment Analysis classifier using Machine Learning

How can we make a working classifier for sentiment analysis since for that we need to train our classifier on huge data sets.
I have the huge data set to train, but the classifier object (here using Python), gives memory error when using 3000 words. And I need to train for more than 100K words.
What I thought was dividing the huge data set into smaller parts and make a classifier object for each and store it in a pickle file and use all of them. But it seems using all the classifier object for testing is not possible as it takes only one of the object during testing.
The solution which is coming in my mind is either to combine all the saved classifier objects stored in the pickle file (which is just not happening) or to keep appending the same object with new training set (but again, it is being overwritten and not appended).
I don't know why, but I could not find any solution for this problem even when it is the basic of machine learning. Every machine learning project needs to be trained in huge data set and the object size for training those data set will always give a memory error.
So, how to solve this problem? I am open to any solution, but would like to hear what is followed by people who do real time machine learning projects.
Code Snippet :
documents = [(list(movie_reviews.words(fileid)), category)
for category in movie_reviews.categories()
for fileid in movie_reviews.fileids(category)]
all_words = []
for w in movie_reviews.words():
all_words.append(w.lower())
all_words = nltk.FreqDist(all_words)
word_features = list(all_words.keys())[:3000]
def find_features(document):
words = set(document)
features = {}
for w in word_features:
features[w] = (w in words)
return features
featuresets = [(find_features(rev), category) for (rev, category) in documents]
numtrain = int(len(documents) * 90 / 100)
training_set = featuresets[:numtrain]
testing_set = featuresets[numtrain:]
classifier = nltk.NaiveBayesClassifier.train(training_set)
PS : I am using the NLTK toolkit using NaiveBayes. My training dataset is being opened and stored in the documents.
There are two things you seem to be missing:
Datasets for text are usually extremely sparse, and you should store them as sparse matrices. For such representation, you should be able to store milions of documents inyour memory with vocab. of 100,000.
Many modern learning methods are trained in mini-batch scenario, meaning that you never need whole dataset in memory, instead, you feed it to the model with random subsets of data - but still training a single model. This way your dataset can be arbitrary large, memory consumption is constant (fixed by minibatch size), and only training time scales with the amount of samples.

Resources