Deep Reinforcement Learning (keras-rl) Early stopping - machine-learning

According to these guys (https://nihit.github.io/resources/spaceinvaders.pdf) it is possible to perform Early Stopping with Deep Reinforcement Learning. I used that before with Deep Learning on Keras, but, how to do that on keras-rl? in the same fit() function or before sending the model to the agent?

It looks like you could just use keras's callback; if you really want it in the package, grab it from here and put it in here. Otherwise, I would try:
from keras.callbacks import EarlyStopping
early_stop = EarlyStopping(patience=69) # epochs stagnation before termination
# from their example cem_cartpole.py
cem.fit(env, nb_steps=100000, visualize=False, callbacks=[early_stop], verbose=2)

Related

How to add Reduce Learning Rate On Plateau in XGBoost?

You know in TensorFlow, we use a callback named ReduceLROnPlateau which reduce our learning rate slightly when our model stops learning. Does anyone know how to do this in XGBoost? I want to know if there is any way to reduce learning when XGBoost model stops learning.
This article claims:
A quick Google search reveals that there has been some work done utilizing a decaying learning rate, one that starts out large and shrinks at each round. But we typically see no accuracy gains in Cross Validation and when looking at test error graphs there are minimal differences between its performance and using a regular old constant.
They wrote a Python package called BetaBoost to find an optimal sequence for the learning rate scheduler.
In principle they seem to use a function which returns learning rates for the LearningRateScheduler
from scipy.stats import beta
def beta_pdf(scalar=1.5,
a=26,
b=1,
scale=80,
loc=-68,
floor=0.01,
n_boosting_rounds=100):
"""
Get the learning rate from the beta PDF
Returns
-------
lrs : list
the resulting learning rates to use.
"""
lrs = [scalar*beta.pdf(i,
a=a,
b=b,
scale=scale,
loc=loc)
+ floor for i in range(n_boosting_rounds)]
return lrs
[...]
xgb.train(
[...],
callbacks=[xgb.callback.LearningRateScheduler(beta_pdf())]
)

Does Scikit-learn support transfer learning?

Does Scikit-learn support transfer learning? Please check the following code.
model clf is gotten by fit(X,y)
Can model clf2 learn on the base of clf and transfer learn by fit(X2,y2) ?
>>> from sklearn import svm
>>> from sklearn import datasets
>>> clf = svm.SVC()
>>> X, y= ....
>>> clf.fit(X, y)
SVC()
>>> import pickle
>>> s = pickle.dumps(clf)
>>> clf2 = pickle.loads(s)
>>> clf2.fit(X2,y2)
>>> clf2.predict(X[0:1])
In the context of scikit-learn there's no transfer learning as such, there is incremental learning or continuous learning or online learning.
By looking at your code, whatever you're intending to do won't work the way you're thinking here. From this scikit-learn documentation:
Calling fit() more than once will overwrite what was learned by any
previous fit()
Which means using fit() more than once on the same model will simply overwrite all the previously fitted coefficients, weights, intercept (bias), etc.
However if you want to fit a portion of your data set and then improve your model by fitting a new data, what you can do is look for estimators that include partial_fit API implementation.
If we call partial_fit() multiple times, framework will update the
existing weights instead of re-initialising them.
Another way to do incremental learning with scikit-learn is to look for algorithms that support the warm_start parameter.
From this doc:
warm_start: bool, default=False
When set to True, reuse the solution of
the previous call to fit() as initialization, otherwise, just erase the
previous solution. Useless for liblinear solver.
Another example is Random forrest regressor.

Access Clients Loss while having keras tff NN models

I'm trying to obtain the losses of all clients in tensorflow model without luck. The answer to post how to print local outputs in tensorflow federated?
suggests to create our NN model from scratch. However, I already have my keras NN model. So is there a way to still access the local client losses without having to build NN from scratch?
I tried to use tff.federated_collect(), but not sure how is that possible.
This is partly my attempt:
trainer_Itr_Process = tff.learning.build_federated_averaging_process(model_fn_Federated,server_optimizer_fn=(lambda : tf.keras.optimizers.SGD(learning_rate=learn_rate)),client_weight_fn=None)
FLstate = trainer_Itr_Process.initialize()
#tff.learning.Model
def federated_output_computation():
return{
'num_examples': tff.federated_sum(metrics.num_examples),
'loss': tff.federated_mean(metrics.loss, metrics.num_examples),
'accuracy': tff.federated_mean(metrics.accuracy, metrics.num_examples),
'per_client/num_examples': tff.federated_collect(metrics.num_examples),
'per_client/loss': tff.federated_collect(metrics.loss),
'per_client/accuracy': tff.federated_collect(metrics.accuracy),
}
This is the error I received:
#tff.learning.Model
TypeError: object() takes no parameters
tff.learning.Model is not a decorator for functions, it is the class interface used by the tff.learning module.
Probably the best way to change the implementation of tff.learning.Model.federated_output_computation (what is recommended in how to print local outputs in tensorflow federated?) is to create your own subclass of tff.learning.Model, that implements a different federated_output_computation property. This would be close to re-implementing tff.learning.from_keras_model(), except providing the custom metric aggregation; so looking at the implementation (here) can be useful, but ingesting Keras models is non-trivial at the moment.

Classification with numerical label?

I know of a couple of classification algorithms such as decision trees, but I can't use any of them to the problem I have at hands.
I have a dataset in which each row contains information about a purchase. It's columns are:
- customer id
- store id where the purchase took place
- date and time of the event
- amount of money spent
I'm trying to make a prediction that, given the information of who, where and when, predicts how much money is going to be spent.
What are some possible ways of doing this? Are there any well-known algorithms?
Also, I'm currently learning RapidMiner, and I'm experimenting with some of its features. Everything that I've tried there doesn't allow me to have a real number (amount spent) as a label. Maybe I'm doing something wrong?
You could use a Decision Tree Regressor for this. Using a toolkit like scikit-learn, you could use the DecisionTreeRegressor algo where your features would be store id, date and time, and customer id, and your target would be the amount spent.
You could turn this into a supervised learning problem. This is untested code, but it could probably get you started
# Load libraries
import numpy as np
import pylab as pl
from sklearn import datasets
from sklearn.tree import DecisionTreeRegressor
from sklearn import cross_validation
from sklearn import metrics
from sklearn import grid_search
def fit_predict_model(data_import):
"""Find and tune the optimal model. Make a prediction on housing data."""
# Get the features and labels from your data
X, y = data_import.data, data_import.target
# Setup a Decision Tree Regressor
regressor = DecisionTreeRegressor()
parameters = {'max_depth':(4,5,6,7), 'random_state': [1]}
scoring_function = metrics.make_scorer(metrics.mean_absolute_error, greater_is_better=False)
## fit your data to it ##
reg = grid_search.GridSearchCV(estimator = regressor, param_grid = parameters, scoring=scoring_function, cv=10, refit=True)
fitted_data = reg.fit(X, y)
print "Best Parameters: "
print fitted_data.best_params_
# Use the model to predict the output of a particular sample
x = [## input a test sample in this list ##]
y = reg.predict(x)
print "Prediction: " + str(y)
fit_predict_model(##your data in here)
I took this from a project I was working on almost directly to predict housing prices so there are probably some unnecessary libraries and without doing validation you have no clue how accurate this case would be, but this should get you started.
Check out this link:
http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html
Yes, as comments have pointed out it's regression that you need. Linear regression does sound like a good starting point as you don't have a huge number of variables.
In RapidMiner type regression into the Operators menu and you'll see several options under Modelling-> Functions. Linear Regression, Polynomical, Vector, etc. (There's more, but as a beginner let's start here).
Right click any of these operators and press Show Operator Info and you'll see numerical labels are allowed.
Next scroll through the help documentation of the operator and you'll see a link to a tutorial process. It's really simple to use, but it's good to get you started with an example.
Let me know if you need any help.

SciKit Learn feature selection and cross validation using RFECV

I am still very new to machine learning and trying to figure things out myself. I am using SciKit learn and have a data set of tweets with around 20,000 features (n_features=20,000). So far I achieved a precision, recall and f1 score of around 79%. I would like to use RFECV for feature selection and improve the performance of my model. I have read the SciKit learn documentation but am still a bit confused on how to use RFECV.
This is the code I have so far:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.cross_validation import cross_val_score
from sklearn.feature_selection import RFECV
from sklearn import metrics
# cross validation
sss = StratifiedShuffleSplit(y, 5, test_size=0.2, random_state=42)
for train_index, test_index in sss:
docs_train, docs_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# feature extraction
count_vect = CountVectorizer(stop_words='English', min_df=3, max_df=0.90, ngram_range=(1,3))
X_CV = count_vect.fit_transform(docs_train)
tfidf_transformer = TfidfTransformer()
X_tfidf = tfidf_transformer.fit_transform(X_CV)
# Create the RFECV object
nb = MultinomialNB(alpha=0.5)
# The "accuracy" scoring is proportional to the number of correct classifications
rfecv = RFECV(estimator=nb, step=1, cv=2, scoring='accuracy')
rfecv.fit(X_tfidf, y_train)
X_rfecv=rfecv.transform(X_tfidf)
print("Optimal number of features : %d" % rfecv.n_features_)
# train classifier
clf = MultinomialNB(alpha=0.5).fit(X_rfecv, y_train)
# test clf on test data
X_test_CV = count_vect.transform(docs_test)
X_test_tfidf = tfidf_transformer.transform(X_test_CV)
X_test_rfecv = rfecv.transform(X_test_tfidf)
y_predicted = clf.predict(X_test_rfecv)
#print the mean accuracy on the given test data and labels
print ("Classifier score is: %s " % rfecv.score(X_test_rfecv,y_test))
Three questions:
1) Is this the correct way to use cross validation and RFECV? I am especially interested to know if I am running any risk of overfitting.
2) The accuracy of my model before and after I implemented RFECV with the above code are almost the same (around 78-79%), which puzzles me. I would expect performance to improve by using RFECV. Anything I might have missed here or could do differently to improve the performance of my model?
3) What other feature selection methods could you recommend me to try? I have tried RFE and SelectKBest so far, but they both haven't given me any improvement in terms of model accuracy.
To answer your questions:
There is a cross-validation built in the RFECV feature selection (hence the name), so you don't really need to have additional cross-validation for this single step. However since I understand you are running several tests, it's good to have an overall cross-validation to ensure you're not overfitting to a specific train-test split. I'd like to mention 2 points here:
I doubt the code behaves exactly like you think it does ;).
# cross validation
sss = StratifiedShuffleSplit(y, 5, test_size=0.2, random_state=42)
for train_index, test_index in sss:
docs_train, docs_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
# feature extraction
count_vect = CountVectorizer(stop_words='English', min_df=3, max_df=0.90, ngram_range=(1,3))
X_CV = count_vect.fit_transform(docs_train)
Here we first go through the loop, that has 5 iterations (n_iter parameter in StratifiedShuffleSplit). Then we go out of the loop and we just run all your code with the last values of train_index, test_index. So this is equivalent to a single train-test split where you probably meant to have 5. You should move your code back into the loop if you want it to run like a 'proper' cross validation.
You are worried about overfitting: indeed when 'looking for the best method' the risk exists that we're going to pick the method that works best... only on the small sample we're testing the method on.
Here the best practice is to have a first train-test split, then to perform cross-validation only using the train set. The test set can be used 'sparingly' when you think you found something, to make sure the scores you get are consistent and you're not overfitting.
It may look like you're throwing away 30% of your data (your test set), but it's absolutely worth it.
It can be puzzling to see feature selection does not have that big an impact. To introspect a bit more you could look into the evolution of the score with the number of selected features (see the example from the docs).
That being said, I don't think this is the right use case for RFE. Basically with your code you are eliminating features one by one, which probably takes a long time to run and does not make so much sense when you have 20000 features.
Other feature selection methods: here you mention SelectKBest but you don't tell us which method you use to score your features! SelectKBest will pick the K best features according to a score function. I'm guessing you were using the default which is ok, but it's better to have an idea of what the default does ;).
I would try SelectPercentile with chi2 as a score function. SelectPercentile is probably a bit more convenient than SelectKBest because if your dataset grows a percentage probably makes more sense than a hardcoded number of features.
Another example from the docs that does just that (and more).
Additional remarks:
You could use a TfidfVectorizer instead of a CountVectorizer followed by a TfidfTransformer. This is strictly equivalent.
You could use a pipeline object to pack the different steps of your classifier into a single object you can run cross validation on (I encourage you to read the docs, it's pretty useful).
from sklearn.feature_selection import chi2_sparse
from sklearn.feature_selection import SelectPercentile
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
pipeline = Pipeline(steps=[
("vectorizer", TfidfVectorizer(stop_words='English', min_df=3, max_df=0.90, ngram_range=(1,3))),
("selector", SelectPercentile(score_func=chi2, percentile=70)),
('NB', MultinomialNB(alpha=0.5))
])
Then you'd be able to run cross validation on the pipeline object to find the best combination of alpha and percentile, which is much harder to do with separate estimators.
Hope this helps, happy learning ;).

Resources