For a ML library, which would be the arguments in favour to use a new specific function name for the inverse prediction for a model (e.g. a zero-mean/unit variance scaler), something like inversePrediction(mod,Xnew), and which arguments would be in favour to just use a keyword argument to the already employed predict function, something like predict(mod,Xnew;inv=true) ?
Some context:
I use only predict(mod) (and eventually predict(mod,Xnew) for models that generalise to new data) for unsupervised models and so-called transformers without distinguishing between them
I use camel case
MLJ and scikit-learn use inverse_transform
I care more to user-friendly than performance
Related
We all hear GPT-3 being called a large language model (LLM), but is it really more of a framework since you can use GPT-3 with your own dataset, to train your own version of a GPT-3 model?
My understanding is that a model is the result of training, and you can use one of many frameworks/libraries to train the model (ex: tensor flow). If GPT-3 was just a model, you wouldn't be able to train with your own data on it, right? So that makes GPT-3 a framework?
Can anyone help me to better understand the AI terminology for this?
The terminology used is model.
A model in LLM is defined as a mathematical representation of language which is used to make predictions based on probabilities. Basically GPT was trained by turning works (tokens) into mathematical representations. In most cases each work is represented by a 1500 feature array (known in machine learning as a vector).
In the case of GPT-3, the latest model 'davinici-003' uses probability to make predictions on the response it gives based on the training it was provided.
With GPT-3 you can fine-tune the model to perform actions it hasn't been trained on before. It is still referred to as a model even though you can fine-tune it.
I'm trying to create an ensemble of an determined regressor, with this in mind i've searched for some way to use the sklearn already existing ensemble methods, and try to change the base estimator of the ensemble. the bagging documentation is clear because it says that you can change the base estimator by passing your regressor as parameter to "base_estimator", but with GradientBoosting you can pass a regressor in the "init" parameter.
My question is: passing my regressor in the init parameter of the GradientBoosting, will make it use the regressor i've specified as base estimator instead of trees? the documentation says that the init value must be "An estimator object that is used to compute the initial predictions", so i dont know if the estimator i'll pass in init will be the one used in fact as the weak learner to be enhanced by the bosting method, or it will just be used at the beginning and after that all the work is done by decision trees.
No.
GradientBoostingRegressor can only use regressor trees as base estimators; from the docs (emphasis mine):
In each stage a regression tree is fit
And as pointed out in a relevant Github thread (HT to Ben Reiniger for pointing this out in the comment below):
the implementation is entirely tied to the assumption that the base estimators are trees
In order to boost arbitraty base regressors (similar to bagging), you need AdaBoostRegressor, which, similarly again with bagging, takes also a base_estimator argument. But before doing so, you may want to have a look at own answer in Execution time of AdaBoost with SVM base classifier; quoting:
Adaboost (and similar ensemble methods) were conceived using decision trees as base classifiers (more specifically, decision stumps, i.e. DTs with a depth of only 1); there is good reason why still today, if you don't specify explicitly the base_classifier argument, it assumes a value of DecisionTreeClassifier(max_depth=1). DTs are suitable for such ensembling because they are essentially unstable classifiers, which is not the case with SVMs, hence the latter are not expected to offer much when used as base classifiers.
What is the difference between Difference between als.train(, als.fit() , als.traimImplicit()
First of all we should know difference between implicit and explicit feedback.
explicit preference (also referred as "explicit feedback"), such as "rating" given to item by users.
implicit preference (also referred as "implicit feedback"), such as "view" and "buy" history.
For better understanding you can look at below two links:
Why does ALS.trainImplicit give better predictions for explicit ratings?
https://stats.stackexchange.com/questions/133565/how-to-set-preferences-for-als-implicit-feedback-in-collaborative-filtering
train and trainimplicit are used in mllib package which is used for rdd data. With spark dataframe, spark has new module with name ml. In ml package it uses spark dataframe for calculating ratings and the method name is fit. fit method from ml use matrix factorization. for more detail check doc for ALS(ml) class.
https://github.com/apache/spark/blob/926e3a1efe9e142804fcbf52146b22700640ae1b/python/pyspark/ml/recommendation.py
Also,ml module is faster than mllib.
What's the difference between Spark ML and MLLIB packages
Is the classic Q-learning algorithm, using lookup table (instead of function approximation), equivalent to dynamic programming?
From Sutton & Barto's book (Reinforcement Learning: An Introduction, chapter 4)
The term dynamic programming (DP) refers to a collection of algorithms
that can be used to compute optimal policies given a perfect model of
the environment as a Markov decision process (MDP). Classical DP
algorithms are of limited utility in reinforcement learning both
because of their assumption of a perfect model and because of their
great computational expense, but they are still important
theoretically.
So, although both share the same working principles (either using tabular Reinforcement Learning/Dynamic Programming or approximated RL/DP), the key difference between classic DP and classic RL is that the first assume the model is known. This basically means knowing the transition probabilities (which indicates the probability of change from state s to state s' given action a) and the expected immediate reward function.
On the contrary, RL methods only require to have access to a set of samples, either collected online or offline (depending on the algorithm).
Of course, there are hybrid methods that can be place between RL and DP, for example those that learn a model from the samples, and then use that model in the learning process.
NOTE: The term Dynamic Programming, in addition to a set of mathematical optimization techniques related with RL, is also used to refer a "general algorithmic pattern", as pointed in some comment. In both case, the fundaments are the same, but depending on the context may have a different meaning.
Dynamic Programming is an umbrella encompassing many algorithms. Q-Learning is a specific algorithm. So, no, it is not the same.
Also, if you mean Dynamic Programming as in Value Iteration or Policy Iteration, still not the same. These algorithms are "planning" methods. You have to give them a transition and a reward function and they will iteratively compute a value function and an optimal policy.
Q-Learning is a model-free reinforcement learning method. This one is "model-free", not because it doesn't use a machine learning model or anything like that, but because they don't require, and don't use a model of the environment, also known as MDP, to obtain an optimal policy. You also have "model-based" methods. These, unlike Dynamic Programming methods, are based on learning a model, not simply using one. And, unlike model-free methods, don't throw away samples after estimating values, they instead try to reconstruct the transition and reward function to get better performance.
Model-based methods combine model-free and planning algorithms to get same good results with less amount of samples than required by model-free methods (Q-Learning), and without needing a model like Dynamic Programming methods (Value/Policy Iteration).
If you use Q-learning in an offline setup, like AlphaGo, for example, then it is equivalent to dynamic programming. The difference is that it can also be used in an online setup.
What's the best way to use nominal value as opposed to real or boolean ones for being included in a subset of feature vector for machine learning?
Should I map each nominal value to real value?
For example, if I want to make my program to learn a predictive model for an web servie users whose input features may include
{ gender(boolean), age(real), job(nominal) }
where dependent variable may be the number of web-site login.
The variable job may be one of
{ PROGRAMMER, ARTIST, CIVIL SERVANT... }.
Should I map PROGRAMMER to 0, ARTIST to 1 and etc.?
Do a one-hot encoding, if anything.
If your data has categorial attributes, it is recommended to use an algorithm that can deal with such data well without the hack of encoding, e.g decision trees and random forests.
If you read the book called "Machine Learning with Spark", the author
wrote,
Categorical features
Categorical features cannot be used as input in their raw form, as they are not
numbers; instead, they are members of a set of possible values that the variable can take. In the example mentioned earlier, user occupation is a categorical variable that can take the value of student, programmer, and so on.
:
To transform categorical variables into a numerical representation, we can use a
common approach known as 1-of-k encoding. An approach such as 1-of-k encoding
is required to represent nominal variables in a way that makes sense for machine
learning tasks. Ordinal variables might be used in their raw form but are often
encoded in the same way as nominal variables.
:
I had exactly the same thought.
I think that if there is a meaningful(well-designed) transformation function that maps categorical(nominal) to real values, I may also use learning algorithms that only takes numerical vectors.
Actually I've done some projects where I had to do that way and
there was no issue raised concerning the performance of learning system.
To someone who took a vote against my question,
please cancel your evaluation.