How to train deep and wide model by FTRL optimizer?

How to train deep and wide model by FTRL optimizer? - machine-learning

I have wide model, and want to incorporate deep model according to Deep and Wide Model. Should I apply the FTRL operator to every node in the deep part. And how can I quickly modify the existing wide model to integrate the deep part.

Related

Cascading two Machine Learning models

I built a machine learning (ML) model to classify real-time network traffic as an attack or normal traffic using a dataset consisting of approximately 3 million records.
Then, I built a second ML model to classify the real-time network traffic according to their application, i.e., Google, Facebook, YouTube, etc. using another dataset consisting of approximately 1.5 million records.
Now I want to cascade these two models so that if the traffic is normal, then the traffic should be classified by the second ML model. Otherwise, it should be discarded since there is no need to pass through the second model.
Can I cascade these two models even though they are built using different datasets? And if so, how can I do that?

I do the cascading logic simply in a programming language code C++ or Python, not using ML-tool features. If the data from the second model, doesn't contribute to the decision of the first model - just keep the models separated.

How to adjust feature importance in Azure AutoML

I am hoping to have some low code model using Azure AutoML, which is really just going to the AutoML tab, running a classification experiment with my dataset, after it's done, I deploy the best selected model.
The model kinda works (meaning, I publish the endpoint and then I do some manual validation, seems accurate), however, I am not confident enough, because when I am looking at the explanation, I can see something like this:
4 top features are not really closely important. The most "important" one is really not the one I prefer it to use. I am hoping it will use the Title feature more.
Is there such a thing I can adjust the importance of individual features, like ranking all features before it starts the experiment?
I would love to do more reading, but I only found this:
Increase feature importance
The only answer seems to be about how to measure if a feature is important.
Hence, does it mean, if I want to customize the experiment, such as selecting which features to "focus", I should learn how to use the "designer" part in Azure ML? Or is it something I can't do, even with the designer. I guess my confusion is, with ML being such a big topic, I am looking for a direction of learning, in this case of what I am having, so I can improve my current model.

Here is link to the document for feature customization.
Using the SDK you can specify "feauturization": 'auto' / 'off' / 'FeaturizationConfig' in your AutoMLConfig object. Learn more about enabling featurization.
Automated ML tries out different ML models that have different settings which control for overfitting. Automated ML will pick which overfitting parameter configuration is best based on the best score (e.g. accuracy) it gets from hold-out data. The kind of overfitting settings these models has includes:
Explicitly penalizing overly-complex models in the loss function that the ML model is optimizing
Limiting model complexity before training, for example by limiting the size of trees in an ensemble tree learning model (e.g. gradient boosting trees or random forest)
https://learn.microsoft.com/en-us/azure/machine-learning/concept-manage-ml-pitfalls

transfer Learning in regression for similar domain but different distr

I am currently working on a KDD project aiming to build a predictor with very small real world data.
The goal is to predict to predict the quantity Y of an instance of an Product while knowing other quantities of this instance.
There are Predictors (same Task) trained on similar (not the same) products. Those Models are valid for their use-case.
My approach is to use large datasets of other products (similar domain, similar task but different distributions) and adapt those to the target domain using transfer Learning.
At this pint I am having trouble finding methods/algorithms fitting my needs.
Looking at the decision tree 1 it should be a domain adaption problem.
What algorithm or Model is suited for this kind of usecase?

You can try the Deep Domain Adaptation Regression method, as shown in the paper "Representation Subspace Distance for Domain Adaptation Regression" published in ICML 2021. Using the labeled source domain and unlabeled target domain to learn a model performing well on target domain.

Awesome Domain Adaptation Python Toolbox (ADAPT) is an open source library providing access to numerous models and algorithms to perform transfer learning and domain adaptation: https://adapt-python.github.io/adapt/index.html.

Feature selection using correlation

I'm doing feature selection to train my Machine Learning (ML) models using correlation. I trained the each model(SVM, NN,RF) with all features and did a 10-fold cross validation to obtain mean accuracy score value.
Then I removed features which has a zero correlation coefficient (which implies there is no relationship between feature and class) and trained the each model(SVM, NN,RF) with all features and did a 10-fold cross validation to obtain mean accuracy score value.
Basically my objective is to do feature selection based on accuracy scores I get in above two scenarios. But I'm not sure whether this is a good approach for feature selection.
Also I want to do a grid search to identify best model parameters. but I'm getting confused with GridSearchCV in Scikit learn API. Since it also do a cross validation (default 3 folds) can I use best_score_ value obtained doing a grid search in above two scenarios to determine what are the good features for model training?
Please advice me on this confusion, or please suggest me with a good reference to read.
Thanks in advance

As a page 51 of this thesis says,
In other words, a feature is useful if it is correlated with or
predictive of the class; otherwise it is irrelevant.
The report goes on to say that not only should you remove the features that are not correlated with the targets, you should also watch out for features that correlate heavily with each other. Also see this.
In other words, it seems to be a good thing to look at correlation of features with the classes (targets) and remove the features that have little to no correlation.
Basically my objective is to do feature selection based on accuracy
scores I get in above two scenarios. But I'm not sure whether this is
a good approach for feature selection.
Yes, you can totally run experiments with different feature sets and look at the test accuracy to select the features that perform the best. It's really important that you only look at the test accuracy i.e. performance of the model on unseen data.
Also I want to do a grid search to identify best model parameters.
Grid search is performed for finding the best hyper-parameters. Model parameters are learned during training.
Since it also do a cross validation (default 3 folds) can I use
best_score_ value obtained doing a grid search in above two scenarios
to determine what are the good features for model training?
If the set of hyper-parameters is fixed, the best score value will be affected only by the feature set, and thus can be used to compare effectiveness of the features.

MobileNet vs SqueezeNet vs ResNet50 vs Inception v3 vs VGG16

I have recently been looking into incorporating the machine learning release for iOS developers with my app. Since this is my first time ever using anything ML related I was very lost when I started reading the different model descriptions that Apple has made available. They have the same purpose/description, the only difference being the actual file size. What is the difference between these models and how would you know which one is best fit ?

The models Apple makes available are just for simple demo purposes. Most of the time, these models are not sufficient for use in your own app.
The models on Apple's download page are trained for a very specific purpose: image classification on the ImageNet dataset. This means they can take an image and tell you what the "main" object is in the image, but only if it's one of the 1,000 categories from the ImageNet dataset.
Usually, this is not what you want to do in your own apps. If your app wants to do image classification, typically you want to train a model on your own categories (like food or cars or whatever). In that case you can take something like Inception-v3 (the original, not the Core ML version) and re-train it on your own data. That gives you a new model, which you then need to convert to Core ML again.
If your app wants to do something other than image classification, you can use these pretrained models as "feature extractors" in a larger neural network structure. But again this involves training your own model (usually from scratch) and then converting the result to Core ML.
So only in a very specific use case -- image classification using the 1,000 ImageNet categories -- are these Apple-provided models useful to your app.
If you do want to use any of these models, the difference between them is speed vs. accuracy. The smaller models are fastest but also least accurate. (In my opinion, VGG16 shouldn't be used on mobile. It's just too big and it's no more accurate than Inception or even MobileNet.)

SqueezeNets are fully convolutional and use Fire modules which have a squeeze layer of 1x1 convolutions which vastly decreases parameters as it can restrict the number of input channels each layer. This makes SqueezeNets extremely low latency, in addition to the fact they don't have dense layers.
MobileNets utilise depth-wise separable convolutions, very similar to inception towers in inception. These also reduce the number of a parameters and hence latency. MobileNets also have useful model-shrinking parameters than you can call before training to make it exact size you want. The Keras implementation can use ImageNet pre-trained weights too.
The other models are very deep, large models. The reduced number of parameters / style of convolution is not used for low latency but just for the ability to train very deep models, essentially. ResNet introduced residual connections between layers which were originally believed to be key in training very deep models. These aren't seen in the previously mentioned low latency models.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart