CoreML on-device model training with tabular data - ios

I'm trying to build an app that makes suggestions (distinct classes) based on a table with 4 features: latitude, longitude, time and weekday.
The training data of my app is 100% personal, so it doesn't really make sense to pre-train the model. I wanna be able to train on device. I know CoreML 3 supports updating for neural networks and kNN classifiers, but does this really help me with my tabular data?
Other tabular classifiers like boasted tree, random forest... can't be trained on device unfortunately. Are there alternatives to CoreML for on device training of those simpler machine learning algorithms? Or can CoreML somehow already do what I want.
Unfortunately I'm not really an expert in neural networks.

Just because Core ML doesn't provide something, doesn't mean it's impossible. :-) You can use existing libraries or implement the algorithm by yourself.
If you're looking to build a logistic regression classifier, this is fairly easy to implement by hand. (You can even use a neural network with a single layer for this and still use Core ML.)

Related

Phishing Website Detection using Machine Learning

I have a semester project where I have to detect phishing website using ML. I have been using support vector binary classifier which is trained on an existing dataset to predict that whether a website is legitimate or not. The problem is SVMs need high calculations to train our data and are delicate with noisy data. Therefore, there is a high chance of overfitting. Is there any other classification model which will help to optimize my model?
I have done the similar project in my Engineering days, i used NB Classifier.

classification and data mining. Difference?

I am working in the area of machine learning and pattern recognition for the last 8-10 years. I use it for image classification and recognition. Recently, I started learning about some Data Mining. How much is data mining related to classification? Or can I as a person with experience on image classification work on data mining?
Classification is one of many machine learning techniques used in data mining. But usually, you'd simply use the more precise "machine learning" category for classification.
Data mining is he explorative side - you want to understand the data. That can mean learning to predict, but mostly to understand what can be predicted (and what not) and how (which features etc.).
In many cases, classification is used in a way that I'd not include in data mining. If you just want to recognize images as cars or not (but don't care about the "why") it's probably not data mining.

When to use supervised or unsupervised learning?

Which are the fundamental criterias for using supervised or unsupervised learning?
When is one better than the other?
Is there specific cases when you can only use one of them?
Thanks
If you a have labeled dataset you can use both. If you have no labels you only can use unsupervised learning.
It´s not a question of "better". It´s a question of what you want to achieve. E.g. clustering data is usually unsupervised – you want the algorithm to tell you how your data is structured. Categorizing is supervised since you need to teach your algorithm what is what in order to make predictions on unseen data.
See 1.
On a side note: These are very broad questions. I suggest you familiarize yourself with some ML foundations.
Good podcast for example here: http://ocdevel.com/podcasts/machine-learning
Very good book / notebooks by Jake VanderPlas: http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/Index.ipynb
Depends on your needs. If you have a set of existing data including the target values that you wish to predict (labels) then you probably need supervised learning (e.g. is something true or false; or does this data represent a fish or cat or a dog? Simply put - you already have examples of right answers and you are just telling the algorithm what to predict). You also need to distinguish whether you need a classification or regression. Classification is when you need to categorize the predicted values into given classes (e.g. is it likely that this person develops a diabetes - yes or no? In other words - discrete values) and regression is when you need to predict continuous values (1,2, 4.56, 12.99, 23 etc.). There are many supervised learning algorithms to choose from (k-nearest neighbors, naive bayes, SVN, ridge..)
On contrary - use the unsupervised learning if you don't have the labels (or target values). You're simply trying to identify the clusters of data as they come. E.g. k-Means, DBScan, spectral clustering..)
So it depends and there's no exact answer but generally speaking you need to:
Collect and see you data. You need to know your data and only then decide which way you choose or what algorithm will best suite your needs.
Train your algorithm. Be sure to have a clean and good data and bear in mind that in case of unsupervised learning you can skip this step as you don't have the target values. You test your algorithm right away
Test your algorithm. Run and see how well your algorithm behaves. In case of supervised learning you can use some training data to evaluate how well is your algorithm doing.
There are many books online about machine learning and many online lectures on the topic as well.
Depends on the data set that you have.
If you have target feature in your hand then you should go for supervised learning. If you don't have then it is a unsupervised based problem.
Supervised is like teaching the model with examples. Unsupervised learning is mainly used to group similar data, it plays a major role in feature engineering.
Thank you..

How can we train and test a neural network with UNB ISCX benchmark dataset?

I have tried with KDD dataset on my neural net and now I want to extend using ISCX dataset. Some part of this dataset contains the HTTP DOS attacks labelled represents replica of real time network traffic but I couldn't figure out how can I convert them into Neural inputs(numeric) to train and test my neural net which would classify these intrusion vectors..
Appreciated for Any pointers..
I didn't work with this data set, but if you have sufficient information about features and values of each feature, you can create .arff file quickly and then use WEKA very easy.
Although you can use many applications but some user-friendly applications such as GUI of WEKA has the capability of working with discrete and non numerical features very easy. and can help you to start working with your data set as fast as possible.

Using Weka for Game Playing

I am doing a project where I have neural networks (or other algorithms) play each other in poker. After each win or loss, I want the neural network (or other algorithm) to update in response to the error of the loss (how this is calculated is unimportant here).
Weka is very nice and I don't want to reinvent the wheel. However, Weka's API seems primarily designed to train from a dataset. Game playing doesn't use a dataset. Rather, the network plays, and then I want it to update itself based on its loss.
Is it possible to use the Weka API to update a network instead of a dataset but on one instance and do this over and over again? I'm I thinking about this right?
The other idea I also want to implement is use a genetic algorithm to update the weights in a neural network, instead of the backpropogation algorithm. As far as I can tell, there is no way to manually specify the weights of a neural network in Weka. This, of course, is vital if using a genetic algorithm for this purpose.
Please help :) Thank you.
Normally weka learning algorithms are batch learning algoritms. What you need are incremental classifier.
From weka docs
Most classifiers need to see all the data before they can be trained, e.g., J48 or SMO. But there are also schemes that can be trained in an incremental fashion, not just in batch mode. All classifiers implementing the weka.classifiers.UpdateableClassifier interface are able to process data in such a way.
See UpdateableClassifier interface to which classifiers implement it.
Also you may look MOA Massive Online Analysis tool which is closely related with weka and all of its classifiers are incremental due to constraints of online learning.
Weka, as far as I can tell, does not do online learning (which is what you're asking about).
It might be better to investigate using competitive analysis for your game.
You may have to reinvent the wheel here. I don't think it's a bad use of time.
I'm currently implementing a learning classifier system, which is pretty simple. I'd also advise looking into these kinds of algorithms. There is an implementation on the internet, but I still prefer to code my own.

Resources