CoreML Multiple Input/Multiple Classifier output - ios

After searching questions on SO and reddit, I can't figure out how to train a multiple input, multiple output classifier on a ML Text Classifier. I can train a single input, single output text classifier but that doesnt fit my use case.
Any help would be appreciated. I understand that there's no code to post, and that this is sort of a "show me how" question, but this information seems not readily available via searching and elsewhere, and would be beneficial to the community.

The classifier objects provided by Core ML (and Create ML) are for very specific use cases. If you try to do anything more advanced than that, you'll have to create a custom model, such as your own neural network.

Related

What is the term for using a Neural Network to create new data based on training data?

I have a large set of Training data which consists of various texts. They should be the input for my neural network. I have no output, or I don't know what to put as output.
Anyway, after the learning phase I want the neural network to create new texts based on the training data.
I read about this like „I made a bot watch 1000 hours of xy and asked it to write a new xy“.
Now my question is, what kind of machine learning is this? I am not looking for instructions on how to write it, but just a hint on how to find some keywords or tutorials. My Google searches so far were useless.
Your problem can usually be solved by an Encoder-Decoder architecture. This architecture would learn a set of latent vectors from your input, then try to output in whatever form you want. This architecture can be built with RNN, LSTM or CNN. Nowadays, attention-based models like transformers are more common among the big names. If you want to do text generation, you can start by reading about Generative Adversarial Networks (GANs).

choosing parameters in creating a good deep neural network

in neural networks we have parameters like momentum, learning rate, activation function etc. so my question is what parameters to choose in order to create a good deep neural network? also, is there any criteria based on which we choose the parameters.
This is a bit of a loaded question, and I am not sure if the format of your question is correct for StackOverflow as it is not necessarily a coding question.
However, choosing hyper-parameters is one of the biggest challenges in all of Machine Learning. There is no correct answer that tells you "X and Y will you give you better results than Z and W" because of how many factor go into the question. What kind of modeling are you attempting, what is your objective, what is your data, etc..
The first part I would suggest follows my question from above in "What is your objective". If it is classification, regression or something else. Once you can answer that, you probably need to determine your loss function. This in itself can be challenging, but here is a link I provided that gives a good over view on different loss functions and their usage:
https://machinelearningmastery.com/how-to-choose-loss-functions-when-training-deep-learning-neural-networks/
Secondly, determining the other parameters is a whole different challenge. From batch size, learning rate, optimizer, etc.. There is too much to ever get to a definitive answer. Some techniques however use grid search, where essentially you run your model in a loop-style sequence, with all the possible parameters you want to test with. Keras tuner is one such tool that can do this.
Overall, I think you should do some further research into the topic of deep learning.
https://www.deeplearning.ai/ai-notes/optimization/
Here is a helpful link that can provide some insight.

Simple machine learning for website classification

I am trying to generate a Python program that determines if a website is harmful (porn etc.).
First, I made a Python web scraping program that counts the number of occurrences for each word.
result for harmful websites
It's a key value dictionary like
{ word : [ # occurrences in harmful websites, # of websites that contain these words] }.
Now I want my program to analyze the words from any websites to check if the website is safe or not. But I don't know which methods will suit to my data.
The key thing here is your training data. You need some sort of supervised learning technique where your training data consists of website's data itself (text document) and its label (harmful or safe).
You can certainly use the RNN but there also other natural language processing techniques and much faster ones.
Typically, you should use a proper vectorizer on your training data (think of each site page as a text document), for example tf-idf (but also other possibilities; if you use Python I would strongly suggest scikit that provides lots of useful machine learning techniques and mentioned sklearn.TfidfVectorizer is already within). The point is to vectorize your text document in enhanced way. Imagine for example the English word the how many times it typically exists in text? You need to think of biases such as these.
Once your training data is vectorized you can use for example stochastic gradient descent classifier and see how it performs on your test data (in machine learning terminology the test data means to simply take some new data example and test what your ML program outputs).
In either case you will need to experiment with above options. There are many nuances and you need to test your data and see where you achieve the best results (depending on ML algorithm settings, type of vectorizer, used ML technique itself and so on). For example Support Vector Machines are great choice when it comes to binary classifiers too. You may wanna play with that too and see if it performs better than SGD.
In any case, remember that you will need to obtain quality training data with labels (harmful vs. safe) and find the best fitting classifier. On your journey to find the best one you may also wanna use cross validation to determine how well your classifier behaves. Again, already contained in scikit-learn.
N.B. Don't forget about valid cases. For example there may be a completely safe online magazine where it only mentions the harmful topic in some article; it doesn't mean the website itself is harmful though.
Edit: As I think of it, if you don't have any experience with ML at all it could be useful to take any online course because despite the knowledge of API and libraries you will still need to know what it does and the math behind the curtain (at least roughly).
What you are trying to do is called sentiment classification and is usually done with recurrent neural networks (RNNs) or Long short-term memory networks (LSTMs). This is not an easy topic to start with machine learning. If you are new you should have a look into linear/logistic regression, SVMs and basic neural networks (MLPs) first. Otherwise it will be hard to understand what is going on.
That said: there are many libraries out there for constructing neural networks. Probably easiest to use is keras. While this library simplifies a lot of things immensely, it isn't just a magic box that makes gold from trash. You need to understand what happens under the hood to get good results. Here is an example of how you can perform sentiment classification on the IMDB dataset (basically determine whether a movie review is positive or not) with keras.
For people who have no experience in NLP or ML, I recommend using TFIDF vectorizer instead of using deep learning libraries. In short, it converts sentences to vector, taking each word in vocabulary to one dimension (degree is occurrence).
Then, you can calculate cosine similarity to resulting vector.
To improve performance, use stemming / lemmatizing / stopwords supported in NLTK libraires.

Text detection on images

I am a very new student on machine learning. I just wanted to ask what are possible ways to improve a method (Naive Bayes for example) to get better results classifying images into text or non-text images, instead of just inputing a x number of images and telling the system which have text and which do not?
Thanks in advance
The state of the art in such problems are deep neural networks with several convolutional layers. See this article for an example of image classification using deep convolutional nets. Your problem (just determining if an image has text or not) is much easier than the general image classification problem the authors consider, so you'd probably get away with using a much simpler network architecture.
Nowadays you don't need to implement these things yourself, there are efficient and GPU-accelerated implementations freely available, for instance Caffe, Torch7, keras...

Can neural bots trained by a neural network be used for the following purpose?

Hey I have a task to perform, which is basically to somehow retrieve powerpoint presentations or pdf documents pertaining to a certain field. Let's say I want to retrieve ppt and pdf lecture notes pertaining to bioinformatics field. I would like to know if this task can be achieved by adapting the approach of using neural bots trained by a neural network? Just wanted to confirm that this approach is not completely wrong before I proceeded further with my implementation.
And in case someone is wondering why a neural network or any learning algorithm at all is required in this case well here is my plan (which might be wrong or there might be an easier way to achieve this so please feel free to correct me):
I generate neural bots trained by a neural network (not sure how this training happens yet, I am assuming by supervised learning using a sample training set of certain ppt and pdf files) and then these bots retrieve pages that are similar to what they learnt through their training.
So is the above approach a correct way to go about completing this task?
Neural nets are complicated. It seems like you have a generic document classification problem. The simplest place to start is using some kind of naive bayes model with bag of word features. The next step I'd take from there is to use a linear SVM or logistic regression on the same feature set. If you still don't have the performance you want after you tried simpler things, maybe then go on to try using neural nets.
Just like you wouldn't say, I want to do write an email server, I'll start by writing an operating system, I'd tend to be wary of using neural nets before simpler things have failed.

Resources