Can anyone suggest me a good source to learn?
I am a newbie in ML
As I am a newbie, I have not done anything in this.
This might be an excellent place to start. You can create a new kernel straight from the dataset page, and the data will be ready for you when you enter the kernel. You can also look at other people's kernels who have used that dataset, and I bet you'll find plenty of helpful examples.
You'll get lots of hate for asking this kind of question, since it doesn't fit in S.O. question parameters, but I prefer to be a useful human.
Related
I've written a program to analyze a given piece of text from a website and make conclusory classifications as to its validity. The code basically vectorizes the description (taken from the HTML of a given webpage in real-time) and takes in a few inputs from that as features to make its decisions. There are some more features like the domain of the website and some keywords I've explicitly counted.
The highest accuracy I've been able to achieve is with a RandomForestClassifier, (>90%). I'm not sure what I can do to make this accuracy better except incorporating a more sophisticated model. I tried using an MLP but for no set of hyperparameters does it seem to exceed the previous accuracy. I have around 2000 data points available for training.
Is there any classifier that works best for such projects? Does anyone have any suggestions as to how I can bring about improvements? (If anything needs to be elaborated, I'll do so.)
Any suggestions on how I can improve on this project in general? Should I include the text on a webpage as well? How should I do so? I tried going through a few sites, but the next doesn't seem to be contained in any specific element whereas the description is easy to obtain from the HTML. Any help?
What else can I take as features? If anyone could suggest any creative ideas, I'd really appreciate it.
You can search with keyword NLP. The task you are facing is a hot topic among those study deep learning, and is called natural language processing.
RandomForest is a machine learning algorithm, and probably works quite well. Using other machine learning algorithms might improve your accuracy, or maybe not. If you want to try out other machine learning algorithms that are light, it's fine.
Deep Learning most likely will outperform your current model, and starting with keyword NLP, you'll find out many models, hopefully Word2Vec, Bert, and so on. You can find out all the codes on github.
One tip for you, is to think carefully whether you can train the model or not. Trying to train BERT from scratch is a crazy thing to do for a starter, even for an expert. Try to bring pretrained model and finetune it, or just bring the word vectors.
I hope that this works out.
I have had an infatuation with a certain concept regarding machine learning that Sethbling proved with his Mar.io program: https://youtu.be/qv6UVOQ0F44
I have a decent amount of logical programming experience in a number of different languages and have read around a lot about machine learning and neural networking.
What I'm looking for is a good set of references that could teach me how to apply neural networks in code, rather than just as a mathematical teaching like most of what I have seen thus far.
Thanks in advance!
Sentdex (https://www.youtube.com/user/sentdex) has incredible tutorials on Youtube and walks through teaching a model to play GTA.
It may seem daunting at first, but the rewards of overcoming such a challenging task will be worth it.
You might want to check out the JavaScript library Neataptic to check out how they implemented neural networks in Agar.IO for example.
You might also want to check out the NeuroEvolution of Augmenting Topologies paper for a basic understanding of neuro-evolution.
I am trying to Code a genetic algorithm in Matlab but really dont know how it works in images and how to proceed? Is there any basic tutorial that can help me understand how to apply GA on images (starting from 2d to multidimentional images ).
That will be a great help for me.
Thanking everyone in anticipations.
Kind Regards.
For GA you need two things: a fitness function that can evaluate any solution and tell how good it is, and a representation of your solution so that you can do crossover and mutation. Once you have these, you are good to go. I'm not an expert on image processing so I can't help you with that exactly.
Look at the book Essentials of metaheuristics which is a very good resource for start with evolutionary computation (and not only that) in general. It's free.
There is a paper on this subject which you can find at the IEEE library. I believe it solves the problem you vaguely describe.
The problem statement is kind of vague but i am looking for directions because of privacy policy i can't share exact details. so please help out.
We have a problem at hand where we need to increase the efficiency of equipment or in other words decide on which values across multiple parameters should the machines operate to produce optimal outputs.
My query is whether it is possible to come up with such numbers using Linear Regression or Multinomial Logistic Regression algorithms, if no then can you please specify which algorithms will be more suitable. Also can you please point me to some active research done on this kind of problem that is available in public domain.
Does the type of problem i am asking suggestions for comes in the area of Machine Learning ?
Lots of unknowns here but I’ll make some assumptions.
What you are attempting to do could probably be achieved with multiple linear regression. I have zero familiarity with the Amazon service (I didn’t even know it existed until you brought this up, it’s not available in Europe). However, a read of the documentation suggests that the Amazon service would be capable of doing this for you. The problem you will perhaps have is that it’s geared to people unfamiliar with this field and a lot of its functionality might be removed or clumped together to prevent confusion. I am under the impression that you have turned to this service because you too are somewhat unfamiliar with this field.
Something that may suit your needs better is Response Surface Methodology (RSM), which I have applied to industrial optimisation problems that I think are similar to what you suggest. RSM works best if you can obtain your data through an experimental design such as a Central Composite Design or Box-Behnken design. I suggest you spend some time Googling these terms to get your head around them, I don’t think it’s an unmanageable burden to learn how to apply these with no prior experience in this area. Because your question is vague, only you can determine if this really is suitable. If you already have the data in an unstructured format, you can still generate an RSM but it is less robust. There are plenty of open-access articles using these techniques but Science Direct is conveniently down at the moment!
Minitab is a software package that will do all the regression and RSM for you. Its strength is that it has a robust GUI and partially reflects Excel so it is far less daunting to get into than something like R. It also has plenty of guides online. They offer a 30 day free trial so it might be worth doing some background reading, collecting the tutorials you need and develop a plan of action before downloading the trial.
Hope that is some help.
I am working with OpenCV for a project used for recognition and I had a general question regarding the API and it's terms. I've looked online and couldn't find anything specific to this but I was wondering what the differences were regarding the Discrete Adaboost, Real AdaBoost, LogitBoost, and Gentle AdaBoost. If anyone could direct me to a pros v cons or a general description about these so that I may research which would be useful.
Update
I have added a link to a powerpoint file that goes over the different variations of the Boosting techniques. Hope this hopes someone else out there.
Adaboost powerpoint
Thanks in advance
There isn't really a simple "always use technique X" otherwise there wouldn't be a need for all the others . You really have to understand the details and experiment.
see The opencv discussion and A list of papers and technical summaries