Machine learning and python NLP - machine-learning

I want to work with fake news detection with my country newspaper language, I Want to compare fake and real news with machine learning how can machine give me answer which news is fake or real?

I believe you cannot apply just machine learning. Just think about it: you cannot detect a fake news just analyzing it. You have to have a prior knowledge and compare it to the idea of the world that you have.
The only approach you could use is to cluster the news into clusters that tell about the same news, then, if try to compare different news that talk about the same topic and find some inconsistency. Is it a "AI-complete" problem though.

Related

Deep Learning Algorithm to Predict Bash Commands

Im new to machine learning, and I want to develop an application that takes all the data from multiple user's bash history, and predict the next command of another user based on other's executed commands.
I searched for it a lot but didnt find any good answer. Appreciate the ML expert's help if know about sample of similar code, or have any comments that might be useful such as what algorithm.etc. should I look into.
You can check Language Modeling topic, which is able to predict the next word in the sequence given the words that precede it. You probably work with RNN or LSTM based networks for Language Modeling.

Machine Learning - Derive information from a text

I'm a newbie in the field of Machine Learning and Supervised learning.
My task is the following: from the name of a movie file on a disk, I'd like to retrieve some metadata about the file. I have no control on how the file is named, but it has a title and one or more additional info, like a release year, a resolution, actor names and so on.
Currently I have developed a rule heuristic-based system, where I split the name into tokens and try to understand what each word could represent, either alone or with adjacent ones. For detecting people names for example, I'm using a dataset of english names, and score the word as being a potential person's name if I find it in the dataset. If adjacent to it is a word that I scored as a potential surname, I score the two words as being an actor. And so on. It works with a decent accuracy, but changing heuristic scores manually to "teach" the system is tedious and unpredictable.
Such a rule-based system is hard to maintain or develop further, so, out of curiosity, I was exploring the field of machine learning. What I would like to know is:
Is there some kind of public literature about these kinds of problems?
Is ML a good way to approach the problem, given the limited data set available?
How would I proceed to debug or try to understand the results of such a machine? I already have problems with the "simplistic" heuristic engine I have developed..
Thanks, any advice would be appreciated.
You need to look into NLP (natural language processing). NLP deals with text processing and other things; for example entity recognition and tagging.
Here is an example of using Spacy library: https://spacy.io/usage/linguistic-features.
Some time ago I did a similar thing, you can see it here: https://github.com/Erlemar/Erlemar.github.io/blob/master/Notebooks/Fate_Zero_explore.ipynb

Search and list person's name in a book with machine learning

I'm not sure whether this task can be solved with machine learning and want to get some suggestions.
I want to search and list all the names of person from a random book (suppose they are all written in lower case). I can manually label a few names in the beginning for training purpose. I guess I can use some supervised learning algorithm, but I don't know what kind of features can be obtained because only available information I can think of in this scenario is the text of the book.
Can you give me a high level suggestion on steps to solve this question with machine learning?
Named Entity Recognition is pretty well-researched problem.
There are existing libraries solving it, for example this is a nice tutorial for Spacy.

Online machine learning for obstacle crossing or bypassing

I want to program a robot which will sense obstacles and learn whether to cross over them or bypass around them.
Since my project, must be realized in week and a half period, I must use an online learning algorithm (GA or such would take a lot time to test because robot needs to try to cross over the obstacle in order to determine is it possible to cross).
I'm really new to online learning so I don't really know which online learning algorithm to use.
It would be a great help if someone could recommend me a few algorithms that would be the best for my problem and some link with examples wouldn't hurt.
Thanks!
I think you could start with A* (A-Star)
It's simple and robust, and widely used.
There are some nice tutorials on the web like this http://www.raywenderlich.com/4946/introduction-to-a-pathfinding
Online algorithm is just the one that can collect new data and update a model incrementally without re-training with full dataset (i.e. it may be used in online service that works all the time). What you are probably looking for is reinforcement learning.
RL itself is not a method, but rather general approach to the problem. Many concrete methods may be used with it. Neural networks have been proved to do well in this field (useful course). See, for example, this paper.
However, to create real robot being able to bypass obstacles you will need much then just knowing about neural networks. You will need to set up sensors carefully, preprocess data from them, work out your model and collect a dataset. Not sure it's possible to even learn it all in a week and a half.

Machine learning/information retrieval project

I’m reading towards M.Sc. in Computer Science and just completed first year of the source. (This is a two year course). Soon I have to submit a proposal for the M.Sc. Project. I have selected following topic.
“Suitability of machine learning for document ranking in information retrieval system”. Researchers have been using various machine learning algorithms for ranking documents. So as the first phase of the project I will be doing a complete literature survey and finding out advantages/disadvantages of current approaches. In the second phase of the project I will be proposing a new (modified) algorithm in order to overcome the limitations of current approaches.
Actually my question is whether this type of project is suitable as a M.Sc. project? Moreover if somebody has some interesting idea in information retrieval filed, is it possible to share those ideas with me.
Thanks
Ranking is always the hardest part of any of Information Retrieval systems. I think it is a very good topic but you have to take care to -- as soon as possible -- to define a scope of the work. Probably you will not be able to develop a new IR engine but rather build a prototype based on, e.g., apache lucene.
Currently there is a lot of dataset including stackoverflow data dump, which provide you all information you need to define a rich feature vector (number of points, time, you can mine topics of previous question etc., popularity of a tag) for you machine learning ranking algorithm. In this part of the work you could, e.g., classify types of features (e.g., user specific, semantic feature - software name in the title) and perform series of experiments to learn which features are most important and which are not for a given dataset.
The second direction of such a project can be how to perform learning efficiently. The reason behind is the quantity of data within web or community forums and changes in the forum (this would be important if you take a community specific features), e.g., changes in technologies, new software release, etc.
There are many other topics related to search and machine learning. The best idea is to search on scholar.google.com for the recent survey papers on ranking, machine learning, and search to learn what is the state-of-the-art. The very next step would be to talk with your MSc supervisor.
Good luck!
Everything you said is good and should be done, but you forgot the most important part:
Prove that your algorithm is better and/or faster than other algorithms, with good experiments and maybe some statistics (p-value, confidence interval).
If you do that and convince people that your algorithm is useful you surely will not fail :)

Resources