I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.
Related
I'm training an unsupervised machine learning model and want to make sure my features are as useful as possible!
Do unsupervised machine learning model featured need to be independent? For example, I have a feature (subscriptionId) that is the subscription Id of different cloud accounts within a Tenant. I also have a feature that is the resourceId of a resource within the subscription.
However, this resourceId contains the subscriptionId. Is it best practice to combine these features or remove one feature (e.g. subscriptionId) to avoid dependence and duplication among dataset features?
For unsupervised learning, commonly used for clustering, association, or dimensionality reduction, features don't need to be fully independent, but if you have many unique values it's likely that your models can learn to differentiate on these high entropy values instead of learning interesting or significant things as you might hope.
If you're working on generative unsupervised models, for customers, I cannot express how much risk this may create, for security and secret disclosure, for Oracle Cloud Infrastructure (OCI) customers. Generative models are premised on regurgitating their inputs, and thousands of papers have been written on getting private information back out of trained models.
It's not clear what problem you're working on, and the question seems early in its formulation.
I recommend you spend time delving into the limits of statistics and data science, which are the foundation of modern popular machine learning methods.
Once you have an idea of what questions can be answered well by ML, and what can't, then you might consider something like fastAI's course.
https://towardsdatascience.com/the-actual-difference-between-statistics-and-machine-learning-64b49f07ea3
https://www.nature.com/articles/nmeth.4642
Again, depending on how the outputs will be used or who can view or (even indirectly) query the model, it seems unwise to train on private values, especially if you want to generate outputs. ML methods are only useful if you have access to a lot of data, and if you have access to the data of many users, you need to be good steward of Oracle Cloud customer data.
I am developing a website, which will recommend recipes to the visitors based on their data. I am collecting data from their profile, website activity and facebook.
Currently I have data like [username/userId, rating of recipes, age, gender, type(veg/Non veg), cuisine(Italian/Chinese.. etc.)]. With respect to above features I want to recommend new recipes which they have not visited.
I have implemented ALS (alternating least squares) spark algorithm. In this we have to prepare csv which contains [userId,RecipesId,Rating] columns. Then we have to train this data and create the model by adjusting parameters like lamdas, Rank, iteration. This model generated recommendation, using pyspark
model.recommendProducts(userId, numberOfRecommendations)
The ALS algorithm accepts only three features userId, RecipesId, Rating. I am unable to include more features (like type, cuisine, gender etc.) apart from which I have mentioned above (userId, RecipesId, Rating). I want to include those features, then train the model and generate recommendations.
Is there any other algorithm in which I can include above parameters and generate recommendation.
Any help would be appreciated, Thanks.
Yes, there are couple of others algorithms. For your case, I would suggest that you Naive Bayes algorithm.
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
Since you are working on a web application, a JS solution, I guess, would come handy to you.
(simple) https://www.npmjs.com/package/bayes
or for example:
(a bit more powerful) https://www.npmjs.com/package/naivebayesclassifier
There are algorithms called recommender systems in machine learning. In this we have content based recommender systems. They are mainly used to recommend products/movies based on customer reviews. You can apply the same algorithm using customer reviews to recommend recipes. For better understanding of this algorithm refer this links:
https://www.youtube.com/watch?v=Bv6VkpvEeRw&list=PL0Smm0jPm9WcCsYvbhPCdizqNKps69W4Z&index=97
https://www.youtube.com/watch?v=2uxXPzm-7FY
You can go with powerful classification algorithms like
->SVM: works very well if you have more number of attributes.
->Logistic Regression: if you have huge data of customers.
You are looking for recommender systems using algorithms like collaborative filtering. I would suggest you to go through Prof.Andrew Ng's short videos on collaborative filtering algorithm and low-rank matrix factorization and also building recommender systems. They are a part of Coursera's Machine learning course offered by Stanford University.
The course link:
https://www.coursera.org/learn/machine-learning#%20
You can check week 9 for the content related to recommender systems.
I have developed a ML model for a classification (0/1) NLP task and deployed it in production environment. The prediction of the model is displayed to users, and the users have the option to give a feedback (if the prediction was right/wrong).
How can I continuously incorporate this feedback in my model ? From a UX stand point you dont want a user to correct/teach the system more than twice/thrice for a specific input, system shld learn fast i.e. so the feedback shld be incorporated "fast". (Google priority inbox does this in a seamless way)
How does one build this "feedback loop" using which my system can improve ? I have searched a lot on net but could not find relevant material. any pointers will be of great help.
Pls dont say retrain the model from scratch by including new data points. Thats surely not how google and facebook build their smart systems
To further explain my question - think of google's spam detector or their priority inbox or their recent feature of "smart replies". Its a well known fact that they have the ability to learn / incorporate (fast) user feed.
All the while when it incorporates the user feedback fast (i.e. user has to teach the system correct output atmost 2-3 times per data point and the system start to give correct output for that data point) AND it also ensure it maintains old learnings and does not start to give wrong outputs on older data points (where it was giving right output earlier) while incorporating the learning from new data point.
I have not found any blog/literature/discussion w.r.t how to build such systems - An intelligent system that explains in detaieedback loop" in ML systems
Hope my question is little more clear now.
Update: Some related questions I found are:
Does the SVM in sklearn support incremental (online) learning?
https://datascience.stackexchange.com/questions/1073/libraries-for-online-machine-learning
http://mlwave.com/predicting-click-through-rates-with-online-machine-learning/
https://en.wikipedia.org/wiki/Concept_drift
Update: I still dont have a concrete answer but such a recipe does exists. Read the section "Learning from the feedback" in the following blog Machine Learning != Learning Machine. In this Jean talks about "adding a feedback ingestion loop to machine". Same in here, here, here4.
There could be couple of ways to do this:
1) You can incorporate the feedback that you get from the user to only train the last layer of your model, keeping the weights of all other layers intact. Intuitively, for example, in case of CNN this means you are extracting the features using your model but slightly adjusting the classifier to account for the peculiarities of your specific user.
2) Another way could be to have a global model ( which was trained on your large training set) and a simple logistic regression which is user specific. For final predictions, you can combine the results of the two predictions. See this paper by google on how they do it for their priority inbox.
Build a simple, light model(s) that can be updated per feedback. Online Machine learning gives a number of candidates for this
Most good online classifiers are linear. In which case we can have a couple of them and achieve non-linearity by combining them via a small shallow neural net
https://stats.stackexchange.com/questions/126546/nonlinear-dynamic-online-classification-looking-for-an-algorithm
I working on a site that needs to present a set of options that have no particular order. I need to sort this list based on the customer that is viewing the list. I thought of doing this by generating recommendation rules and sorting the list putting the best suited to be liked by the customer on the top. Furthermore I think I'd be cool that if the confidence in the recommendation is high, I can tell the customer why I'm recommending that.
For example, lets say we have an icecream joint who has website where customers can register and make orders online. The customer information contains basic info like gender, DOB, address, etc. My goal is mining previous orders made by customers to generate rules with the format
feature -> flavor
where feature would be either information in the profile or in the order itself (like, for example, we might ask how many people are you expecting to serve, their ages, etc).
I would then pull the rules that apply to the current customer and use the ones with higher confidence on the top of the list.
My question, what's the best standar algorithm to solve this? I have some experience in apriori and initially I thought of using it but since I'm interested in having only 1 consequent I'm thinking now that maybe other alternatives might be better suited. But in any case I'm not that knowledgeable about machine learning so I'd appreciate any help and references.
This is a recommendation problem.
First the apriori algorithm is no longer the state of the art of recommendation systems. (a related discussion is here: Using the apriori algorithm for recommendations).
Check out Chapter 9 Recommendation System of the below book Mining of Massive Datasets. It's a good tutorial to start with.
http://infolab.stanford.edu/~ullman/mmds.html
Basically you have two different approaches: Content-based and collaborative filtering. The latter can be done in terms of item-based or user-based approach. There are also methods to combine the approaches to get better recommendations.
Some further readings that might be useful:
A recent survey paper on recommendation systems:
http://arxiv.org/abs/1006.5278
Amazon item-to-item collaborative filtering: http://www.cs.umd.edu/~samir/498/Amazon-Recommendations.pdf
Matrix factorization techniques: http://research.yahoo4.akadns.net/files/ieeecomputer.pdf
Netflix challenge: http://blog.echen.me/2011/10/24/winning-the-netflix-prize-a-summary/
Google news personalization: http://videolectures.net/google_datar_gnp/
Some related stackoverflow topics:
How to create my own recommendation engine?
Where can I learn about recommendation systems?
How do I adapt my recommendation engine to cold starts?
Web page recommender system
I’m reading towards M.Sc. in Computer Science and just completed first year of the source. (This is a two year course). Soon I have to submit a proposal for the M.Sc. Project. I have selected following topic.
“Suitability of machine learning for document ranking in information retrieval system”. Researchers have been using various machine learning algorithms for ranking documents. So as the first phase of the project I will be doing a complete literature survey and finding out advantages/disadvantages of current approaches. In the second phase of the project I will be proposing a new (modified) algorithm in order to overcome the limitations of current approaches.
Actually my question is whether this type of project is suitable as a M.Sc. project? Moreover if somebody has some interesting idea in information retrieval filed, is it possible to share those ideas with me.
Thanks
Ranking is always the hardest part of any of Information Retrieval systems. I think it is a very good topic but you have to take care to -- as soon as possible -- to define a scope of the work. Probably you will not be able to develop a new IR engine but rather build a prototype based on, e.g., apache lucene.
Currently there is a lot of dataset including stackoverflow data dump, which provide you all information you need to define a rich feature vector (number of points, time, you can mine topics of previous question etc., popularity of a tag) for you machine learning ranking algorithm. In this part of the work you could, e.g., classify types of features (e.g., user specific, semantic feature - software name in the title) and perform series of experiments to learn which features are most important and which are not for a given dataset.
The second direction of such a project can be how to perform learning efficiently. The reason behind is the quantity of data within web or community forums and changes in the forum (this would be important if you take a community specific features), e.g., changes in technologies, new software release, etc.
There are many other topics related to search and machine learning. The best idea is to search on scholar.google.com for the recent survey papers on ranking, machine learning, and search to learn what is the state-of-the-art. The very next step would be to talk with your MSc supervisor.
Good luck!
Everything you said is good and should be done, but you forgot the most important part:
Prove that your algorithm is better and/or faster than other algorithms, with good experiments and maybe some statistics (p-value, confidence interval).
If you do that and convince people that your algorithm is useful you surely will not fail :)