I want to make graph using bank transaction dataset between credit vs debit. Amount debited and credited but not getting proper dataset!
Can anyone provide me dataset for the same?
Recently google have launched there own dataset platform and named it Google Dataset Search. You can search for the dataset you need. Having a large collection of dataset, I think you may get one that meets up your requirements.
Here I have got one for you. Hope it will help you.
https://toolbox.google.com/datasetsearch/search?query=Bank%20transactions&docid=sv20cU0dOX3wrVUWAAAAAA%3D%3D
Related
I’m working on a personal Data Science project where I try to flag bots on Instagram. I already collected public data about 80k users, labelled ~4k users and managed to get 3k more thanks to similarities (e.g. same comment, same profile pic, same scammy website in bio, etc.). This last step got me more bots but also changed the distribution of the bot/legit user in the labelled dataset.
I heard about semi-supervised learning but I’m still very new in Data Science as this is my first ML project, I don’t feel super confident about using it.
What are my different options? Can I balance the labelled data and stop labelling after a point? Should I label everything? What would you advise me?
I have a dataset that contains email interaction between a large user group. I mean which user sends en email to other users. The most significant column of that data is sender_id, receiver_id, time etc. I want to come up with a solution for suggesting receiver_id using machine learning (I solved it using graph theory concepts), now want to apply a machine learning solution here, as a learning process.
I need some help and ideas for this particular problem,
what should be a machine learning approach to suggest multiple receiver id (max 5 to 10 users) based on the previous interactions?
Also, how to describe this problem, either a regression one or a classification one? I'm confused!
As per my understanding this problem closely related to email recipients' recommendation, please share some good papers on that issue. Actually, I'm not sure how to apply, Collaborative filtering on that problem as I have no access to the email body, there is no possibility to apply content-based approaches. Please correct me if I'm wrong.
It depends on your training set. If you have sufficient number of features for "receiving" output and sufficient number of data then you may use multi-classification. But since I assume that there are too much receiver, clustering would be a better option. You can create clusters from your features and recommend the emails to the users that are in the same cluster. For example, This paper uses that approach.
I'm trying to build a model which will give the probability of every customer in a database will show up on a certain day (i.e. I pass in 8/25/19 and the list of all customers shows up with their respective probability). I have the logs for all customers transactions and the date. I'm thinking of using some sort of RNN to do this. Is this the proper way to do this? If not, what is the best way to do it? I want to discover the patterns and high confidence leads for which customers show up. There is around 400,000 records for 3 years.
You have time series data.
RNN is a good starting point. Check out this step-by-step instructions of sales prediction. RNN is an easy start and might give you really good quality. Also there is an adaptation of xgboost algorithm for time series that also gives a good quality, but might be slower.
Good luck!
I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.
I'm novice in ML. I've crunch time and in need to choose the algorithm to complete my following task:
Traveler, is visiting my website. I make them fill the form and have all the necessary signal (attributes) with me like whether they have booked flight or not, whether email is genuine is not, phone no is given or not, trip date is fixed, destination location is fixed or not.
But along with that I have many visitor who don't fill the form completely or just uses fake phone number.
I again re-iterate, I have lot of signal available with me, and I need to filter out the traveler who is certain to go for traveling so that I can personally contact them. I also need some score as well on the scale of 10.
Which ML algorithm is best suited for this job and why ?
Previously I have worked in WEKA.
You'll need to create an ensemble model (composition of many different algorithms).