We are currently working on cyberbullying tweets detection using machine learning, we are unable to find a dataset for the same. So can someone please help us by sending the data set. We will continue our work depending on the data set.
We tried on specific sites and we created a dataset ourselves but that's doesn't seem to do the work. so please help us by sending the dataset for the same.
You may use the Offensive Language Identification Dataset here:
https://sites.google.com/site/offensevalsharedtask/olid
Related
it is my first time dealing with audio data and i'm having a bit of trouble extracting the freature of a real sample audio to pass on the model for evaluation especially that i didn't really understand how the data was pre-processed from the paper below and "The exact order of appearance of the features is not known"
as stated in the Attribute Information.
https://aclanthology.org/H90-1075.pdf
any help would be appreciated.
i tried searching for "spectral coefficients; contour features, sonorant features, pre-sonorant features, and post-sonorant features" but got with a lot of information to start with.
Can anyone help me find out a benchmark dataset for geoparsing of social media texts. I need a benchmark dataset to evaluate my algorithms. I came to know a dataset GEOPARSE TWITTER BENCHMARK DATASET from Suthampton University. but I am unable to access it. Can anyone help me in this ?
I am looking for solutions where I can automatically approve or disapprove different supplier invoices based on historical data.
Let's say, I got an invoice from an HP laptop supplier and based on the previous data, I have to approve or reject that invoice.
Basically, I want to make a decision or prediction based on the data already available based on the history with artificial intelligence, machine learning or any other cloud service
This isn't a direct question though but you can start by looking into various methods of classifications. There is a huge amount of material available online. Try reading about K-Nearest Neighbors, Naive Bayes, K-means, etc. to get an idea about how algorithms in Machine Learning domain work. Once you start understanding what is written in the documentation then start implementing them. You will face a lot of problems which you can search online and I'm sure you will find most of them answered here in this portal.
I have developed a ML model for a classification (0/1) NLP task and deployed it in production environment. The prediction of the model is displayed to users, and the users have the option to give a feedback (if the prediction was right/wrong).
How can I continuously incorporate this feedback in my model ? From a UX stand point you dont want a user to correct/teach the system more than twice/thrice for a specific input, system shld learn fast i.e. so the feedback shld be incorporated "fast". (Google priority inbox does this in a seamless way)
How does one build this "feedback loop" using which my system can improve ? I have searched a lot on net but could not find relevant material. any pointers will be of great help.
Pls dont say retrain the model from scratch by including new data points. Thats surely not how google and facebook build their smart systems
To further explain my question - think of google's spam detector or their priority inbox or their recent feature of "smart replies". Its a well known fact that they have the ability to learn / incorporate (fast) user feed.
All the while when it incorporates the user feedback fast (i.e. user has to teach the system correct output atmost 2-3 times per data point and the system start to give correct output for that data point) AND it also ensure it maintains old learnings and does not start to give wrong outputs on older data points (where it was giving right output earlier) while incorporating the learning from new data point.
I have not found any blog/literature/discussion w.r.t how to build such systems - An intelligent system that explains in detaieedback loop" in ML systems
Hope my question is little more clear now.
Update: Some related questions I found are:
Does the SVM in sklearn support incremental (online) learning?
https://datascience.stackexchange.com/questions/1073/libraries-for-online-machine-learning
http://mlwave.com/predicting-click-through-rates-with-online-machine-learning/
https://en.wikipedia.org/wiki/Concept_drift
Update: I still dont have a concrete answer but such a recipe does exists. Read the section "Learning from the feedback" in the following blog Machine Learning != Learning Machine. In this Jean talks about "adding a feedback ingestion loop to machine". Same in here, here, here4.
There could be couple of ways to do this:
1) You can incorporate the feedback that you get from the user to only train the last layer of your model, keeping the weights of all other layers intact. Intuitively, for example, in case of CNN this means you are extracting the features using your model but slightly adjusting the classifier to account for the peculiarities of your specific user.
2) Another way could be to have a global model ( which was trained on your large training set) and a simple logistic regression which is user specific. For final predictions, you can combine the results of the two predictions. See this paper by google on how they do it for their priority inbox.
Build a simple, light model(s) that can be updated per feedback. Online Machine learning gives a number of candidates for this
Most good online classifiers are linear. In which case we can have a couple of them and achieve non-linearity by combining them via a small shallow neural net
https://stats.stackexchange.com/questions/126546/nonlinear-dynamic-online-classification-looking-for-an-algorithm
After going through many tutorials on youtube, could not find an answer...
I have two arff files, one with the actual test results, class is numeric 0-48,
and the other with a '?' as the class.
I've used 10 fold cross validation REPtree and got a reasonably low error.
My problem is that I don't understand how to use weka to apply this training set on the "unpredicted" data that I have.
The training set consists of users that answered an online survey, and the other file is users who did not answer the survey.
Here is a screenshot of the actual set up I have.
Thank you very much!!