Looking for training data for music accompaniment [closed] - machine-learning

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am building a system that uses machine learning to generate an accompanying melody in real time as a leading melody is being played. It uses a type of Recurrent Neural Networks and at every step it tries to predict the next note on the accompanying track. At this point I am satisfied with just working with midi files.
I am having serious trouble finding training data. My original idea was to just download midi files from sites such as mididb and convert them to csv, but the problem is that it's hard to come up with a way to distinguish between the leading melody and the accompanying melody. Sometimes this is possible, but then again I would prefer the accompanying tracks to be always from the same (or similar) instrument(s), because different instruments are used differently (the duration and pitch of notes is very different from one instrument to the other etc.) and that would just get the network really confused.
I found Bach Corales on the UCI Machine Learning repository . The problem with this dataset though, is that it only has melodies with 1 voice. I want datasets with 2 voices, one of which is the leading melody and the other the accompanying melody.
I understand that this is difficult, so any advice on how to approach the problem would be very appreciated. I have tools that convert midi files to csv format, and if you can think of certain types/genres of songs, for which it would be easy to distinguish between leading and accompanying tracks (programmatically or manually), please let me know. Any advice will be greatly appreciated.

Exciting Topic. There aren't much other databases out there for data mining other than the set you mentioned. So you'll need to get a bit creative.
Have you read Jürgen Schmidhuber's approach on music composition using LSTM Reccurent Neural Networks? If not, you should definitely do so:
A First Look at Music Composition using LSTM Recurrent Neural Networks
Finding Temporal Structure in Music: Blues Improvisation with LSTM Recurrent Networks
You can browse through his work on his site
Now, the first paper created their own dataset, you might try asking the authors. The training set of the latter paper can be seen on their webpage to the study.
The best approach I think is to generate your own dataset:
1) Note that they have used sheets (pdf) and audio (not only midi but also wav/mp3) so you might want to think about extracting chords from wav files and labeling them with possible melody harmonies manually.
2) You can search directly for single scores instead of data mining datasets. E.g. www.free-scores.com to find specific scores. You can edit them, import them to Sibelius or Finale and convert them to midi in these programs. The easiest way would be if you can find scores written in Sibelius/Finale itself so you can export them to midi right away.
Edit:
One more comment on your chord/melody structure. Try to keep it simple at the beginning. Try to maintain a format like in the "First Look at.." paper: Melody+Chord Structure. No Instruments. After this is working you can try to reach the same results building this representation from multiple instrument scores. If that's working, try to build the multiple instrumentation scores from midi. If that works, start with real audio files.

Related

Use machine learning to analyze sports bets [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Lets say I have database with over 1 Million bets (all kinds of sports) made by couple thousands of users, over a period of 2 years (and still growing).
These data are just lying around doing nothing, so I thought if it would be possible to use something like https://www.tensorflow.org/, do a bit of tinkering and it would analyze all the bets in database and learn from it some patterns, whats good and whats not.
The point being is we dont have resources to employ dozens of people for god knows how long to write some complicated software from the ground up. So I was thinking we could use some module from TensorFlow and go from there.
So then I would feed the network with new open bets that are currently in the system (those would be bets that are on matches that are about to be played) and it would pick for me what I should bet on, for example there is a 90% chance this bet will win, because 10 very successful players made this bet, and they have very high success when betting on this particular sport.
We have lots of experienced users, they make lots of money from betting. So the system could be trained on the data we have and then it would know, for example, if user A bets on this league/team, its very likely he will win.
The question is, where do we go from here? Can anybody point us in the right direction? Or is this just too difficult to do, for 2 people in few months? Can we use some pre-programmed solutions, like TensorFlow?
Without having a look of the data, it is impossible to suggest what direction should you take your next steps but anyway your first step should be to explore your data throughly, create model on small subset of data and test your hypothesis.
Overall you can try to:
Use Python or R to Load and Clean Data
Take a random subset of data(some 10,000 rows) and create a simple model using SVM or Random Forest looks like a classification Win/Lose.
Test your results and verify your hypothesis with some data.
Explore about your data to see if you can generate better features
Design a small neural network first and then think about a deep neural network using tensorflow or keras etc.
Have a look at this: https://hackernoon.com/how-to-create-your-own-machine-learning-predictive-system-in-the-nba-using-python-7189d964a371
Yes, this is possible but can be more difficult than it appears.
Consider Microsoft's Cortana which (while only picking if a game will win outright and not ATS) is only approx. 63% accurate; which is quite good but not exactly 90% as you mention in your question (1).
The size of your database should be great for ANN models. It would be a very interesting project for sure!
To your question "where I go from here..." my answer is to simply explore the data in RStudio or using a cloud service such as Microsoft's Azure ML Studio (2) or Amazon's Machine Learning services (3).
Good luck!
Ref. 1: http://www.businessinsider.com/nfl-picks-microsoft-cortana-elo-week-5-2017-10
Ref. 2: https://studio.azureml.net/
Ref. 3: https://aws.amazon.com/amazon-ai/

Modelling card game for machine learning [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm looking for some help modelling this machine learning problem.
A hand consists of three rows (containing 3, 5, and 5 cards respectively). Your goal is to build a hand that scores the most points. You receive the cards in intervals called streets, five cards in the first street, and three in the next four streets (you must discard one of the cards in the final four streets). Cards can't be moved once you place them. More details on scoring.
My goal is to build a system that, given a set of streets, plays the hand similar to our best players. It seem pretty clear that I'll need to build a neural network for each street, using features based on the existing hand and the set of cards in the street. I've got plenty of data (streets, placements, and final score), but I'm a little unsure how to model the problem given that the possible outputs are unique on the set of cards (although there are less than 3^5 placements in the first street, and 3^3 after). I've previously only dealt with classification problems with fixed categories.
Does anyone have an example of a similar problem or suggestions how to prepare the training data when you have unique outputs?
A vague question gives a vague answer (which is my excuse for being too lazy to code ;-).
You wrote you have a lot of data, and it seems you want to map the game onto experience gained with supervised learning. But that is not the way game-optimization works. One usually does not perform supervised learning, but rather reinforcement learning. The differences are subtle, but reinforcement learning (with Markov decision processes as its theoretical basis) offers more a local view -- like optimize the decision given a specific state. Supervised learning rather corresponds to optimize several decisions at once.
Another show stopper for the usual supervised learning approach is that even if you have a lot of data, it will almost surely be too little. And it will not offer the "required paths".
The usual approach at least since Thesauro's backgammon player is rather: set up the basic rules of the game, possibly introduce human knowledge as heuristics, and then let the program play against itself as often as possible -- this is how google deep mind set up a master go player, for example. See also this interesting video.
In your case, the task should in principle be not that hard, as there is a comparatively small number of game states and, importantly, any issues involved by psychology like bluffing, consistent playing, and so on are completely absent.
So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.
I'll stay that vague for now. But I would be glad to assist you further if necessary.

Neural network: should the algorithm be rewritten for every case? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have 2 sequences of numbers and I'd want to continue it using neural algorithms (there is some logic in them, but I don't know what, and there are no external factors affecting the selection). There are some relationship is in each of the two sequences separately, as well as between them.
So, I'm new to machine learning, but I've got such an idea: is there any already written-and-well-working applications (libraries) that implement exact algorithms for me not to learn them all before using. Simply like "most-frequently-used-neural-algorithms-kit".
I'm thinking of analysing some music sheets and two sequences: "notes" and "durations".
OK, according to the comments I think I got what you want.
Generally, no, you don't need to rewrite the standard algorithm of ANN. But be aware that ANN is not an algorithm, but a cluster of algorithms (including BackPropagation-ANN, Hopfield-ANN, Boltzmann Machine etc). Among them I recommend BP-ANN which is simple and suitable for your project. You might want to input a sequences of the known notes and duration, and then expect an output of the next note and duration.
To use BP-ANN, you don't need to rewrite them. Due to its a widely-used algorithm, there are many toolkits and open source implementations of it:
Google "back propagation neural network implementation", you will find it easily. There are also a few opensource projects on Github(in both C language and Matlab): https://github.com/search?q=back+propagation&type=Everything&repo=&langOverride=&start_value=1
For further reading if you also want to deeply understand the details of its implementation, read this: http://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=1279&context=ecetr&sei-redir=1
If you're interested in neural networks there are plenty of libraries available.
ANNIE is one such open source example, the MATLAB Neural Network toolbox is a
commercial example. These are libraries which you tell the architecture of the
neural network, you can train, test, verify, etc. The important part in all
these machine learning methods is how you represent your data, and those were
the comments you were getting (for example Predictor's). Sometimes you get
excellent results with one representation and very bad results with others.
There are also libraries to train SVMs (a specialized algorithm to train neural
networks) with quadratic regularization, LIBSVM is one great example.
There is also plenty of work on predicting time series with neural networks (if
that is what you want to do with music, I am not sure what exactly you want).
If the input is a series of (note, duration) pairs, then I suspect you'd get much farther by summarizing the historical note-to-note transitions or by something similar in an effort to capture the syntax of the music (Markov analysis, etc.), than you would by stuffing this into a neural network. It may help, too, to try representing the series as note differentials, measuring how many notes up or down the scale the new note is, rather than the actual value of the note itself.

How to implement an artificial neural network in Delphi? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to have an artificial neural network:
42 input neurons
168 hidden neurons
7 output neurons
This network is to play the game of "Connect Four". At the end of each game, the network gets feedback (game result / win?).
Learning should be done with Temporal Difference Learning.
My questions:
What values should be in my reward array?
And finally: How can I apply it to my game now?
Thank you so much in advance!
First hit is: you're assigning '0' to t in 'main', but your arrays' low-bound is '1', so you're accessing a non-existing element in the loops, hence the AV.
If you had enabled range-checking in compiler options, you'd be getting a range check error and you probably would have find the reason earlier.
BTW, since I have no idea what the code is doing, I wouldn't possibly notice any other errors at this time..
If you're interested in using a third party library (free for non-commercial products, I've been very happy with some tools from this company http://www.mitov.com/html/intelligencelab.html (although I've never used their intelligence lab, just video tools.)
Fast Artificial Neural Network (FANN) is a good open source library, its been optimised and used by a large community, with plenty of support and delphi bindings.
Using dependencies in this area is advised if you don't fully understand what your doing, the smallest detail can have a big impact on how a neural network performs; so best spend your time on your implementation of the network, then on anything else.
Other links that may be helpful for you:
http://delphimagic.blogspot.com.ar/2012/12/red-neuronal-backpropagation.html
(Includes source code)
Coding a Backpropagation neural network with two input neurons, two output and one hidden layer.
The sample provides two sets of data that can train the network and see how accurate learning minimizing the error shown in a graph.
Modifying the program can change the number of times the network trained with test data (epochs)

Publicly Available Spam Filter Training Set [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm new to machine learning, and for my first project I'd like to write a naive Bayes spam filter. I was wondering if there are any publicly available training sets of labeled spam/not spam emails, preferably in plain text and not a dump of a relational database (unless they pretty-print those?).
I know such a publicly available database exists for other kinds of text classification, specifically news article text. I just haven't been able to find the same sort of thing for emails.
Here is what I was looking for: http://untroubled.org/spam/
This archive has around a gigabyte of compressed accumulated spam messages dating 1998 - 2011. Now I just need to get non-spam email. So I'll just query my own Gmail for that using the getmail program and the tutorial at mattcutts.com
Sure, there's Spambase, which is as far as i'm aware, is the most widely cited spam data set in the machine learning literature.
I have used this data set many times; each time i am impressed how much effort has been put into the formatting and documentation of this data set.
A few characteristics of the Spambase set:
4601 data points--all complete
each comprised of 58 features
(attributes)
each data point is labelled 'spam' or
'no spam'
approx. 40% are labeled spam
of the features, all are continuous
(vs. discrete)
a representative feature: average
continuous sequence of capital
letters
Spambase is archived in the UCI Machine Learning Repository; in addition, it's also available on the Website for the excellent ML/Statistical Computation Treatise, Elements of Statistical Learning by Hastie et al.
SpamAssassin has a public corpus of both spam and non-spam messages, although it hasn't been updated in a few years. Read the readme.html file to learn what's there.
You might consider taking a look at the TREC spam/ham corpus (which I think is the collection of emails from Enron that was made public from the court case). TREC generally runs a bunch of competitive text processing tasks, so it might give you some references for comparison.
The downside is that they're stored in raw mbox format, though there are parsers available in many languages (Apache Tika is a good example).
The webpage isn't TREC, but this seems to be a good overview of the task with links to the data: http://plg.uwaterloo.ca/~gvcormac/spam/
A more modern one spam training set can be found at kaggle. Moreover, you can test accuracy of your classifier on their website by uploading your results.
I have also an answer, here you can find a daily refreshed Bayesian database for initial training and also a daily created archive containing captured spams. You will find the instructions how to use it on the site.

Resources