Using Artificial Intelligence (AI) to predict Stock Prices - f#

Given a set of data very similar to the Motley Fool CAPS system, where individual users enter BUY and SELL recommendations on various equities. What I would like to do is show each recommendation and I guess some how rate (1-5) as to whether it was good predictor<5> (ie. correlation coefficient = 1) of the future stock price (or eps or whatever) or a horrible predictor (ie. correlation coefficient = -1) or somewhere in between.
Each recommendation is tagged to a particular user, so that can be tracked over time. I can also track market direction (bullish / bearish) based off of something like sp500 price. The components I think that would make sense in the model would be:
user
direction (long/short)
market direction
sector of stock
The thought is that some users are better in bull markets than bear (and vice versa), and some are better at shorts than longs- and then a combination the above. I can automatically tag the market direction and sector (based off the market at the time and the equity being recommended).
The thought is that I could present a series of screens and allow me to rank each individual recommendation by displaying available data absolute, market and sector out performance for a specific time period out. I would follow a detailed list for ranking the stocks so that the ranking is as objective as possible. My assumption is that a single user is right no more than 57% of the time - but who knows.
I could load the system and say "Lets rank the recommendation as a predictor of stock value 90 days forward"; and that would represent a very explicit set of rankings.
NOW here is the crux - I want to create some sort of machine learning algorithm that can identify patterns over a series of time so that as recommendations stream into the application we maintain a ranking of that stock (ie. similar to correlation coefficient) as to the likelihood of that recommendation (in addition to the past series of recommendations ) will affect the price.
Now here is the super crux. I have never taken an AI class / read an AI book / never mind specific to machine learning. So I cam looking for guidance - sample or description of a similar system I could adapt. Place to look for info or any general help. Or even push me in the right direction to get started...
My hope is to implement this with F# and be able to impress my friends with a new skill set in F# with an implementation of machine learning and potentially something (application / source) I can include in a tech portfolio or blog space;
Thank you for any advice in advance.

I have an MBA, and teach data mining at a top grad school.
The term project this year was to predict stock price movements automatically from news reports. One team had 70% accuracy, on a reasonably small sample, which ain't bad.
Regarding your question, a lot of companies have made a lot of money on pair trading (find a pair of assets that normally correlate, and buy/sell pair when they diverge). See the writings of Ed Thorpe, of Beat the Dealer. He's accessible and kinda funny, if not curmudgeonly. He ran a good hedge fund for a long time.
There is probably some room in using data mining to predict companies that will default (be unable to make debt payments) and shorting† them, and use the proceeds to buy shares in companies less likely to default. Look into survival analysis. Search Google Scholar for "predict distress" etc in finance journals.
Also, predicting companies that will lose value after an IPO (and shorting them. edit: Facebook!). There are known biases, in academic literature, that can be exploited.
Also, look into capital structure arbitrage. This is when the value of the stocks in a company suggest one valuation, but the value of the bonds or options suggest another value. Buy the cheap asset, short the expensive one.
Techniques include survival analysis, sequence analysis (Hidden Markov Models, Conditional Random Fields, Sequential Association Rules), and classification/regression.
And for the love of God, please read Fooled By Randomness by Taleb.
† shorting a stock usually involves calling your broker (that you have a good relationship with) and borrowing some shares of a company. Then you sell them to some poor bastard. Wait a while, hopefully the price has gone down, you buy some more of the shares and give them back to your broker.

My Advice to You:
There are several Machine Learning/Artificial Intelligence (ML/AI) branches out there:
http://www-formal.stanford.edu/jmc/whatisai/node2.html
I have only tried genetic programming, but in the "learning from experience" branch you will find neural nets. GP/GA and neural nets seem to be the most commonly explored methodologies for the purpose of stock market predictions, but if you do some data mining on Predict Wall Street, you might be able to utilize a Naive Bayes classifier to do what you're interested in doing.
Spend some time learning about the various ML/AI techniques, get a small data set and try to implement some of those algorithms. Each one will have its strengths and weaknesses, so I would recommend that you try to combine them using Naive Bays classifier (or something similar).
My Experience:
I'm working on the problem for my Masters Thesis so I'll pitch my results using Genetic Programming: www.twitter.com/darwins_finches
I started live trading with real money in 09/09/09.. yes, it was a magical day! I post the GP's predictions before the market opens (i.e. the timestamps on twitter) and I also place the orders before the market opens. The profit for this period has been around 25%, we've consistently beat the Buy & Hold strategy and we're also outperforming the S&P 500 with stocks that are under-performing it.
Some Resources:
Here are some resources that you might want to look into:
Max Dama's blog: http://www.maxdama.com/search/label/Artificial%20Intelligence
My blog: http://mlai-lirik.blogspot.com/
AI Stock Market Forum: http://www.ai-stockmarketforum.com/
Weka is a data mining tool with a collection of ML/AI algorithms: http://www.cs.waikato.ac.nz/ml/weka/
The Chatter:
The general consensus amongst "financial people" is that Artificial Intelligence is a voodoo science, you can't make a computer predict stock prices and you're sure to loose your money if you try doing it. None-the-less, the same people will tell you that just about the only way to make money on the stock market is to build and improve on your own trading strategy and follow it closely.
The idea of AI algorithms is not to build Chip and let him trade for you, but to automate the process of creating strategies.
Fun Facts:
RE: monkeys can pick better than most experts
Apparently rats are pretty good too!

I understand monkeys can pick better than most experts, so why not an AI? Just make it random and call it an "advanced simian Mersenne twister AI" or something.

Much more money is made by the sellers of "money-making" systems then by the users of those systems.
Instead of trying to predict the performance of companies over which you have no control, form a company yourself and fill some need by offering a product or service (yes, your product might be a stock-predicting program, but something a little less theoretical is probably a better idea). Work hard, and your company's own value will rise much quicker than any gambling you'd do on stocks. You'll also have plenty of opportunities to apply programming skills to the myriad of internal requirements your own company will have.

If you want to go down this long, dark, lonesome road of trying to pick stocks you may want to look into data mining techniques using advanced data mining software such as SPSS or SAS or one of the dozen others.
You'll probably want to use a combination or technical indicators and fundamental data. The data will more than likely be highly correlated so a feature reduction technique such as PCA will be needed to reduce the number of features.
Also keep in mind your data will constantly have to be updated, trimmed, shuffled around because market conditions will constantly be changing.
I've done research with this for a grad level class and basically I was somewhat successful at picking whether a stock would go up or down the next day but the number of stocks in my data set was fairly small (200) and it was over a very short time frame with consistent market conditions.
What I'm trying to say is what you want to code has been done in very advanced ways in software that already exists. You should be able to input your data into one of these programs and using either regression, or decision trees or clustering be able to do what you want to do.

I have been thinking of this for a few months.
I am thinking about Random Matrix Theory/Wigner's distribution.
I am also thinking of Kohonen self-learning maps.

These comments on speculation and past performance apply to you as well.

I recently completed my masters thesis on deep learning and stock price forecasting. Basically, the current approach seems to be LSTM and other deep learning models. There are also 10-12 technical indicators (TIs) based on moving average that have been shown to be highly predictive for stock prices, especially indexes such as SP500, NASDAQ, DJI, etc. In fact, there are libraries such as pandas_ta for computing various TIs.

I represent a group of academics that are trying to predict stocks in a general form that can also be applied to anything, even the rating of content.
Our algorithm, which we describe as truth seeking, works as follows.
Basically each participant has their own credence rating. This means that the higher your credence or credibility, then the more their vote counts. Credence is worked out by how close to the weighted credence each vote is. It's like you get a better credence value the closer you get to the average vote that has already been adjusted for credence.
For example, let's say that everyone is predicting that a stock's value will be at value X in 30 day's time (a future's option). People who predict on the average get a better credence. The key here is that the individual doesn't know what the average is, only the system. The system is tweaked further by weighting the guesses so that the target spot that generates the best credence is those votes that are already endowed with more credence. So the smartest people (historically accurate) project the sweet spot that will be used for further defining who gets more credence.
The system can be improved too to adjust over time. For example, when you find out the actual value, those people who guessed it can be rewarded with a higher credence. In cases where you can't know the future outcome, you can still account if the average weighted credence changes in the future. People can be rewarded even more if they spotted the trend early. The point is we don't need to even know the outcome in the future, just the fact that the weighted rating changed in the future is enough to reward people who betted early on the sweet spot.
Such a system can be used to rate anything from stock prices, currency exchange rates or even content itself.
One such implementation asks people to vote with two parameters. One is their actual vote and the other is an assurity percentage, which basically means how much a particular participant is assured or confident of their vote. In this way, a person with a high credence does not need to risk downgrading their credence when they are not sure of their bet, but at the same time, the bet can be incorporated, it just won't sway the sweet spot as much if a low assurity is used. In the same vein, if the guess is directly on the sweet spot, with a low assurity, they won't gain the benefits as they would have if they had used a high assurity.

Related

polynomial regression for payroll examination

I serve as internal auditor in few clients ,one of my client has thousands of employees in different location, most of them in the head office, the client looks for corporate control for the salary monitoring
is it make sense to use the regression method in order to find outliers,
potential parameter can be -years of experience, gender, level/rank etc
I planned to go over all the monthly payroll and look for significant outliers ,because of the differences between the global location, it might be a good idea to focus only in the head office
the idea is to train the model for previous months average and test it for the current month
what do you think is too much effort or theoretical ? or can have a good chance to bring value ?
thank you
This answers your question regarding the regression method to use. It makes sense to only use data from the head office, as adding data from different geographies will require you to add more data around general demographics, which you can avoid for a proof of concept.
Coming to the problem itself, you'll need to provide a better explanation of how you're defining outliers. Are you looking for mistakes in payroll? Or are you looking for people who make significantly more/less than their peers? You'll only be able to decide on a modelling framework once you get clarity the basic definitions.
Also, you might want to consider statistical significance tests like Grubbs test (more information on tests here) first, before moving to machine learning approaches. They're easier to set up and explain to non-practitioners.

What model to use for forecasting when customers will arrive?

I'm trying to build a model which will give the probability of every customer in a database will show up on a certain day (i.e. I pass in 8/25/19 and the list of all customers shows up with their respective probability). I have the logs for all customers transactions and the date. I'm thinking of using some sort of RNN to do this. Is this the proper way to do this? If not, what is the best way to do it? I want to discover the patterns and high confidence leads for which customers show up. There is around 400,000 records for 3 years.
You have time series data.
RNN is a good starting point. Check out this step-by-step instructions of sales prediction. RNN is an easy start and might give you really good quality. Also there is an adaptation of xgboost algorithm for time series that also gives a good quality, but might be slower.
Good luck!

Advice on classifying users in machine learning scenario

I'm looking for some advice in the problem of classifying users into various groups based on there answers to a sign up process.
The idea is that these classifications will group people with similar travel habits, i.e. adventurous, relaxing, foodie etc. This shouldn't be a classification known to the user, so isn't as simple as just asking what sort of holidays they like ( The point is to remove user bias/not really knowing where to place yourself).
The way I see it working is asking questions such as apps they use, accounts they interact with on social media (gopro, restaurants etc) , giving some scenarios and asking which sounds best, these would be chosen from a set provided to them, hence we have control over the variables. The main problem I have is how to get numerical values associated to each of these.
I've looked into various Machine learning algorithms and have realised this is most likely a clustering problem but I cant seem to figure out how to use this style of question to assign a value to each dimension that will actually give a useful categorisation.
Another question I have is whether there is some resources where I could find information on the sort of questions to ask users to gain information that'd allow classification like this.
The sort of process I envision is one similar to https://www.thread.com/signup/introduction if anyone is familiar with it.
Any advice welcomed.
The problem you have at hand is that you want to calculate a similarity measure based on categorical variables, which is the choice of their apps, accounts etc. Unless you measure the similarity of these apps with respect to an attribute such as how foodie is the app, it would be a hard problem to specify. Also, you would need to know all the possible states a categorical variable can assume to create a similarity measure like this.
If the final objective is to recommend something that similar people (based on app selection or social media account selection) have liked or enjoyed, you should look into collaborative filtering.
If your feature space is well defined and static (known apps, known accounts, limited set with few missing values) then look into content based recommendation systems, something as simple as Market Basket Analysis can give you a reasonable working model.
Else if you really want to model the system with a bunch of features that can assume random states, this could be done with multivariate probabilistic models, if the structure (relationships and influences between features) is well defined, you could benefit from Probabilistic Graphical Models, such as Bayesian Networks.
You really do need to define your problem better before you start solving it though.
You can use prime numbers. If each choice on the list of all possible choices is assigned a different prime, and the user's selection is saved as a product, then you will always know if the user has made a particular choice if the modulo of selection/choice is 0. Beauty of prime numbers, voila!

Applying AI, recommendation or machine learning techniques to search feature

I'm new to the area of AI, machine learning, recommendation engines and data mining however would like to find a way to get into the area.
I'm working on an conference room booking application which will recommend meeting rooms to employees at which it calculates to be the most suitable time and location. The recommendations are based on criteria which an employee will enter before submitting a search. The criteria can include meeting attendees (which can be in different locations and timezones), room capacity (based on attendees) and types of equipment required.
The recommendation engine will take into consideration timezones and locations and recommend one or more meetings rooms , depending on whether employees are in different builings/geo-graphical regions.
Can anyone recommend recommendation engine, machine learning or AI techniques which i could apply to solving the solution? I'm new to this area so all suggestions are greatly appreciated.
This looks more like an optimization problem. You have some hard constraints and some preferences. Look at Linear Programming. Also google Constraint based Scheduling, there are several tutorials.
Just a warning: This is in general an NP-hard problem, so unless you are trying to solve it for a small number participants, you will need to use some heuristics and approximations. If you want to go a little bit overboard, there is a coursera class on optimization running right now.

How does Collective Intelligence beat Experts' view? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am interested in doing some Collective Intelligence programming, but wonder how it can work?
It is said to be able to give accurate predictions: the O'Reilly Programming Collective Intelligence book, for example, says a collection of traders' action actually can predict future prices (such as corn) better than an expert can.
Now we also know in statistics class that, if it is a room of 40 students taking exam, there will be 3 to 5 students who will get an "A" grade. There might be 8 that get "B", and 17 that got "C", and so on. That is, basically, a bell curve.
So from these two standpoints, how can a collection of "B" and "C" answers give a better prediction than the answer that got an "A"?
Note that the corn price, for example, is the accurate price factoring in weather, demand of food companies using corn, etc, rather than "self fulfilling prophecy" (more people buy the corn futures and price goes up and more people buy the futures again). It is actually predicting the supply and demand accurately to give out an accurate price in the future.
How is it possible?
Update: can we say Collective Intelligence won't work in stock market euphoria and panic?
The Wisdom of Crowds wiki page offers a good explanation.
In short, you don't always get good answers. There needs to be a few conditions for it to occur.
Well, you might want to think of the following "model" for a guess:
guess = right answer + error
If we ask a lot of people a question, we'll get lots of different guesses. But if, for some reason, the distribution of errors is symmetric around zero (actually it just has to have zero mean) then the average of the guesses will be a pretty good predictor of the right answer.
Note that the guesses don't necessarily have to be good -- i.e., the errors could indeed be large (grade B or C, rather than A) as long as there are grade B and C answers distributed on both sides of the right answer.
Of course, there are cases where this is a terrible model for our guesses, so collective intelligence won't always work...
Crowd Wisdom techniques, like prediction markets, work well in some situations, and poorly in others, just as other approaches (experts, for instance) have their strengths and weaknesses. The optimal arenas therefore, are ones where no other approaches do very well, and prediction markets can do well. Some examples include predicting public elections, estimating project completion dates, and predicting the prevalence of epidemics. These are areas where information is spread around sparsely, and experts haven't found effective models that reliably predict.
The general idea is that market participants make up for one another's weaknesses. The expectation isn't that the markets will always predict every outcome correctly, but that, due to people noticing other people's mistakes, they won't miss crucial information as often, and that over the long haul, they'll do better. In cases where the exerts actually know the answer, they'll be able to influence the outcome. Different experts can weigh in on different questions, so each has more influence where they have the most knowledge. And as markets continue over time, each participant gets feedback from their gains and losses that makes them better informed about which kinds of questions they actually understand and which ones they should stay away from.
In a classroom, people are often graded on a curve, so the distribution of grades doesn't tell you much about how good the answers were. Prediction markets calibrate all the answers against actual outcomes. This public record of successes and failures does a lot to reinforce the mechanism, and is missing in most other approaches to forecasting.
Collective intelligence is really good at coming up to to problems that have complex behavior behind them, because they are able to take multiple sources of opinions/attributes to determine the end result. With a setup like this, training helps to optimize the end result of the processes.
The fault is in your analogy, both opinions are not equal. Traders predict direct profit for their transaction (the little part of the market they have overview over), while the expert tries to predict the overall field.
IOW the overall traders position is pieced together like a jigsaw puzzle based on a large amount of small opinions for their respective piece of the pie (where they are assumed to be experts).
A single mind can't process that kind of detail, which is why the overall position MIGHT overshadow the real expert. Note that this is particularly phenomon is usually restricted to a fairly static market, not in periods of turmoil. Expert usually do better then, since they are often better trained and motivated to avoid going with general sentiment. (which is often comparable to that of a lemming in times of turmoil)
The problem with the class analogy is that the grading system doesn't assume that the students are masters in their (difficult to predict) terrain, so it is not comparable.
P.s. note that the base axiom depends on all players being experts in a small piece of the field. One can debate if this requirement actually transports well to a web 2 environment.

Resources