Related
I have panel data consisting of time series for 120 months, 45 institutions and approximately 8 variables for each one. I want to do a cluster analysis in order to detect stressed institutions based on dynamic clustering analysis. For instance, check if a stressed institution does move from one cluster to another, or if its behavior changes so much that it is no longer part of its own cluster.
The idea would be to use the information up to time t to cluster the institutions and get the clusters for each institution so it can evolve with new information and use all the information available up to that point from all the banks, with time varying clusters.
My first idea was to use statistical control techniques and anomaly detection for time series such as the ones in the package anomaly, but this procedure does not use all the information from the other banks, just its own. It might be that the whole system is stressed, so detecting an anomaly in one bank might be because of the system and not because of the particular bank.
I also tried using clustering in each period through hierarchical clustering, and did a decent job on classifying the institutions based on my knowledge of them. However, this procedure only uses data at each point in time, not all the data available up to that point.
I had the idea of using clustering methods for panel data at each point in time, using the data up to that point, and cycling through each month to get dynamic clusters using the whole dataset. However, I don't know if this approach makes sense, or if there are better methods to do this kind of analysis.
Thank you very much!
This is more or less general question, in my implementation of backpropagation algorithm, I start from some "big" learning rate, and then decrease it after I see the error started to grow, instead of narrowing down.
I am able to do this rate decrease either after I got error grow a bit (StateA), or just before it's about to grow (StateB, kind of rollback to previous "successful" state)
So the question is what is better from mathematical points of view?
Or do I need to execute two parallel testing, let's say try to learn from point StateA, then point StateB both with reduced learning rate and compare which one is decreasing faster?
BTW I did't try approach from last paragraph. It's only pop up in mind during I write this question. In current implementation of algorithm I continue learning from StateA with decreased learning rate with assumptions that the decrease in learning rate is rather small to make me go back in previous direction to global minimum, if I accidentally faced only local minimum
What you describe is one of a collection of techniques called Learning Rate Scheduling. Just for you to know, there are more than two techniques:
Predetermined piesewise constant learning rate
Performance scheduling (looks like the closest one to yours)
Exponential scheduling
Power scheduling
...
The exact performance of each one greatly depends on the optimizer (SGD, Momentum, NAG, RMSProp, Adam, ...) and the data manifold (i.e. the training data and objective function). But they have been studied in regard to deep learning problems. For example, I'd recommend you this paper by Andrew Senior at al that compared various techniques for speech recognition task. The authors concluded is that exponential scheduling performed the best. If you're interested in math behind it, you should definitely take a look at their study.
I am working on a machine learning scenario where the target variable is Duration of power outages.
The distribution of the target variable is severely skewed right (You can imagine most power outages occur and are over with fairly quick, but then there are many, many outliers that can last much longer) A lot of these power outages become less and less 'explainable' by data as the durations get longer and longer. They become more or less, 'unique outages', where events are occurring on site that are not necessarily 'typical' of other outages nor is data recorded on the specifics of those events outside of what's already available for all other 'typical' outages.
This causes a problem when creating models. This unexplainable data mingles in with the explainable parts and skews the models ability to predict as well.
I analyzed some percentiles to decide on a point that I considered to encompass as many outages as possible while I still believed that the duration was going to be mostly explainable. This was somewhere around the 320 minute mark and contained about 90% of the outages.
This was completely subjective to my opinion though and I know there has to be some kind of procedure in order to determine a 'best' cut-off point for this target variable. Ideally, I would like this procedure to be robust enough to consider the trade-off of encompassing as much data as possible and not telling me to make my cut-off 2 hours and thus cutting out a significant amount of customers as the purpose of this is to provide an accurate Estimated Restoration Time to as many customers as possible.
FYI: The methods of modeling I am using that appear to be working the best right now are random forests and conditional random forests. Methods I have used in this scenario include multiple linear regression, decision trees, random forests, and conditional random forests. MLR was by far the least effective. :(
I have exactly the same problem! I hope someone more informed brings his knowledge. I wander to what point is a long duration something that we want to discard or that we want to predict!
Also, I tried treating my data by log transforming it, and the density plot shows a funny artifact on the left side of the distribution ( because I only have durations of integer numbers, not floats). I think this helps, you also should log transform the features that have similar distributions.
I finally thought that the solution should be stratified sampling or giving weights to features, but I don't know exactly how to implement that. My tries didn't produce any good results. Perhaps my data is too stochastic!
It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
how to expand spinner time ?
From The Nature of Time:
The Nature of Time
We have all been asked the question
'what is the time?', but this entry
will be addressing the subtly
different question: 'what is time?'
Many people, on being asked this
question, would probably say that they
don't have the time to answer. In this
entry, we will explore many theories
about the nature of time. No one
theory has unquestioning truth about
it, so it will be up to you to decide
on the one you see is best.
The Classical Approach to Time
There is not very much to say on this
theory since it is the one with which
we are most familiar. Traditionally,
time is simply seen as a measure of
the distance between events. It has a
past, present and a future. The past
is considered to have already happened
and to be unchangeable, while the
future is considered to be open to
many possibilities. Humans measure
time using many units, some based on
real events like the rotation of the
Earth, others that are even more
arbitrary.
Isaac Newton's classical description
of time in his highly-regarded work
Principia is that it 'flows equably of
itself', which means that time 'flows'
at a constant rate that is the same
for everybody - it is independent of
the events that take place in it. It
would be untrue to say that this idea
was unchallenged until the twentieth
century - the 18th-Century empiricist
philosopher George Berkeley, for
example, disagreed with Newton and
held that time was 'the succession of
ideas in the mind' - but there was no
serious evidence to suggest that
Newton's elegant and absolute
description was wrong until Einstein
destroyed it.
Unfortunately, the classical view of
time is biased towards the human
perception of the 'flow' of time. We
see events in one direction, and we
assume time to be the same everywhere.
The classical approach to time does
not explain exactly why we perceive
time in this way, and it does not
describe how the effect is achieved.
The other theories of the nature of
time challenge the very roots of this
natural point of view.
Relativity
The Theory of Relativity is the
celebrated discovery of the physicist
Albert Einstein. Originally, it was
two theories: the Special Theory of
Relativity came first in 1905 and
states that the rate at which time
passes is not the same all over the
universe - it is dependent on the
observer (in other words, it is
relative). It is not hard to see that
different people perceive the passing
of time at a different rate to others:
as we get older, less information is
processed about our surroundings per
second, so we perceive time to be
going faster.
But Einstein's theory went further
than this. The relativity of time is
based not on our age, but on our speed
of movement through space. The faster
we travel through space, the slower we
travel through time. Although this
sounds crazy at first, it makes sense
when thought of in a particular way.
The theory of relativity demands that
we view space and time not as separate
entities but as one concept called
space-time. Time becomes a fourth
dimension, just like the other three
dimensions of space that we are used
to (height, width and length). This
view of time is crucial to
understanding most of the other
theories about time's ultimate nature.
Humans only possess two-dimensional
retinae (the light-receptive surface
at the back of our eyes), which means
that we can only see in two
dimensions. Our vision of the third
dimension is a result of perspective
and the existence of our binocular
vision. If we had three-dimensional
retinae, we would be able to see all
of an entire room simultaneously - its
walls, its floor and its ceiling at
the same time! For this reason, it is
very difficult, if not totally
impossible, for humans to visualise a
fourth dimension.
To overcome this impairment, it is
useful to use lower-dimensional
analogies when talking about
dimensions above three, even if we are
talking about time as one of these
dimensions. So in this case, let us
imagine that the universe is shaped
like a cuboid1, and that humans are
two-dimensional and have
one-dimensional retinae. Imagine that
the spatial dimensions are the width
and height of a cross-section of the
cuboid, meaning that humans can move
up, down, left and right at will
within the cuboid. Imagine that the
depth of the cuboid is time.
Right, now imagine that you are a
two-dimensional human within the
cuboid and that you start off being
midway up the cuboid. Then you start
moving upward (ie, through space, but
not time). Eventually you hit the edge
of the cuboid. Now imagine that you
move downwards, but that you also move
through time in a forward direction.
This time it will take you longer to
get back to being mid-way up the
cuboid because you are not taking a
direct route downwards - you are also
moving through time. As you can see,
moving through time slows down your
movement in space.
It works the other way around too. If
you stay still in space and just move
forward in time, then it will take
less time to get to a particular point
in time than if you move upwards and
forwards in time simultaneously. So
movement in space slows down your
movement in time. This is what
relativity states about how time
really is. However, the amount by
which time is slowed down when you
move through space is very small in
everyday situations, and you would
need to move at a speed of a
considerable percentage the speed of
light in order for it to make any
noticeable difference.
Relativity has been proven too. Atomic
clocks2 have been placed in aeroplanes
moving at high speeds and then
compared with clocks that were on
Earth. Slight differences that are
exactly predicted by the mathematical
equations of relativity were indeed
detected.
The general theory of relativity goes
a step further and was published in
1916. Einstein stated that mass curves the 'fabric' of space-time to create
the illusion of the force of gravity.
Again, a lower-dimensional analogy is
best. Imagine putting bowling balls on
a sheet of rubber. They bend the
rubber. Any object coming into the
vicinity of the curve begins to move
around the curve like something
spiralling around a sink basin.
Einstein's picture of gravity is that
simple. And again, this has been
proved. Einstein made predictions
about how light would be taking
technically curved paths around large
masses, and this effect was measured
during a total eclipse of the sun.
Time and Determinism
You will have noticed that the theory
of relativity does not carry any
description of a 'flow' of time, and
in fact, it describes time in almost
exactly the same way that we are used
to thinking about space. Relativity
unifies space and time. All points in
space are in existence simultaneously
- this is common sense; so are all points in time in existence
simultaneously too? This would suggest
that all events in time are already
'here' and that there is no scope for
choosing what happens in the future.
This view of time is called
determinism because events are
pre-determined.
It is worth noting that relativity
does not rule out the idea of free
will, but does not provide any support
for it either. Many people can get
upset about the evidence supporting
determinism because humans like to
think they have a free will to make
independent decisions. Such people
would not feel better if they heard
about the many worlds theory of
quantum mechanics.
Time in the Many Worlds Theory of
Quantum Mechanics
To understand this theory, we need to
go back to our cuboid example. You
will notice that each event in time is
simply a cross-section of the cuboid
(a square). Humans effectively
perceive the dimension of time in this
cuboid to be a succession of these
squares. Like frames in a movie, these
create the illusion of a smooth
passage of time. But why is it that we
see time like this? The answer to this
question will be explored later.
If you think about the world around
you, you will most likely notice that
it seems to have been tailor-made for
life. The universe has the precise
properties that led to the formation
of life on Earth. For example, in the
early universe there was a 'battle'
between matter and anti-matter3. The
particles with the certain quantum
properties that we now characterise as
'matter', for a hitherto inexplicable
reason, won the battle. If this hadn't
happened, we could not exist, or we
would not be the same as we are today.
Many physicists have speculated that
this and other similar events are too
much of a coincidence to be regarded
as just that: a coincidence.
Martin Rees, the Astronomer Royal of
the UK, paints an analogous picture of
going into a clothes shop. If you go
into a clothes shop that only sells
one size of clothing, it would be a
big coincidence if you found it was
your size. However, we get no surprise
when finding our own clothes size in a
clothes shop because good clothes
shops sell a wide range of clothes
sizes. We can now extend this picture
to the universe. It is very unlikely
that the universe should exist because
of how biased it seems to have been
towards gravitational stability and
the creation of diverse life later on.
However, if we see the universe as
providing a wide range of 'universes'
of different properties, it will come
as no surprise if we find one universe
that supports life.
You can think of this theory as
multiple cuboids in a vast universe of
cuboids, all with their own
space-time. Each cuboid represents one
universe that has a different set of
laws of physics, and therefore could
be wildly different from all the other
universes. There may in fact be a
large number of universes that support
life but with small differences, just
as there might be many shirts of your
size in the clothes shop, but perhaps
in different colours.
In this view, there are multiple
timelines. Some people have likened
this view of time to train tracks. We
move along a train track in one
direction, but there are huge numbers
of other train tracks running parallel
to ours. Each train track may be
different in some way (it might have
trains on it, for example). For this
reason, the other universes around us
in this 'multiverse'4 are referred to
as parallel universes.
A multiverse of space-times is not
just a theory that solves the question
of why our environment is so suited to
life; it is also a theory of quantum
mechanics. In the quantum theory there
are many events that take place
because of random chance. In electric
currents, for example, the electrons
that make up the current follow a
random path in the wires that is
influenced by a fields of electrical
forces they pass through, which is why
it always seems that the current is
split 50:50. Many physicists believe
that with each quantum decision like
this, every possibility has a separate
universe in which it is enacted.
Hence, in one universe the electron
goes one way; in another, it goes the
other way.
In this theory - which is called the
many worlds interpretation of quantum
mechanics - every possibility gets
enacted. Since quantum interactions
are the fundamentals of any larger (or
'macroscopic') reaction, we can infer
that everything happens in one
universe or other. So if you have a
decision to make, say whether to take
a holiday to Hawaii or not, there is
one universe where you go, and one
universe where you don't. This also
spells trouble for free will. All
possibilities get played out, so it is
just a matter of which universe you
are in to determine which way you go.
There is a variation of this theory.
For this variation we will need to
think another dimension lower. So,
instead of imagining universes as
cuboids, we need to imagine them as
rectangles. Imagine the length of the
rectangle is time; and its other
dimension, space. The rectangle has no
thickness whatsoever, so if you put
multiple rectangles (ie, multiple
universes) on top of each other, the
whole structure becomes no thicker.
This version of the many worlds
interpretation is slightly easier to
grasp, because otherwise we would have
universes branching off from one
another to eternity, which is rather
difficult to imagine. There is no real
evidence for or against either of the
theories of the multiverse,
unfortunately.
You will have noticed that in all
these theories, time has two
directions just like all the other
dimensions. In theory, there is
nothing to stop us from moving in the
other direction. There is another
slightly different theory of time as
being bi-directional, and you might
also be interested to see how this
could lead to possibilities of
time-travel.
Why Do We Perceive Time the Way We Do?
What, then, is time? If no one asks
me, I know what it is. If I wish to
explain it to him who asks me, I do
not know.
- St Augustine. If time is a dimension just like all the others,
why do we experience it so
differently? This is the question that
interests James Hartle of the
University of California in Santa
Barbara, along with physicists Stephen
Hawking, Murray Gell-Mann and Steven
Weinberg. They believe that the
passage of time is just an illusion.
Hartle thinks that time's arrow is a
product of the way we process
information. Gell-Mann gave creatures
that process time in this way the name
'information gathering and utilising
systems' (IGUSs). Humans are IGUSs.
Because of our two-dimensional
retinae, we can't take in multiple
cross-sections of the 'cuboid' - ie
'frames' of time - simultaneously. We
gather information about this frame -
our surrounding environment - using
our senses, and then we store the
information in an input register. This
does not have an unlimited capacity,
so we have to transfer the information
to our memory registers before we can
input the information about the next
frame. Humans have a short-term and a
long-term memory, as well as our
cerebellums that store 'unforgettable'
information (such as how to swim).
IGUSs also carry something called a
'schema', which is a generalised model
of our perception of our environment.
It holds several rules about what is
best to do and what is not a good idea
to do. The information we receive from
our surroundings is passed to the
schema to determine how we react in
certain situations. The decision is
conscious, but we also do unconscious
computation of information: the schema
is updated unconsciously. The
conscious part of the IGUS in humans
focuses on the input register, which
we call the present. The unconscious
part focuses on information in the
memories, and we call that the past.
This is why we consciously experience
the present and remember the past.
The movement of the information
through the IGUSs registers creates
the illusion of the flow of time. It
is not time itself that flows. Each
IGUS has a different speed for the
flow of its information between
registers. This corresponds to
differences between the perception of
the speed of the flow of time. Flies,
for example, need to process more
information per second in order to fly
so quickly but still avoid common
obstacles; therefore, they perceive
time as going slower. To us, a fly's
perception of time would look like
slow motion. Flies only live for a few
days, or a few weeks as a maximum, and
to a human, this is a very short
lifetime. But to a fly, this feels a
lot longer.
So the reason that we experience a
'flow' of time could just be because
of how we process information. It is a
competitive advantage to us as a
species to process information bits at
a time. It wouldn't make sense for us
to have evolved with the capability to
see all time simultaneously.
Digital Time, or, Is Time Like a
Movie?
You may have noticed a reference to
'frames' of time in the explanations
above. We usually think of time as
continuous - a smooth passage of
events. However, most physical
theories define space and time as
being the opposite of a continuous
passage of events. M-theory and Loop
Quantum Gravity, for example, are both
serious scientific theories (not
proven theories, though) that state
that space and time have minimum
units. There was even a theory of
quantum mechanics to suggest that time
was made of particles called
'chronons'!
The theorised minimum length of time
possible is called the Planck time and
is equivalent to 10-43 seconds. When
space or time is 'digital' like this,
we say that it is 'discrete'.
If this theory is true, then our
perception of time could be like a
movie. Movies are not continuous: if
you slow them down enough, you see
that they are just collections of
still photographs played in quick
succession. We process information
about our surroundings and obtain a
picture just like one frame of a movie
or animation. When 'played' in quick
succession, this creates the illusion
of smooth, continuous movement.
Is Time Really That Much Like Space?
So far, time has been seen as very
much like a dimension of space, and
its passage in one direction has been
seen as an illusion. But there are
some counter-arguments; there are
still some big differences between
time and space that cannot easily be
explained as illusions.
One way of supporting the idea that an
'arrow of time' is irrelevant is by
proving that all processes are the
same if done forwards or backwards. In
quantum mechanics, most interactions
between particles are 'time-symmetric'
- it doesn't matter whether you look at them from past to future or future
to past because they look the same.
But this is not true of macroscopic
objects. Wine glasses shatter, but you
rarely see shards of glass assemble
themselves into wine glasses.
Physicists can explain why shards of
glass do not form wine glasses by
postulating the existence of 'the
thermodynamic arrow of time'.
Thermodynamics is basically a
collection of laws. Here is how the
chemist PW Atkins summarises them:
There are four laws. The third of
them, the Second Law, was recognised
first; the first, the Zeroth law, was
formulated last; the First Law was
second; the Third Law might not even
be a law in the same sense as the
others. The gist of it is that the
universe is always becoming more
disordered. The disorder of the
universe is called 'entropy', so we
say that entropy is always increasing.
Nobody really knows why this is the
case, but we see it all the time in
experiments. This is why heat always
flows into colder areas, but never the
other way round. Heat is simply the
result of giving particles in a given
system more energy; they begin to move
and vibrate randomly, which is a
disordered state. Colder things are
more ordered because their constituent
particles tend to be harder to move.
This in-built arrow explains why
macroscopic objects have irreversible
interactions. This is a clear
difference from space. If you think of
the spatial manifestation of a table,
it does not follow that one end of the
table is more disordered than the
other, but it does follow that the
table will end up more disordered in
the future than when it has just been
made. Hence, there is a very distinct
difference between time and space.
Can Time Be Reversed?
If time's 'flow' in one direction
really is an illusion, what is there
stopping us from reversing it? In
theory, nothing! Lawrence Schulman of
Clarkson University in New York
thoroughly believes that time can run
backwards. In other words, shards of
glass can turn into wine glasses,
people grow younger and younger and
the universe gets smaller and smaller.
In fact, Schulman goes as far as to
say that such reversed-time zones can
exist as spaces within our own
universe. A computer simulation has
shown that regions with opposite time
arrows do not cancel each other out
and do not disturb each other at all.
The great thing about this theory is
that if a civilisation in a
reversed-time region kept records of
events that occur in our future, the
records might have survived to our
past (which is their future). Finding
these records could tell us the
future. This is, of course, a long
shot, but still a physical
possibility.
Another possibility is that the
universe's arrow of time (as far as
thermodynamics is concerned) will
naturally reverse itself at a crucial
point in the history of the universe.
At this point, the universe would
start to get smaller and everybody
would get younger until there was a
big crunch analogous to the big bang.
This creates a perfect symmetry to the
universe.
Again, there is little evidence that
shows us that reversed-time regions
exist, and there is no evidence that
the universe's thermodynamic arrow of
time will naturally reverse itself.
Equally, there is little evidence
against these theories either.
So what is time? Is it a dimension
just like space? Does it flow, or is
that just an illusion? Is time digital
like the frames of a movie, or does it
flow continuously? And can time really
be reversed or manipulated? None of
these questions can be answered with
definite confidence, but next time
somebody asks you what the time is,
perhaps you'll think of the answer
differently.
Kernel::sleep ?
Given a set of data very similar to the Motley Fool CAPS system, where individual users enter BUY and SELL recommendations on various equities. What I would like to do is show each recommendation and I guess some how rate (1-5) as to whether it was good predictor<5> (ie. correlation coefficient = 1) of the future stock price (or eps or whatever) or a horrible predictor (ie. correlation coefficient = -1) or somewhere in between.
Each recommendation is tagged to a particular user, so that can be tracked over time. I can also track market direction (bullish / bearish) based off of something like sp500 price. The components I think that would make sense in the model would be:
user
direction (long/short)
market direction
sector of stock
The thought is that some users are better in bull markets than bear (and vice versa), and some are better at shorts than longs- and then a combination the above. I can automatically tag the market direction and sector (based off the market at the time and the equity being recommended).
The thought is that I could present a series of screens and allow me to rank each individual recommendation by displaying available data absolute, market and sector out performance for a specific time period out. I would follow a detailed list for ranking the stocks so that the ranking is as objective as possible. My assumption is that a single user is right no more than 57% of the time - but who knows.
I could load the system and say "Lets rank the recommendation as a predictor of stock value 90 days forward"; and that would represent a very explicit set of rankings.
NOW here is the crux - I want to create some sort of machine learning algorithm that can identify patterns over a series of time so that as recommendations stream into the application we maintain a ranking of that stock (ie. similar to correlation coefficient) as to the likelihood of that recommendation (in addition to the past series of recommendations ) will affect the price.
Now here is the super crux. I have never taken an AI class / read an AI book / never mind specific to machine learning. So I cam looking for guidance - sample or description of a similar system I could adapt. Place to look for info or any general help. Or even push me in the right direction to get started...
My hope is to implement this with F# and be able to impress my friends with a new skill set in F# with an implementation of machine learning and potentially something (application / source) I can include in a tech portfolio or blog space;
Thank you for any advice in advance.
I have an MBA, and teach data mining at a top grad school.
The term project this year was to predict stock price movements automatically from news reports. One team had 70% accuracy, on a reasonably small sample, which ain't bad.
Regarding your question, a lot of companies have made a lot of money on pair trading (find a pair of assets that normally correlate, and buy/sell pair when they diverge). See the writings of Ed Thorpe, of Beat the Dealer. He's accessible and kinda funny, if not curmudgeonly. He ran a good hedge fund for a long time.
There is probably some room in using data mining to predict companies that will default (be unable to make debt payments) and shorting†them, and use the proceeds to buy shares in companies less likely to default. Look into survival analysis. Search Google Scholar for "predict distress" etc in finance journals.
Also, predicting companies that will lose value after an IPO (and shorting them. edit: Facebook!). There are known biases, in academic literature, that can be exploited.
Also, look into capital structure arbitrage. This is when the value of the stocks in a company suggest one valuation, but the value of the bonds or options suggest another value. Buy the cheap asset, short the expensive one.
Techniques include survival analysis, sequence analysis (Hidden Markov Models, Conditional Random Fields, Sequential Association Rules), and classification/regression.
And for the love of God, please read Fooled By Randomness by Taleb.
†shorting a stock usually involves calling your broker (that you have a good relationship with) and borrowing some shares of a company. Then you sell them to some poor bastard. Wait a while, hopefully the price has gone down, you buy some more of the shares and give them back to your broker.
My Advice to You:
There are several Machine Learning/Artificial Intelligence (ML/AI) branches out there:
http://www-formal.stanford.edu/jmc/whatisai/node2.html
I have only tried genetic programming, but in the "learning from experience" branch you will find neural nets. GP/GA and neural nets seem to be the most commonly explored methodologies for the purpose of stock market predictions, but if you do some data mining on Predict Wall Street, you might be able to utilize a Naive Bayes classifier to do what you're interested in doing.
Spend some time learning about the various ML/AI techniques, get a small data set and try to implement some of those algorithms. Each one will have its strengths and weaknesses, so I would recommend that you try to combine them using Naive Bays classifier (or something similar).
My Experience:
I'm working on the problem for my Masters Thesis so I'll pitch my results using Genetic Programming: www.twitter.com/darwins_finches
I started live trading with real money in 09/09/09.. yes, it was a magical day! I post the GP's predictions before the market opens (i.e. the timestamps on twitter) and I also place the orders before the market opens. The profit for this period has been around 25%, we've consistently beat the Buy & Hold strategy and we're also outperforming the S&P 500 with stocks that are under-performing it.
Some Resources:
Here are some resources that you might want to look into:
Max Dama's blog: http://www.maxdama.com/search/label/Artificial%20Intelligence
My blog: http://mlai-lirik.blogspot.com/
AI Stock Market Forum: http://www.ai-stockmarketforum.com/
Weka is a data mining tool with a collection of ML/AI algorithms: http://www.cs.waikato.ac.nz/ml/weka/
The Chatter:
The general consensus amongst "financial people" is that Artificial Intelligence is a voodoo science, you can't make a computer predict stock prices and you're sure to loose your money if you try doing it. None-the-less, the same people will tell you that just about the only way to make money on the stock market is to build and improve on your own trading strategy and follow it closely.
The idea of AI algorithms is not to build Chip and let him trade for you, but to automate the process of creating strategies.
Fun Facts:
RE: monkeys can pick better than most experts
Apparently rats are pretty good too!
I understand monkeys can pick better than most experts, so why not an AI? Just make it random and call it an "advanced simian Mersenne twister AI" or something.
Much more money is made by the sellers of "money-making" systems then by the users of those systems.
Instead of trying to predict the performance of companies over which you have no control, form a company yourself and fill some need by offering a product or service (yes, your product might be a stock-predicting program, but something a little less theoretical is probably a better idea). Work hard, and your company's own value will rise much quicker than any gambling you'd do on stocks. You'll also have plenty of opportunities to apply programming skills to the myriad of internal requirements your own company will have.
If you want to go down this long, dark, lonesome road of trying to pick stocks you may want to look into data mining techniques using advanced data mining software such as SPSS or SAS or one of the dozen others.
You'll probably want to use a combination or technical indicators and fundamental data. The data will more than likely be highly correlated so a feature reduction technique such as PCA will be needed to reduce the number of features.
Also keep in mind your data will constantly have to be updated, trimmed, shuffled around because market conditions will constantly be changing.
I've done research with this for a grad level class and basically I was somewhat successful at picking whether a stock would go up or down the next day but the number of stocks in my data set was fairly small (200) and it was over a very short time frame with consistent market conditions.
What I'm trying to say is what you want to code has been done in very advanced ways in software that already exists. You should be able to input your data into one of these programs and using either regression, or decision trees or clustering be able to do what you want to do.
I have been thinking of this for a few months.
I am thinking about Random Matrix Theory/Wigner's distribution.
I am also thinking of Kohonen self-learning maps.
These comments on speculation and past performance apply to you as well.
I recently completed my masters thesis on deep learning and stock price forecasting. Basically, the current approach seems to be LSTM and other deep learning models. There are also 10-12 technical indicators (TIs) based on moving average that have been shown to be highly predictive for stock prices, especially indexes such as SP500, NASDAQ, DJI, etc. In fact, there are libraries such as pandas_ta for computing various TIs.
I represent a group of academics that are trying to predict stocks in a general form that can also be applied to anything, even the rating of content.
Our algorithm, which we describe as truth seeking, works as follows.
Basically each participant has their own credence rating. This means that the higher your credence or credibility, then the more their vote counts. Credence is worked out by how close to the weighted credence each vote is. It's like you get a better credence value the closer you get to the average vote that has already been adjusted for credence.
For example, let's say that everyone is predicting that a stock's value will be at value X in 30 day's time (a future's option). People who predict on the average get a better credence. The key here is that the individual doesn't know what the average is, only the system. The system is tweaked further by weighting the guesses so that the target spot that generates the best credence is those votes that are already endowed with more credence. So the smartest people (historically accurate) project the sweet spot that will be used for further defining who gets more credence.
The system can be improved too to adjust over time. For example, when you find out the actual value, those people who guessed it can be rewarded with a higher credence. In cases where you can't know the future outcome, you can still account if the average weighted credence changes in the future. People can be rewarded even more if they spotted the trend early. The point is we don't need to even know the outcome in the future, just the fact that the weighted rating changed in the future is enough to reward people who betted early on the sweet spot.
Such a system can be used to rate anything from stock prices, currency exchange rates or even content itself.
One such implementation asks people to vote with two parameters. One is their actual vote and the other is an assurity percentage, which basically means how much a particular participant is assured or confident of their vote. In this way, a person with a high credence does not need to risk downgrading their credence when they are not sure of their bet, but at the same time, the bet can be incorporated, it just won't sway the sweet spot as much if a low assurity is used. In the same vein, if the guess is directly on the sweet spot, with a low assurity, they won't gain the benefits as they would have if they had used a high assurity.