Methods to Find 'Best' Cut-Off Point for a Continuous Target Variable - machine-learning

I am working on a machine learning scenario where the target variable is Duration of power outages.
The distribution of the target variable is severely skewed right (You can imagine most power outages occur and are over with fairly quick, but then there are many, many outliers that can last much longer) A lot of these power outages become less and less 'explainable' by data as the durations get longer and longer. They become more or less, 'unique outages', where events are occurring on site that are not necessarily 'typical' of other outages nor is data recorded on the specifics of those events outside of what's already available for all other 'typical' outages.
This causes a problem when creating models. This unexplainable data mingles in with the explainable parts and skews the models ability to predict as well.
I analyzed some percentiles to decide on a point that I considered to encompass as many outages as possible while I still believed that the duration was going to be mostly explainable. This was somewhere around the 320 minute mark and contained about 90% of the outages.
This was completely subjective to my opinion though and I know there has to be some kind of procedure in order to determine a 'best' cut-off point for this target variable. Ideally, I would like this procedure to be robust enough to consider the trade-off of encompassing as much data as possible and not telling me to make my cut-off 2 hours and thus cutting out a significant amount of customers as the purpose of this is to provide an accurate Estimated Restoration Time to as many customers as possible.
FYI: The methods of modeling I am using that appear to be working the best right now are random forests and conditional random forests. Methods I have used in this scenario include multiple linear regression, decision trees, random forests, and conditional random forests. MLR was by far the least effective. :(

I have exactly the same problem! I hope someone more informed brings his knowledge. I wander to what point is a long duration something that we want to discard or that we want to predict!
Also, I tried treating my data by log transforming it, and the density plot shows a funny artifact on the left side of the distribution ( because I only have durations of integer numbers, not floats). I think this helps, you also should log transform the features that have similar distributions.
I finally thought that the solution should be stratified sampling or giving weights to features, but I don't know exactly how to implement that. My tries didn't produce any good results. Perhaps my data is too stochastic!

Related

How to treat outliers if you have data set with ~2,000 features and can't look at each feature individually

I'm wondering how one goes about treating outliers at scale. Based on my experiences, I usually need to understand why there are outliers from the first place. What causes it, are there any patterns, or it just happens randomly. I know that, theoretically, we usually define outliers as data points outside of 3 standard deviation. But in the case where data is so big that you can't treat each feature one by one, and don't know if the 3 standard deviation rule is applicable anymore because of sparsity, how do we most effectively treat the outliers.
My intuition about high dimensional data is that data is sparse so the definition of "outliers" is harder to determine. Do you guys think we would be able to just get away with using ML algorithms that are more robust to outliers (tree based models, robust SVM, etc) instead of trying to treat outliers during preprocessing step? And if we really want to treat it, what is the best way to do it?
I would first propose a frame work for understanding the data. Imagine you are handed a dataset with no explanation of what it is. Analytics could actually be used to enable us to get understanding. Usually rows are observations and columns parameters of some sort regarding the observations. You first want to have a frame work for what you are trying to achieve. Now matter is going on, all data centers around the interest of people...that is why we decided to record it in some format. Given that, we are at most interested in:
1.) Object
2.) Attributes of object
3.) Behaviors of object
4.) Preferences of object
4.) Behaviors and preferences of object over time
5.) Relationships of object to other objects
6.) Affects of attributes, behaviors, preferences and other objects on object
So you are wanting to identify these items. So you open a data set and maybe you instantly recognize a time stamp. You then see some categorical variables and start doing relationship analysis for what is one to one, one to many, many to many. You then identify continuous variables. These all come together to give a foundation for identifying what is an outlier.
If we are evaluating objects of over time....is the rare event indicative of something that happens rarely, but we want to know about. Forest fire are outlier events...but they are events of great concern. If I am analyzing machine data and having rare events, but these rare events are tied to machine failure, then it matters. Basically.....does the rare event-parameter show evidence that it correlates to something that you care about?
Now if you have so many dimensions that the above approach is not feasible to your judgement, then you are seeking dimension reduction alternatives. I am currently employing Single Value Decomposition as at technique. I am already seeing situations where I am accomplishing the same level of predictive ability with 25% of the data. Which segways into my final thought; find a mark to decide whether the outliers matter or not.
Begin with leaving them in then begin your analysis, and run the work again with them removed. What were the affects. I believe that when you are in doubt, simply do both and see how different the results are. If there is little difference than maybe you are good to go. If there is significant difference of concern, then you are wanting to take an evidenced based approach of the outlier occurring. Simply because it is rare in your data does not mean it is rare. Think of certain type crimes that are under-reported (via arrest records). Lack of data showing politicians being arrested for insider trading does not mean that politicians are not doing insider trader en masse.

How do i represent the chromosome using Genetic Algorithm?

My task is to calculate clashes between alert time schedule and the user calendar schedule to generate the clashes less alert time schedule.
How should i represent the chromosome according to this problem?
How should i represent the time slots? (Binary or Number)
Thank You
(Please Consider i'm a beginner to the genetic algorithm studies)
Questions would be: What have you tried so far? How good are your results so far? Also your Problem is
stated quite unspecific. Thus here is what I can give:
The Chromosome should probably be the starttime of the alerts in your schedule (if I understood your Problem correctly).
As important is to think of the ways you want to evaluate and calculate the Fitness of your individuals (here clashes (e.g. amount or time overlap between appointments), but it is obvious that you might find better heuristics to receive better solutions / faster convergence)
Binary or continuous number might both work: I am usually going for numbers whenever there is no strong reason to not do so (since it is easier to Interpret, debug, etc.). Binary comes with some nice opportunities with respect to Mutation and Recombination.
I strongly recommend playing around and reading about those Things. This might look like a lot of extra work to implement, but you should rather come to see them as hyperparameters which Need to be tuned in order to receive the best Outcome.

Is there any technique to know in advance the amount of training examples you need to make deep learning get good performance?

Deep learning has been a revolution recently and its success is related with the huge amount of data that we can currently manage and the generalization of the GPUs.
So here is the problem I'm facing. I know that deep neural nets have the best performance, there is no doubt about it. However, they have a good performance when the number of training examples is huge. If the number of training examples is low it is better to use a SVM or decision trees.
But what is huge? what is low? In this paper of face recognition (FaceNet by Google) they show the performance vs the flops (which can be related with the number of training examples)
They used between 100M and 200M training examples, which is huge.
My question is:
Is there any method to predict in advance the number of training examples I need to have a good performance in deep learning??? The reason I ask this is because it is a waste of time to manually classify a dataset if the performance is not going to be good.
My question is: Is there any method to predict in advance the number of training examples I need to have a good performance in deep learning??? The reason I ask this is because it is a waste of time to manually classify a dataset if the performance is not going to be good.
The short answer is no. You do not have this kind of knowledge, furthermore you will never have. These kind of problems are impossible to solve, ever.
What you can have are just some general heuristics/empirical knowledge, which will say if it is probable that DL will not work well (as it is possible to predict fail of the method, while nearly impossible to predict the success), nothing more. In current research, DL rarely works well for datasets smaller than hundreads thousands/milions of samples (I do not count MNIST because everything works well on MNIST). Furthermore, DL is heavily studied actually in just two types of problems - NLP and image processing, thus you cannot really extraplate it to any other kind of problems (no free lunch theorem).
Update
Just to make it a bit more clear. What you are asking about is to predit whether given estimator (or set of estimators) will yield a good results given a particular training set. In fact you even restrict just to the size.
The simpliest proof (based on your simplification) is as follows: for any N (sample size) I can construct N-mode (or N^2 to make it even more obvious) distribution which no estimator can reasonably estimate (including deep neural network) and I can construct trivial data with just one label (thus perfect model requires just one sample). End of proof (there are two different answers for the same N).
Now let us assume that we do have access to the training samples (without labels for now) and not just sample size. Now we are given X (training samples) of size N. Again I can construct N-mode labeling yielding impossible to estimate distribution (by anything) and trivial labeling (just a single label!). Again - two different answers for the exact same input.
Ok, so maybe given training samples and labels we can predict what will behave well? Now we cannot manipulate samples nor labels to show that there are no such function. So we have to get back to statistics and what we are trying to answer. We are asking about expected value of loss function over whole probability distribution which generated our training samples. So now again, the whole "clue" is to see, that I can manipulate the underlying distributions (construct many different ones, many of which impossible to model well by deep neural network) and still expect that my training samples come from them. This is what statisticians call the problem of having non-representible sample from a pdf. In particular, in ML, we often relate to this problem with curse of dimensionality. In simple words - in order to estimate the probability well we need enormous number of samples. Silverman shown that even if you know that your data is just a normal distribution and you ask "what is value in 0?" You need exponentialy many samples (as compared to space dimensionality). In practise our distributions are multi-modal, complex and unknown thus this amount is even higher. We are quite safe to say that given number of samples we could ever gather we cannot ever estimate reasonably well distributions with more than 10 dimensions. Consequently - whatever we do to minimize the expected error we are just using heuristics, which connect the empirical error (fitting to the data) with some kind of regularization (removing overfitting, usually by putting some prior assumptions on distributions families). To sum up we cannot construct a method able to distinguish if our model will behave good, because this would require deciding which "complexity" distribution generated our samples. There will be some simple cases when we can do it - and probably they will say something like "oh! this data is so simple even knn will work well!". You cannot have generic tool, for DNN or any other (complex) model though (to be strict - we can have such predictor for very simple models, because they simply are so limited that we can easily check if your data follows this extreme simplicity or not).
Consequently, this boils down nearly to the same question - to actually building a model... thus you will need to try and validate your approach (thus - train DNN to answer if DNN works well). You can use cross validation, bootstraping or anything else here, but all essentialy do the same - build multiple models of your desired type and validate it.
To sum up
I do not claim we will not have a good heuristics, heuristic drive many parts of ML quite well. I only answer if there is a method able to answer your question - and there is no such thing and cannot exist. There can be many rules of thumb, which for some problems (classes of problems) will work well. And we already do have such:
for NLP/2d images you should have ~100,000 samples at least to work with DNN
having lots of unlabeled instances can partially substitute the above number (thus you can have like 30,000 labeled ones + 70,000 unlabeled) with pretty reasonable results
Furthermore this does not mean that given this size of data DNN will be better than kernelized SVM or even linear model. This is exactly what I was refering to earlier - you can easily construct counterexamples of distributions where SVM will work the same or even better despite number of samples. The same applies for any other technique.
Yet still, even if you are just interested if DNN will work well (and not better than others) these are just empirical, trivial heuristics, which are based on at most 10 (!) types of problems. This could be very harmfull to treat these as rules or methods. This are just rough, first intuitions gained through extremely unstructured, random research that happened in last decade.
Ok, so I am lost now... when should I use DL? And the answer is exteremly simple:
Use deep learning only if:
You already tested "shallow" techniques and they do not work well
You have large amounts of data
You have huge computational resources
You have experience with neural networks (this are very tricky and ungreatful models, really)
You have great amount of time to spare, even if you will just get a few % better results as an effect.

Should I remove test samples that are identical to some training sample?

I've been having a bit of a debate with my adviser about this issue, and I'd like to get your opinion on it.
I have a fairly large dataset that I've used to build a classifier. I have a separate, smaller testing dataset that was obtained independently from the training set (in fact, you could say that each sample in either set was obtained independently). Each sample has a class label, along with metadata such as collection date and location.
There is no sample in the testing set that has the same metadata as any sample in the training set (as each sample was collected at a different location or time). However, it is possible that the feature vector itself could be identical to some sample in the training set. For example, there could be two virus strains that were sampled in Africa and Canada, respectively, but which both have the same protein sequence (the feature vector).
My adviser thinks that I should remove such samples from the testing set. His reasoning is that these are like "freebies" when it comes to testing, and may artificially boost the reported accuracy.
However, I disagree and think they should be included, because it may actually happen in the real world that the classifier sees a sample that it has already seen before. To remove these samples would bring us even further from reality.
What do you think?
It would be nice to know if you're talking about a couple of repetitions in million samples or 10 repetitions in 15 samples.
In general I don't find what you're doing reasonable. I think your advisor has a very good point. Your evaluation needs to be as close as possible to using your classifier outside your control -- You can't just assume your going to be evaluated on a datapoint you've already seen. Even if each data point is independent, you're going to be evaluated on never-before-seen data.
My experience is in computer vision, and it would be very highly questionable to train and test with the same picture of a one subject. In fact I wouldn't be comfortable training and testing with frames of the same video (not even the same frame).
EDIT:
There are two questions:
The distribution permits that these repetitions naturally happen. I
believe you, you know your experiment, you know your data, you're
the expert.
The issue that you're getting a boost by doing this and that this
boost is possibly unfair. One possible way to address your advisor's
concerns is to evaluate how significant a leverage you're getting
from the repeated data points. Generate 20 test cases 10 in which
you train with 1000 and test on 33 making sure there are not
repetitions in the 33, and another 10 cases in which you train with
1000 and test on 33 with repetitions allowed as they occur
naturally. Report the mean and standard deviation of both
experiments.
It depends... Your adviser suggested the common practice. You usually test a classifier on samples which have not been used for training. If the samples of the test set matching the training set are very few, your results are not going to have statistical difference because of the reappearance of the same vectors. If you want to be formal and still keep your logic, you have to prove that the reappearance of the same vectors has no statistical significance on the testing process. If you proved this theoretically, I would accept your logic. See this ebook on statistics in general, and this chapter as a start point on statistical significance and null hypothesis testing.
Hope I helped!
In as much as the training and testing datasets are representative of the underlying data distribution, I think it's perfectly valid to leave in repetitions. The test data should be representative of the kind of data you would expect your method to perform on. If you genuinely can get exact replicates, that's fine. However, I would question what your domain is where it's possible to generate exactly the same sample multiple times. Are your data synthetic? Are you using a tiny feature set with few possible values for each of your features, such that different points in input space map to the same point in feature space?
The fact that you're able to encounter the same instance multiple times is suspicious to me. Also, if you have 1,033 instances, you should be using far more than 33 of them for testing. The variance in your test accuracy will be huge. See the answer here.
Having several duplicate or very similar samples seems somewhat analogous to the distribution of the population you're attempting to classify being non-uniform. That is, certain feature combinations are more common than others, and the high occurrence of them in your data is giving them more weight. Either that, or your samples are not representative.
Note: Of course, even if a population is uniformly distributed there is always some likelihood of drawing similar samples (perhaps even identical depending on the distribution).
You could probably make some argument that identical observations are a special case, but are they really? If your samples are representative it seems perfectly reasonable that some feature combinations would be more common than others (perhaps even identical depending on your problem domain).

how to expand spinner time? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
how to expand spinner time ?
From The Nature of Time:
The Nature of Time
We have all been asked the question
'what is the time?', but this entry
will be addressing the subtly
different question: 'what is time?'
Many people, on being asked this
question, would probably say that they
don't have the time to answer. In this
entry, we will explore many theories
about the nature of time. No one
theory has unquestioning truth about
it, so it will be up to you to decide
on the one you see is best.
The Classical Approach to Time
There is not very much to say on this
theory since it is the one with which
we are most familiar. Traditionally,
time is simply seen as a measure of
the distance between events. It has a
past, present and a future. The past
is considered to have already happened
and to be unchangeable, while the
future is considered to be open to
many possibilities. Humans measure
time using many units, some based on
real events like the rotation of the
Earth, others that are even more
arbitrary.
Isaac Newton's classical description
of time in his highly-regarded work
Principia is that it 'flows equably of
itself', which means that time 'flows'
at a constant rate that is the same
for everybody - it is independent of
the events that take place in it. It
would be untrue to say that this idea
was unchallenged until the twentieth
century - the 18th-Century empiricist
philosopher George Berkeley, for
example, disagreed with Newton and
held that time was 'the succession of
ideas in the mind' - but there was no
serious evidence to suggest that
Newton's elegant and absolute
description was wrong until Einstein
destroyed it.
Unfortunately, the classical view of
time is biased towards the human
perception of the 'flow' of time. We
see events in one direction, and we
assume time to be the same everywhere.
The classical approach to time does
not explain exactly why we perceive
time in this way, and it does not
describe how the effect is achieved.
The other theories of the nature of
time challenge the very roots of this
natural point of view.
Relativity
The Theory of Relativity is the
celebrated discovery of the physicist
Albert Einstein. Originally, it was
two theories: the Special Theory of
Relativity came first in 1905 and
states that the rate at which time
passes is not the same all over the
universe - it is dependent on the
observer (in other words, it is
relative). It is not hard to see that
different people perceive the passing
of time at a different rate to others:
as we get older, less information is
processed about our surroundings per
second, so we perceive time to be
going faster.
But Einstein's theory went further
than this. The relativity of time is
based not on our age, but on our speed
of movement through space. The faster
we travel through space, the slower we
travel through time. Although this
sounds crazy at first, it makes sense
when thought of in a particular way.
The theory of relativity demands that
we view space and time not as separate
entities but as one concept called
space-time. Time becomes a fourth
dimension, just like the other three
dimensions of space that we are used
to (height, width and length). This
view of time is crucial to
understanding most of the other
theories about time's ultimate nature.
Humans only possess two-dimensional
retinae (the light-receptive surface
at the back of our eyes), which means
that we can only see in two
dimensions. Our vision of the third
dimension is a result of perspective
and the existence of our binocular
vision. If we had three-dimensional
retinae, we would be able to see all
of an entire room simultaneously - its
walls, its floor and its ceiling at
the same time! For this reason, it is
very difficult, if not totally
impossible, for humans to visualise a
fourth dimension.
To overcome this impairment, it is
useful to use lower-dimensional
analogies when talking about
dimensions above three, even if we are
talking about time as one of these
dimensions. So in this case, let us
imagine that the universe is shaped
like a cuboid1, and that humans are
two-dimensional and have
one-dimensional retinae. Imagine that
the spatial dimensions are the width
and height of a cross-section of the
cuboid, meaning that humans can move
up, down, left and right at will
within the cuboid. Imagine that the
depth of the cuboid is time.
Right, now imagine that you are a
two-dimensional human within the
cuboid and that you start off being
midway up the cuboid. Then you start
moving upward (ie, through space, but
not time). Eventually you hit the edge
of the cuboid. Now imagine that you
move downwards, but that you also move
through time in a forward direction.
This time it will take you longer to
get back to being mid-way up the
cuboid because you are not taking a
direct route downwards - you are also
moving through time. As you can see,
moving through time slows down your
movement in space.
It works the other way around too. If
you stay still in space and just move
forward in time, then it will take
less time to get to a particular point
in time than if you move upwards and
forwards in time simultaneously. So
movement in space slows down your
movement in time. This is what
relativity states about how time
really is. However, the amount by
which time is slowed down when you
move through space is very small in
everyday situations, and you would
need to move at a speed of a
considerable percentage the speed of
light in order for it to make any
noticeable difference.
Relativity has been proven too. Atomic
clocks2 have been placed in aeroplanes
moving at high speeds and then
compared with clocks that were on
Earth. Slight differences that are
exactly predicted by the mathematical
equations of relativity were indeed
detected.
The general theory of relativity goes
a step further and was published in
1916. Einstein stated that mass curves the 'fabric' of space-time to create
the illusion of the force of gravity.
Again, a lower-dimensional analogy is
best. Imagine putting bowling balls on
a sheet of rubber. They bend the
rubber. Any object coming into the
vicinity of the curve begins to move
around the curve like something
spiralling around a sink basin.
Einstein's picture of gravity is that
simple. And again, this has been
proved. Einstein made predictions
about how light would be taking
technically curved paths around large
masses, and this effect was measured
during a total eclipse of the sun.
Time and Determinism
You will have noticed that the theory
of relativity does not carry any
description of a 'flow' of time, and
in fact, it describes time in almost
exactly the same way that we are used
to thinking about space. Relativity
unifies space and time. All points in
space are in existence simultaneously
- this is common sense; so are all points in time in existence
simultaneously too? This would suggest
that all events in time are already
'here' and that there is no scope for
choosing what happens in the future.
This view of time is called
determinism because events are
pre-determined.
It is worth noting that relativity
does not rule out the idea of free
will, but does not provide any support
for it either. Many people can get
upset about the evidence supporting
determinism because humans like to
think they have a free will to make
independent decisions. Such people
would not feel better if they heard
about the many worlds theory of
quantum mechanics.
Time in the Many Worlds Theory of
Quantum Mechanics
To understand this theory, we need to
go back to our cuboid example. You
will notice that each event in time is
simply a cross-section of the cuboid
(a square). Humans effectively
perceive the dimension of time in this
cuboid to be a succession of these
squares. Like frames in a movie, these
create the illusion of a smooth
passage of time. But why is it that we
see time like this? The answer to this
question will be explored later.
If you think about the world around
you, you will most likely notice that
it seems to have been tailor-made for
life. The universe has the precise
properties that led to the formation
of life on Earth. For example, in the
early universe there was a 'battle'
between matter and anti-matter3. The
particles with the certain quantum
properties that we now characterise as
'matter', for a hitherto inexplicable
reason, won the battle. If this hadn't
happened, we could not exist, or we
would not be the same as we are today.
Many physicists have speculated that
this and other similar events are too
much of a coincidence to be regarded
as just that: a coincidence.
Martin Rees, the Astronomer Royal of
the UK, paints an analogous picture of
going into a clothes shop. If you go
into a clothes shop that only sells
one size of clothing, it would be a
big coincidence if you found it was
your size. However, we get no surprise
when finding our own clothes size in a
clothes shop because good clothes
shops sell a wide range of clothes
sizes. We can now extend this picture
to the universe. It is very unlikely
that the universe should exist because
of how biased it seems to have been
towards gravitational stability and
the creation of diverse life later on.
However, if we see the universe as
providing a wide range of 'universes'
of different properties, it will come
as no surprise if we find one universe
that supports life.
You can think of this theory as
multiple cuboids in a vast universe of
cuboids, all with their own
space-time. Each cuboid represents one
universe that has a different set of
laws of physics, and therefore could
be wildly different from all the other
universes. There may in fact be a
large number of universes that support
life but with small differences, just
as there might be many shirts of your
size in the clothes shop, but perhaps
in different colours.
In this view, there are multiple
timelines. Some people have likened
this view of time to train tracks. We
move along a train track in one
direction, but there are huge numbers
of other train tracks running parallel
to ours. Each train track may be
different in some way (it might have
trains on it, for example). For this
reason, the other universes around us
in this 'multiverse'4 are referred to
as parallel universes.
A multiverse of space-times is not
just a theory that solves the question
of why our environment is so suited to
life; it is also a theory of quantum
mechanics. In the quantum theory there
are many events that take place
because of random chance. In electric
currents, for example, the electrons
that make up the current follow a
random path in the wires that is
influenced by a fields of electrical
forces they pass through, which is why
it always seems that the current is
split 50:50. Many physicists believe
that with each quantum decision like
this, every possibility has a separate
universe in which it is enacted.
Hence, in one universe the electron
goes one way; in another, it goes the
other way.
In this theory - which is called the
many worlds interpretation of quantum
mechanics - every possibility gets
enacted. Since quantum interactions
are the fundamentals of any larger (or
'macroscopic') reaction, we can infer
that everything happens in one
universe or other. So if you have a
decision to make, say whether to take
a holiday to Hawaii or not, there is
one universe where you go, and one
universe where you don't. This also
spells trouble for free will. All
possibilities get played out, so it is
just a matter of which universe you
are in to determine which way you go.
There is a variation of this theory.
For this variation we will need to
think another dimension lower. So,
instead of imagining universes as
cuboids, we need to imagine them as
rectangles. Imagine the length of the
rectangle is time; and its other
dimension, space. The rectangle has no
thickness whatsoever, so if you put
multiple rectangles (ie, multiple
universes) on top of each other, the
whole structure becomes no thicker.
This version of the many worlds
interpretation is slightly easier to
grasp, because otherwise we would have
universes branching off from one
another to eternity, which is rather
difficult to imagine. There is no real
evidence for or against either of the
theories of the multiverse,
unfortunately.
You will have noticed that in all
these theories, time has two
directions just like all the other
dimensions. In theory, there is
nothing to stop us from moving in the
other direction. There is another
slightly different theory of time as
being bi-directional, and you might
also be interested to see how this
could lead to possibilities of
time-travel.
Why Do We Perceive Time the Way We Do?
What, then, is time? If no one asks
me, I know what it is. If I wish to
explain it to him who asks me, I do
not know.
- St Augustine. If time is a dimension just like all the others,
why do we experience it so
differently? This is the question that
interests James Hartle of the
University of California in Santa
Barbara, along with physicists Stephen
Hawking, Murray Gell-Mann and Steven
Weinberg. They believe that the
passage of time is just an illusion.
Hartle thinks that time's arrow is a
product of the way we process
information. Gell-Mann gave creatures
that process time in this way the name
'information gathering and utilising
systems' (IGUSs). Humans are IGUSs.
Because of our two-dimensional
retinae, we can't take in multiple
cross-sections of the 'cuboid' - ie
'frames' of time - simultaneously. We
gather information about this frame -
our surrounding environment - using
our senses, and then we store the
information in an input register. This
does not have an unlimited capacity,
so we have to transfer the information
to our memory registers before we can
input the information about the next
frame. Humans have a short-term and a
long-term memory, as well as our
cerebellums that store 'unforgettable'
information (such as how to swim).
IGUSs also carry something called a
'schema', which is a generalised model
of our perception of our environment.
It holds several rules about what is
best to do and what is not a good idea
to do. The information we receive from
our surroundings is passed to the
schema to determine how we react in
certain situations. The decision is
conscious, but we also do unconscious
computation of information: the schema
is updated unconsciously. The
conscious part of the IGUS in humans
focuses on the input register, which
we call the present. The unconscious
part focuses on information in the
memories, and we call that the past.
This is why we consciously experience
the present and remember the past.
The movement of the information
through the IGUSs registers creates
the illusion of the flow of time. It
is not time itself that flows. Each
IGUS has a different speed for the
flow of its information between
registers. This corresponds to
differences between the perception of
the speed of the flow of time. Flies,
for example, need to process more
information per second in order to fly
so quickly but still avoid common
obstacles; therefore, they perceive
time as going slower. To us, a fly's
perception of time would look like
slow motion. Flies only live for a few
days, or a few weeks as a maximum, and
to a human, this is a very short
lifetime. But to a fly, this feels a
lot longer.
So the reason that we experience a
'flow' of time could just be because
of how we process information. It is a
competitive advantage to us as a
species to process information bits at
a time. It wouldn't make sense for us
to have evolved with the capability to
see all time simultaneously.
Digital Time, or, Is Time Like a
Movie?
You may have noticed a reference to
'frames' of time in the explanations
above. We usually think of time as
continuous - a smooth passage of
events. However, most physical
theories define space and time as
being the opposite of a continuous
passage of events. M-theory and Loop
Quantum Gravity, for example, are both
serious scientific theories (not
proven theories, though) that state
that space and time have minimum
units. There was even a theory of
quantum mechanics to suggest that time
was made of particles called
'chronons'!
The theorised minimum length of time
possible is called the Planck time and
is equivalent to 10-43 seconds. When
space or time is 'digital' like this,
we say that it is 'discrete'.
If this theory is true, then our
perception of time could be like a
movie. Movies are not continuous: if
you slow them down enough, you see
that they are just collections of
still photographs played in quick
succession. We process information
about our surroundings and obtain a
picture just like one frame of a movie
or animation. When 'played' in quick
succession, this creates the illusion
of smooth, continuous movement.
Is Time Really That Much Like Space?
So far, time has been seen as very
much like a dimension of space, and
its passage in one direction has been
seen as an illusion. But there are
some counter-arguments; there are
still some big differences between
time and space that cannot easily be
explained as illusions.
One way of supporting the idea that an
'arrow of time' is irrelevant is by
proving that all processes are the
same if done forwards or backwards. In
quantum mechanics, most interactions
between particles are 'time-symmetric'
- it doesn't matter whether you look at them from past to future or future
to past because they look the same.
But this is not true of macroscopic
objects. Wine glasses shatter, but you
rarely see shards of glass assemble
themselves into wine glasses.
Physicists can explain why shards of
glass do not form wine glasses by
postulating the existence of 'the
thermodynamic arrow of time'.
Thermodynamics is basically a
collection of laws. Here is how the
chemist PW Atkins summarises them:
There are four laws. The third of
them, the Second Law, was recognised
first; the first, the Zeroth law, was
formulated last; the First Law was
second; the Third Law might not even
be a law in the same sense as the
others. The gist of it is that the
universe is always becoming more
disordered. The disorder of the
universe is called 'entropy', so we
say that entropy is always increasing.
Nobody really knows why this is the
case, but we see it all the time in
experiments. This is why heat always
flows into colder areas, but never the
other way round. Heat is simply the
result of giving particles in a given
system more energy; they begin to move
and vibrate randomly, which is a
disordered state. Colder things are
more ordered because their constituent
particles tend to be harder to move.
This in-built arrow explains why
macroscopic objects have irreversible
interactions. This is a clear
difference from space. If you think of
the spatial manifestation of a table,
it does not follow that one end of the
table is more disordered than the
other, but it does follow that the
table will end up more disordered in
the future than when it has just been
made. Hence, there is a very distinct
difference between time and space.
Can Time Be Reversed?
If time's 'flow' in one direction
really is an illusion, what is there
stopping us from reversing it? In
theory, nothing! Lawrence Schulman of
Clarkson University in New York
thoroughly believes that time can run
backwards. In other words, shards of
glass can turn into wine glasses,
people grow younger and younger and
the universe gets smaller and smaller.
In fact, Schulman goes as far as to
say that such reversed-time zones can
exist as spaces within our own
universe. A computer simulation has
shown that regions with opposite time
arrows do not cancel each other out
and do not disturb each other at all.
The great thing about this theory is
that if a civilisation in a
reversed-time region kept records of
events that occur in our future, the
records might have survived to our
past (which is their future). Finding
these records could tell us the
future. This is, of course, a long
shot, but still a physical
possibility.
Another possibility is that the
universe's arrow of time (as far as
thermodynamics is concerned) will
naturally reverse itself at a crucial
point in the history of the universe.
At this point, the universe would
start to get smaller and everybody
would get younger until there was a
big crunch analogous to the big bang.
This creates a perfect symmetry to the
universe.
Again, there is little evidence that
shows us that reversed-time regions
exist, and there is no evidence that
the universe's thermodynamic arrow of
time will naturally reverse itself.
Equally, there is little evidence
against these theories either.
So what is time? Is it a dimension
just like space? Does it flow, or is
that just an illusion? Is time digital
like the frames of a movie, or does it
flow continuously? And can time really
be reversed or manipulated? None of
these questions can be answered with
definite confidence, but next time
somebody asks you what the time is,
perhaps you'll think of the answer
differently.
Kernel::sleep ?

Resources