Predicting possible inputs leading to output satisfying certain condition - machine-learning

Suppose there is a data set of statistical data with a number of input columns and one output column. The predictors characterize some particular process that is repeated, so one data row is corresponding to one occasion of that process. And for these process characteristics the order and duration is important. Some of them might be absent at all, some of them are repeated, but with different speed or other parameter.
Let's say that our process is names P and it can have a lot of child parts, that form the process together. Let's say, once the process had N sub processes:
Sub process 1, with: speed = SpdA, duration = DurA, depth = DepA
Right after sub process A next sub process B happened:
Sub process 2, with: speed = SpdB, duration = DurB, depth = DepB
...
... N. Sub process N.
So there might be from 1 to N child processes in each process, that is, in each data row. And the amount of the child processes may vary from one row to another. This is about the input data.
As for the output - the output here in the simplest case is binary - either success or failure, but in reality it will be a positive number starting from 0 to positive infinity. This number represents the time by which the process has finished successfully. If the value for the output is a positive infinity - it means that the process failed to succeed.
Very important note, if we are going with the simplest case where the output is binary - in the statistical data set there will be data rows that mostly have failure in the output. The goal is to find the hypothetical parameters that values of the test predictors should be equal to, to make the process succeed.
For example, after learning we should be able to tell what is the concrete universal input parameters that will most process success. That was the simplest, binary output case.
However, in real life we will have the output that represents time by which the process finished successfully, and +infinity - if failure. So here the goal is the same - make the process succeed or as much close to success as possible. The goal is to generate the test inputs that we might use in future to prevent the output equal to +infinity.
The goal maximum is, having the target time provided, find the exact values for the inputs that will make the process finish successfully as closer to the given time as possible. Here we should expect the enumeration of child processes, their order and the values for each child process to be predicted.
Here in this problem, I guess, the output will play the role of the input and the input will play the role of the output.
What is the approach to solve these problems? How to handle the variable number of characteristics and how to handle the order that might vary in the each data row?
I am a novice in machine learning and would appreciate the concrete suggestions or examples of similar problems solved.
Any help and advice welcome!

Related

Store machine status on Graphite time-series to later extract KPIs

having a machine which sends (not regularly) its status values 0, 1, 2, we're storing it in Graphite. Now the status means:
0 - stopped
1 - working
2 - stopped by anomaly
The requested KPIs to extract are the classical ones: how much time on status 0 or 1 or 2 in a day or a week? Before reinventing the wheel, we're looking at the best way to compute those PKIs and if in Graphite (or possible other time-series solution) there are already function which deal with summing the time where the data point value is just a condition. Clearly the time intervals to sum are not stored, it's the time elapsed between a data point and the next one.
Or should the data pre-processed to compute the time intervals and then store three data sets like: status.working, status.stopped, status.alarm and for each store when the specific "event" started and how much it lasted?
There are other KPIs, for example the number of alarms in a day. Receiving two status data points in a row both indicating status "2" is actually a single alarm condition and must count as 1.
So, is there a best way to store such data without pre-processing it? It sounds to be a common pattern but (shame on us?) we have not found this topic well explored.
Thanks.
Graphite has a number of functions that could help you here. One that stands out is the summarize() function in which you can pass an aggregation method (in this case sum) and a duration in minutes/hours/days/weeks/etc), take a look here
isNonNull is another useful function: it can be used to determine the existence of a datapoint regardless of the value.
When you say that the machie reports a value 0 to indicate it has stopped - does it actually send that value or does it report nothing? This is an important detail and will have some bearing on the end result of your solution.

Stateful LSTM - Hidden State transfer between and within batches (Keras)

I've been confused over how the hidden/cell states transfer from within one batch when you have a batch_size > 1, and across batches.
My assumption currently is that hidden states never transfer from one series to another WITHIN a batch. I.e. when batch_size = 3, the hidden state of the first item is not passed to the second.
Next, setting Stateful=True will mean that the hidden state of the 1st element of the 1st batch gets passed to the 1st element of the 2nd batch and so on..
This is what the docs state:
If True, the last state
for each sample at index i in a batch will be used as initial
state for the sample of index i in the following batch.
I have been struggling to find a confirmation of this, and it also doesn't seem very intuitive, because assuming my data is sequential and I've batched my data in batch size of 3, I don't see why I would want the hidden state from the 1st unit to transfer to the 4th unit, and for the 2nd unit to go to the 5th unit, and so on.
Yes, it works as described by you above.
And yes you're right, from the perspective of a data stream t[] it is maybe not very intuitive to order the data like that:
Batch1 = [t[0], t[3], t[6]]
Batch2 = [t[1], t[4], t[7]]
Batch3 = [t[2], t[5], t[8]]
But I think it makes completely sense out of a data-processing perspective. This data ordering allow you to completely parallelize/vectorize the processing of all batches. In other words: [t[0], t[3], t[6]] can be processed in parallel, whereas the (sequential) sequence [t[0], t[1], t[2]] could be only processed one after each other.

Real-time pipeline feedback loop

I have a dataset with potentially corrupted/malicious data. The data is timestamped. I'm rating the data with a heuristic function. After a period of time I know that all new data items coming with some IDs needs to be discarded and they represent a significant portion of data (up to 40%).
Right now I have two batch pipelines:
First one just runs the rating over the data.
The second one first filters out the corrupted data and runs the analysis.
I would like to switch from batch mode (say, running every day) into an online processing mode (hope to get a delay < 10 minutes).
The second pipeline uses a global window which makes processing easy. When the corrupted data key is detected, all other records are simply discarded (also using the discarded keys from previous days as a pre-filter is easy). Additionally it makes it easier to make decisions about the output data as during the processing all historic data for a given key is available.
The main question is: can I create a loop in a Dataflow DAG? Let's say I would like to accumulate quality-rates given to each session window I process and if the rate sum is over X, some a filter function in earlier stage of pipeline should filter out malicious keys.
I know about side input, I don't know if it can change during runtime.
I'm aware that DAG by definition cannot have cycle, but how achieve same result without it?
Idea that comes to my mind is to use side output to mark ID as malicious and make fake unbounded output/input. The output would dump the data to some storage and the input would load it every hour and stream so it can be joined.
Side inputs in the Beam programming model are windowed.
So you were on the right path: it seems reasonable to have a pipeline structured as two parts: 1) computing a detection model for the malicious data, and 2) taking the model as a side input and the data as a main input, and filtering the data according to the model. This second part of the pipeline will get the model for the matching window, which seems to be exactly what you want.
In fact, this is one of the main examples in the Millwheel paper (page 2), upon which Dataflow's streaming runner is based.

Is this a correct implementation of Q-Learning for Checkers?

I am trying to understand Q-Learning,
My current algorithm operates as follows:
1. A lookup table is maintained that maps a state to information about its immediate reward and utility for each action available.
2. At each state, check to see if it is contained in the lookup table and initialise it if not (With a default utility of 0).
3. Choose an action to take with a probability of:
(*ϵ* = 0>ϵ>1 - probability of taking a random action)
1-ϵ = Choosing the state-action pair with the highest utility.
ϵ = Choosing a random move.
ϵ decreases over time.
4. Update the current state's utility based on:
Q(st, at) += a[rt+1, + d.max(Q(st+1, a)) - Q(st,at)]
I am currently playing my agent against a simple heuristic player, who always takes the move that will give it the best immediate reward.
The results - The results are very poor, even after a couple hundred games, the Q-Learning agent is losing a lot more than it is winning. Furthermore, the change in win-rate is almost non-existent, especially after reaching a couple hundred games.
Am I missing something? I have implemented a couple agents:
(Rote-Learning, TD(0), TD(Lambda), Q-Learning)
But they all seem to be yielding similar, disappointing, results.
There are on the order of 10²⁰ different states in checkers, and you need to play a whole game for every update, so it will be a very, very long time until you get meaningful action values this way. Generally, you'd want a simplified state representation, like a neural network, to solve this kind of problem using reinforcement learning.
Also, a couple of caveats:
Ideally, you should update 1 value per game, because the moves in a single game are highly correlated.
You should initialize action values to small random values to avoid large policy changes from small Q updates.

If I called arc4random_uniform(6) at 5 o'clock OR 5:01, would I get the same number?

I'm making an iOS dice game and one beta tester said he liked the idea that the rolls were already predetermined, as I use arc4random_uniform(6). I'm not sure if they are. So leaving aside the possibility that the code may choose the same number consecutively, would I generate a different number if I tapped the dice in 5 or 10 seconds time?
Your tester was probably thinking of the idea that software random number generators are in fact pseudo-random. Their output is not truly random as a physical process like a die roll would be: it's determined by some state that the generators hold or are given.
One simple implementation of a PRNG is a "linear congruential generator": the function rand() in the standard library uses this technique. At its core, it is a straightforward mathematical function, and each output is generated by feeding in the previous one as input. It thus takes a "seed" value, and -- this is what your tester was thinking of -- the sequence of output values that you get is completely determined by the seed value.
If you create a simple C program using rand(), you can (must, in fact) use the companion function srand() (that's "seed rand") to give the LCG a starting value. If you use a constant as the seed value: srand(4), you will get the same values from rand(), in the same order, every time.
One common way to get an arbitrary -- note, not random -- seed for rand() is to use the current time: srand(time(NULL)). If you did that, and re-seeded and generated a number fast enough that the return of time() did not change, you would indeed see the same output from rand().
This doesn't apply to arc4random(): it does not use an LCG, and it does not share this trait with rand(). It was considered* "cryptographically secure"; that is, its output is indistinguishable from true, physical randomness.
This is partly due to the fact that arc4random() re-seeds itself as you use it, and the seeding is itself based on unpredictable data gathered by the OS. The state that determines the output is entirely internal to the algorithm; as a normal user (i.e., not an attacker) you don't view, set, or otherwise interact with that state.
So no, the output of arc4random() is not reliably repeatable by you. Pseudo-random algorithms which are repeatable do exist, however, and you can certainly use them for testing.
*Wikipedia notes that weaknesses have been found in the last few years, and that it may no longer be usable for cryptography. Should be fine for your game, though, as long as there's no money at stake!
Basically, it's random. No it is not based around time. Apple has documented how this is randomized here: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/arc4random_uniform.3.html

Resources