How to design an inpute matrix for binary classification in Keras

How to design an inpute matrix for binary classification in Keras - machine-learning

I want to predict lane changes with RNN based on sensor data extracted from vehicles. Therefor I have an input matrix containing periodic bus data (about 40ms) on every row. Each columns contains data from a single sensor. In sum, I have more than 300 columns with sensor data.
My output Matrix with label-data contains for every row entry in input a resulting boolean value representing a lane change. So they have both the same length.
Here for example:
..my input matrix: ..and output matrix:
+------+---------+--------+---------+...+ +-----------+
| ## |sensor 1 |sensor 2| sensor 3|...| |lane change|
+------+---------+--------+---------+---+ +-----------+
|time 1| 15.0 | 42.0 | 1 |...| | 0 |
+------+---------+--------+---------+---+ +-----------+
|time 2| 14.8 | 38.2 | 1 |...| | 0 |
+------+---------+--------+---------+---+ +-----------+
|time 3| 14.3 | 27.0 | 0 |...| | 1 |
+------+---------+--------+---------+---+ +-----------+
|time n| . | . | . |...| | . |
Keras needs for recurrent neural Networks (RNN) like LSTM an input structure in the shape of
[samples, time steps, Features]
Now my questions:
How do I have to reshape my input matrix to get working with LSTM?
Do I have to shift the whole input matrix to the next row for each timestep?
I suppose the last 2-3 seconds before lane change could be interesting for prediction, so I need many timesteps which causes also many copies of my input matrix.
Here what I think:
Input matrix at next timestep (shift row 1 line):
+------+---------+--------+---------+...+
| ## |sensor 1 |sensor 2| sensor 3|...|
+------+---------+--------+---------+---+
|time 1| 0.0 | 0.0 | 0 |...|
+------+---------+--------+---------+---+
|time 2| 15.0 | 42.0 | 1 |...|
+------+---------+--------+---------+---+
|time 3| 14.8 | 38.2 | 1 |...|
+------+---------+--------+---------+---+
|time 4| 14.3 | 27.0 | 0 |...|
Input matrix next to last timestep (shift row 2 lines):
+------+---------+--------+---------+...+
| ## |sensor 1 |sensor 2| sensor 3|...|
+------+---------+--------+---------+---+
|time 1| 0.0 | 0.0 | 0 |...|
+------+---------+--------+---------+---+
|time 2| 0.0 | 0.0 | 0 |...|
+------+---------+--------+---------+---+
|time 3| 15.0 | 42.0 | 1 |...|
+------+---------+--------+---------+---+
|time 4| 14.8 | 38.2 | 1 |...|
Is this right?
Please note that, I will try to reduce sensor data later. This is in the moment not my first aim.

From what I understood what you need to do is,
Decide how much the network need to see to be able to predict, what
I mean is you show n step and the network will predict n+1 step
Split your matrix into multiple matrices with n row and save the n+1 row lane change as label
Now you should have a list of k matrices of shape(n, m_sensor) which has shape [k-samples, n_stpes, m_sensor], this is exactly what the rnn needs as input.

Related

Classification with Integers and Types

Let's say we have the following dataset
Label | Features |
-----------------------------------
Age | Size | Weight | shoeSize |
20 | 180 | 80 | 42 |
40 | 173 | 56 | 38 |
as i know features in machine learning should be normalized and the ones mentioned above can be normalized really good. but what if i want to extend the feature list for for example the following features
| Gender | Ethnicity |
| 0 | 1 |
| 1 | 2 |
| 0 | 3 |
| 0 | 2 |
where the Gender values 0 and 1 are for female and male. and the Ethnicity values 1, 2 and 3 are for asian, hispanic and european. since these values reference types i am note sure if they can be normalized.
if they can not be normalized how can i handle mixing values like the size with types like the enthnicity.

Mapping timeseries+static information into an ML model (XGBoost)

So lets say I have multiple probs, where one prob has two input DataFrames:
Input:
One constant stream of data (e.g. from a sensor) Second step: Multiple streams from multiple sensors
> df_prob1_stream1
timestamp | ident | measure1 | measure2 | total_amount |
----------------------------+--------+--------------+----------+--------------+
2019-09-16 20:00:10.053174 | A | 0.380 | 0.08 | 2952618 |
2019-09-16 20:00:00.080592 | A | 0.300 | 0.11 | 2982228 |
... (1 million more rows - until a pre-defined ts) ...
One static DataFrame of information, mapped to an unique identifier called ident, which needs to be assigned to the ident column in each df_probX_streamX in order to let the system recognize, that this data is related.
> df_global
ident | some1 | some2 | some3 |
--------+--------------+----------+--------------+
A | LARGE | 8137 | 1 |
B | SMALL | 1234 | 2 |
Output:
A binary classifier [0,1]
So how can I suitable train XGBoost to be able to make the best usage of one timeseries DataFrame in combination with one static DataFrame (containg additional context information) in one prob? Any help would be appreciated.

How we can convert a time series data into supervised learning problem?

I am preparing a data for machine learning model. I want to deal with time series data as normal supervised learning prediction. Let's say I have a data for car speed and I have several cars models such as
+-----+---------+-------------+
| day | Model | Speed |
+-----+---------+-------------+
| 1 | Bentley | 20.47 km/h |
| 2 | Bentley | 32.22 km/h |
| 3 | Bentley | 23.11 km/h |
| 1 | BMW | 37.60 km/h |
| 2 | BMW | 27.90 km/h |
| 3 | BMW | 40.47 km/h |
So I want to deal with several car models in the training so that my machine learning model should predict the speed for Bentley and BMW.
I have converted the data for training like this :
+---------+------------+------------+-------------------+
| Model | day_1 | day_2 | label == day_3 |
+---------+------------+------------+-------------------+
| Bentley | 20.47 km/h | 32.22 km/h | 23.11 km/h |
| BMW | 37.60 km/h | 27.90 km/h | 40.47 km/h |
+---------+------------+------------+-------------------+
Is it a correct approach?

How to set more conditions (targets) in the Time Series Node in SPSS Modeler?

could you please advise if it is possible to calculate the predictions in SPSS Modeler when having two conditions for the model
i.e. we need to calculate the future values for the respective ID and at the same time we need to see the split per Var1.
So far we have used Time Series node but there we set just one target Value (currency1). Would it be please possible to have the output in the format that we have one figure as the prediction for the respective ID and having there also Var1 reflected in the split. We need this split per Var1 as one ID has more values in Var1, so it is not the case as in Var3 where we have just one value assigned to the ID.
ID | Value (currency1) | Value (currency2) | Period | Var1 | Var2 | Var3
---------------------------------------------------------------------------
U1 | 1000 | 1200 | 1/1/2000 | 100 | abc | 1p1
U1 | 500 | 600 | 2/1/2000 | 100 | abc | 1p1
U1 | 700 | 840 | 3/1/2000 | 200 | def | 1p1
U2 | 500 | 600 | 1/1/2000 | 100 | ghj | 1p2
U2 | 800 | 960 | 4/1/2000 | 300 | abc | 1p2
Thank you very much in advance for any help / advice.

if statement to check other rows - Google Docs

I have a spreadsheet along this format:
------------------------------------------------
| A | B | C | D |
------------------------------------------------
| type | wins | loss | ratio |
------------------------------------------------
| cat | 1 | | 1 |
------------------------------------------------
| dog | 2 | | 2 |
------------------------------------------------
| rabbit | | 1 | -1 |
------------------------------------------------
| dog | 1 | | 1 |
------------------------------------------------
| horse | 1 | | 1 |
------------------------------------------------
| dog | | 2 | -2 |
------------------------------------------------
What I want, is to check if the "A" column contains the word "dog" and if it does count the amount of times it has won in the table.
So, the formula would output:
dog total wins: 2
As it won the first time it appeared in the table and the second time it won too. If there is a winner the loss cell will be empty, and visa versa. I don't need it to add up the wins and give me the total sum.
Then, as a second part, I need it to give me the average ratio for "dog", so the formula needs to check if the "A" column contains "dog", if it does, it needs to add up all the "dog" ratios and divide by how many there are.
I have the formula to add up how many times "dog" appears, but the rest I'm stumped on!
=COUNTIF(A2:A;"dog")
Can anyone advise of the correct formula please?

I would do this:
FIRST PART:
=counta(filter(B:B,A:A="dog"))
SECOND PART:
=sum(filter(D:D,A:A="dog"))/countif(A:A;"dog")
Hope this helps.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

How to design an inpute matrix for binary classification in Keras - machine-learning

Related

Classification with Integers and Types

Mapping timeseries+static information into an ML model (XGBoost)

How we can convert a time series data into supervised learning problem?

How to set more conditions (targets) in the Time Series Node in SPSS Modeler?

if statement to check other rows - Google Docs

Categories

Resources