Predict car accidents with tensorflow - machine-learning

So the task is kinda simple. As far as I know machine learning I know that it is possible, I just don't now the way to do it.
So basically I want to predict how many car accidents there will be in my city. I have data of weather conditions and how much there was accidents from past and to test or validate my model I want to use the latest accident data.
weather = [[20150601 130100, 23, 60], #[year_month_day hours_mins_secs, temperature_C, humidity_%]
[20150601 130100, 23, 50],
[20150601 130200, 23, 51],
# ...
[20150601 132300, 23, 49]]
accidents = [[20150601 130700, 1], #[year_month_day hours_mins_secs, count_of_accidents
[20150601 1301000, 2],
[20150601 1301100, 1],
# ...
[20150601 132300, 1]]
So now I want to predict accident count for every minute based on temperature and humidity per date (note that sometimes the input data is not provided every minute, and there is time gaps). To improve my model I want to feed it with new accident and weather data each day.
The bottom line is that at the end we'll have the program that can say when there will be an accident based on weather, thus it can say it is safe or not safe to drive today. In future I'll update it with other data sets but for now let's train it this way.
So the question is how to make this happen on tensorflow? Can someone please help?

There will be many ways to handle this. However, since you have series data, and car accidents may be actually related to the t-n weather data, RNN might be a good start.
Please see the RNN (LSTM) based classification example at https://github.com/nlintz/TensorFlow-Tutorials/blob/master/7_lstm.py.
I'm also interested in this. Let me know how it goes.

Related

Aggregating daily to weekly data with categorical variables

my question doesn't regard any particular software, it's more of a broad question that could concern every type of data mining problem.
I have a data set with daily data and a bunch of attributes, like the above. 'Sales' is numeric and represents the revenue of sales on a given day. 'Open' is categorical and retrieves if a store is open (=1) or closed (=0). And 'Promo' is categorical, stating if a type of promo is happening at the given day (it takes the values a, b and c).
day
sales
open
promo
06/12/2022
15
1
a
05/12/2022
0
0
a
04/12/2022
12
1
b
Now, my goal is to develop a model that predicts weekly sales. In order to do this, I will need to aggregate daily data into weekly data.
For the variable sales this is quite straight forward because the value of weekly sales is the sum of daily sales within a certain week.
My question regards the categorical variables (open and promo), what kind of aggregation function should I use? I have tried to convert the variables to numerical and use the weekly mean as an aggregation method for this attributes, but i don't know if this is a common approach.
I would like to know if anyone knows what is the best/usual way to tackle this?
Thanks, anyway!

What should I do with date column in dataset?

Actually I'm working on the Australia weather dataset to predict whether it will rain tomorrow or not?
I'm new to machine learning and I don't know what to do with the date column in my dataset because I know machines only take numerical values.
So plz tell me what I should do with this date how should I deal with it?
You can extract year, month and day number of date as categorical variable. It can be significant what month, year or whether it is the beginning or the end of the month.
If you tell me what kind of language you are using I can help you with code :)
Hey Navin Bondade,
When you are using onehotencoding and supposedly, the column contains too many values, then automatically too many columns pile up in your dataset. So, you have to pick 10 or 15 most frequent values in that particular feature and try to encode them and leave the rest out. I haven't used it personally, but I did see a team win the KDD Orange Cup with this technique.
Happy Learning !!

Predicting next 4 quater customer count based on last 3 years quarterly customer count

I am currently working on a project where i need to predict next 4 quarters customer count for a retail client based on previous customer count of last three years i.e. quarterly data means total 12 data points. please suggest a beat approach to predict customer count for next 4 quarters.
Note:-I can't share the data but Customer count has a declining trend YOY.
Please let me know if more information is required or question is not clear.
With only 12 data points you would be hard-pushed to justify anything more than a simple regression analysis.
If the declining trend was so strong that you were at risk of passing below 0 sales you could look at taking a log to linearise the data.
If there is a strong seasonal cycle you will need to factor that in, but doing so also reduces the effective sample size from 12 to 9 quarters of data (three degrees of freedom being used up by the seasonalisation).
Thats about it really.
You dont specify explicitly how far in the future you want to make your predictions, but rather you do that implicitly when you make sure your model is robust and does not over-fit.
What does that mean?
Make sure that distribution of labels with your available independent varaibles has similiar distributions of that what you expect in future. You cant expect your model to learn patterns that were not there in the first place. So variables that show same information for distinct customer count values 4 quarters in the future are what you want to include.

Analyzing raw data from a Google Sheet into a 'dashboard'

Situation:
Every time I visit a member of my field team, I put their 'score' into a Google Form, which then puts the raw data into this Sheet.
I'd like a second Dashboard sheet that has:
all of the raw data, plus a calculation of their "overall score" (an average of parts A, B, C, and D)
a way to easily see the average score of each person per quarter (Q4 2016, Q1 2017)
a way to easily see the average score for each type of Observation (Live vs scenario #1, 2, 3)
a drop-down where I can select a user and see their scores on a chart compared to the rest of the group's averages
I've done some of the work [here], but would love some help to figure out the best way to do this (keeping in mind of the performance of the sheet considering I'd actually have thousands of rows of raw data).
*Things to note:*
* I might score one person twice in a row before getting to the next person
* I might score one person twice in a month (I'm not sure how to show that in the Dashboard)
Thank you in advance for your help. I'm trying to learn as much as I can, but it's all still pretty new to me.

Data warehouse fact measurements that cannot be meaningfully aggregated over time?

Is there an example of a time-varying numerical quantity that might be in a data warehouse that cannot be meaningfully aggregated over time? If so why?
Stock levels cannot, because they represent a value that is already an aggregation at a particular moment in time.
If you have ten items in stock today and ten yesterday, and ten in stock every day this week, you cannot add them up to "70" meaningfully for the whole week, unless you are measuring something like space utilisation efficiency.
Other examples: bank balance, or speed of flywheel, or time since overhaul.
Many subatomic processes can be observed using our notion of "time" but probably wouldn't make much sense when aggregated. This is because our notion of "time" doesn't make much sense at the quantum level.

Resources