How can I define period for weekly data in SPSS? - time-series

I have weekly data. I would like to perform time series analysis on it. Refer to Robjhyndman, period of weekly data can be calculated by approximation 365.25/7 = 52.
How can I define period in SPSS for weekly data? The following is data examples which has dd/mm/yyyy format:
date,count
04/16/2013,17
04/23/2013,13
04/30/2013,13
.
.
.
09/27/2016,20

COMPUTE count=XDATE.WEEK(date).

Related

Time series- Not periodic, despite having included frequency

This is actually part of my thesis research, where I have to run a time series analysis on pollution and economic growth of a single country.
I have data of over 144 years of the two variables with each value representing a single year. I imported, set the values as numeric and attached the dataset through the console and ran:
ts_gdp= (data=`GDP per capita, start=1871,end=2014,frequency=1, names=gdp)
I get to see all the values for the first variable and then follow up with the stl() but I get this error. Any clues why this shows up, although I have set the frequency=1, which is the number of observations for the unit of time, in this case a year? Thank you in advance!
Error in stl(GDP, s.window = "periodic") :
series is not periodic or has less than two periods

ML.NET - Normalizing date time data

so the case is to simply forecast some feature value Y (let it be type float) given specific time T.
Currently I've got simple 2 column data like
2019-10-18 10:00 | 1.0
2019-10-18 12:00 | 2.5
and so on.
Simple input data can represent changing values of sinusoid function f(x)=sin(x) in time.
I'm interested in how to convert date time series in ML.NET that later I want to ask engine to predict feature value Y for given date time T (maybe given in form of unix time stamp?)
I would recommend converting to Unix timestamps, yes. ML.NET algorithms use floats as features, so timestamps will work fine.

Extract some keywords like rent, deposit, liabilities etc. from unstructured document

Writing an algorithm to extract some keywords like rent, deposit, liabilities etc. from rent agreement document. I used "naive bayes classifier" but the output is not giving desired output:
my training data is like:
train = [
("refundable security deposit Rs 50000 numbers equal 5 months","deposit"),
("Lessee pay one month's advance rent Lessor","security"),
("eleven (11) months commencing 1st march 2019","duration"),
("commence 15th feb 2019 valid till 14th jan 2020","startdate")]
The below code is not giving desired keyword:
classifier.classify(test_data_features)
Please share if there are any libraries in NLP to accomplish this.
Seems like you need to make your specific NER(Named Entity Recognizer) for parsing your unstructured document.
where you need to tag every word of your sentence into certain labels. Based on the surrounding words and context window your trained NER will be able to give you the results which you looking for.
Check standford corenlp implementation of NER.

Binning time values in SPSS modeler

I have a Time (24 hours formate) column in my dataset and I would like to use SPSS Modeler to bin the timings into the respective parts of the day.
For example, 0500-0900 = early morning ; 1000-1200 = late morning ; 1300-1500 = afternoon
May I know how do I go about doing that? Here is how my Time column looks like -
Here is how to read the data - e.g. 824 = 0824AM ; 46 = 0046AM
I've actually tried to use the Binning node by adjusting the bin-width in SPSS modeler and here's the result:
It's weird because I do not have any negative data in my dataset but the starting number of bin 1 is a negative amount as shown in the photo.
The images that you added are blocked to me, but did you here's an idea of solution:
Create a Derive node with a query similar to this (new categorical variable):
if (TIME>= 500 or TIME <=900) then 'early morning' elseif (TIME>= 1000 or TIME <=1200) then 'late morning' else 'afternoon' endif
Hope to have been helpful.
You can easily export the bins (Generate a derive node from that windows on the image) and edit the boundaries in accordance to your needs. Or try some other binning method that would fit the results better to what you expect as an output.

Cyclic ordinal features in random forest

How do you prepare cyclic ordinal features like time in a day or day in a week for the random forest algorithm?
By just encoding time with minutes after midnight the information difference between 23:55 and 00:05 will be very high although it is only 10 minutes difference.
I found a solution here where the time feature is split in to two features using cosine and sine of the seconds after midnight feature. But will that be appropriate for random forest? With using random forest one can't be sure that all features will be present for every split. So often there will be half of the time information missing for a decisions.
Looking forward to you thoughts!
If you have a date variable, with values like '2019/11/09', you can extract individual features like year (2019), month (11), day (09), day of the week (Monday), quarter (4), semester (2). You can go ahead and add additional features like "is bank holiday", "is weekend", or "advertisement campaign", if you know the dates of specific events.
If you have a time variable with values like 23:55, you can extract hr (23), minutes (55) and if you had, seconds, nanoseconds etc. If you have info about the timezone, you can also get this.
If you have datetime variable with values like '2019/11/09 23:55', you can combine the above.
If you have more than 1 datetime variable, you can capture differences between them, for example if you have date of birth, and date of application, you can determine the feature "age at time of application".
More info about the options for datetime can be found in pandas dt module. Check methods here.
The cyclical transformation in your link is used to re-code circular variables like hrs of a day, or months of the year, where for example December (month 12) is closer to January (month 1) than to July (month 7), whereas if you encoded with numbers, this relationship is not captured. You would use this transformation if this is what you want to represent. But this is not the standard go method to transform this variables (to my knowledge).
You can check Scikit-learn's tutorial on time related feature engineering.
Random forests capture non-linear relationships between features and targets, so they should be able to handle both numerical features like month, or the cyclical variation.
To be absolutely sure, the best way is to try both engineering methods and see which feature returns better model performance.
You can apply the cyclical transformation straightaway with the open source package Feature-engine. Check the CyclicalTransformer.

Resources