Plotting time series + offset not working - time-series

I want to plot two time series that are in different formats in the
same format, but one of those two doesn't work for me.
So here's the thing:
I have a time series in seconds with a given starting time t0.
So the series looks like this:
Starting time: t0
0 -21.0028
5 -21.0067
10 -21.007
...
17875 -20.9943
I want to plot this time series with the time (e.g. 9:03) instead
of seconds and I want to show another time series where the
data is given in standard time in the same plot.
I can plot the second series just fine.
But when I try to plot the first series, the measured points
start jumping from x=100 back to 0 and then back and forth.
Additionally I can't add the offset t0 to the series.
Here's what I tried (t0 = 32615s) to get at least the first time series plotted:
gnuplot> set xdata time
gnuplot> set timefmt "%S"
gnuplot> set format x "%H:%M"
gnuplot> plot 'measurement.dat' u (timecolumn(1)+32615):2 w l, 'measurement2.dat" u 1:2 w l
Does anyone know how I can plot these time series?
Thanks in advance!

My understanding of your problem is that you have two data sets with different timedata. One in seconds and the other what you call "standard", I assume, e.g. 09:03:00.
And you want to shift the one with seconds by some value.
Internally, dates and times are stored as seconds from January 1st 1970 00:00:00.
Since the first dataset is already in seconds just add your shift-value.
You need to convert ´$Data2´ into seconds via timecolumn(1,"%H:%M:%S").
And the way you display the xtic labels is set via set format x "%H:%M" time. Also check help timecolumn. It also depends if you want to display a time of the day, i.e. 00:00 to 23:59 or if you want to display hours >24h. Then you might want to use "%tH:%tM:%tS"as format. Check help time_specifiers.
Code:
### shifted time data
reset session
# create some test data
set print $Data1
do for [i=1:20000:200] {
print sprintf("%d %.3f",i,sin(i/2000.))
}
set print
set print $Data2
do for [i=30000:50000:200] {
print sprintf("%s %.3f",strftime("%H:%M:%S",i),2*sin(i/2000.-2.5))
}
set print
set format x "%H:%M" time
set key top center
set multiplot layout 2,1
plot $Data1 u 1:2 w lp ti "Data1", \
$Data2 u (timecolumn(1,"%H:%M:%S")):2 w lp ti "Data2"
t0 = 30000
plot $Data1 u ($1+t0):2 w lp ti "shifted Data1", \
$Data2 u (timecolumn(1,"%H:%M:%S")):2 w lp title "Data2"
unset multiplot
### end of code
Result:

Related

how to automatically detect and correct an abnormal period of data in a time serie using python

I am trying to find a method that allows me to automatically detect and correct abnormal periods of data points in a time series (a sequence of outliers). I've already tried ThymeBoost but it only corrects point outliers and not outlier periods.
Here is an example of a time series containing a period of outliers
the time series :
01/02/2018 288.000000
01/03/2018 332.000000
01/04/2018 277.000000
01/05/2018 233.000000
01/06/2018 204.000000
01/07/2018 216.000000
01/08/2018 175.000000
01/09/2018 218.000000
01/10/2018 413.000000
01/11/2018 416.000000
01/12/2018 151.000000
01/01/2019 224.000000
01/02/2019 563.000000
01/03/2019 413.000000
01/04/2019 238.000000
01/05/2019 343.000000
01/06/2019 176.000000
01/07/2019 533.103060
01/08/2019 230.000000
01/09/2019 364.000000
01/10/2019 324.000000
01/11/2019 437.000000
01/12/2019 738.000000
01/01/2020 619.000000
01/02/2020 728.000000
01/03/2020 506.000000
01/04/2020 500.000000
01/05/2020 886.000000
01/06/2020 892.000000
01/07/2020 268.000000
01/08/2020 32.000000
01/09/2020 45.000000
01/10/2020 51.000000
01/11/2020 373.000000
01/12/2020 61.000000
01/01/2021 73.000000
01/02/2021 779.000000
01/03/2021 584.718872
01/04/2021 614.000000
01/05/2021 489.000000
01/06/2021 534.000000
01/07/2021 455.000000
The plot:
I have also tried to use the seasonal decompose but it doesn't work since the series doesn't seem to have a seasonality

SPSS Compute a new variable with the last four digits of a numeric variable

I have a numeric variable (year of birth) in SPSS and i would like to take the last for digits out of it. Most values are like 1988, 2001, 1948 etc. But about 250 respondents entered their year of birth like 30-2-1947, or 2-9-1984 etc. That means not all values have the same length. By taking the last 4 digits into a new variable I could create an age category for all the respondents.
How can I do that?
I tried by converting the variable to a string and using substr to get a part of the value, but I always had to choose a starting point. I want to start from the last digit and then move backwards.
Instead of using SUBSTR() you can try using RIGHT() to grab the last four digits.
* convert yob to string variable.
ALTER TYPE yob (A10).
EXE .
* use RIGHT to extract the last 4 digits and convert to numeric.
COMPUTE n_yob = NUMBER(RIGHT(yob, 4, F4)) .
EXE .
You can now use n_yob to calcuate age (ex: COMPUTE age = 2022-n_yob .). You can also use ALTER_TYPE again if you want to convert yob back to it's original type.

How to calculate the difference in minutes between two bedtimes (NOT time elapsed)?

I'm trying to get Google Sheets functions to calculate the difference in minutes between two bedtimes and have been spinning my wheels for at least five hours on this. Here are four examples of what I'm trying to accomplish:
BEDTIME 1 BEDTIME 2 DIFF IN MINS
9:00 PM 9:15 PM 15
9:00 PM 10:00 PM 60
11:30 PM 1:00 AM 90
1:00 AM 11:00 PM 120
As you can see, the date doesn't figure at all. I apologize for not offering up code, but I've tried at least half a dozen approaches from other answers and they aren't working -- mainly, I suspect, because most people are looking to find the time elapsed between the two times whereas I'm looking to determine "how much earlier" or "how much later" one bedtime is relative to another (always expressed as a positive value).
Any help would be appreciated. Thanks.
Times are stored as numbers between 0 and 1. If you subtract two times and multiply the result by 24 x 60 = 1440 and format as a number you’ll get number of minutes. I think you’ll need something like:
=MIN(ABS(1440*(B1-A1)), ABS(1440*(B1-A1-1)), ABS(1440*(B1-A1+1)))
The difference between two times is a duration. The question requests that durations be converted to "digital minutes", but that is often not as readable as one would think. 175 minutes is more difficult to understand than 2:55 hours.
There is therefore usually no point in multiplying by 24 * 60 — instead, just use the duration value as is:
=min( abs(B2 - A2), abs(B2 - A2 - 1), abs(B2 - A2 + 1) )
Format the result cell as Format > Number > Duration.
See this answer for an explanation of how date and time values work in spreadsheets.
use arrayformula:
=INDEX(IFERROR(1/(1/TRANSPOSE(QUERY(TRANSPOSE(
IF(A2:A&B2:B="", 0, ABS(1440*(B2:B-A2:A+{-1, 0, 1})))),
"select "&TEXTJOIN(",", 1,
"min(Col"&ROW(A2:A)-ROW(A2)+1&")"))))),, 2)
or:
=INDEX(IFERROR(1/(1/QUERY(SPLIT(FLATTEN(
ROW(A2:A)&"×"&ABS(1440*(B2:B-A2:A+{-1, 0, 1}))), "×"),
"select min(Col2) group by Col1 label min(Col2)''"))))
Try to implement a modulus function in your code. It would basically do something like this:
If x = -5, then y = f(x) = – (-5) = 5, since x is less than zero
If x = 10, then y = f(x) = 10, since x is greater than zero
If x = 0, then y = f(x) = 0, since x is equal to zero
Therefore calculating how much time passed without negative values.

analyze outlook calendar data in spss

I downloaded Conference Room Usage from outlook.
I want to know
How busy are the conference rooms?
What are the hot times?
Who are the super users?
Who are not the super users?
How many recurrent meetings take place.
This issue i'm having is that I need the duration between the "StartTime" and the "EndTime"; but they are currently strings!
start end starttime endtime
1/1/2014 1/1/2014 5:00:00 PM 5:00:00 PM
Also, it's likely safe to assume that StartTimes and EndTimes do not straddle two days, but perhaps I want to check for this.
Perhaps conversion to a 24-hour clock might help; "Duration" is then "EndTime" - "StartTime". How can i convert back to a 12-hour clock for the uninitiated. Finally, I need the day of the week (Monday, Tuesday, etc) an event falls on.
This can mostly be accomplished through the wizard.
some sudo code that should do the trick would be
COMPUTE Start=number(StartDate, ADATE10).
VARIABLE LEVEL Start (SCALE).
FORMATS Start (ADATE10).
VARIABLE WIDTH Start(10).
EXECUTE.
COMPUTE starttimetest=number(StartTime, TIME8).
VARIABLE LEVEL starttimetest (SCALE).
FORMATS starttimetest (TIME8).
VARIABLE WIDTH starttimetest(8).
EXECUTE.
compute teststartadd=start+starttimetest.
DO if index(starttime,'PM') gt 0 and subs(starttime,1,2) ne '12' .
COMPUTE Realstart=datesum(teststartadd,12,'hours').
ELSE.
COMPUTE REALstart=TESTstartADD.
END IF.
COMPUTE End=number(EndDate, ADATE10).
VARIABLE LEVEL End (SCALE).
FORMATS End (ADATE10).
VARIABLE WIDTH End(10).
EXECUTE.
COMPUTE endtimetest=number(endTime, TIME8).
VARIABLE LEVEL endtimetest (SCALE).
FORMATS endtimetest (TIME8).
VARIABLE WIDTH endtimetest(8).
EXECUTE.
compute testendadd=end+endtimetest.
DO if index(endtime,'PM') gt 0 and subs(endtime,1,2) ne '12' .
COMPUTE RealEnd=datesum(testendadd,12,'hours').
ELSE.
COMPUTE REALEND=TESTENDADD.
END IF.
exe.
delete vars Start
starttimetest
teststartadd
End
endtimetest
testendadd.
exe.
formats RealEnd RealStart(datetime23).
compute Length=datedif(realend,realstart,'hours').
if length > 12 check=1.
freq check.
compute StartWkDay=XDATE.WKDAY(realstart).
compute EndWkDay=XDATE.WKDAY(realEnd).
string StartDayText EndDayText(a8).
you'll have to convert using something like
*if XDATE.WKDAY(realstart)=1 startdaytext="Sunday".

Generating means of a variable using dummy variables & foreach in Stata

My dataset includes TWO main variables X and Y.
Variable X represents distinct codes (e.g. 001X01, 001X02, etc) for multiple computer items with different brands.
Variable Y represents the tax charged for each code of variable X (e.g. 15 = 15% for 001X01) at a store.
I've created categories for these computer items using dummy variables (e.g. HD dummy variable for Hard-Drives, takes value of 1 when variable X represents a HD, etc). I have a list of over 40 variables (two of them representing X and Y, and the rest is a bunch of dummy variables for the different categories I've created for computer items).
I would like to display the averages of all these categories using a loop in Stata, but I'm not sure how to do this.
For example the code:
mean Y if HD == 1
Mean estimation Number of obs = 5
--------------------------------------------------------------
| Mean Std. Err. [95% Conf. Interval]
-------------+------------------------------------------------
Tax | 7.1 2.537716 1.154172 15.24583
gives me the mean Tax for the category representing Hard Drives. How can I use a loop in Stata to automatically display all the mean Taxes charged for each category? I would do it by hand without a problem, but I want to repeat this process for multiple years, so I would like to use a loop for each year in order to come up with this output.
My goal is to create a separate Excel file with each of the computer categories I've created (38 total) and the average tax for each category by year.
Why bother with the loop and creating the indicator variables? If I understand correctly, your initial dataset allows the use of a simple collapse:
clear all
set more off
input ///
code tax str10 categ
1 0.15 "hd"
2 0.25 "pend"
3 0.23 "mouse"
4 0.29 "pend"
5 0.16 "pend"
6 0.50 "hd"
7 0.54 "monitor"
8 0.22 "monitor"
9 0.21 "mouse"
10 0.76 "mouse"
end
list
collapse (mean) tax, by(categ)
list
To take to Excel you can try export excel or put excel.
Run help collapse and help export for details.
Edit
Because you insist, below is an example that gives the same result using loops.
I assume the same data input as before. Some testing using this example database
with expand 1000000, shows that speed is virtually the same. But almost surely,
you (including your future you) and your readers will prefer collapse.
It is much clearer, cleaner and concise. It is even prettier.
levelsof categ, local(parts)
gen mtax = .
quietly {
foreach part of local parts {
summarize tax if categ == "`part'", meanonly
replace mtax = r(mean) if categ == "`part'"
}
}
bysort categ: keep if _n == 1
keep categ mtax
Stata has features that make it quite different from other languages. Once you
start getting a hold of it, you will find that many things done with loops elsewhere,
can be made loop-less in Stata. In many cases, the latter style will be preferred.
See corresponding help files using help <command> and if you are not familiarized with saved results (e.g. r(mean)), type help return.
A supplement to Roberto's excellent answer: After collapse, you will need a loop to export the results to excel.
levelsof categ, local(levels)
foreach x of local levels {
export excel `x', replace
}
I prefer to use numerical codes for variables such as your category variable. I then assign them value labels. Here's a version of Roberto's code which does this and which, for closer correspondence to your problem, adds a "year" variable
input code tax categ year
1 0.15 1 1999
2 0.25 2 2000
3 0.23 3 2013
4 0.29 1 2010
5 0.16 2 2000
6 0.50 1 2011
7 0.54 4 2000
8 0.22 4 2003
9 0.21 3 2004
10 0.76 3 2005
end
#delim ;
label define catl
1 hd
2 pend
3 mouse
4 monitor
;
#delim cr
label values categ catl
collapse (mean) tax, by(categ year)
levelsof categ, local(levels)
foreach x of local levels {
export excel `:label (categ) `x'', replace
}
The #delim ; command makes it possible to easily list each code on a separate line. The"label" function in the export statement is an extended macro function to insert a value label into the file name.

Resources