Why does aggregate divide a dataset into sub-groups?

Why does aggregate divide a dataset into sub-groups? - spss

I have dataset1 with two variables:
Date Date
Type Numeric //Can be either 1 or 2
It looks like this (simplified):
Date Type
04.04.15 1
04.04.15 1
04.04.15 1
04.04.15 1
04.04.15 1
04.04.15 1
04.04.15 1
04.04.15 1
When I try to aggregate this into a new dataset, I get:
04.04.15 6
04.04.15 2
Why does it divide into two subgroups? I have tried changing the datatype, but it still does the same. Why does it not simply aggregate it to 8?
I use the aggregate data command, with date as the break variable, and I check the count variables tick-box.

Date variables can contain time in addition to date, but be formatted to show only the date in the file. Your two groups must have the same dates but different times. Try going to the variable view and changing the Date format to one that shows the time, then you'll see if this is indeed the explanation.

Related

How to select data with minimum time interval between results

I am not sure how to best ask this question.. I am looking to select data but with a minimum time interval between the results. For example:
This measurement:
time field
2015-08-18T00:00:00Z 12
2015-08-18T00:00:00Z 1
2015-08-18T00:06:00Z 11
2015-08-18T00:06:00Z 3
2015-08-18T05:54:00Z 2
2015-08-18T06:00:00Z 1
2015-08-18T06:06:00Z 8
2015-08-18T06:12:00Z 7
This Query:
select sum(*) from measurement where field > 0 would return the sum of all of the rows. I would like to be able to specify a minimum interval between results and only match on the first row in a set of closely timed rows. Ex. 8 minute minimum interval would only match these rows (and result in a sum of 22):
time field
2015-08-18T00:00:00Z 12
2015-08-18T05:54:00Z 2
2015-08-18T06:06:00Z 8
Is there a way to get my expected output from influxdb?
The only alternative I can think of is to just return all of the rows without the sum() aggregate function then loop through the results and do lots of time comparisons or date math in my application.

Probably not with InfluxQL.
InfluxQL has a function elapsed which returns the time elapsed between consecutive datapoints https://docs.influxdata.com/influxdb/v1.7/query_language/functions/#elapsed
That's possibly the only function that has something to do with time but I can't think of a way to apply it for what you need.
You may have better luck with the window function of Flux https://v2.docs.influxdata.com/v2.0/query-data/guides/window-aggregate/
I'm not familiar enough to say how, if at all possible.
Doing it in your application may be the way to go.

Knime: Time Series

I have a list of time series and I have extracted the time and date field for my calculation. I would Like to insert all the missing dates that fall under two row,Like the one in the screenshotenter image description here.
P.S. I do not have a code to add here.
Update : I have tried to add a lag column to get the next time and then a java script to find the number of interval. Now I have a number of columns to be inserted but I am finding it difficult to insert the rows and also is there any other efficient way than this?
Update 2:
I have tried generating time series like
Date and time Group
2012-02-24 0
2012-02-24 1
2012-02-24 2
2012-02-24 3
2012-02-25 0
2012-02-25 1
2012-02-25 2
2012-02-25 3
And I have a time series like
Date and time Group
24.2.2012 1
24.2.2012 2
24.2.2012 3
25.2.2012 0
25.2.2012 1
25.2.2012 2
25.2.2012 3
May i know how to merge them in knime to achieve
Date and time Group
2012-02-24 Null
2012-02-24 1
2012-02-24 2
2012-02-24 3
2012-02-25 0
2012-02-25 1
2012-02-25 2
2012-02-25 3

I was able to produce it by creating a unique date series and then using Join node and then sorting it based on Date. Thank you.

TABLEAU: calc field to get the last value available

I'm using Tableau Desktop, my data are like this:
KPI,date,monthValue
coffee break,01/06/2015,10.50
coffee break,01/07/2015,8.30
and I want to build a table like this
KPI, year(date), last value
coffee time, 2015, 8.30
How can I set a calculated field in order to show me the last value available in that year? I tried to do:
LOOKUP([MonthValue], LAST())
But it didn't work and tells me 'cannot mix aggregate and non-aggregate', so I did:
LOOKUP(sum([MonthValue]), LAST())
But it didn't work too. How should I proceed?

If you are using Tableau 9 then you can do this with an LOD calc that looks for the max value in your date field and then checks if the current date value is the same as the max date value.
[Date] == {fixed: max([Date])}
As you can see in the example below when you use the calc as a filter you will only get the last row from your example above.
UPDATE: to get the values per year you can do something like:
Here I am using a table calculation to find the max date per year and then ranking those dates and filtering down to the latest date in each year (which will be the one that has a rank equal to 1).
!max date is WINDOW_MAX(ATTR(Date))
!rank is RANK(Date)
You need to make sure that the table calculations are computer in the correct way (in this case across the values of each year).

Create an array of hash values in order of this weeks date

I have a view that displays uses a five element array to display some numbers where the elements relate to mon,tue,wed,thur,fri
[100,200,50,300,200]
this is built ( not very cleanly) by the method
def this_weeks_sales
x = self.sales.where(sale_date: (Date.today.beginning_of_week..(Date.today.beginning_of_week + 5)))
.group("DATE(sales.sale_date)").sum("sales.amount").sort.to_h.values.in_groups_of(5,0)[0]
end
If there are no sales for a day it should have '0' as that element and
always have 5 elements for mon-friday
I've run some tests and it's not working as I thought it would, if there is a £100 sale for tuesday but nothing for monday then I get the array [100,0,0,0,0] instead of the expected [0,100,0,0,0] i.e the first sale of the week will always be element[0]
I don't want to change all my views, how can I get the desired output?
self.sales.where(sale_date: (Date.today.beginning_of_week..(Date.today.beginning_of_week + 5)))
.group("DATE(sales.sale_date)").sum("sales.amount").sort.to_h
returns a hash e.g from the example above if there's only a sale on tuesday {Tue, 28 Jul 2015=>100}

ETA: You're grouping and sorting correctly, but then transforming the sorted array back into a hash and pulling the (unordered) values. You just need to leave it as an array and map it to the sum:
sales.where(sale_date: Time.zone.now.all_week).group("DATE(sales.sale_date)").sum(:amount).sort_by{|date,sum| date}.map{|date,sum| sum}
Edit 2: If you want to get 0 for dates that don't exist in the database, you'll have to loop through the desired dates:
daily_sale_totals = sales.where(sale_date: Date.today.all_week).group("DATE(sales.sale_date)").sum(:amount)
Date.today.all_week.map{|date| daily_sale_totals[date] || 0}

SPSS dataset restructuring involving variable for survey completion date

I'm using SPSS and have a dataset comprised of individuals' responses to a survey question. This is longitudinal data, so the subjects have taken the survey at least twice and some as many as four or five times.
My variables are ID (scale), date of survey completion (date - dd-mmm-yyyy), and response to survey question (scale).
The dataset is sorted by ID then date (ascending). Each date corresponds to survey time 1, time 2, etc. What I would like to do is compute a new variable time that corresponds to the survey completion dates for a particular participant. I would then like to use that variable to complete a long-to-wide restructuring of the dataset.
So, I'd like to accomplish the following and am not sure how to go about doing it:
1) I have something like this:
ID Date Assessment_Answer
----------------------------------
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
2) Want to compute another variable that would give me this:
ID Date Assessment_Answer Time
-----------------------------------------
1 01-Jan-2009 4 Time1
1 01-Jan-2010 1 Time2
1 01-Jan-2011 5 Time3
2 15-Oct-2012 6 Time1
2 15-Oct-2013 0 Time2
3) And restructure so that I have something like this:
ID Time1 Time2 Time3 Time4
--------------------------
1 4 1 5
2 6 0

You can use sequential case processing to create a variable that is a counter within each ID. So for example:
*Making fake data.
DATA LIST FREE / ID (F1.0) Date (DATE10) Assessment_Answer (F1.0).
BEGIN DATA
1 01-Jan-2009 4
1 01-Jan-2010 1
1 01-Jan-2011 5
2 15-Oct-2012 6
2 15-Oct-2012 0
END DATA.
*Making counter within ID.
SORT CASES BY Id Date.
DO IF ($casenum = 1) OR (Id <> LAG(ID)).
COMPUTE Time = 1.
ELSE.
COMPUTE Time = LAG(Time) + 1.
END IF.
FORMATS Time (F2.0).
EXECUTE.
Now you can use CASESTOVARS to reshape the data like you requested.
CASESTOVARS
/ID = Id
/INDEX = Time
/DROP Date.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Why does aggregate divide a dataset into sub-groups? - spss

Related

How to select data with minimum time interval between results

Knime: Time Series

TABLEAU: calc field to get the last value available

Create an array of hash values in order of this weeks date

SPSS dataset restructuring involving variable for survey completion date

Categories

Resources