DAX: Create table with LATEST entries - iot

I have a number of sensors in my home, and I want to use PowerBI to display a graph of the temperatures in the different rooms as well as gauge the current/most recent values.
I am having the hardest time writing this in dax:
The data comes to PowerBI from an Azure Table named "DeviceReadings" on the form:
Location (Text, Partition Key)
RowKey (Numeric value based on date stored, not used)
Date (DateTime, TimeStamp)
Temperature (Decimal)
Humidity (Decimal)
What I would like is: (PSEUDOCODE)
Select Location, Last(Date), Temperature, Humidity FROM DeviceReadings
GROUP BY Location
Expected/Wanted outcome is:
"Mediaroom", "01.10.2017 09:00", 26, 17
"Office", "01.10.2017 09:03", 28, 23
"Livingroom", "01.10.2017 09:13", 22, 32
Obviously, I'm trying to create a calculated DAX table based on these readings. The idea is to create a calculated table that at all times contain the most recent temperature/humidity so that I can display those values in gauge-style visuals.
I have been trying to set table = SUMMARIZECOLUMNS, grouping by Location, and then adding named columns "LastSampled" as "MAX(DeviceReadings[Date]) and then "Temperature";LASTNONBLANK(DeviceReadings[Temperature];DeviceReadings[Temperature]); but that does not get a "connected temperature reading, but something else.
Ideally, I want to group by location, then by max date pr location, and then display the raw temperature + humidity value
I simply want a "Most recent temperature reading" by location displayed on my PowerBI dashboard. (I'm using PowerBI desktop to write all the queries and make the reports, and have not yet uploaded to PowerBI portal)
My DAX skills are fairly low, so I need help in writing the calculated query.

Try using this expression:
Table = SELECTCOLUMNS (
FILTER (
CROSSJOIN (
SELECTCOLUMNS (
GROUPBY (
DeviceReadings,
DeviceReadings[Location],
"LASTDATE", MAXX ( CURRENTGROUP (), DeviceReadings[Date] )
),
"Location", DeviceReadings[Location],
"Date", [LASTDATE]
),
SELECTCOLUMNS (
DeviceReadings,
"DateJoin", DeviceReadings[Date],
"LocationJoin", DeviceReadings[Location],
"Humidity", DeviceReadings[Humidity],
"Temperature", DeviceReadings[Temperature]
)
),
[Date] = [DateJoin]
&& [Location] = [LocationJoin]
),
"Location", [Location],
"LastDate", [Date],
"Temperature", [Temperature],
"Humidity", [Humidity]
)
I am pretty sure there is an easier way to achieve what you are after but I cannot figure out right now.
Let me know if this helps.

Related

Query data based on time interval/frequency

I'm trying to have a calendar display a series of activities based on their type, time and frequency for an easier visualization of data.
So far, I have managed to create a formula that correctly fetches the data that I have on a repository and displays it on the calendar. However, I'm not sure how I can have it account for entries that have a frequency (happening every x days).
For an easier understanding here are screenshots of both the table and the schedule
And here's the current formula I'm using to display the event/activity title in each day/hour at C12 for example:
=IFERROR(
INDEX(Repository!$K:$K,
MATCH(
C$10,
IF(
(Repository!$G:$G=$G$8)*
(Repository!$H:$H=$K$8)*
(Repository!$N:$N>=$B12)*
(Repository!$N:$N<$B12+TIME(2,0,0)),
Repository!$D:$D),
0)
),
"")
What I'm currently missing on the formula is a way to correctly account for the start/end date as well as frequency and understand if each day falls under the specified criteria. In case the frequency is 0 then I'd like to have it discard the end date at all (in case for some reason I end up forgetting to set the end date).
I have tried to work with the formula provided to account for the frequency but nothing that I tried seemed to work.
Minimal example requested by #GabrielCarballo
Entry on the table with a 2 days frequency:
Expected result on the schedule:
So basically, the formula on each cell should check for the start date, end date and frequency of the activity and identify if the specific date on the schedule falls under the specified timeframe.
In this minimal example, the activity starts on the 7th December and repeats every 2 days until the 14th of December.
use in C6:
=INDEX(IFNA(VLOOKUP(TEXT(C4:P4+B6:B14, "e-m-d-h-m")&G$2&K$2, SPLIT(FLATTEN(MAP(
Repository!D$4:D, Repository!O$4:O, Repository!P$4:P, Repository!N$4:N,
Repository!G$4:G, Repository!H$4:H, Repository!K$4:K, LAMBDA(d, o, p, n, g, h, k,
IF(DAYS(o, d)>=SEQUENCE(1, MAX(DAYS(o, d)), 0, p), TEXT(d+SEQUENCE(1,
MAX(DAYS(o, d)), 0, p)+IF(ISODD(HOUR(n)), (HOUR(n)*"1:00")-"1:00", HOUR(n)*"1:00"),
"e-m-d-h-m")&g&h&"×"&k, )))), "×"), 2, )))

Trying to label rows in Google Sheets where there are duplicate values in one column in order of of date from another column

I have a list of 1000s of users, their assigned devices, and the device's end of life dates.
I need to label everyone's newest device (furthest end of life date) as Primary and all others as Secondary. The problem is with the duplicate users where some have multiple devices. For example, If someone has 3 devices I need to label 2 of them secondary and their newest one primary.
I hope this makes sense. I've attached an example screenshot of what I'm trying to achieve.
Tried using UNIQUE and COUNTIF and pivot tables but I'm not getting anywhere
Put this formula into D1.
What it does is, it get the data range A1:C8 to form a QUERY, which arrange the data according to 1st: User, 2nd: EndDate (in descending order),
than it uses FILTER function to filter the query with the row data of BYROW function, to found out if the date of each row is equal to the 1st date column of the filter result of the query or not, if true than apply "Primary", otherwise apply "Secondary".
If you need to change the data range, just change the ref.range of the very last line in the code.
=LAMBDA(DATA,
LAMBDA(QUERY,
LAMBDA(USER,PROFILE,DATE,
BYROW(DATA,
LAMBDA(ROW,
IF(INDEX(ROW,,1)="User",
"Device Usage",
IF(INDEX(ROW,,3)=INDEX(FILTER(QUERY,USER=INDEX(ROW,,1)),1,3),"Primary","Secondary")
)
)
)
)(INDEX(QUERY,,1),INDEX(QUERY,,2),INDEX(QUERY,,3))
)(QUERY({DATA},"ORDER BY Col1,Col3 DESC"))
)(A1:C8)
Just noticed that there are 2 unused variables, the code can be simplified a bit:
=LAMBDA(DATA,
LAMBDA(QUERY,
LAMBDA(USER,
BYROW(DATA,
LAMBDA(ROW,
IF(INDEX(ROW,,1)="User",
"Device Usage",
IF(INDEX(ROW,,3)=INDEX(FILTER(QUERY,USER=INDEX(ROW,,1)),1,3),"Primary","Secondary")
)
)
)
)(INDEX(QUERY,,1))
)(QUERY({DATA},"ORDER BY Col1,Col3 DESC"))
)(A1:C8)
This can also be done without the QUERY function:
=LAMBDA(DATA,
LAMBDA(USER_COL,
BYROW(DATA,LAMBDA(ROW,
LAMBDA(USER,DATE,
IF(USER="User",
"Device Usage",
IF(DATE=MAX(INDEX(FILTER(DATA,USER_COL=USER),,3)),"Primary","Secondary")
)
)(INDEX(ROW,,1),INDEX(ROW,,3))
))
)(INDEX(DATA,,1))
)(A1:C8)

In Tableau how do I use RANK to calculate an "OTHER" field?

I'm new to Tableau so this may be an easy question about computations using RANK. I can't find any tableau HELP or other stack-overflow answer to this. Maybe this is a GROUP question. Maybe it's about OTHER.
I have a data set of 160 countries ( rows ) with a field for jetfuel consumption for each country.
I just want to make a bar chart like the attached image showing the 20 highest fuel-consumption countries by name ranked by jetfuel_consumption ( I can do that much) AND an 21st row computed country name titled "Rest of world" summing the remaining 140 countries together as if it were just another country like the bottom of this model .
I have a working valid computed field labelled "myrank" = RANK(AVG([Jetfuel Consumption]),'desc')
My thought was to simply calculate a new text field that would equal the country name for rank < 21 and then be the string "Rest of World" otherwise.
Such as:
IF ( [therank] < 11 ) [Country] ELSE "Rest of World" END
But that is not valid for an unspecified reason. I know I'm confused already about how to just specify the value of a field without something like SUM or AVG or AGG wrapping it, but this is a larger question.
What's the right way to make this view?
I've created simple dataset:
And I want to group TOP 3 countries by Consumption.
To do it I should create a set (click on Country in Dimension) and select TOP 3 By SUM(Cosumption):
Then create a calculated field to show Countries IN Set and "Others".
IF [Country Set] is a boolean expression "The country IN a set".
Drag and Drop corresponding fields and configure sort, for example:
Sets are convenient to dynamically change, expand and customize any visualization. More detailed: https://help.tableau.com/current/pro/desktop/en-us/sortgroup_sets_topn.htm

Merge Columns without deleting or affecting the rows

Im seeking some experienced advice, as I have been working on a little project to automate how we gather data in my office using Google Sheets(Please note that I can't use add-ons).
I'm encountering difficulties in finding a way to merge columns that have the same name, but without deleting/merging the row(because im pulling stats for the different tasks employees handle).
In the example you can see that column A has names that repeat because each individual completes one or more tasks, so my goal would be to find a way to automatically merge the repeating names in column A without affecting the rest of the columns.
I believe it is important to know that the table auto-populates as im currently using a filter function, because I paste all my data in the excel and it filters only my agents names.
Here is the formula that im using in c26:
=FILTER(A3:G22,ARRAYFORMULA(ISNUMBER(MATCH(A3:A22,{"Mary";"Jason";"Ana";"Jen";"Ben";"Helen";"Dan";"Richard";"Breg"},0))))
Please tell me that there is a way to do this!
Here is a link to the example Doc That I made
https://docs.google.com/spreadsheets/d/1RsCeHfzzbRsUDnj6UmmCdfzjryp-2xW09UTCx_qfIpA/edit?usp=sharing
Here you can do it, but can't merge by formula:
= ARRAYFORMULA (
ifna (
vlookup (
ifna (
sort (
{ row(A4:A22) *
len ( vlookup ( A4:A22,
{"Ana";"Jen";"Ben";"Helen";"Dan";"Richard";"Breg"},1,0)
)^0}
,1,true
),""
)
, { row(A4:A22),
left(A4:A22, 1000 *
transpose (
split (join("","1," & rept("0,",countif(A4:A22,unique(A4:A22))-1)),",",true,true))
),
B4:G22
},{2,3,4,5,6,7,8},false
),""
)
)
Another formula with the same result:
= ARRAYFORMULA (
ifna(vlookup (
sort (
{ row(A4:A22) *
len ( vlookup ( A4:A22,
{"Ana";"Jen";"Ben";"Helen";"Dan";"Richard";"Breg"},1,0)
)^0}
,1,true
),
{ row(A4:A22),
if(transpose(split(join(
"","1," & rept("0,",COUNTIF(A4:A22,unique(A4:A22))-1)),",",true,true))=1,
A4:A22,""
),B4:G22
},{2,3,4,5,6,7,8},false),""
)
)
After a little bit of discussion on the sheet, this query() will display only one name per agent, but still list all the rows for that agent.
=ARRAYFORMULA(ARRAY_CONSTRAIN(QUERY({Sheet1!A1:G22,IF(COUNTIFS(Sheet1!A1:A22,Sheet1!A1:A22,ROW(Sheet1!A1:A22),"<="&ROW(Sheet1!A1:A22))>1,"",Sheet1!A1:A22)},"select Col8,Col2,Col3,Col4,Col5,Col6,Col7,Col1 where Col1 matches '"&TEXTJOIN("|",TRUE,A2:A)&"' order by Col1 label Col8'Name'",3),9^99,7))

What is the best way to store sensor data in Clickhouse?

We have a set of devices and all of them have sensors. All devices have some common set of sensors, but some devices have additional sensors. Every sensor has different discretization level and some sensors could change sometimes very fast, and sometimes could not change for some time.
For example, we have DeviceA and have a stream of packets in a form(NULL means that value doesn't change):
Timestamp, Temp, Latitude, Longitude, Speed...
111, 20, 54.111, 23.111, 10
112, 20, NULL, NULL, 13
113, 20, NULL, 23.112, 15
And DeviceB:
Timestamp, Temp, Latitude, Longitude, Speed..., AdditionalSensor
111, 24, 54.111, 23.121, 10 ... 1
112, 23, 55.111, 23.121, 13 ... 2
113, 23, 55.111, 23.122, 15 ... 1
After some time new sensors could be added to some device.
Every sensor could be any of numeric types(Int32, UInt8, Float32)
After that data will be used to calculate: dau, mau, retention, GPS coordinates clustering and so on.
We could simply create some table:
CREATE TABLE Sensors
(
Date Date,
Id FixedString(16),
DeviceNumber FixedString(16),
TimeUtc DateTime,
DeviceTime DateTime,
Version Int32,
Mileage Int32,
Longitude Float64,
Latitude Float64,
AccelX Float64,
AccelY Float64,
AccelZ Float64
...
) ENGINE = MergeTree(Date, (DeviceNumber, TimeUtc), 8192);
But two problems here: no support for a different set of sensors and sometimes we have null values for some sensor values in case of no changes and it would be great to see last non null value before a timestamp.
The first problem we could solve by creating a table with fields: SensorName, Timestamp, Date, Value. But how to choose correct type? Should we use different tables for different types?
Probably we need to use graphite engine, unfortunately, I have no any experience with that. So any help is really appreciated. It would be great to have possibility to keep only changed values of any sensor.
Update
I found a way how to deal with null values. We could use "anyLast" function to request last received value for a column:
SELECT anyLast(Lights) FROM test where TimeUtc <= toDateTime('2017-11-07 11:13:59');
Unfortunately we can't fill all missing values using some kind of overlapping window functions(no support for them in clickhouse). So in case of nullable field aggregate function will use only not null values and in case of non-nullable field all values including zero values will be used and both ways are incorrect. A workaround is to fill null values before insert using select with anyLast values for all null values in a row.
You can use Clickhouse like a time-series database.
Your table definition is restricting you from having dynamic metrics. That's why you are trying to deal with NULL values.
You can use this table for sensor values:
CREATE TABLE ts1(
entity String,
ts UInt64, -- timestamp, milliseconds from January 1 1970
s Array(String), -- names of the sensors
v Array(Float32), -- sensor values
d Date MATERIALIZED toDate(round(ts/1000)), -- auto generate date from ts column
dt DateTime MATERIALIZED toDateTime(round(ts/1000)) -- auto generate date time from ts column
) ENGINE = MergeTree(d, entity, 8192)
Here we are loading sensor values of device A:
INSERT INTO ts1(entity, ts, s, v)
VALUES ('deviceA', 1509232010254, ['temp','lat','long','speed'], [24, 54.111, 23.121, 11])
Querying deviceA temp data:
SELECT
entity,
dt,
ts,
v[indexOf(s, 'temp')] AS temp
FROM ts1
WHERE entity = 'deviceA'
┌─entity─┬──────────────────dt─┬────────────ts─┬─temp─┐
│ deviceA│ 2017-10-28 23:06:50 │ 1509232010254 │ 24 │
└────────┴─────────────────────┴───────────────┴──────┘
Check this full answer to get a detailed usage.

Resources