InfluxDB subquery in WHERE clause - influxdb

Having some issues wrapping my brain around this one. I have two tables in InfluxDB 1.8.x, here's the relevant data layout
table a
-------------------------------------------
|time |hostname|device_cache|
|6/14/2022 9:00:30PM|device1 |dm-4 |
|6/14/2022 9:00:30PM|device2 |dm-4 |
|6/14/2022 9:00:30PM|device3 |dm-8 |
-------------------------------------------
table b
-----------------------------------------------------
|time |hostname|diskiodevice|diskiola1|
|6/14/2022 9:00:30PM|device1 |dm-0 |8 |
|6/14/2022 9:00:30PM|device1 |dm-4 |7 |
|6/14/2022 9:00:30PM|device3 |dm-3 |9 |
|6/14/2022 9:00:30PM|device2 |dm-2 |8 |
|6/14/2022 9:00:30PM|device3 |dm-8 |15 |
|6/14/2022 9:00:30PM|device2 |dm-4 |9 |
|6/14/2022 9:00:30PM|device3 |dm-3 |1 |
-----------------------------------------------------
So, what I am trying to do is get all the diskiola1 values for the diskiodevices from table b that are defined as device_cache items from table a for a particular hostname entry. Here's what I've tried:
SELECT max("diskiola1")
FROM "table b"
WHERE hostname = 'device1'
AND
time > now() - 10m
AND
"cache_device" IN
( Select distinct("device_cache") as "cache_device" FROM "table a" WHERE hostname = 'device1')
GROUP BY time(20s)
My goal is to have this as a time series in a graph to show the values of diskiola1 for a given host over a period of time for only the device_cache items. This data is given to me to work with, I really can't modify it unfortunately.
Anyone see where I'm going wrong? The error I receive is
ERR: error parsing query: found IN, expected ;

Unfortunately InfluxQL doesn't support IN operator or for the foreseeable future (see details here). InfluxQL doesn't support JOIN operation either (see details here).
Seems your "table_a" is more like a mapping table while "table_b" is storing the time series data actually. Assuming hostname is a tag while device_cache is a field for "table a"; hostname is a tag while diskiodevice and diskiola1 are fields for "table b". You could try enabling Flux and try following sample codes:
aDistinctDeviceCache = from(bucket:"yourDatabaseName/yourRentionPolicyName")
|> range(start: 2018-05-22T23:30:00Z, stop: 2018-05-23T00:00:00Z) // start and stop can be changed
|> filter(fn:(r) => r._measurement == "table a" and r.hostname == "device1" and r._field == "device_cache")
|> distinct()
bDevice1 = from(bucket:"yourDatabaseName/yourRentionPolicyName")
|> range(start: -10m)
|> filter(fn:(r) => r._measurement == "table b" and r.hostname == "device1")
|> rename(columns: {diskiodevice: "device_cache"})
maxDiskiola1ForDevice1 =
join(tables:{aPlus:aDistinctDeviceCache, bPlus:bDevice1}, on:["hostname", "device_cache"])
|> window(every: 20s)
|> max("diskiola1")
|> yield()
This will first grab distinct values from "table_a" and then rename some field of "table_b" so that we can join the two tables together in the last step.
Here are some more tips to convert your InfluxQL to Flux and convert your subqueries.

Related

Gremlin, combine two queries and join data

I have a problem making a query for the following case:
+--------------------hasManager-------------------+
| | |
| property:isPersonalMngr=true (bool) |
| v
[ Employee ]-- hasShift -->[ Shift ]-- hasManager -->[ Manager ]
| | |
| | property:isPersonalMngr=false (bool)
| |
| property:name (text)
|
property:baseShift (bool)
For a manager 'John', who is managing shifts and can also be a personal manager of an empoyee, I want return all the employees he's managing with the list of shifts for each employee. Each empoyee has a 'baseShift' (say: 'night' / 'day') and a scheduled shift ('wed123')
Eg:
[ 'Employee1', [ 'night', 'wed123', 'sat123' ]]
[ 'Employee2', [ 'day', 'mon123', 'tue123' ]]
For the shift employees I have this:
g.V('John').in('hasManager').in('hasShift').hasLabel('Employee')
For the personal managed I have this:
g.V('John').in('hasManager').hasLabel('Employee')
How do I combine these two AND add the name property of the shift in a list?
Thanks.
To test this, I created the following graph. Hope this fits your data model from above:
g.addV('Manager').property(id,'John').as('john').
addV('Manager').property(id,'Terry').as('terry').
addV('Manager').property(id,'Sally').as('sally').
addV('Employee').property(id,'Tom').as('tom').
addV('Employee').property(id,'Tim').as('tim').
addV('Employee').property(id,'Lisa').as('lisa').
addV('Employee').property(id,'Sue').as('sue').
addV('Employee').property(id,'Chris').as('chris').
addV('Employee').property(id,'Bob').as('bob').
addV('Shift').property('name','mon123').as('mon123').
addV('Shift').property('name','tues123').as('tues123').
addV('Shift').property('name','sat123').as('sat123').
addV('Shift').property('name','wed123').as('wed123').
addE('hasManager').from('tom').to('john').property('isPersonalMngr',true).
addE('hasManager').from('tim').to('john').property('isPersonalMngr',true).
addE('hasManager').from('lisa').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('sue').to('terry').property('isPersonalMngr',true).
addE('hasManager').from('chris').to('sally').property('isPersonalMngr',true).
addE('hasManager').from('bob').to('sally').property('isPersonalMngr',true).
addE('hasShift').from('tom').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('tues123').property('baseShift','night').
addE('hasShift').from('lisa').to('wed123').property('baseShift','night').
addE('hasShift').from('sue').to('sat123').property('baseShift','night').
addE('hasShift').from('chris').to('wed123').property('baseShift','day').
addE('hasShift').from('bob').to('sat123').property('baseShift','day').
addE('hasShift').from('bob').to('mon123').property('baseShift','day').
addE('hasShift').from('tim').to('wed123').property('baseShift','day').
addE('hasManager').from('mon123').to('terry').property('isPersonalMngr',false).
addE('hasManager').from('tues123').to('sally').property('isPersonalMngr',false).
addE('hasManager').from('wed123').to('john').property('isPersonalMngr',false).
addE('hasManager').from('sat123').to('terry').property('isPersonalMngr',false)
From this, the follow query generates an output in the format that you're looking for:
gremlin> g.V('John').
union(
inE('hasManager').has('isPersonalMngr',true).outV(),
inE('hasManager').has('isPersonalMngr',false).outV().in('hasShift')).
dedup().
map(union(id(),out('hasShift').values('name').fold()).fold())
==>[Tom,[mon123]]
==>[Tim,[tues123,wed123]]
==>[Lisa,[wed123]]
==>[Chris,[wed123]]
A note on your data model - you could likely simplify things by having two different types of edges for hasManager and that would remove the need for a boolean property on those edges. Instead, you could have hasOrgManager and hasShiftManager edges and that would remove the need for the property checks when traversing those edges.

Dataflow stream python windowing

i am new in using dataflow. I have following logic :
Event is added to pubsub
Dataflow reads pubsub and gets the event
From event i am looking into MySQL to find relations in which segments this event have relation and list of relations is returned with this step. This segments are independent from one another.
Each segment can be divided to two tables in MySQL results for email and mobile and they are independent as well.
Each segment have rules that can be 1 to n . I would like to process this step in parallel and collect all results. I have tried to use Windows but i am not sure how to write the logic so when i get the combined results from all rules inside one segment all of them will be collected at end function and write the final logic inside MySQL depending from rule results ( boolean ).
Here is so far what i have :
testP = beam.Pipeline(options=options)
ReadData = (
testP | 'ReadData' >> beam.io.ReadFromPubSub(subscription=str(options.pubsubsubscriber.get())).with_output_types(bytes)
| 'Decode' >> beam.Map(lambda x: x.decode('utf-8'))
| 'GetSegments' >> beam.ParDo(getsegments(options))
)
processEmails = (ReadData
| 'GetSubscribersWithRulesForEmails' >> beam.ParDo(GetSubscribersWithRules(options, 'email'))
| 'ProcessSubscribersSegmentsForEmails' >> beam.ParDo(ProcessSubscribersSegments(options, 'email'))
)
processMobiles = (ReadData
| 'GetSubscribersWithRulesForMobiles' >> beam.ParDo(GetSubscribersWithRules(options, 'mobile'))
| 'ProcessSubscribersSegmentsForMobiles' >> beam.ParDo(ProcessSubscribersSegments(options, 'mobile'))
)
#for sake of testing only window for email is written
windowThis = (processEmails
| beam.WindowInto(
beam.window.FixedWindows(1),
trigger=beam.transforms.trigger.Repeatedly(
beam.transforms.trigger.AfterProcessingTime(1 * 10)),
accumulation_mode=beam.transforms.trigger.AccumulationMode.DISCARDING)
| beam.CombinePerKey(beam.combiners.ToListCombineFn())
| beam.ParDo(print_windows)
)
In this case, because all of your elements have the exact same timestamp, I would use their message ID, and their timestamp to group them with Session windows. It would be something like this:
testP = beam.Pipeline(options=options)
ReadData = (
testP | 'ReadData' >> beam.io.ReadFromPubSub(subscription=str(options.pubsubsubscriber.get())).with_output_types(bytes)
| 'Decode' >> beam.Map(lambda x: x.decode('utf-8'))
| 'GetSegments' >> beam.ParDo(getsegments(options))
)
# At this point, ReadData contains (key, value) pairs with a timestamp.
# (Now we perform all of the processing
processEmails = (ReadData | ....)
processMobiles = (ReadData | .....)
# Now we window by sessions with a 1-second gap. This is okay because all of
# the elements for any given key have the exact same timestamp.
windowThis = (processEmails
| beam.WindowInto(beam.window.Sessions(1)) # Default trigger is fine
| beam.CombinePerKey(beam.combiners.ToListCombineFn())
| beam.ParDo(print_windows)
)

How do I perform a "diff" on two Sources given a key using Apache Beam Python SDK?

I posed the question generically, because maybe it is a generic answer. But a specific example is comparing 2 BigQuery tables with the same schema, but potentially different data. I want a diff, i.e. what was added, deleted, modified, with respect to a composite key, e.g. the first 2 columns.
Table A
C1 C2 C3
-----------
a a 1
a b 1
a c 1
Table B
C1 C2 C3 # Notes if comparing B to A
-------------------------------------
a a 1 # No Change to the key a + a
a b 2 # Key a + b Changed from 1 to 2
# Deleted key a + c with value 1
a d 1 # Added key a + d
I basically want to be able to make/report the comparison notes.
Or from a Beam perspective I may want to Just output up to 4 labeled PCollections: Unchanged, Changed, Added, Deleted. How do I do this and what would the PCollections look like?
What you want to do here, basically, is join two tables and compare the result of that, right? You can look at my answer to this question, to see the two ways in which you can join two tables (Side inputs, or CoGroupByKey).
I'll also code a solution for your problem using CoGroupByKey. I'm writing the code in Python because I'm more familiar with the Python SDK, but you'd implement similar logic in Java:
def make_kv_pair(x):
""" Output the record with the x[0]+x[1] key added."""
return ((x[0], x[1]), x)
table_a = (p | 'ReadTableA' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysA' >> beam.Map(make_kv_pair)
table_b = (p | 'ReadTableB' >> beam.Read(beam.io.BigQuerySource(....))
| 'SetKeysB' >> beam.Map(make_kv_pair))
joined_tables = ({'table_a': table_a, 'table_b': table_b}
| beam.CoGroupByKey())
output_types = ['changed', 'added', 'deleted', 'unchanged']
class FilterDoFn(beam.DoFn):
def process((key, values)):
table_a_value = list(values['table_a'])
table_b_value = list(values['table_b'])
if table_a_value == table_b_value:
yield pvalue.TaggedOutput('unchanged', key)
elif len(table_a_value) < len(table_b_value):
yield pvalue.TaggedOutput('added', key)
elif len(table_a_value) > len(table_b_value):
yield pvalue.TaggedOutput('removed', key)
elif table_a_value != table_b_value:
yield pvalue.TaggedOutput('changed', key)
key_collections = (joined_tables
| beam.ParDo(FilterDoFn()).with_outputs(*output_types))
# Now you can handle each output
key_collections.unchanged | WriteToText(...)
key_collections.changed | WriteToText(...)
key_collections.added | WriteToText(...)
key_collections.removed | WriteToText(...)

Calculate hierarchical labels for Google Sheets using native functions

Using Google Sheets, I want to automatically number rows like so:
The key is that I want this to use built-in functions only.
I have an implementation working where child items are in separate columns (e.g. "Foo" is in column B, "Bar" is in column C, and "Baz" is in column D). However, it uses a custom JavaScript function, and the slow way that custom JavaScript functions are evaluated, combined with the dependencies, possibly combined with a slow Internet connection, means that my solution can take over one second per row (!) to calculate.
For reference, here's my custom function (that I want to abandon in favor of native code):
/**
* Calculate the Work Breakdown Structure id for this row.
*
* #param {range} priorIds IDs that precede this one.
* #param {range} names The names for this row.
* #return A WBS string id (e.g. "2.1.5") or an empty string if there are no names.
* #customfunction
*/
function WBS_ID(priorIds,names){
if (Array.isArray(names[0])) names = names[0];
if (!names.join("")) return "";
var lastId,pieces=[];
for (var i=priorIds.length;i-- && !lastId;) lastId=priorIds[i][0];
if (lastId) pieces = (lastId+"").split('.').map(function(s){ return s*1 });
for (var i=0;i<names.length;i++){
if (names[i]){
var s = pieces.concat();
pieces.length=i+1;
pieces[i] = (pieces[i]||0) + 1;
return pieces.join(".");
}
}
}
For example, cell A7 would use the formula:
=WBS_ID(A$2:A6,B7:D7)
...to produce the result "1.3.2"
Note that in the above example blank rows are skipped during numbering. An answer that does not honor this—where the ID is calculated determinstically from the ROW())—is acceptable (and possibly even desirable).
Edit: Yes, I've tried to do this myself. I have a solution that uses three extra columns which I chose not to include in the question. I have been writing equations in Excel for at least 25 years (and Google Spreadsheets for 1 year). I have looked through the list of functions for Google Spreadsheets and none of them jumps out to me as making possible something that I didn't think of before.
When the question is a programming problem and the problem is an inability to see how to get from point A to point B, I don't know that it's useful to "show what I've done". I've considered splitting by periods. I've looked for a map equivalent function. I know how to use isblank() and counta().
Lol this is hilariously the longest (and very likely the most unnecessarily complicated way to combine formulas) but because I thought it was interesting that it does in fact work, so long as you just add a 1 in the first row then in the second row you add:
=if(row()=1,1,if(and(istext(D2),counta(split(A1,"."))=3),left(A1,4)&n(right(A1,1)+1),if(and(isblank(B2),isblank(C2),isblank(D2)),"",if(and(isblank(B2),isblank(C2),isnumber(indirect(address(row()-1,column())))),indirect(address(row()-1,column()))&"."&if(istext(D2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,)),if(and(isblank(B2),istext(C2)),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+0.1,2),if(istext(B2),round(max(indirect(address(1,column())&":"&address(row()-1,column())))+1,),))))))
in my defense ive had a very long day at work - complicating what should be a simple thing seems to be my thing today :)
Foreword
Spreadsheet built-in functions doesn't include an equivalent to JavaScript .map. The alternative is to use the spreadsheets array handling features and iteration patterns.
A "complete solution" could include the use of built-in functions to automatically transform the user input into a simple table and returning the Work Breakdown Structure number (WBS) . Some people refer to transforming the user input into a simple table as "normalization" but including this will make this post to be too long for the Stack Overflow format, so it will be focused in presenting a short formula to obtain the WBS.
It's worth to say that using formulas for doing the transformation of large data sets into a simple table as part of the continuous spreadsheet calculations, in this case, of WBS, will make the spreadsheet to slow to refresh.
Short answer
To keep the WBS formula short and simple, first transform the user input into a simple table including task name, id and parent id columns, then use a formula like the following:
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
Explanation
First, prepare your data
Put each task in one row. Include a General task / project to be used as the parent of all the root level tasks.
Add an ID to each task.
Add a reference to the ID of the parent task for each task. Left blank for the General task / project.
After the above steps the data should look like the following:
+---+--------------+----+-----------+
| | A | B | C |
+---+--------------+----+-----------+
| 1 | Task | ID | Parent ID |
| 2 | General task | 1 | |
| 3 | Substast 1 | 2 | 1 |
| 4 | Substast 2 | 3 | 1 |
| 5 | Subsubtask 1 | 4 | 2 |
| 6 | Subsubtask 2 | 5 | 2 |
+---+--------------+----+-----------+
Remark: This also could help to reduce of required processing time of a custom funcion.
Second, add the below formula to D2, then fill down as needed,
=ArrayFormula(
IFERROR(
INDEX($D$2:$D,MATCH($C2,$B$2:$B,0))
&"."
&COUNTIF($C$2:$C2,C2),
RANK($B2,FILTER($B$2:B,LEN($C$2:$C)=0),TRUE)&"")
)
The result should look like the following:
+---+--------------+----+-----------+----------+
| | A | B | C | D |
+---+--------------+----+-----------+----------+
| 1 | Task | ID | Parent ID | WBS |
| 2 | General task | 1 | | 1 |
| 3 | Substast 1 | 2 | 1 | 1.1 |
| 4 | Substast 2 | 3 | 1 | 1.2 |
| 5 | Subsubtask 1 | 4 | 2 | 1.1.1 |
| 6 | Subsubtask 2 | 5 | 2 | 1.1.2 |
+---+--------------+----+-----------+----------+
Here's an answer that does not allow a blank line between items, and requires that you manually type "1" into the first cell (A2). This formula is applied to cell A3, with the assumption that there are at most three levels of hierarchy in columns B, C, and D.
=IF(
COUNTA(B3), // If there is a value in the 1st column
INDEX(SPLIT(A2,"."),1)+1, // find the 1st part of the prior ID, plus 1
IF( // ...otherwise
COUNTA(C3), // If there's a value in the 2nd column
INDEX(SPLIT(A2,"."),1) // find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1, // add the 2nd part of the prior ID (or 0), plus 1
INDEX(SPLIT(A2,"."),1) // ...otherwise find the 1st part of the prior ID
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),2),1) // add the 2nd part of the prior ID or 1 and
& "." // add a period and
& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1) // add the 3rd part of the prior ID (or 0), plus 1
)
) & "" // Ensure the result is a string ("1.2", not 1.2)
Without comments:
=IF(COUNTA(B3),INDEX(SPLIT(A2,"."),1)+1,IF(COUNTA(C3),INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),0)+1,INDEX(SPLIT(A2,"."),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),2),1)& "."& IFERROR(INDEX(SPLIT(A2,"."),3)+1,1))) & ""

How to build a table where each cell is built using the 2 dimensions

Desired output
Each User has child Plan which has child PlanDate objects. PlanDate has an attribute ddate which is just a date. Plan has an attribute mtype that can either be M, V, or C (haha MVC, subconscious techy much?). For a given week (let's just say the current week), I'd like to print out a table that looks like this:
----------------------------------------------------------------------------
| User | Mon | Tue | Wed | Thu | Fri | Other attributes of User
----------------------------------------------------------------------------
| Eric | M | | M | | M | ...
----------------------------------------------------------------------------
| Erin | V | V | V | V | V | ...
----------------------------------------------------------------------------
| Jace | | C | C | | | ...
----------------------------------------------------------------------------
| Kris | C | | | | | ...
----------------------------------------------------------------------------
| Tina | V | | V | | V | ...
----------------------------------------------------------------------------
| Lily | M | M | M | M | M | ...
----------------------------------------------------------------------------
The order of the Users on the rows doesn't really matter to me; I may add Ransack gem to make it ordered, but for now ignore. A given User may not have PlanDates with a ddate for every day in a given week, and certainly there's no relationship between the PlanDates across Users.
Proposed options
I feel like there are two options:
In the view, print the column headers with a data-attribute of the day in question, and print the row headers with a data-attribute of the user id in question (will have to first select all the users who do have a grandchild PlanDate with a ddate somewhere in the current week). Then in the intersection, use the two data-attributes to query ActiveRecord.
In the model, generate a data hash that can create the table, and then pass that hash to the view via the controller.
Option 1 makes more intuitive sense to me having been a lifelong Excel user, but it breaks MVC entirely, so I'm trying to go with Option 2, and the challenges below are related to Option 2. That said if you can make a case for Option 1 go for it! I feel like if you can convince me to do Option 1, I can implement it without the same problems...
Challenges with Option 2
I can build a hash with one dimension as a key, and a hash of the other dimension as an array. For example, if the days of the current week were used as the key:
{
Mon => [Eric, Erin, Kris, Tina, Lily],
Tue => [Erin, Jace, Lily]
Wed => [Eric, Erin, Jace, Kris, Lily],
Thu => [Erin, Lily],
Fri => [Eric, Erin, Tina, Lily]
}
But the problem is once I get there, I'm not sure how to deal with the fact that there are blanks in the data... If I were to convert the hash above into a table, I would only know how to make the Users appear as a list under each date; but then that wouldn't look like my desired output at all, because there wouldn't be gaps in the data. For example on Monday, there's no Jace, but there needs to be a blank space for Jace so that it's easy for the Viewer to look across and see, ah there's no Jace on Monday, but there is a Jace on Tuesday and Wednesday.
Oh actually I just needed a minute to think logically about this... I just need a nested hash, one with the first dimension, one with the second. So a method like this:
def plan_table
# 1 : get an array of the current week's dates
week = (Date.today.at_beginning_of_week..(Date.today.at_end_of_week-2.days)).map { |d| d }
# 2 : find all the users (plucking the id) that have a plan date in the current week
current_week_users = User.select { |u| u.plans.select { |p| p.plan_dates.select { |pd| week.include? pd.ddate.to_date }.count > 0 }.count > 0 }.map(&:id)
# 3: build the hash
#week_table = Hash.new
week.each do |day|
#week_table[day] = {}
current_week_users.each do |user_id|
# for each user in the has we built already, we have to check if that user has a pd that: 1) falls on this date, 2) has a non canceled plan, 3) has a user that matches this current user. The extra checks for user_id and plan_id are my ole' paranoia about objects being created without parents
potential_pd = PlanDate.select { |pd| pd.ddate.to_date == day && pd.plan_id != nil && pd.plan.status != "canceled" && pd.plan.user_id != nil && pd.plan.user.id == user_id }
if potential_pd == []
#week_table[day][user_id] = ""
else
#week_table[day][user_id] = potential_pd.first.plan.mtype
end
end
end
return #week_table
end

Resources