I have influxdb and grafana set up. Templating works fine.
What I'm now trying to achieve is to have a FROM clause built from multiple templated values.
I have the following measurements defined in influxdb:
Game1_Draw, Game1_Home, Game1_Away
Game2_Draw, Game2_Home, Game2_Away
.... ... ...
GameN_Draw, GameN_Home, GameN_Away
I want the user to select the game name (Game1, Game2...), and then have three graphs (not queries) with measurements (GameSelected)_Home, (GameSelected)_Away, (GameSelected)_Draw
Getting the game names from templating was easy.
What I need is to generate a query whose FROM clause will depend the game selected and a constant. Something like:
SELECT mean("myvalue") FROM /^$game_Home/ WHERE ....
SELECT mean("myvalue") FROM /^$game_Draw/ WHERE ....
SELECT mean("myvalue") FROM /^$game_Away/ WHERE ....
I cannot make this work. I cannot find anything in the documentation related to aggregate FROM clause.
Encoding information in measurement names is generally an anti-pattern in InfluxDB. The information in the separate measurement suffixes, e.g. _Home, _Draw, _Away, would be much more useful if it was recorded in a tag.
game,odds_type=home myvalue=0.5 1469923200000000000
game,odds_type=draw myvalue=0.6 1469923200000000000
game,odds_type=away myvalue=0.2 1469923200000000000
Then displaying these series on the same graph in Grafana would only require a group by on the odds_type tag.
Related
I have developed a project using influxdb and I am currently trying to understand why my influx container keeps crashing due to oom exits.
The way I designed my database is quite basic. I have several buildings, for each building, I need to have timebased values. So I created a database for each building, and a measurement for each type of value (for example energy consumption).
I do not use tags at all, because using the design I described above, all I have left to store is the float values and their timestamp index. I like this design because every building is completely separated from the others (as they should be), and if I want to get data from one of them, I just need to connect to the building's database (or bucket) and query it like so :
SELECT * FROM field1,field2 WHERE time>d1 and time<d2
According to this influx article, if I understand correctly (english isn't my first langage), I have a cardinality of:
3 buildings (bucket/database) * 1000 fields (measurement) * 1 (default tag ?) = 3000 cardinality
This doesn't seem to be much, thus I think I misunderstand something.
Assume a DB with the following data records:
2018-04-12T00:00:00Z value=1000 [series=distance]
2018-04-12T00:00:00Z value=10 [series=signal_quality]
2018-04-12T00:01:00Z value=1100 [series=distance]
2018-04-12T00:01:00Z value=0 [series=signal_quality]
There is one field called value. Square brackets denote the tags (further tags omitted). As you can see, the data is captured in different data records instead of using multiple fields on the same record.
Given the above structure, how can I query the time series of distances, filtered by signal quality? The goal is to only get distance data points back when the signal quality is above a fixed threshold (e.g. 5).
"Given the above structure", there's no way to do it in plain InfluxDB.
Please keep in mind - Influx is NONE of a relational DB, it's different, despite query language looks familiar.
Again, given that structure - you can proceed with Kapacitor, as it was already mentioned.
But I strongly suggest you to rethink the structure, if it is possible, if you're able to control the way the metrics are collected.
If that is not an option - here's the way: spin a simple job in Kapacitor that will just join the two points into one basing on time (check this out for how), and then drop it into new measurement.
The data point would look like this, then:
DistanceQualityTogether,tag1=if,tag2=you,tag2=need,tag4=em distance=1000,signal_quality=10 2018-04-12T00:00:00Z
The rest is oblivious with such a measurement.
But again, if you can configure your metrics to be sent like this - better do it.
I have a bunch of measurements, all starting with task_runtime.
i.e.
task_runtime.task_a
task_runtime.task_b
task_runtime.task_c
Is there a way to select all of them by a partial measurement name?
I'm using grafana on top of influxdb and I want to display all of these measurements in a single graph, but I don't have a closed list of these measurements.
I thought about something like
select * from (select table_name from all_tables where table_name like "task_runtime.*")
But not sure on the influxdb syntax for this
You can use a regular expression when specifying measurements in the FROM clause as described in the InfluxDB documentation.
For example, in your case:
SELECT * FROM /^task_runtime.*/
Grafana also supports this and will display all measurements separately.
Is there a way to add a tag to an existing entry in InfluxDB measurement? If not in the existing db measurement, is there a way to insert the records with a new tag into a new influx measurement?
Currently I have a set of measurements that should probably be entries in a single measurement where their current measurement names should be tag-keys in the superset of the merged measurements.
e.g.
show measurements
measurement1
measurement2
measurement3
measurement4
should instead be tags on the data included in each measurement and union to form a single measurement joinedmeasurement with indexed tags measurment1, measurement2,...
It would have to be done manually via queries.
For example, in python using the official client:
from influxdb import InfluxDBClient
client = InfluxDBClient('localhost', database='my_db')
measurement = 'measurement1'
db_data = client.query('select value from %s' % (measurement))
data_to_write = [{'measurement': 'joinedmeasurement',
'tags': ['measurement1'],
'time': d['time'],
'fields': {'value': d['value']},
}
for d in db_data.get_points()]
client.write_points(data_to_write)
And so on for the rest of the measurements. Can run the above in a loop to do all of them in one go.
Consider using named fields though in addition to tags. The above example only uses one field - can have as many as you want.
This improves performance further, though obviously fields are not indexed so do not use them for data that queries are to run on.
I'm struggling to find a real world example on how to use google cloud dataflow combiners to run a common ETL tasl which aggregates records on multiple keys (e.g. Date, Location) and sums values over different measures (e.g. GrossValue, NetValue, Quantity). I can only find examples with a typical Key/Value (e.g. Day/Value) aggregation. Any hints on how this is done with the Python SDK would be appreciated.
I'm not 100% sure I understand your question. Do you have separate elements you are trying to join the data together for, in which case you may wish to use CoGroupByKey? Or does a single element have multiple fields?
Hope some of this info helps,
I would suggest looking at windowing, which will allow you to subdivide a PCollection according to the timestamps of its individual elements. If you want to see all the events for particular day this may be useful. Python examples of windowing. You may want to window across a days worth of data. This link is useful as well to understand how you can use GroupByKey in different ways,
Another option is to determine what date your elements belongs to, and use a group by key to key it with "[location][date][other]". You may need to do something like this if you want to join the data based on multiple fields.
See this GroupByKey example, but change the key to use your multiple fields concatenated.
Here is an example for reducing with a custom combiner. You can add logic here to do a custom aggregation for multiple different measurements.