What are series and bucket in InfluxDb - influxdb

While trying to understand different concepts of InfluxDb I came across this documentation, where there is a comparision of terms with SQL database.
An InfluxDB measurement is similar to an SQL database table.
InfluxDB tags are like indexed columns in an SQL database.
InfluxDB fields are
like unindexed columns in an SQL database.
InfluxDB points are similar
to SQL rows.
But there are couple of other terminology which I came across, which I could not clearly understand and wondering if there is an SQL equivalent for that.
Series
Bucket
From what I understand from the documentation
series is the collection of data that share a retention policy,
measurement, and tag set.
Does this mean a series is a subset of data in a database table? Or is it like database views ?
I could not see any documentation explaining buckets. I guess this is a new concept in 2.0 release
Can someone please clarify these two concepts.

I have summarized my understanding below:
A bucket is named location with retention policy where time-series data is stored.
A series is a logical grouping of data defined by shared measurement, tag and field.
A measurement is similar to an SQL database table.
A tag is similar to indexed columns in an SQL database.
A field is similar to unindexed columns in an SQL database.
A point is similar to SQL row.
For example, a SQL table workdone:
Email
Status
time
Completed
lorr#influxdb.com
start
1636775801000000000
76
lorr#influxdb.com
finish
1636775868000000000
120
marv#influxdb.com
start
1636775801000000000
0
marv#influxdb.com
finish
1636775868000000000
20
cliff#influxdb.com
start
1636775801000000000
54
cliff#influxdb.com
finish
1636775868000000000
56
The columns Email and Status are indexed.
Hence:
Measurement: workdone
Tags: Email, Status
Field: Completed
Series (Cardinality = 3 x 2 = 6):
Measurement: workdone; Tags: Email: lorr#influxdb.com, Status: start; Field: Completed
Measurement: workdone; Tags: Email: lorr#influxdb.com, Status: finish; Field: Completed
Measurement: workdone; Tags: Email: marv#influxdb.com, Status: start; Field: Completed
Measurement: workdone; Tags: Email: marv#influxdb.com, Status: finish; Field: Completed
Measurement: workdone; Tags: Email: cliff#influxdb.com, Status: start; Field: Completed
Measurement: workdone; Tags: Email: cliff#influxdb.com, Status: finish; Field: Completed
Splitting a logical series across multiple buckets may not improve performance but may complicate flux query as need to include multiple buckets.

The InfluxDb document that you link to has an example of what a Series is, even if they don't label it as such. In InfluxDb, you can think of each combination of measurement and tags as being in it's own "table". The documentation splits it like this.
This table in SQL:
+---------+---------+---------------------+--------------+
| park_id | planet | time | #_foodships |
+---------+---------+---------------------+--------------+
| 1 | Earth | 1429185600000000000 | 0 |
| 2 | Saturn | 1429185601000000000 | 3 |
+---------+---------+---------------------+--------------+
Becomes these two Series in InfluxDb:
name: foodships
tags: park_id=1, planet=Earth
----
name: foodships
tags: park_id=2, planet=Saturn
...etc...
This has implications when you query for the data, and is also the reason why the recommendation is that you don't have tag values with high cardinality. For example, if you had a tag of temperature (especially if it was a precise to multiple decimal points) that InfluxDb would be creating a "table" for each potential combination of tag values.
A Bucket is much easier to understand. It's just a combination of a database with a retention policy. In previous versions of InfluxDb these were separate concepts which have now been combined.

According to the InfluxDB glossary:
Bucket
A bucket is a named location where time-series data is stored in InfluxDB 2.0. In InfluxDB 1.8+, each combination of a
database and a retention policy (database/retention-policy) represents
a bucket. Use the InfluxDB 2.0 API compatibility endpoints included
with InfluxDB 1.8+ to interact with buckets.
Series
A logical grouping of data defined by shared measurement, tag
set, and field key.

Related

Influxdb query on tag returns nothing

I have an Influxdb with lots of fields and a single tag:
> show tag keys
name: rtl433
tagKey
------
model
Now, I want a list of all possible values for model, so I run
SELECT model FROM rtl433
>
-and it returns nothing. Why? There's lots of data in model if I select *.
You are trying to use classic SQL solution, but InfluxDB is not classic SQL DB. You should check InfluxDB doc and you will find solution:
SHOW TAG VALUES WITH KEY = "model"

Create InfluxDB Continuous Query where the measurement name is based on tag values

I have a measurement called reading where all the rows are of the form:
time channel host value
2018-03-05T05:38:41.952057914Z "1" "4176433" 3.46
2018-03-05T05:39:26.113880408Z "0" "5222355" 120.23
2018-03-05T05:39:30.013558256Z "1" "5222355" 5.66
2018-03-05T05:40:13.827140492Z "0" "4176433" 3.45
2018-03-05T05:40:17.868363704Z "1" "4176433" 3.42
where channel and host are tags.
Is there a way I can automatically generate a continuous query such that:
The CQ measurement's name is of the form host_channel
Until now I have been doing them 1 by 1, for example
CREATE CONTINUOUS QUERY 4176433_1 ON database_name
BEGIN
SELECT mean(value) INTO 4176433_1
FROM reading
WHERE host = '4176433' AND channel = '1'
GROUP BY time(1m)
END
but is there a way I can automatically get 1m sampling per host & channel any time a new host is added to the database? Thanks!
There is no way of doing this in InfluxDB, by the number of reasons. Encoding tag values in a measurements names contradicts InfluxDB official best practices and being discouraged.
I suggest you just going with:
CREATE CONTINUOUS QUERY reading_aggregator ON database_name
BEGIN
SELECT mean(value), host + '_' + channel AS host_channel
INTO mean_reading
FROM reading
GROUP BY time(1m), host, channel
END

Influxdb select data from a specific shard

I would like to know if it is possible somehow from the CLI of the influx to select the data of a specific shard. I also would like to select the series within two timestamps but i haven't yet found how. Any input would be appreciated, thank you.
Q: I would like to know if it possible somehow from the CLI of the influx to select the data of a specific shard.
A: At influxdb 1.3 this is not possible. However you should be able to work out what data lives in there.
Query to get the shards information:
show shards
it should tell you the start and end date time of the data (across all series in the database) contained in that shard.
For instance
Given Shard info:
id database retention_policy shard_group start_time end_time expiry_time owners
-- -------- ---------------- ----------- ---------- -------- ----------- ------
123 mydb autogen 123 2012-11-26T00:00:00Z 2012-12-03T00:00:00Z 2012-12-03T00:00:00Z
124 mydb autogen 124 2013-01-14T00:00:00Z 2013-01-21T00:00:00Z 2013-01-21T00:00:00Z
125 mydb autogen 125 2013-04-29T00:00:00Z 2013-05-06T00:00:00Z 2013-05-06T00:00:00Z
Given Measurements:
name: measurements
name
----
measurement_abc
measurement_def
measurement_123
Shard 123 will contain all of the data across the noted measurements above that fall in the start time of 2012-11-26T00:00:00Z and end time of 2012-12-03T00:00:00Z. That is, running a drop shard 123 would see data in that range disappearing across the measurements.

How to create a measurement in InfluxDB

I am a beginner with InfluxDB and I've read the intro documentation, but cannot find any details on how to create a new measurement. Am I missing something ?
As noted in the comments, to "create" a new measurement you simply insert data into that measurement.
For example
$ influx
> CREATE DATABASE mydb
> USE mydb
Using database mydb
> SHOW MEASUREMENTS
> INSERT cpu,host=serverA value=10
> SHOW MEASUREMENTS
name: measurements
name
----
cpu
> INSERT mem,host=serverA value=10
> SHOW MEASUREMENTS
name: measurements
name
----
cpu
mem
In INFLUX DB , you cant create empty measurements.
You need to add some data as well.
For Example,
INSERT xyz,name=serverA value=10,count=10
This will create a measurement name xyz where
tag keys : name
field keys : value & count
You can check Field and tag keys by executing show field keys or show tag keys.
In the INSERT command, the format is like :
measurement_name,tag keys + value separated by comma Field keys with value separated by comma
eg: INSERT xyz,name=serverA value=10,count=10
In this way, you can create measurement with specifying your required field and tag keys.
You cannot create an empty measurement, afaik.
Like they said above, if you want one you need to start writing to it and that should take care of creating one along with some data in it.
insert load,app_name=app3,groupname=second,performance=degraded uuid=003,loading=50,frequency=1
In the above, we are using "insert" to write new data into a new measurement called "load".
app_name,groupname,performance are 'tags' and uuid,loading,frequency are fields
create database <data base name of your choice>
create user "<username>" with password '<password>'
To see the all databases: SHOW DATABASES
Enter the Database: use <database name>
To see all tables inside the database: SHOW MEASUREMENTS
grant all on <data base name> to <username>
insert data (Here Motionsense is a Measurement which is similar to the table name of SQL): INSERT MotionSense,SensorType=Gyro roll=1.2,yaw=5,pitch=3
See the data of the Measurements: SELECT * FROM "MotionSense"

Complex query melting my brain! Rails and Postgres

I apologize if I'm missing something really obvious here, but hopefully you'll humour me!
I have these models
Employee - with id, first_name, last_name
Shift Type - with id, shift_name
Date Indices - with id, date
Locations - with id, location
Allocated shifts - with employee_id, shift_type_id, date_index_id, location_id
Now I can write queries that show me allocated shifts and join with locations, names etc. but what I was is to be able to produce a table that takes dates as columns and employees as rows to produce a roster like such
______________________________________________
|employee|date 1 |date 2 | date 3 |
|'dave' |early shift|late shift |day off |
|'martha'|day off |early shift|early shift|
etc.
I'm sure I'm just pretty dumb, but how can I create these 'virtual' columns and link them to the employee?
You are looking for a "pivot" or "crosstab" query. Postgres has the additional module tablefunc for that. More info in this related answer:
PostgreSQL Crosstab Query
And many links to similar questions on SO from there.

Resources