Create an histogram with raw mysql data on grafana - histogram

I need to create an histogram, using data from a mysql database, in grafana. I have a 0 or 1 data, that means 1 if the bus is been charging, or 0 if it is not.
I need to create a histogram by hour of the day, for the amount of time that i ask in the timebar of grafana.
I hope any idea, of help to how to build this.
Thanks :)

Related

Influxdb speed up query over long time periods with group by

i write sensor data every second to an influxdb database. Displaying weekly, monthly or yearly summaries in grafana is quite slow since it needs to query many thousand values.
To speed things up, i was thinking about using a cron job to run a queries like
select mean(sensor1) into data_avg_1h from data where time > start and time <= end group by time(1h)
select mean(sensor1) into data_avg_1d from data where time > start and time <= end group by time(1d)
select mean(sensor1) into data_avg_1w from data where time > start and time <= end group by time(1w)
This would mean i need more storage, but queries run much faster.
Is this a bodge job or acceptable and is there a more clever way to do something like that?
Yes. It is perfectly ok and it is also recommended to downsample the data like you have mentioned in the question.
However, instead of using a cronjob it will be better to use Continuous query feature of InfluxDB to achieve the same result.
Downsampling & Contious Query Documentation.
Please be aware that when storing the average value for short period, if you want to calculate the average for a longer period from this downsampled data you will have to calculate the weighted average. Otherwise, you will calculating the average of average which, may not be equal to the average value calculated from the Original data.
This is because, each downsampled average value might be having different number of datapoints.
So while calculating the mean on regular interval store the number of data points received in that interval. This way you will be able to calculate the weighted average.

influxdb : Display over a 24 hour period whether a machine is off, idle or on

I’m new to influx and grafana and want to know in terms of percentage over a 24 hour period on whether the machine is off, idle or on. This is a IOT project and we're recording the state (off, idle on) by the power used. The data ends up being stored in influx under state as either 0 = off, 1 = idle and 2 = on. How can I achieve this in influx query or Grafana? Any pointers or help is appreciated.
I would use the pie chart plugin to show the three states and respective percentages over time. You can do something like:
SELECT mean("value") FROM "state" WHERE value=[0|1|2] GROUP BY time($__interval) fill(null)
For each part of the chart.
Have a look at grafana-discrete-panel (also available in official grafana plugin repository). It is designed specially to visualize state changes in time.

InfluxDB performance

For my case, I need to capture 15 performance metrics for devices and save it to InfluxDB. Each device has a unique device id.
Metrics are written into InfluxDB in the following way. Here I only show one as an example
new Serie.Builder("perfmetric1")
.columns("time", "value", "id", "type")
.values(getTime(), getPerf1(), getId(), getType())
.build()
Writing data is fast and easy. But I saw bad performance when I run query. I'm trying to get all 15 metric values for the last one hour.
select value from perfmetric1, perfmetric2, ..., permetric15
where id='testdeviceid' and time > now() - 1h
For an hour, each metric has 120 data points, in total it's 1800 data points. The query takes about 5 seconds on a c4.4xlarge EC2 instance when it's idle.
I believe InfluxDB can do better. Is this a problem of my schema design, or is it something else? Would splitting the query into 15 parallel calls go faster?
As #valentin answer says, you need to build an index for the id column for InfluxDB to perform these queries efficiently.
In 0.8 stable you can do this "indexing" using continuous fanout queries. For example, the following continuous query will expand your perfmetric1 series into multiple series of the form perfmetric1.id:
select * from perfmetric1 into perfmetric1.[id];
Later you would do:
select value from perfmetric1.testdeviceid, perfmetric2.testdeviceid, ..., permetric15.testdeviceid where time > now() - 1h
This query will take much less time to complete since InfluxDB won't have to perform a full scan of the timeseries to get the points for each testdeviceid.
Build an index on id column. Seems that he engine uses full scan on table to retrieve data. By splitting your query in 15 threads, the engine will use 15 full scans and the performance will be much worse.

Identifying the delta between two graph images

How we can identify the differnece between two nodes in Neo4j from time to time.In other words i have two graph images and my task is to identify the delta between these two..
Any suggestions?
Here i will have access to a neo4j DB which will be updating by an application to whcih i dont have accesss.And my job is to identify the changes done to DB with in a time interval like for 4 hours once etc..here nodes doesn't have any time stamps. Previously i reffered as two images since my requirement is to identify the changes happened to the same DB with in some time gap.ex 1 pm( one graph intance) to 4 pm(second graph instance) with 4 hours time interval..
Hope my query is clear..
Thanks..

How do I handle large amounts of logfile data for display in dynamic charts?

I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.

Resources