What is the best structure in HDF5 for a sensor signal? - hdf5

I have a project in which I want to export measured sensor signals into a HDF5 file.
The data I have consists of time,value pairs per sensor. The sensor data is sparse, meaning each sensor sends its data at different times and intervals. So there can be a sensor which sends data each second, as well as a sensor which sends data each minute. All data is 64 bit floating point for now.
What is the best format for this? Should I create a Nx2 signal and store it in a group called sensors and store the timestamp and value next to eachother? Or should I create a group per sensor and store the values and timestamps in seperate arrays?
I'm looking for best practices here. I would like to be able to plot the signals easily in python.
In case anyone is wondering what I'm doing, this is the project in question: https://github.com/windelbouwman/lognplot

Related

Cross correlaing two data sets to find similarity

I have data sets of Heart Rate Variability data and i am trying to determine if the data collected each day is similar or related for data collected the next day. Please how can i go about this. I am looking to see if Dynamic time warping or cross correlation can do this but i am confused as to how to go about it. I am open to suggestions. I am hoping to write my code using either Matlab or Python
I have tried using Dynamic time warping and Cross correlation to compare the signals

Storing any number series data in a time-series database

I would like to make use of time-series database InfluxDb to store data points indexed by another number instead of time which every data point is stored against. So I can take advantage all the features for a series of datapoints against this number..
For example I have a rocket doing multiple launches on which I have several sensors recording temperature, air pressure, fuel level &c. And I want to graph these datapoints against elevation not time..
I realise I could store elevation itself against time then from the time for say a temperature reading work out the elevation and project the results - but that working out would lose the performance characteristics of just querying the datapoints indexed by elevation. Also third party tools which use the time-series database won't be able to simply get these datapoints against elevation as opposed to time to graph them out, e.g. Grafana, without me putting something in-between to marry the data up..
One idea I had was to have a fake time where meters = seconds and store against this, then I would need make that a composite with something else to differentiate rocket launches, e.g. increment year by 1 starting at year 0.. So I don't see every launch starting at the same elevation and can separate the "number-series" from each other - I guess I would have that problem anyway and the proper way to that would be through tags..
What makes you believe that this approach would be more efficient than storing the elevation jointly with your other sensor data? Fetching data is pretty cheap so the performance gain might be very light compared to the augmented complexity of your keys. Not to mention that you would still need to have the time make part of your elevation-timestamp, otherwise you will end up with duplicate pseudo timestamps and therefore incomplete data as most time series databases do not allow multiple values at the same timestamp for a given series.
I would encourage you to also have a look at other time series databases which include elevation as part of their standard data model. Check out Warp 10 for that matter (std disclaimer, I am the co-founder of SenX, maker of Warp 10).

Point in polygon based search vs geo hash based search

I'm looking for some advice.
I'm developing a system with geographic triggers, these enable my device to perform certain actions depending on where it is. The triggers are contained within polygons that are stored in my database I've explored multiple options to get this working, however, I'm not very familiar with geo-spacial systems.
An option would be to use the current location of the device and query the DB directly to give me all the polygons that contain that point, thus, all the triggers since they are linked together. A potential problem with this approach, I think, would be the possible amount of polygons stored, and the frequency of the queries, since this system serves multiple devices simultaneously and each one of them polls every few seconds.
An other option I'm exploring is to encode the polygons to an array of geo-hashes and then attach the trigger to each one of them.
Green is the geohashes that the trigger will be attached to, yellow are areas that need to be recalculated with a higher precision. The idea is to encode the polygon in the most efficient way down to X precision.
An other optimization I came up with is to only store the intersection of polygons with roads since these devices are only use in motor vehicles.
Doing this enable the device to work offline performing it's own encoding and lookup, with a potential disadvantage being that the device will have to implement logic to stay up-to-date with triggers added or removed ( potentially every 24 hours )
I'm looking for the most efficient way to implement this given some constrains such as:
Potentially unreliable networks ( the device has LTE connectivity )
Limited processing power, the devices for now are based on a raspberry pi 3 Compute module, however, they perform other tasks such as image processing.
Limited storage, since they store videos and images.
Potential large amount of triggers/polygons
Potential large amount of devices.
Any thoughts are greatly appreciated.

Analyzing Sensor Data stored in cassandra and draw graphs

I'm collecting data from different sensors and write them to a Cassandra database.
The Sensor-ID accts as a partition key, the timestamp of the sensors data as clustering column. Additionally a value of the sensor is stored.
Each sensor collects something about 30000 to 60000 values a day.
The simplest thing I wane do is draw a graph showing this data. This is not a problem for a few hours but when showing a week or even a longer range, all the data has to be loaded into the backend (a rails application) for further processing. This isn't really fast with my test dataset and won't be faster in production I think.
So my question is, how to speed this up. I thought about pre-processing the data directly in the database but it seems, that Cassandra isn't able to do such things.
For a graph with a width of 1000px it isn't interesting to draw ten thousands of points - so it would be interesting to gather only relevant, pre-aggregated data from the database.
For example, when showing the data for a whole day in a graph with a width of 1000px, it would be enough to take 1000 average values (this would be an average clustered by 86seconds - 60*60*24 / 1000).
Is this a good approach? Or are there other techniques fasten this up? How would I handle this with database? Create a second Table and store some average values? But the resolution of the graph may change...
Other approaches would be drawing mean values by day, week, month and so on. Maybe vor this a second table could do a good job!
Cassandra is all about letting you write and read your data quickly. Think of it as just a data store. It can't (really) do any processing on that data.
If you want to do operations on it, then you are going to need to put the data into something else. Storm is quite popular for building computation clusters for processing data from Cassandra, but without knowing exactly the scale you need to operate at, then that may be overkill.
Another option which might suit you is to aggregate data on the way in, or perhaps in nightly jobs. This is how OLAP is often done with other technologies. This can work if you know in advance what you need to aggregate. You could build your sets into hourly, daily, whatever, then pull a smaller amount of data into Rails for graphing (and possibly aggregate it even further to exactly meet the desired graph requirements).
For the purposes of storing, aggregating, and graphing your sensor data, you might consider RRDtool which does basically everything you describe. Its main limitation is it does not store raw data, but instead stores aggregated, interpolated values. (If you need the raw data, you can still use Cassandra for that.)
AndySavage is onto something here when it comes to precomputing aggregate values. This does require you to understand in advance the sorts of metrics you'd like to see from the sensor values generally.
You correctly identify the limitation of a graph in informing the viewer. Questions you need to ask really fall into areas such as:
When you aggregate are you interested in the mean, median, spread of the values?
What's the biggest aggregation that you're interested in?
What's the goal of the data visualisation - is it really necessary to be looking at a whole year of data?
Are outliers the important part of the dataset?
Each of these questions will lead you down a different path with visualisation and the application itself too.
Once you know what you're wanting to do, an ETL process harnessing some form of analytical processing will be needed. This is where the Hadoop world would be useful investigating.
Regarding your decision to use Cassandra as your timeseries historian, how is that working for you? I'm looking at technical solutions for a similar requirement at the moment and it's one of the options on the table.

Getting full audio frequency spectrum with Tobybears VST Template?

I'm trying to make a simple frequency analyzer VST plugin using Tobybears VST Template for Delphi.
The problem I'm having is that I cant seem to find any documentation or information about how to get something like an array of values that represent the different frequencies from a chunk of audio data that is recieved from the host.
Does anybody have a clue on how to do this?
Also, my VST host keeps crashing whenever I try to use the DelphiASIOVst library, which is another library for making custom VSTs.
Thanks!
The Tobybears VST Template is obsolate(vst 2.3). Rather use the DAV project on sourceforge, as sugested by Shannon.(which make some vst 2.4)
About the analysis, it's quite easy, you basically have to make some FFT on the signal (you buffer the input and when 2^n data have been accumulated you make a FFT), and then you compute the hypothenus of each imaginary,real pair to get the aproximative amplitude of a band...then you plot on a graph...In combination with a envelope follower and some GUI programming skills you'll get someting like the Voxengo Span...
VST plugins receive audio signals as time domain signals. The audio signal data doesn't contain frequency information (which is why you can't find any documentation).
To implement a frequency analyzer you'll need to transform the received time domain signal into a frequency domain signal. Performing a Fast Fourier Transformation (FFT) is the standard way to transform time domain signals into frequency domain signals.

Resources