Auto-trigger for thresholds with time series data - time-series

I wonder if the Clickhouse is a possible solution for the next task.
I'm collecting time-series data (for example pulse measurements of people)
I have different types of thresholds (for example min and max pulse value based on the age)
Once a pulse for an individual human reached the appropriate threshold, I want to trigger external service
In other words, what I looking for beyond a regular time-series storage is:
ability to set multiple thresholds
detect if the value is beyond the threshold automatically
emit some kind of event to 3rd party
Any other tools suggestions are appreciated. Thanks in advance.

Clickhouse have partial features for this task
you can try to write your own code (python, golang everything else) as an external process
which can use LIVE VIEWS and WATCH for trigger event detection, look article which describes these features
https://www.altinity.com/blog/2019/11/13/making-data-come-to-life-with-clickhouse-live-view-tables
and this code should emit an event to 3rd party system

Related

Dynamic clustering for panel data

I have panel data consisting of time series for 120 months, 45 institutions and approximately 8 variables for each one. I want to do a cluster analysis in order to detect stressed institutions based on dynamic clustering analysis. For instance, check if a stressed institution does move from one cluster to another, or if its behavior changes so much that it is no longer part of its own cluster.
The idea would be to use the information up to time t to cluster the institutions and get the clusters for each institution so it can evolve with new information and use all the information available up to that point from all the banks, with time varying clusters.
My first idea was to use statistical control techniques and anomaly detection for time series such as the ones in the package anomaly, but this procedure does not use all the information from the other banks, just its own. It might be that the whole system is stressed, so detecting an anomaly in one bank might be because of the system and not because of the particular bank.
I also tried using clustering in each period through hierarchical clustering, and did a decent job on classifying the institutions based on my knowledge of them. However, this procedure only uses data at each point in time, not all the data available up to that point.
I had the idea of using clustering methods for panel data at each point in time, using the data up to that point, and cycling through each month to get dynamic clusters using the whole dataset. However, I don't know if this approach makes sense, or if there are better methods to do this kind of analysis.
Thank you very much!

Difference Between Cloud Data fusion and DataFlow on GCP

What is the difference between GCP pipeline services:
Cloud Dataflow and Cloud Data fusion ...
which to you when?
I did a high level pricing taking 10 instances with Basic in Data fusion.
and 10 instance cluster (n1-standard-8) in Dataflow.
The pricing is more than double for Datafusion.
What are the pros and cons for each over one another
Cloud Dataflow is purpose built for highly parallelized graph processing. And can be used for batch processing and stream based processing. It is also built to be fully managed, obfuscating the need to manage and understand underlying resource scaling concepts e.g how to optimize shuffle performance or deal with key imbalance issues. The user/developer is responsible for building the graph via code; creating N transforms and or operations to achieve desired goal. For example: read files from storage, process each line in file, extract data from line, cast data to numeric, sum data in groups of X, write output to data lake.
Cloud Data Fusion is focused on enabling data integration scenarios => reading from source (via extensible set of connectors) and writing to targets e.g. BigQuery, storage, etc. It does have parallelization concepts, but they are not fully managed like Cloud Dataflow. CDF rides on top of Cloud Dataproc which is a managed version for Hadoop based processing. It's sweet spot is visual based graph development leveraging an extensible set of connectors and operators.
Your question is based on "cost" concepts. My advice is to take a step back and define what your processing/graph goal(s) look like. Then look at each products value. If you want full control over processing semantics with greater focus on analytics and want to run in batch and or must have streaming focus on Dataflow. If you want point and click data movement, with less focus need on data analytics AND do not need streaming then look at CDF.

Point in polygon based search vs geo hash based search

I'm looking for some advice.
I'm developing a system with geographic triggers, these enable my device to perform certain actions depending on where it is. The triggers are contained within polygons that are stored in my database I've explored multiple options to get this working, however, I'm not very familiar with geo-spacial systems.
An option would be to use the current location of the device and query the DB directly to give me all the polygons that contain that point, thus, all the triggers since they are linked together. A potential problem with this approach, I think, would be the possible amount of polygons stored, and the frequency of the queries, since this system serves multiple devices simultaneously and each one of them polls every few seconds.
An other option I'm exploring is to encode the polygons to an array of geo-hashes and then attach the trigger to each one of them.
Green is the geohashes that the trigger will be attached to, yellow are areas that need to be recalculated with a higher precision. The idea is to encode the polygon in the most efficient way down to X precision.
An other optimization I came up with is to only store the intersection of polygons with roads since these devices are only use in motor vehicles.
Doing this enable the device to work offline performing it's own encoding and lookup, with a potential disadvantage being that the device will have to implement logic to stay up-to-date with triggers added or removed ( potentially every 24 hours )
I'm looking for the most efficient way to implement this given some constrains such as:
Potentially unreliable networks ( the device has LTE connectivity )
Limited processing power, the devices for now are based on a raspberry pi 3 Compute module, however, they perform other tasks such as image processing.
Limited storage, since they store videos and images.
Potential large amount of triggers/polygons
Potential large amount of devices.
Any thoughts are greatly appreciated.

RapidMiner - Time Series Segmentation

As I am fairly new to RapidMiner, I have a Historical Financial Data Set (with attributes Date, Open, Close, High, Low, Volume Traded) from Yahoo Finance and I am trying to find a way to segment it such as in the image below:
I am also planning on performing this segmentation on more than one of such Data Sets and then comparing between each segmentation (i.e. Segment 1 for Data Set A against Segment 1 for Data Set B), so I would preferably require an equal number of segments each.
I am aware that certain extensions are available within the RapidMiner Marketplace, however I do not believe that any of them have what I am looking for. Your assistance is much appreciated.
Edit: I am currently trying to replicate the Voting-Based Outlier Mining for Multiple Time Series (V-BOMM) with multiple data sets. So far, I am able to perform the operation by recording and comparing common dates against each other.
However, I would like to enhance the process to compare Segments rather than simply dates. I have gone through the existing functionalities of RapidMiner, and thus far I don't believe any fit my requirements.
I have also considered Dynamic Time Warping, but I can't seem to find an available functionality in RapidMiner.
Ultimate question: Can someone guide me to functionalities that can help replicate the segmentation in the attached image such that the segments can be compared between Historic Data Sets in RapidMiner? Also, can someone guide me on how to implement Dynamic Time Warping using RapidMiner?
I would use the new version of the Time Series extension, using the windowing features to segment the time series into whatever parts you want. There is a nice explanation of the new tools in the blog section of the community.

Getting full audio frequency spectrum with Tobybears VST Template?

I'm trying to make a simple frequency analyzer VST plugin using Tobybears VST Template for Delphi.
The problem I'm having is that I cant seem to find any documentation or information about how to get something like an array of values that represent the different frequencies from a chunk of audio data that is recieved from the host.
Does anybody have a clue on how to do this?
Also, my VST host keeps crashing whenever I try to use the DelphiASIOVst library, which is another library for making custom VSTs.
Thanks!
The Tobybears VST Template is obsolate(vst 2.3). Rather use the DAV project on sourceforge, as sugested by Shannon.(which make some vst 2.4)
About the analysis, it's quite easy, you basically have to make some FFT on the signal (you buffer the input and when 2^n data have been accumulated you make a FFT), and then you compute the hypothenus of each imaginary,real pair to get the aproximative amplitude of a band...then you plot on a graph...In combination with a envelope follower and some GUI programming skills you'll get someting like the Voxengo Span...
VST plugins receive audio signals as time domain signals. The audio signal data doesn't contain frequency information (which is why you can't find any documentation).
To implement a frequency analyzer you'll need to transform the received time domain signal into a frequency domain signal. Performing a Fast Fourier Transformation (FFT) is the standard way to transform time domain signals into frequency domain signals.

Resources