Acumulate delayed data and trigger them after 5 minutes - google-cloud-dataflow

I have a Google Dataflow job that reads data from PubSub, aggregates de data and in the end, sends the data to an InflluxDB. What I want to achieve is to aggregate the data in windows of 1 minute but to have only an entry in the DB for each minute. The problem is that I want to allow lateness data so I need to accumulate the data during a period of 5 minutes and then to send to the DB a unique entry.
Is it possible? I tried to do that with the below code, but I don't get what I want:
input.apply(Window
.<KV<String, String>>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(
AfterProcessingTime
.pastFirstElementInPane()
.plusDelayOf(Duration.standardMinutes(5)))
.withAllowedLateness(Duration.standardMinutes(5))
.discardingFiredPanes()

I already collaborated on a similar question. You can use .triggering(Never.ever()) to omit sending the ON TIME panes. Then, as you are already doing, set the allowed lateness to 5 minutes for late records.
It's also important to change the Window.ClosingBehavior to FIRE_ALWAYS. This way we account for the case where there is no late data but we haven't emitted the on-time records. Once the window is closed it will always emit a final pane with PaneInfo.isLast set to true.
So, for your case, the code would be something like:
input.apply(Window
.<KV<String, String>>into(FixedWindows.of(Duration.standardMinutes(1)))
.triggering(Never.ever())
.withAllowedLateness(Duration.standardMinutes(5), Window.ClosingBehavior.FIRE_ALWAYS)
.discardingFiredPanes()

Related

What is the time unit for "datadelay" attribute on GoogleFinance?

I'm using the GoogleFinance() functions on a Google spreadsheet to keep track of my stocks. With the "datadelay" attribute I can check how long ago the data has been updated for the last time. But it only returns a raw number, like "54000" for one ticker and "15" for another. What time unit is that supposed to be? minutes? seconds? milliseconds?
When I check the documentation for the Google Finance I saw that there is a page explains there might be delay up to 20 minutes. They also mentioned that they are using different exchanges to retrieve market data and all this different exchanges might have different data delay. It can explain the differences in the "datadelay" column.
For the unit of this column, my assumption is it should be shorter than seconds since 54000 seconds = 900 min, which is far higher than the maximum delay defined in the help page. But I am not sure what would be value for this column when you query in the not-trading days.
The page shows delays for each exchange.
GoogleFinance() function updates every minute (if it is set so), but keep in mind that results may be delayed up to 20 minutes. so the answer is between 1-20 minutes

ActiveRecord query for all of last day's data and every 100th record prior

I have a process that generates a new record every 10 minutes. It was great for some time, however, now Datum.all returns 30k+ records, which are unnecessary as the purpose is simply to display them on a chart.
So as a simple solution, I'd like to provide all available data generated in the past 24 hours, but low res data (every 100th record) prior to the last 24 hours (right back to the beginning of the dataset).
I suspect the solution is some combination of this answer which selects every nth record (but was provided in 2010), and this answer which combines two ActiveRecord objects
But I cannot work out how to get a working implementation that obtains all the required data into one instance variable
You can use OR query:
Datum.where("created_at>?", 1.day.ago).or(Datum.where("id%100=0"))

How to add Data to Firebase?

My goal is to add +1 every day to a global variable in Firebase to track how many days have passed. I'm building an app that give new facts every day, and at the 19:00 UTC time marker, I want the case statement number (the day global day variable) to increment by +1.
Some have suggested that I compare two dates and get the days that have passed that way. If I were to do that, I could hard code the initial time when I first want the app to start at 19:00 some day. Then when the function reached1900UTC() is called everyday thereafter, compare it to a Firebase timestamp of that current time which should be 19:00. In theory, it should show that 1 day or more day has passed.
This is the best solution so far, thanks to #DavidSeek and #Jay, but I would still like to figure it out with concurrent writes if anyone has a solution in that front. Until then, I'm marking David's answer as the correct one.
How would I make it so it can't increase more than +1 if multiple people call this? Because my fear is that, when say, 100 people calls this function, it increases by + 1 for every person that has called it.
My app works on a global time, and this function is called every day at 19:00 UTC. So when that function is called I want the day count to increase by one.
You should use transactions to handle concurrent writes:
https://firebase.google.com/docs/database/ios/read-and-write#save_data_as_transactions
You may know this but Firebase doesn't have a way to auto-increment a counter as there's no server side logic, so having a counter increment at 19:00 UTC isn't going to be possible without interaction from a client that happens to be logged on at that time.
That being said, it's fairly straightforward to have the first user that logs in increment that counter - then any other clients logging in after that would not increment it and would have access to that day's new content.
Take a look at Zapier.com - that's a service that can fire time based triggers for your app which may do the trick.
As of this writing, Zapier and Firebase don't play nice together, however, there are a number of other trigger options that Zapier can do with your app while continuing to use Firebase for storage.
One other thought...
Instead of dealing with counters and counting days, why not just have each day's content stored within a node for each day and when each user logs on, the app get's that days content:
2016-10-10
fact: "The Earth is an Oblate Spheroid"
2016-10-11
fact: "Milli Vanilli is neither a Milli or a Vanilli. Discuss."
2016-10-12
fact: "George Washington did not have a middle name"
This would eliminate a number of issues such as counters, updates, concurrent writing to Firebase, triggers etc.
It's also dynamic and expandable and a user could easily see that day's facts or the fact for any prior day(s)
I'm trying to split your question into different sections.
1) If you want to use a global variable to count the days from, let's say, today. Then I would set a timestamp hardcoded into the App that sets the NSDate.
Then In my App, when I need to know the days that have been passed by, I would call a function counting the days from the timestamp to NSDate().
2) If you have a function in your App that counts a +1 into a Firebase, then your fear is correct. It would count +1 for every person that uses the App.
3) If you want every User to have a variable count since when they use their App, then I would handle User registration. So I have a "UserID" and then I would set a Firebase tree like that:
UserID
------->
FirstOpen
-------> Date
That way you could handle each User's first open.
Then you are able to set a timestamp AND call +1 for every user independently. Because then you set the +1 for every user into their UserID .child

zabbix trigger based on one week old data

Iam very much new to Zabbix. i have tried my hands on triggers. what i was able to make out was it can set triggers on some constant threshold. what i need is that it should compare with the data which i exactly one week old for that exact time and if the change is above some particular % threshold then trigger an alert.
i had tried some steps like keeping the current data and one week old data in and external database and then querying that data with zabbix ODBC drivers but then i was stuck when i was not able to compare two items.
if i may be confusing stating my issue. let me know and i will be more clear with my problem
you can use the last() function for this.
For example if we sample our data every 5 minutes and we want to compare the last value with the value 10 minutes ago we can use
(item1.last(#1)/item2.last(#3)) > 1.2 - this will trigger an alert if the latest value is greater by 20% than the value 10 minutes ago.
From the documentation it is not very clear to me if you can use seconds or if they will be ignored (for example item.last(60) - to get the value 1 minute ago), but you can read more about the last function here:
https://www.zabbix.com/documentation/2.4/manual/appendix/triggers/functions

Storing large amount of boolean values in Rails

I am to store quite large amount of boolean values in database used by Rails application - it needs to store 60 boolean values in single record per day. What is best way to do this in Rails?
Queries that I will need to program or execute:
* CRUD
* summing up how many true values are for each day
* possibly (but not nessesarily) other reports like how often true is recorded in each of field
UPDATE: This is to store events that may or may not occur in 5 minute intervals between 9am and 1pm. If it occurs, then I need to set it to true, if not then false. Measurements are done manually and users will be reporting these information using checkboxes on the website. There might be small updates, but most of the time it's just one time entry and then queries as listed above.
UPDATE 2: 60 values per day is per one user, there will be between 1000-2000 users. If there isn't some library that helps with that, I will go for simplest approach and deal with it later if I will get issues with performance. Every day user reports events by checking desired checkboxes on the website, so there is normally a single data entry moment per day (or few if not done on daily basis).
This is dependent on a lot of different things. Do you need callbacks to run? Do you need AR objects instantiated? What is the frequency of these updates? Is it done frequently but not many at a time or rarely but a bunch at once? Could you represent these booleans as a mask instead? We definitely need more context.
Why do these need to be in a single record? Can't you use a 'days' table to tie them all together, then use a day_id column in your 'events' table?
Specify in the Day model that it 'has_many :events' and specify in the Event model file that it 'belongs_to :day'. Then you can find all the events for a day with just the id for the day.
For the third day record, you'd do this:
this_day = Day.find 3
Then you can you use 'this_day.events' to get all the events for that day.
You'll need to decide what you wish to use to identify each day so you query for a day's events using something that you understand. The id column I used above to find it probably won't work.
You could use the timestamp first moment of each day to do that, for example. Or you could rely upon the 'created_at' column of the table to be between the start and end of a day
And you'll want to be sure to thing about what time zone you are using and how this will be stored in the database.
And if your data will be stored close to midnight, daylight savings time could also be an issue. I find it best to use GMT to avoid that issue.
Good luck.

Resources