Composer Airflow - Cross DAG Task Dependancy - google-cloud-composer

I have below 2 DAGs and tasks
DAGA - Task1,Task2,Task3
DAGB - Task4,Task5,Task6
Now Task4 & 5 of DAG B depends on Task1 of DAGA and Task6 depends on Task3 but I want this dependency only on Monday and for remaining days I don't want this dependency.

For this kind of cross dag dependency wherein you want certain tasks of a dag to run on a particular day of the week , use TriggerDagRunOperator along with BranchPythonOperator. This operator function must return a list of task IDs that the DAG should proceed with based on some logic.
Another good link

Related

InfluxDB Continuous Query running on entire time series data

If my interpretation is correct, according to the documentation provided here:InfluxDB Downsampling when we down-sample data using a Continuous Query running every 30 minutes, it runs only for the previous 30 minutes data.
Relevant part of the document:
Use the CREATE CONTINUOUS QUERY statement to generate a CQ:
CREATE CONTINUOUS QUERY "cq_30m" ON "food_data" BEGIN
SELECT mean("website") AS "mean_website",mean("phone") AS "mean_phone"
INTO "a_year"."downsampled_orders"
FROM "orders"
GROUP BY time(30m)
END
That query creates a CQ called cq_30m in the database food_data.
cq_30m tells InfluxDB to calculate the 30-minute average of the two
fields website and phone in the measurement orders and in the DEFAULT
RP two_hours. It also tells InfluxDB to write those results to the
measurement downsampled_orders in the retention policy a_year with the
field keys mean_website and mean_phone. InfluxDB will run this query
every 30 minutes for the previous 30 minutes.
When I create a Continuous Query it actually runs on the entire dataset, and not on the previous 30 minutes. My question is, does this happen only the first time after which it runs on the previous 30 minutes of data instead of the entire dataset?
I understand that the query itself uses GROUP BY time(30m) which means it'll return all data grouped together but does this also hold true for the Continuous Query? If so, should I then include a filter to only process the last 30 minutes of data in the Continuous Query?
What you have described is expected functionality.
Schedule and coverage
Continuous queries operate on real-time data. They use the local server’s timestamp, the GROUP BY time() interval, and InfluxDB database’s preset time boundaries to determine when to execute and what time range to cover in the query.
CQs execute at the same interval as the cq_query’s GROUP BY time() interval, and they run at the start of the InfluxDB database’s preset time boundaries. If the GROUP BY time() interval is one hour, the CQ executes at the start of every hour.
When the CQ executes, it runs a single query for the time range between now() and now() minus the GROUP BY time() interval. If the GROUP BY time() interval is one hour and the current time is 17:00, the query’s time range is between 16:00 and 16:59.999999999.
So it should only process the last 30 minutes.
Its a good point about the first run.
I did manage to find a snippet from an old document
Backfilling Data
In the event that the source time series already has data in it when you create a new downsampled continuous query, InfluxDB will go back in time and calculate the values for all intervals up to the present. The continuous query will then continue running in the background for all current and future intervals.
https://influxdbcom.readthedocs.io/en/latest/content/docs/v0.8/api/continuous_queries/#backfilling-data
Which would explain the behaviour you have found

Maintaining a Relative Order of Flex Tasks in a SQL DB

I have a task scheduling app that allows people to create 2 types of tasks...
•Strict- tasks with a set start time and duration
•Flex- tasks that have a duration, but no specific start time
Its also important to understand how flex tasks operate- Flex tasks will continuously reschedule themselves throughout your day in the nearest time you have open...so for example if the only task on your schedule today is a flex task like "Go workout - duration:60mins" and you open the app at 4pm it will have "Go workout" scheduled from 4-5pm for you , if you dont click the checkbox indicating you completed the task and open the app again at 5PM "Go workout will be rescheduled to 5-6pm so that the stuff you are meaning to get done is constantly in your face and trying to fit itself into the gaps of your life.
When a user views their schedule here are the steps I go through:
•Grab a array of all strict tasks
•Grab a array of all flex tasks
•Loop through each strict task and figure out how big of a time gap there is between the task currently being looped's end time and the next tasks start time.
•if a gap exists loop through the flex tasks and see if any of them will fit in the time gap in question. if a flex task is small enough to fit in the time gap add it to the strictTasksArray between the task being currently looped and the next task.
This works great as long as there is no need for any kind of ordering when it comes to flex tasks, but now we have added the ability for users to drag and drop flex tasks into a specific relative order aka if I have Task A,B,C,D
and I drag Task D & B to the front so that its now D,B,A,C it needs to save that ordering so that if you close and reopen the app the app will still remember to try to fit task D in , followed by B, A & C .....but im having big trouble thinking of a efficient way to do considering the ordering is relative and not strict...any ideas how to save relative ordering in a SQLIte DB without having to update every tasks's DB record every time a user drag/drops a task and changes the relative ordering?
If you have ever coded in Basic, you might remember numbering code lines. It was advisable to number in increments of 10 so that if later on you would have to insert a line or two you won't have to re-number all the code, just assign a new number in-between those of the previous and the next lines.
So, in your situation I would create a numeric field for Rank and for each new Flex task assign Rank = max(Rank) + 1024 (for example). Afterwards if the tasks are rearranged I would update just one "moved" task's Rank with the average Rank of it's new previous and next neighbours. That way any Rank change would be an update for one row only. Of cause if the Rank is int and I run out of integers in-between two tasks I would have to update them all, but that should be a rear occasion and I would just re-Rank them in new increments of 1024.
Sounds like you'd need some sort of either priority or order_number column to set the order in which the tasks come in. Just make it an int, and weight them accordingly. If you needed the DBMS to keep them in order using a query, you'd have to use sorting:
SELECT task_id, task_group_id, task_name, completed, priority
FROM tasks WHERE user = ? and task_group_id = ? and completed = 0 ORDER BY priority ASC
you can use some sort of foreign key to a task_group table to actually group certain tasks together if they're multipart, and then build a query to find all the ones that are either complete or incomplete. The weightage assigned would still be correct, because the tasks don't refer to each other by ID.

Task in daily products generated not in duration

I have a project which contains a task for creation 1000 products in 10 days (daily 100 products).
I have distributed that in 10 employees mean every employee have to create daily 10 products not matter how many hours he spent. I am unable to find how to create that task and monitor accordingly.
You cannot use MS-Project to model this kind of task scheduling behaviour since its resource loading and task scheduling is based on knowing how much work (expressed as time) is needed to complete a task.
You can force MS-Project to have a 10-day duration, irrespective of the resources applied, by setting Work=10d and making the task Fixed Duration (both before adding any resources), but that cannot be used to divide up piece-work amongst resources assigned to that task.

Rails activrecord jobs queue with maximum execution frequency

I've ActiveRecord model:
User(id,user_specific_attributes, last_check:datetime, check_priority:integer, today_api_calls:integer)
I'm doing API call for each User once a day. API has some important limits:
it's accesible from 4am to 8pm
call frequency limit: 10 per minute = 6 seconds timeout
call count limit: 3000/day
I need to run get_some_data_from_api() for each User once a day (start at 4am). Execution order is defined by check_priority column.
In case of error from get_some_data_from_api() it should restart job after 6 seconds (api limit).
Is there any gem suitable for this case?
Gems like Sidekiq, Delayed Job, Resque are unsuitable. Using them I need to queue all jobs with specific time. Consider:
Adding new job with high priority (requeue all next jobs?)
Job execution can take more than 6 seconds
Restarting job in case of error (requeue all next jobs?)
Delayed job would also work. It has options to run at specific time and reschedule if job fails.
DJ runs as a separate instance of your app. You can assign priorities to jobs. The time of execution does not matter. It has built in options to configure retry on failure.
To schedule jobs at a specific time and to reschedule them, I use self perpetuating jobs. So after the job is done it reschedules itself. Something like
def run_me
///code code
User.delayed_job(:run_at => next_day).run_me
end
You can handle certain errors in the same way. For eg if api limit is crossed you may want to catch some exception and reschedule next day instead of 6 secs.

How to automatically measure\monitor the average of sums of a consecutive samplers in jmeter?

I have the following JMeter test plan.
+Test Plan
+Login Thread Group
HttpRequest1
HttpRequest2
HttpRequest3
Is there a way to automatically view\monitor the average of sums of HttpRequest1 ,2 and 3?
I couln't found a way to do it in "Summary Report" or "Aggregate Report"
Is it possible? or do I have to do it manually?
Do you explicitly mean 'the average of sums' As in the average of the total sum for each request over the duration of the test run? If so, then I'm not aware of any JMeter listeners will show you the sum of elapsed time for a sampler, it's not something typically required. Instead, you could probably get what you need fairly easily from reading the jtl file at the command line.
But perhaps you meant something else, you might find that using a Transaction Controller serves you requirements. This will record and show the total elapsed time for multiple requests. So in your example, you might wrap HTTPRequest1, 2 & 3 in a transaction controller and this would give you the sum of all three requests. Then, the Aggregate and Summary listeners would show you the average for this transaction as a separate line.

Resources