I have an application where user can write custom python function code and request payload. User should be able to test function on click of a button. My requirement is to dynamically deploy this code(can have extra wrapper code if necessary) as Function in my kubernetes cluster using openfaas, send the payload to test and fetch the results.
I need all these steps executed with in 60 sec. I am more worried on cold starts incurred while creating function.
Can anyone suggest on whether this requirement is feasible through openfaas ?
Related
In docker we have used deploy: replicas: 3 for our microservice. We have some Cronjob & the problem is the system in running all cronjob is getting called 3 times which is not what we want. We want to run it only one time. Sample of cron in nest.js :
#Cron(CronExpression.EVERY_5_MINUTES)
async runBiEventProcessor() {
const calculationDate = new Date()
Logger.log(`Bi Event Processor started at ${calculationDate}`)
How can I run this cron only once without changing the replicas to 1?
This is quite a generic problem when cron or background job is part of the application having multiple instances running concurrently.
There are multiple ways to deal with this kind of scenario. Following are some of the workaround if you don't have a concrete solution:
Create a separate service only for the background processing and ensure only one instance is running at a time.
Expose the cron job as an API and trigger the API to start background processing. In this scenario, the load balancer will hand over the request to only one instance. This approach will ensure that only one instance will handle the job. You will still need an external entity to hit the API, which can be in-house or third-party.
Use repeatable jobs feature from Bull Queue or any other tool or library that provides similar features.
Bull will hand over the job to any active processor. That way, it ensures the job is processed only once by only one active processor.
Nest.js has wrapper for the same. Read more about the Bull queue repeatable job here.
Implement a custom locking mechanism
It is not difficult as it sounds. Many other schedulers in other frameworks work on similar principles to handle concurrency.
If you are using RDBMS, make use of transactions and locking. Create cron records in the database. Acquire the lock as soon as the first cron enters and processes. Other concurrent jobs will either fail or timeout as they will not be able to acquire the lock. But you will need to handle a few cases in this approach to make it bug-free and flawless.
If you are using MongoDB or any similar database that supports TTL (Time-to-live) setting and unique index. Insert the document in the database where one of the fields from the document has unique constraints that ensure another job will not be able to insert one more document as it will fail due to database-level unique constraints. Also, ensure TTL(Time-to-live index) on the document; this way document will be deleted after a configured time.
These are workaround if you don't have any other concrete options.
There are quite some options here on how you could solve this, but I would suggest to create a NestJS microservice (or plain nodeJS) to run only the cronjob and store it in a shared db for example to store the result in Redis.
Your microservice that runs the cronjob does not expose anything, it only starts your cronjob:
const app = await NestFactory.create(
WorkerModule,
);
await app.init();
Your WorkerModule imports the scheduler and configures the scheduler there. The result of the cronjob you can write to a shared db like Redis.
Now you can still use 3 replica's but prevent registering cron jobs in all replica's.
I know this question is wired, But I am not 100% sure if it is possible or not. Need expert advise.
I am using this architecture (see Fig 1), there is a MVC WebAPI which puts data in Azure Queue and then Queue will call Azure Function to perform small tasks but very large in number e.g Queue is sending 5k - 10k requests to Azure Function in 1 minute.
Fig 1
We want to remove Azure Function because it cost us a lot. We want to go for alternate of it.
For this, someone share an idea to remove Azure function with another MVC WebAPI. (see Fig 2)
Fig 2
Is above architecture is possible ? If yes then How and If no then can anyone please suggest anything?
When using Azure Functions with Storage Queue trigger, Azure Functions will scale out based on the load on the queue. By default, batchSize is set to 16. The setting can be configured via host.json
The number of queue messages that the Functions runtime retrieves simultaneously and processes in parallel. When the number being processed gets down to the newBatchThreshold, the runtime gets another batch and starts processing those messages. So the maximum number of concurrent messages being processed per function is batchSize plus newBatchThreshold. This limit applies separately to each queue-triggered function.
This setting alone might not be sufficient when the number of messages is substantial. In that case, you want to restrict the scale-out behaviour associated with the number of VMs used to execute the Function App. The setting is an App Setting WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT. Setting it to 1 would prevent any scale-out to new VMs, but according to the documentation
This setting is a preview feature - and only reliable if set to a value <= 5
While your focus is on the cost of processing, take into consideration time as well. Unless it's OK to wait for the messages to get processed for a long time, you're likely to have other alternatives to Functions. But the trade-off between the cost and the time to process will always be there.
As stated in my previous post I was trying to pass a single file's name from Cloud Function to Dataflow. What if I uploaded multiple files at a time in a GCS bucket? Is it possible to have a single Cloud Function capture and send all the filenames by using event.data? If not any other way I could get those file names in my Dataflow program?
Thank You
To run this in a single pipeline you would need to create a custom source that took a list of file names (or a single string that was the concatened file names, etc.) and then use that source with an appropriate runtime PipelineOption.
The challenge with this approach is that only the client (presumably) knows how many files there are and when they've all completed upload. Events sent to Cloud Functions are going to be both at-least-once (meaning you may occasionally get more than one) and have events potentially out of order. Even if the Cloud Function somehow knew how many files it was expecting, you may find it difficult to guarantee only one Cloud Function triggered Dataflow due to a race condition checking Cloud Storage (e.g. more than one function might "think" they are the last one). There is no "batch" semantic in Cloud Storage (AFAIK) that would lead to a single function invocation (there IS a batch API, but events are emitted from single "object" changes so even a batch write of N files would result in at-least-N events).
It may be better to have the client manually trigger either a Cloud Function, or Dataflow directly, once all files have been uploaded. You could trigger a Cloud Function either directly via HTTP, or you could just write a sentinel value to Cloud Storage to trigger a function.
The alternative could be to package up the files into a single upload from the client (e.g tar them), but I'm there may be reasons why this doesn't make sense for your use case.
I am using the Java GAPI client to work with Google Cloud Dataflow (v1b3-rev197-1.22.0). I am running a pipeline from template and the method for doing that (com.google.api.services.dataflow.Dataflow.Projects.Templates#create) does not allow me to set labels for the job. However I get the Job object back when I execute the pipeline, so I updated the labels and tried to call com.google.api.services.dataflow.Dataflow.Projects.Jobs#update to persist that information in Dataflow. But the labels do not get updated.
I also tried updating labels on finished jobs (which I also need to do), which didn't work either, so I thought it's because the job is in a terminal state. But updating labels seems to do nothing regardless of the state.
The documentation does not say anything about labels not being mutable on running or terminated pipelines, so I would expect things to work. Am I doing something wrong and if not what is the rationale behing the decision no to allow label updates? (And how are template users supposed to set the initial label set when executing the template?)
Background: I want to mark terminated pipelines that have been "processed", i.e. those that our automated infrastructure already sent notification about to appropriate places. Labels seemed as a good approach that would shield me from having to use some kind of local persitence to track stuff (big complexity jump). Any suggestions on how to approach this if labels are not the right tool? Sadly, Stackdriver cannot monitor finished pipelines, only failed ones. And sending a notification from within the pipeline code doesn't seem as a good idea to me (wrong?).
I am using Parse.com as the backend for my iOS app. Parse has a big Export Data button for backing up your database that will send an email with a zip containing each table and its data in JSON format. That's great, but is there any way to automate this task? I want to be able to do this every night, and I know you can use Background Jobs for automated tasks, but is it possible to hook into this particular feature? I couldn't find an answer on Parse's forums and it didn't turn up anything except old threads talking about how this feature was on the horizon.
The best I can work out, without Parse providing a true way of achieving this, is to have a job creating File objects in a "backup" table. And then use an external service (with the REST API) to pull this out into S3 or similar.
It's not ideal, but it would work. Also, it will count against your API requests so you may want to optimise with the updated flag.
What I do for this issue is I am running a simple Windows Server in the AWS EC2 to run a scheduled task.
Create simple bat file to run a command node parse-backup.js
Create basic scheduled task using windows provided scheduler and run bat file
You can use this node code. https://github.com/mkim871/parse-node-backup