How to import DAG with specific Parameters - google-cloud-composer

I have a DAG that I want to launch in Composer with some parameters.
I can't use Airflow Global Variables as I want to launch the same DAG with different "Context". My Parameters representing this context
Thanks for the help

What you can do is to create a dag as a function on a separate file and then just reuse you dag code with different dag configurations. Below represents is the approach in pseudo code:
Dag Folder
#ie:
#machine_export.py
#car_export.py
#plane_export.py
from logic import export_pub_template
config = {
'param1': ...
'param2': ...
}
with DAG(dag_id='',
...
) as dag:
export_pub_template(config)
Logic Folder
#push_data_template_dag.py
def export_pub_template(config):
get_data = get_data_custom_operator(config.params)
export_data = send_data_to_db(config.params)
get_data >> export_data
Also, here is also a similar approach. It follows the same pattern which is to break up the DAG code so you can reuse it in similar DAGs.
For details about writing DAGs you can check this link.

Related

Why is this route test failing?

I've been following along with the testdriven.io tutorial for setting up a FastAPI with Docker. The first test I've written using PyTest errored out with the following message:
TypeError: Settings(environment='dev', testing=True, database_url=AnyUrl('postgres://postgres:postgres#web-db:5432/web_test', scheme='postgres', user='*****', password='*****', host='web-db',host_type='int_domain', port='5432', path='/web_test')) is not a callable object.
Looking at the picture, you'll notice that the Settings object has a strange form; in particular, its database_url parameter seems to be wrapping a bunch of other parameters like password, port, and path. However, as shown below my Settings class takes a different form.
From config.py:
# ...imports
class Settings(BaseSettings):
environment: str = os.getenv("ENVIRONMENT", "dev")
testing: bool = os.getenv("TESTING", 0)
database_url: AnyUrl = os.environ.get("DATABASE_URL")
#lru_cache()
def get_settings() -> BaseSettings:
log.info("Loading config settings from the environment...")
return Settings()
Then, in the conftest.py module, I've overridden the settings above with the following:
import os
import pytest
from fastapi.testclient import TestClient
from app.main import create_application
from app.config import get_settings, Settings
def get_settings_override():
return Settings(testing=1, database_url=os.environ.get("DATABASE_TEST_URL"))
#pytest.fixture(scope="module")
def test_app():
app = create_application()
app.dependency_overrides[get_settings] = get_settings_override()
with TestClient(app) as test_client:
yield test_client
As for the offending test itself, that looks like the following:
def test_ping(test_app):
response = test_app.get("/ping")
assert response.status_code == 200
assert response.json() == {"environment": "dev", "ping": "pong", "testing": True}
The container is successfully running on my localhost without issue; this leads me to believe that the issue is wholly related to how I've set up the test and its associated config. However, the structure of the error and how database_url is wrapping up all these key-value pairs from docker-compose.yml gives me the sense that my syntax error could be elsewhere.
At this juncture, I'm not sure if the issue has something to do with how I set up test_ping.py, my construction of the settings_override, with the format of my docker-compose.yml file, or something else altogether.
So far, I've tried to fix this issue by reading up on the use of dependency overrides in FastApi, noodling with my indentation in the docker-compose, changing the TestClient from one provided by starlette to that provided by FastAPI, and manually entering testing mode.
Something I noticed when attempting to manually go into testing mode was that the container doesn't want to follow suit. I've tried setting testing to 1 in docker-compose.yml, and testing: bool = True in config.Settings.
I'm new to all of the relevant tech here and bamboozled. What is causing this discrepancy with my test? Any and all insight would be greatly appreciated. If you need to see any other files, or are interested in the package structure, just let me know. Many thanks.
Any dependency override through app.dependency_overrides should provide the function being overridden as the key and the function that should be used instead. In your case you're assigning the correct override, but you're assigning the result of the function as the override, and not the override itself:
app.dependency_overrides[get_settings] = get_settings_override()
.. this should be:
app.dependency_overrides[get_settings] = get_settings_override
The error message shows that FastAPI tried to call your dictionary as a function, something that hints to it expecting a function instead.

How do I include a custom Lambda Layer into a pipeline stack? (AWS-CDK)

I have two stacks,
Application stack
Pipeline stack
What is the official way of including a custom lambda layer into the pipeline stack so that it relays code location information back to my application stack?
I have followed the documentation to make regular lambdas work... found here:
https://docs.aws.amazon.com/cdk/latest/guide/codepipeline_example.html?shortFooter=true
Some documentation and/or example code would be greatly helpful.
Can anyone point me in the right direction?
Thx
Sucks that no one has answered this question until now but I was trying to solve the same problem and came up with this, so I hope it helps someone like us :)
There are zillions of ways to skin this cat but I am trying to go for clean CICD with easy developer work. The route I chose was to build my Lambda Layer with a Code.from_docker_build() object. I supplied a Dockerfile I wrote, which can package my code into whatever code artifact I need and then CDK knows how to handle it. That becomes my Lambda Layer, which I can then consume in other stacks/lambdas.
So here's what you need to do:
Create a Dockerfile in your repo which can build your code into an artifact.
This Dockerfile should finish by putting your single code artifact file into the /asset directory. You should only have a tar ball or a zip or whatever, just 1 file that is your code in "artifact" form that can run in lambda
use Code.from_docker_build() as your code object in your function.
class YourLambdaLayer(cdk.Stack):
def __init__(self, scope: cdk.Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
# Create Lambda Layer
new_version = LayerVersion(self, construct_id,
layer_version_name=construct_id,
code=Code.from_docker_build(
path=os.path.abspath("./"),
file='Dockerfile'
),
compatible_runtimes=[Runtime.PYTHON_3_8],
)
# Create an export to use from other stacks
cdk.CfnOutput(self, f"{construct_id}-Arn-Export",
value=new_version.layer_version_arn,
export_name='your-cool-layer-Arn'
)
Everything else from the example in the link you posted should work as you expect it. The only difference is that you're using a docker image to bundle your artifacts instead of supplying a zip or something like that.
This is what a pipeline resource would look like for a Python function, built using a Dockerfile in the pipeline...
class YourCICDStack(cdk.Stack):
def __init__(self, scope, id, env=None,... **kwargs):
super().__init__(scope, id,..., env=env, **kwargs)
code_repo = Repository(self, 'a-git-repo-in-code-commit',
repository_name='a-cool-python-package-from-u',
description='you can be long winded whilst describing, if you like :)'
)
pipeline = CodePipeline(self, 'resource-name-goes-here',
pipeline_name='pipeline-name-goes-here',
docker_enabled_for_synth=True, # !!! important !!!
synth=ShellStep("Synth",
input=CodePipelineSource.code_commit(
repository=code_repo,
branch='development',
),
commands=[
"pip install -r requirements.txt",
"npm install -g aws-cdk",
f"cdk synth ..."
]
)
)
# Add the stages for deploying
your_stage = pipeline.add_stage(YourLayerStage(self, ..., env=env))
your_stage.add_post(ManualApprovalStep('approval'))
So now that you've got your pipeline publishing your Lambda Layer, you would use it from other stacks using either the from_layer_version_arn() or from_layer_version_attributes().
Those are both class functions, so you use them in your other stacks by doing something like
my_cool_layer_ref = LayerVersion.from_layer_version_arn(
cdk.Fn.import_value('your-cool-layer-Arn')
)
# Include it in your other stacks/functions
some_other_func = Function(...,
layers=[my_cool_layer_ref],
...
)

How can I keep a PBSCluster running?

I have access to a cluster running PBS Pro and would like to keep a PBSCluster instance running on the headnode. My current (obviously broken) script is:
import dask_jobqueue
from paths import get_temp_dir
def main():
temp_dir = get_temp_dir()
scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)
if __name__ == '__main__':
main()
This script is obviously broken because after the cluster is created the main() function exits and the cluster is destroyed.
I imagine I must call some sort of execute_io_loop function, but I can't find anything in the API.
So, how can I keep my PBSCluster alive?
I'm thinking that the section of the Python API (advanced) in the docs might be a good way to try to solve this issue.
Mind you this is an example of how to create Schedulers and Workers, but I'm assuming that the logic could be used in a similar way for your case.
import asyncio
async def create_cluster():
temp_dir = get_temp_dir()
scheduler_options = {'scheduler_file': temp_dir / 'scheduler.json'}
cluster = dask_jobqueue.PBSCluster(cores=24, memory='100GB', processes=1, scheduler_options=scheduler_options)
if __name__ == "__main__":
asyncio.get_event_loop().run_until_complete(create_cluster())
You might have to change the code a bit, but it should keep your create_cluster running until it finished.
Let me know if this works for you.

Failures in init.groovy.d scripts: null values returned

I'm trying to get Jenkins set up, with configuration, within a Docker environment. Per a variety of sources, it appears the suggested method is to insert scripts into JENKINS_HOME/init.groovy.d. I've taken scripts from places like the Jenkins wiki and made slight changes. They're only partially working. Here is one of them:
import java.util.logging.ConsoleHandler
import java.util.logging.FileHandler
import java.util.logging.SimpleFormatter
import java.util.logging.LogManager
import jenkins.model.Jenkins
// Log into a file
println("extralogging.groovy")
def RunLogger = LogManager.getLogManager().getLogger("hudson.model.Run")
def logsDir = new File("/var/log/jenkins")
if (!logsDir.exists()) { logsDir.mkdirs() }
FileHandler handler = new FileHandler(logsDir.absolutePath+"/jenkins-%g.log", 1024 * 1024, 10, true);
handler.setFormatter(new SimpleFormatter());
RunLogger.addHandler(handler)
This script fails on the last line, RunLogger.addHandler(handler).
2019-12-20 19:25:18.231+0000 [id=30] WARNING j.util.groovy.GroovyHookScript#execute: Failed to run script file:/var/lib/jenkins/init.groovy.d/02-extralogging.groovy
java.lang.NullPointerException: Cannot invoke method addHandler() on null object
I've had a number of other scripts return NULL objects from various gets similar to this one:
def RunLogger = LogManager.getLogManager().getLogger("hudson.model.Run")
My goal is to be able to develop (locally) a Jenkins implementation and then hand it to our sysops guys. Later, as I add pipelines and what not, I'd like to be able to also work on them in a local Jenkins configuration and then hand something for import into production Jenkins.
I'm not sure how to produce API documentation so I can chase this myself. Maybe I need to stop doing it this way and just grab the files that get modified when I do this via the GUI and just stuff the files into the right place.
Suggestions?

Using the ez-template plugin for Jenkins through the Jenkins Job DSL doesn't apply the template after creation

I am working on automating the creation of Jenkins jobs by using the Jenkins Job DSL (Groovy). Right now, I am trying to automate the creation of a job that uses the ez-template plugin to use an already existing template and apply that to my newly created job. However, after I am done writing the necessary configuration:
job('foo') {
properties {
templateImplementationProperty {
exclusions(['ez-templates', 'job-params', 'disabled', 'description'])
syncAssignedLabel(true)
syncBuildTriggers(true)
syncDescription(false)
syncDisabled(false)
syncMatrixAxis(true)
syncOwnership(true)
syncScm(true)
syncSecurity(true)
templateJobName('template')
}
}
}
the job gets created alright... except the template is never applied until AFTER I manually hit the save button on the UI in the newly created job. Checking the config.xml of the created job I can see that the xml contains the configuration I specified, but it was never applied.
Looking at the ez-template code, I can see that this is due to the silentSave feature that was implemented in that plugin - it writes configuration to disk without triggering any save events.
I've tried methods available to the Jenkins API but I've had no success there. Any ideas on how I can apply my configuration?
Full disclosure: I'm a co-worker, and was able to help shredmasteryjm solve this. I figured it'd be best to put this out on the net for others.
The Groovy code needed to trigger template implementation contents to be updated is:
import hudson.model.*;
import jenkins.model.*;
import com.joelj.jenkins.eztemplates.utils.TemplateUtils;
import com.joelj.jenkins.eztemplates.TemplateImplementationProperty;
Jenkins j = Jenkins.getInstance()
Item job = j.getItemByFullName('foo')
TemplateImplementationProperty template = TemplateUtils.getTemplateImplementationProperty(job)
TemplateUtils.handleTemplateImplementationSaved(job, template)
This utilizes the EZ-Templates TemplateUtils class to trigger the actual save event, using the template that the job uses. Of note, if job 'foo' doesn't implement a template, then the 'template' variable will be null, causing this code to error. YMMV
In our case, we needed to also add in some useful information from another question: Access to build environment variables from a groovy script in a Jenkins build step ( Windows)
in order to utilize a parameterized job name. As such our completed script looks like this:
import hudson.model.*;
import jenkins.model.*;
import com.joelj.jenkins.eztemplates.utils.TemplateUtils;
import com.joelj.jenkins.eztemplates.TemplateImplementationProperty;
// get current thread / Executor
def thr = Thread.currentThread()
// get current build
def build = thr?.executable
def hardcoded_param = "parameter_job_name"
def resolver = build.buildVariableResolver
def hardcoded_param_value = resolver.resolve(hardcoded_param)
Jenkins j = Jenkins.getInstance()
Item job = j.getItemByFullName(hardcoded_param_value)
TemplateImplementationProperty template = TemplateUtils.getTemplateImplementationProperty(job)
TemplateUtils.handleTemplateImplementationSaved(job, template)
FYI ez-templates 1.3.0 now triggers off additional save events such that you do not need the above trick.

Resources