Parsing environment variables into configuration yaml file - dask

I am using dask on the GFDL analysis cluster to analyze large climate model output.
I am trying to set my temporary-directory configuration to a temporary directory, which can change depending on the node I am logging with (it is always identified by the environmental variable $TMPDIR).
Is there a way to parse environment variables in the dask configuration files?
Cheers.

Not as of today, but that could be done. I recommend raising an issue on the Dask issue tracker.

Related

How to set mode and time in Dynamic Agents?

I am referring to this page:
https://www.instana.com/docs/setup_and_manage/host_agent/updates/#update-interval
Is there a way to pass mode and time from outside as environment variables or any other way beside logging into the pod and manually changing the files inside etc/instana/com.instana.agent.main.config.UpdateManager.cfg file?
To whoever removed his/her answer: It was a correct answer. I don't know why you deleted it. Anyhow, I am posting again in case someone stumbles here.
You can control frequency and time by using INSTANA_AGENT_UPDATES_FREQUENCY and INSTANA_AGENT_UPDATES_TIME environment variables.
Updating mode via env variable is still unknown at this point.
Look at this page for more info: https://www.instana.com/docs/setup_and_manage/host_agent/on/docker/#updates-and-version-pinning
Most agent settings that one may want to change quickly are available as environment variables, see https://www.instana.com/docs/setup_and_manage/host_agent/on/docker. For example, setting the mode via environment variable is supported as well with INSTANA_AGENT_MODE, see e.g., https://hub.docker.com/r/instana/agent. The valid values are:
APM: the default, the agent monitors everything
INFRASTRUCTURE: the agent will collect metrics and entities but not traces
OFF: agent runs but collects no telemetry
AWS: agent will collect data about AWS managed services in a region and an account, supported on EC2 and Fargate, and with some extra configurations, on hosts outside AWS
On Kubernetes, it is also of course possible to use a ConfigMap to override files in the agent container.

Question: BigQueryIO creates one file per input line, is it correct?

I'm new on Apache Beam and I'm developing a pipeline to get rows from JDBCIO and send them to BigQueryIO. I'm converting the rows to avro files with withAvroFormatFunction but it is creating a new file for each row returned by JDBCIO. The same for withFormatFunction with json files.
It is so slow to run locally with DirectRunner because it uploads a lot of files to Google Storage. Is this approach good for scaling on Google Dataflow? Is there a better way to deal with it?
Thanks
In BigqueryIO there is an option to specify withNumFileShards which controls the number of files that get generated while using Bigquery Load Jobs.
From the documentation
Control how many file shards are written when using BigQuery load jobs. Applicable only when also setting withTriggeringFrequency(org.joda.time.Duration).
You can set test your process by setting the value to 1 to see if only 1 large file gets created.
BigQueryIO will commit results to BigQuery for each bundle. The DirectRunner is known to be a bit inefficient about bundling. It never combines bundles. So whatever bundling is provided by a source is propagated to the sink. You can try using other runners such as Flink, Spark, or Dataflow. The in-process open source runners are about as easy to use as the direct runner. Just change --runner=DirectRunner to --runner=FlinkRunner and the default settings will run in local embedded mode.

XDT transformation for JSON in VSTS?

Is there any tool which can transform JSON as well similar to XDT transformations for XML files?
thanks
Azure App service deploy task has a variable substitution option supporting json files. You have to provide json file paths and you can use wild cards in the path. You can define the substitute variables and values in the task so that they will be applied in the deployment. However for on-premises IIS deployments there is no task supporting json file variable substitution support as of now. More information on the json variable substitution is here
This vsts task source code can be found in here you may inspect the implementation logic there and define your own component to do the substitution of json variables.

Jenkins job to read data from SQL DB

I'm new to Jenkins. I have a task where I need to create a Jenkins job to automate builds of certain projects. The build job parameters are going to be stored in the SQL database. So, the job would keep querying the DB and it has to load data from the DB and perform the build operation.
Examples would be greatly appreciated.
How can this be done?
You have to transform the data from available source to the format expecting by the destination.
Here your source data available in DB and you want to use this data in Jenkins.
There might be numerous ways but the efficient way of reading data is using EnvyInJect Plugin.
If you were able to provide the whole data as Properties file format and type to EnvyInject plugin, the whole data is available as environment variables you can use these variable in the Jobs configuration.
EnvyInject Plugin can read this properties file from the Jenkins Job Workspace. And you can provide that file path in Properties File Path input.
To read the data from source and make available as properties file.
Either you can write a executable or if your application provides api to download the properties data.
Both ways to be executed before the SCM step, for this you have to use Pre-SCM-Step
Get the data and inject the data in pre-scm-step only, so that the data available as environment variables.
This is one thought to give gist for you to start. while implementing you may get lot of thoughts to implement according to your requirement.

Update the sys.config file without restarting node

I would like to know if there is a sys.config file of an erlang node to be updated without restarting the node itself?
My usecase is to have variables in the env part of the sys.config configuration, where I constantly poll the sys.config to see if certain variables are true or false for various reasons like turning on or off features of a program on the fly.
The sys.config is only read at start time (and new release installation time). Are you aware of application:set_env/3? If you are, in what way does it not meet your needs?

Resources