Specifying --diskSizeGb when running a dataflow template

Specifying --diskSizeGb when running a dataflow template - google-cloud-dataflow

I'm trying to use a Google dataflow template to export data from Bigtable to Google Cloud Storage (GCS). I'm following the gcloud command details here. However, when running I get a warning and associated error where the suggested fix is to add workers (--numWorkers), increase the attached disk size (--diskSizeGb). However, I see no way to execute the Google provided template while passing those parameters. Amy I missing something?
Reviewing a separate question, it seems like there is a way to do this. Can someone explain how?

parameters like numWorkers and diskSizeGb are Dataflow wide pipeline options. You should be able to specify them like so
gcloud dataflow jobs run JOB_NAME \
--gcs-location LOCATION --num-workers=$NUM_WORKERS --diskSizeGb=$DISK_SIZE
Let me know if you have furthr questions

Related

Is it ok to directly overwrite Dataflow template parameters that are set at build-time?

We would like to prevent certain parameters (namely filesToStage) of our Dataflow template from populating in the Dataflow Job page. Is there a recommended way to achieve this? We've found that simply specifying "filesToStage=" when launching the template via gcloud suffices, but we're not sure if this is robust/stable behavior.
For context, we are hosting this Dataflow template for customer usage and would like to hide as much of the implementation as possible (including classpaths).

Specifically the filesToStage can be sent as blank and the files will be inferred based on the Java classpath:
If filesToStage is blank, Dataflow will infer the files to stage based on the Java classpath.
More information on the considerations for this and other fields can be found here.
For other parameters, the recommendation is to use Cloud KMS to keep the parameters hidden.

Can Jenkins Gatling plugin show the different paths

Is there a way to get the gatling requests in the Gatling Jenkins trend graph? Our build with Jenkins Gatling plugin only shows the trend for the global information in the graph and we want to see the trend per request type as this gives us much more information. Is this possible?
I was looking at the description on their site and it mentions you can configure assertions but to me it wasn't clear if that covers this use case and I'm not finding the assertion files when I run the build with the flag -Dgatling.useOldJenkinsJUnitSupport=true.
To clarify I want the transactions highlighted below in the blue square to appear in the Jenkins graph that shows the trend

Unfortunately isn't possible. However, Gatling has feature for live monitoring where you can setup all needed metrics for each request.
https://gatling.io/docs/current/realtime_monitoring

No, this feature is not available in Gatling OSS Jenkins plugin.
It's available in Gatling FrontLine though.

I got it working using a workaround. The gatling plugin graph will show a trend per simulation. It is looking for /{simulation-name}/global_stats.json in the /build folder.
I wrote a groovy script to parse the json data from stats.json. The structure in stats.json is the same as in global_stats.json. So simply parse the stats.json and copy the json.contents[scenario].stats to a separate file in the build folder:
stats.json structure:
{
...
"contents": {
"scenarioName": {
stats: {...} // copy this part
}
}
scenario-report/global_stats.json
Note the dash "-" in the folder name is required as the plugin is searching for this dash to determine the simulation name. It will nullpointer without it.

How could code know it's running on Google Dataflow?

I'd like to use some configs for a library that's used both on Dataflow and in a normal environment.
Is there a way for the code to check it's running on Dataflow? I couldn't see an environment variable, for example.
Quasi-follow-up to Google Dataflow non-python dependencies - separate setup.py?

One option is to use PipelineOptions, which contains the pipeline runner information. As mentioned in the beam documentation: "When you run the pipeline on a runner of your choice, a copy of the PipelineOptions will be available to your code. For example, you can read PipelineOptions from a DoFn’s Context."
More about PipelineOptions: https://beam.apache.org/documentation/programming-guide/#configuring-pipeline-options

This is not a good answer, but it may be the best we can do at the moment:
if 'harness' in os.environ.get('HOSTNAME', ''):

Log build file size using Jenkins and display results

This is a broad question, so any answers are deeply appreciated. I need to continually log the size of several build files (in this case some CSS and JS files), preserve this log and ideally show it as a dashboard in Jenkins.
I know that I can setup a cron job and execute a bash script to grab the files and log their size, but I'm not sure where this file would live and how to display it. Ideally the result would be a dashboard plot or bar graph over time.
Thanks.
P.S. I'm open to other logging suggestions, but Jenkins seems like the appropriate system to do this in.

Update: this isn't perfect but it works. Google Spreadsheets has a simple API for posting data, so this can work as an endpoint for any script you want to write that logs your data.
It's not a Jenkins solution, but gets the job done.
In my search leading up to this, I did come across JMeter, and the Performance Plugin for Jenkins, which were contenders for a possible solution.

Packer,Jenkins, Cloudformation, how to make them work together?

First of all my apologies that the question is under stackoverflow not stack exchange, I don't have enough points to ask it there.
I've created a packer template in which creates my image(the image includes the code for my application, nginx, php-fpm and ...)
If you have used packer before, you will know that at the end of the process it will give you the image_id, I need to use this image id in order to update the template for my cloudformation on aws,
the cloud formation template will create an launch configuration based on the image_id from the packer. later on the launch configuration will be used to create an autoscaling group,which is connected to an ELB(The ELB is not under cloudformation).
Here are my questions:
1-whats the best way to automate the process of getting the id from packer and updating the cloudformation template?(To elaborate more, i need to get the id somehow, for now the only thing that I can think of is a bash command, but this cause an issue if I want to use jenkins later on.what are other alternatives?)
2-Lets say I managed to get the id, now whats the best policy to update the cloudformation template?(Currently aws CLI is my only option any better solution)?
2-How to automate these whole process using jenkins?

I would put a wrapper Python/Ruby script that would run packer, then call cloudformation reading from the packer output.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Specifying --diskSizeGb when running a dataflow template - google-cloud-dataflow

parameters like numWorkers and diskSizeGb are Dataflow wide pipeline options. You should be able to specify them like so gcloud dataflow jobs run JOB_NAME \ --gcs-location LOCATION --num-workers=$NUM_WORKERS --diskSizeGb=$DISK_SIZE Let me know if you have furthr questions

Related

Is it ok to directly overwrite Dataflow template parameters that are set at build-time?

Can Jenkins Gatling plugin show the different paths

How could code know it's running on Google Dataflow?

Log build file size using Jenkins and display results

Packer,Jenkins, Cloudformation, how to make them work together?

Categories

Resources