TLDR: I would like to run with a different query every month using dataflow API and templates. If that is not possible then can I pass query to at runtime while still using Dataflow API and templates?
I have a dataflow 'batch' data pipeline which reads a BigQuery table like below
def run(argv=None):
parser = argparse.ArgumentParser()
help='project id')
help='bigquery dataset to read data from')
args, pipeline_args = parser.parse_known_args(argv)
project_id = args.pro_id
dataset_id = args.dataset
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session = True
with beam.Pipeline(argv=pipeline_args) as p:
companies = (
| "Read from BigQuery" >>, dataset_id),
And the query parameter for is calculated by a function like this
from datetime import datetime
def query_bq(project, dataset):
month ="%Y_%m_%d")
query = (
f'SELECT * FROM `{project}.{dataset}.data_{month}_json` '
f'LIMIT 10'
return query
Couple of things to note here
I want to run this data pipeline once a day
The table id changes from month to month. So for example, the table id for this month would be data_2020_06_01_json and for next month the table id would be data_2020_07_01_json and all this is calculated by def query_bq(project, dataset) above
I would like to automate the running of this batch pipeline using Dataflow API using cloud function, pubsub event, cloud scheduler.
Here is the cloud function that gets triggered by cloud-scheduler publishing an event to pubsub everyday
def run_dataflow(event, context):
if 'data' in event:
pubsub_message = base64.b64decode(event['data']).decode('utf-8')
pubsub_message_dict = ast.literal_eval(pubsub_message)
event = pubsub_message_dict.get("eventName")
now ="%Y-%m-%d-%H-%M-%S")
project = 'xxx-xxx-xxx'
region = 'europe-west2'
dataflow = build('dataflow', 'v1b3', cache_discovery=False)
if event == "run_dataflow":
job = f'dataflow-{now}'
template = 'gs://xxxxx/templates/xxxxx'
request = dataflow.projects().locations().templates().launch(
'jobName': job,
response = request.execute()
Here is the command I use to launch this data pipeline on dataflow
python \
--setup_file ./ \
--project xxx-xx-xxxx \
--pro_id xxx-xx-xxxx \
--dataset 'xx-xxx-xxx' \
--machine_type=n1-standard-4 \
--max_num_workers=5 \
--num_workers=1 \
--region europe-west2 \
--serviceAccount= xxx-xxx-xxx \
--runner DataflowRunner \
--staging_location gs://xx/xx \
--temp_location gs://xx/temp \
--subnetwork="xxxxxxxxxx" \
--template_location gs://xxxxx/templates/xxxxx
The problem I'm facing :
My query_bq function is called during compilation and creation of dataflow template that is then loaded to GCS. And this query_bq function does not get called during runtime. So whenever my cloud-function invokes dataflow create it is always reading from data_2020_06_01_json table and the table in the query will always remain same even when we progress into July, August and so on. What I really want is for that query to dynamically change based on query_bq function so in future I can read from data_2020_07_01_json and data_2020_08_01_json and so on.
I have also looked into the template file generated and it looks like the query is hard-coded into the template after compilation. Here's a snippet
"name": "beamapp-xxxxx-0629014535-344920",
"steps": [
"kind": "ParallelRead",
"name": "s1",
"properties": {
"bigquery_export_format": "FORMAT_AVRO",
"bigquery_flatten_results": true,
"bigquery_query": "SELECT * FROM `xxxx.xxxx.data_2020_06_01_json` LIMIT 10",
"bigquery_use_legacy_sql": false,
"display_data": [
"key": "source",
"label": "Read Source",
"namespace": "apache_beam.runners.dataflow.ptransform_overrides.Read",
"shortValue": "BigQuerySource",
"type": "STRING",
"value": ""
"key": "query",
"label": "Query",
"namespace": "",
"type": "STRING",
"value": "SELECT * FROM `xxxx.xxxx.data_2020_06_01_json` LIMIT 10"
"key": "validation",
"label": "Validation Enabled",
"namespace": "",
"type": "BOOLEAN",
"value": false
"format": "bigquery",
"output_info": [
An alternative I've tried
I also tried the ValueProvider as defined here
and I added this to my code
class UserOptions(PipelineOptions):
def _add_argparse_args(cls, parser):
parser.add_value_provider_argument('--query_bq', type=str)
user_options = pipeline_options.view_as(UserOptions)
p | "Read from BigQuery" >>,
And when I run this I get this error
WARNING:apache_beam.utils.retry:Retry with exponential backoff: waiting for 3.9023594566785924 seconds before retrying get_query_location because we caught exception: apitools.base.protorpclite.messages.ValidationError: Expected type <class 'str'> for field query, found SELECT * FROM `xxxx.xxxx.data_2020_06_01_json` LIMIT 10 (type <class 'apache_beam.options.value_provider.StaticValueProvider'>)
So I'm guessing does not accept ValueProviders
You cannot use ValueProviders in BigQuerySource, but as of the more recent versions of Beam, you can use, which supports them well.
You would do:
result = (p
You can pass value providers, and it has a lot of other utilities. Check out its documentation
I have a problem with a REST API endpoint in FastAPI that accepts a list of strings via a single query parameter. An example of this endpoint's usage is:
Here, the parameter named 'response' accepts a list of strings as documented in FastAPI tutorial, section on Query Parameters and String Validation. The endpoint works as expected in the browser.
However, it does not work in Swagger docs. The button labeled 'Add string item' shakes upon clicking 'Execute' to test the endpoint. Swagger UI seems unable to create the expected URL with the embedded query parameters (as shown in Fig 1.).
The code for the endpoint is as follows. I have tried with and without validation.
async def getQuestion_byID(item_ID: int = Path(
title = "Numeric ID of the question",
description = "Specify a number between 1 and 999",
ge = 1,
le = 999
), response: Optional[List[str]] = Query(
title="Furnish an answer",
description="Answer can only have letters of the alphabet and is case-insensitive",
), short: bool = Query(
title="Set flag for short result",
description="Acceptable values are 1, True, true, on, yes"
Returns the quiz question or the result.
Accepts item ID as path parameter and
optionally response as query parameter.
Returns result when the response is passed with the item ID.
Otherwise, returns the quiz question.
item = question_bank.get(item_ID, None)
if not item:
return {"question": None}
if response:
return evaluate_response(item_ID, response, short)
return {"question": item["question"]}
Grateful for any help.
As described here, this happens due to that OpenAPI applies the pattern (as well as minimum and maximum constraints) to the schema of the array itself, not just the individual items in the array. If you checked the OpenAPI schema at, you would see that the schema for the response parameter appears as shown below (i.e., validations are being applied to the array itself as well):
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"required": false,
"schema": {
"title": "Furnish an answer",
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "array",
"items": {
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "string"
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"default": []
"name": "response",
"in": "query"
Solution 1
As mentioned here, you could use a Pydantic constr instead to specify items with that contraint:
my_constr = constr(regex="^[a-zA-Z]+$", min_length=3, max_length=99)
response: Optional[List[my_constr]] = Query([], title="Furnish an...", description="Answer can...")
Solution 2
Keep your response parameter as is. Copy the OpenAPI schema from, remove the pattern (as well as minimum and maximum attributes) from response's (array) schema and save the OpenAPI schema to a new file (e.g., my_openapi.json). It should look like this:
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"required": false,
"schema": {
"title": "Furnish an answer",
"type": "array",
"items": {
"maxLength": 99,
"minLength": 3,
"pattern": "^[a-zA-Z]+$",
"type": "string"
"description": "Answer can only have letters of the alphabet and is case-insensitive",
"default": []
"name": "response",
"in": "query"
Then, in your app, instruct FastAPI to use that schema instead:
import json
app.openapi_schema = json.load(open("my_openapi.json"))
Solution 3
Since the above solution would require you to copy and edit the schema every time you make a change or add new endpoints/parameters, you would rather modify the OpenAPI schema as described here. This would save you from copying/editing the schema file. Make sure to add the below at the end of your code (after defining all the routes).
from fastapi.openapi.utils import get_openapi
def custom_openapi():
if app.openapi_schema:
return app.openapi_schema
openapi_schema = get_openapi(
description="This is a very custom OpenAPI schema",
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["maxLength"]
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["minLength"]
del openapi_schema["paths"]["/items/{item_ID}"]["get"]["parameters"][1]["schema"]["pattern"]
app.openapi_schema = openapi_schema
return app.openapi_schema
app.openapi = custom_openapi
In all the above solutions, the constraints annotation that would normally be shown in OpenAPI under response (i.e., (query) maxLength: 99 minLength: 3 pattern: ^[a-zA-Z]+$), won't appear (since Swagger would create that annotation from the constraints applied to the array, not the items), but there doesn't seem to be a way to preserve that. In Solutions 2 and 3, however, you could modify the "in" attribute, shown in the JSON code snippet above, to manually ddd the annotation. But, as HTML elements, etc., are controlled by Swagger, the whole annotation would appear inside parentheses and without line breaks between the constraints. Nevertheless, you could still inform users about the constraints applied to items, by specifying them in the description of your Query parameter.
I am trying to create a service endpoint (aka service connection) in Azure DevOps. I first attempted to use the DevOps CLI but this method hangs. Using az devops as shown below.
az devops service-endpoint azurerm create --name “Azure subscription 1 endpoint” --azure-rm-service-principal-id $serviceprincipleid --azure-rm-subscription-id $subscriptionid --azure-rm-tenant-id $tenantid --azure-rm-subscription-name $serviceprinciplename --organization $organization --project $project
Hangs till i restart PowerShell
I suspect the logged in account doesn't have access?? IDK. And there's no way to specify a personal access token which is what I need anyway.
I then turned my attention towards calling the DevOps REST method using a Personal Access Token (PAT) to authenticate. I'm using the documentation from this sample
Here is the basic code in PowerShell
$body = '{
"data": {
"subscriptionId": "1272a66f-e2e8-4e88-ab43-487409186c3f",
"subscriptionName": "subscriptionName",
"environment": "AzureCloud",
"scopeLevel": "Subscription",
"creationMode": "Manual"
"name": "MyNewARMServiceEndpoint",
"type": "AzureRM",
"url": "",
"authorization": {
"parameters": {
"tenantid": "1272a66f-e2e8-4e88-ab43-487409186c3f",
"serviceprincipalid": "1272a66f-e2e8-4e88-ab43-487409186c3f",
"authenticationType": "spnKey",
"serviceprincipalkey": "SomePassword"
"scheme": "ServicePrincipal"
"isShared": false,
"isReady": true,
"serviceEndpointProjectReferences": [
"projectReference": {
"id": "c7e5f0b3-71fa-4429-9fb3-3321963a7c06",
"name": "TestProject"
"name": "MyNewARMServiceEndpoint"
}' | convertto-json | convertfrom-json
$bo = $body | convertfrom-json
$ = $subscriptionid
$ = "subscription name"
$ = $serviceprinciplename
$bo.authorization.parameters.tenantid = $tenantid
$bo.authorization.parameters.serviceprincipalid = $serviceprincipalid
$bo.authorization.parameters.serviceprincipalkey = $serviceprincipalkey
$bo.serviceEndpointProjectReferences = #{}
$readybody = $bo | convertto-json -Depth 100
function createazurermserviceendpoint($body, $pat, $org, $project)
$requestpath = "/_apis/serviceendpoint/endpoints?api-version=6.0-preview.4"
$token = [System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes(":$pat"))
$uribase = "" + $org
$uri = $uribase+$requestpath
$authheader = "Authorization=Basic " + $token
$result = az rest --method post --uri $uri --headers "Content-Type=application/json" $authheader --body $body | convertfrom-json
return $result
$result = createazurermserviceendpoint $readybody $pat $org $project
The method throws a Bad Request exception as shown below
az : Bad Request({"$id":"1","innerException":null,"message":"TF400898: An Internal Error Occurred. Activity Id:
So, i went into the UI with fiddler and captured both an automated and manual create service endpoint believing the contract would be the same. I'm not certain it is. The resulting json body from the API is shown below. When I attempt to pass this through the script I get the exact same error as above for both of them. None of the json is like the other; I started with the sample json structure in the article mentioned above. Now I'm not certain what the issue is at all.
#hack a version from fiddler to try it
#fiddler body capture from automated service connection
$readybody = '{"authorization":{"parameters":{"tenantid":"xxxxxxxx-34e9-4306-ac1a-5f28c1d08fb1","serviceprincipalid":"","serviceprincipalkey":"","authenticationType":"spnKey"},"scheme":"ServicePrincipal"},"createdBy":{},"data":{"environment":"AzureCloud","scopeLevel":"Subscription","subscriptionId":"yyyyyyyy-75c4-4dfd-bdd5-c8c42d1a5dd0","subscriptionName":"Azure subscription 1.1","creationMode":"Automatic","appObjectId":"","azureSpnPermissions":"","azureSpnRoleAssignmentId":"","spnObjectId":""},"isShared":false,"name":"Azure sub 1.1 test","owner":"library","type":"azurerm","url":"","administratorsGroup":null,"description":"","groupScopeId":null,"operationStatus":null,"readersGroup":null,"serviceEndpointProjectReferences":[{"description":"","name":"Azure sub 1 test","projectReference":{"id":"zzzzzzzz-fad9-427f-ad6c-21f4ae2d311f","name":"Connected2someone"}}]}'
$result = createazurermserviceendpoint $readybody $pat $org $project
Fails the same way
#fiddler body capture from manual service connection
$readybody = '{"dataSourceDetails":{"dataSourceName":"TestConnection","dataSourceUrl":"","headers":null,"resourceUrl":"","requestContent":null,"requestVerb":null,"parameters":null,"resultSelector":"","initialContextTemplate":""},"resultTransformationDetails":{"callbackContextTemplate":"","callbackRequiredTemplate":"","resultTemplate":""},"serviceEndpointDetails":{"administratorsGroup":null,"authorization":{"scheme":"ServicePrincipal","parameters":{"serviceprincipalid":"xxxxxxxx-65b2-470d-adc7-c811fc993014","authenticationType":"spnKey","serviceprincipalkey":"{a key}","tenantid":"yyyyyyy-34e9-4306-ac1a-5f28c1d08fb1"}},"createdBy":null,"data":{"environment":"AzureCloud","scopeLevel":"Subscription","subscriptionId":"zzzzzzzz-75c4-4dfd-bdd5-c8c42d1a5dd3","subscriptionName":"azure test 2 ","creationMode":"Manual"},"description":"","groupScopeId":null,"name":"azure test 2 connection","operationStatus":null,"readersGroup":null,"serviceEndpointProjectReferences":null,"type":"azurerm","url":"","isShared":false,"owner":"library"}}'
$result = createazurermserviceendpoint $readybody $pat $org $project
Fails the same way.
Can someone confirm the REST API works? what version of the API is specified and does the body json look like what I posted?
I did a test with your PowerShell script and got the same error you did.
Then I switched to another PowerShell script with the same body, and it worked.
Here is my script:
$pat = "{PAT}"
$pat = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes(":$($pat)"))
$body = #'
$head = #{ Authorization =" Basic $pat" }
Invoke-RestMethod -Uri $url -Method Post -Headers $head -Body $body -ContentType application/json
So the cause of the error may be your PowerShell script (probably az rest) and not the REST API request body. You can try out the PowerShell script I've provided.
By the way:
You can sign in Azure DevOps CLI with a PAT. Please click this document for detailed information.
I am using influx db and I want to enforce some kind of schema validation.
I had a problem that influx had learned a filed using the wrong type due to a developer mistake. As a result, once we sent the right type, influx wouldn't persist it because it recognised the field of another type.
Can I force field types such as String, Integer and Double?
I use Java
Unfortunately we have waited so long to see this feature finally in the newer release.
Starting from InfluxDB v2.4, we could create a bucket (new name for database in InfluxDB v2.X ) with an explicit schema. That is,
Create a bucket with an explicit schema (see more details here)
influx bucket create \
--name my_schema_bucket \
--schema-type explicit
Adding measurement schemas to your bucket (see more details here)
influx bucket-schema create \
--bucket my_schema_bucket \
--name temperature \
--columns-file columns.csv
where that column.csv is similar to DDL:
{"name": "time", "type": "timestamp"}
{"name": "alert", "type": "field", "dataType": "string"}
{"name": "cdi", "type": "field", "dataType": "float"}
You could refer to this blog as well.
I know I come to you with any news, but I'm stuck solving an issue that probably is my fault, indeed I can't realize what's the solution.
I'm using a standalone installation of the Confluent platform (4.0.0 open source version) in order to demonstrate how to adopt the platform for a specific use case.
Trying to demonstrate the value of using the schema registry I'm facing the following issue posting a new schema with Postman.
The request is:
, method POST
, Header: Accept:application/vnd.schemaregistry.v1+json, application/vnd.schemaregistry+json, application/json
, Body:
{"schema":"{{\"namespace\":\"com.testlab\",\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"resourcepath\",\"type\":\"string\"},{\"name\":\"resource\",\"type\":\"string\"}]}}" }
The response is: {"error_code":42201,"message":"Input schema is an invalid Avro schema"}
Looking at docs and after googling a lot I'm without options.
Any suggestion ?
Thanks for your time
You have extra {} around the schema field.
One way to test this is with jq
$ echo '{"schema":"{{\"namespace\":\"com.testlab\",\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"resourcepath\",\"type\":\"string\"},{\"name\":\"resource\",\"type\":\"string\"}]}}" }' | jq '.schema|fromjson'
jq: error (at <stdin>:1): Objects must consist of key:value pairs at line 1, column 146 (while parsing '{{"namespace":"com.testlab","name":"test","type":"record","fields":[{"name":"resourcepath","type":"string"},{"name":"resource","type":"string"}]}}')
$ echo '{"schema":"{\"namespace\":\"com.testlab\",\"name\":\"test\",\"type\":\"record\",\"fields\":[{\"name\":\"resourcepath\",\"type\":\"string\"},{\"name\":\"resource\",\"type\":\"string\"}]}" }' | jq '.schema|fromjson'
"namespace": "com.testlab",
"name": "test",
"type": "record",
"fields": [
"name": "resourcepath",
"type": "string"
"name": "resource",
"type": "string"
See my comment here about importing AVSC files so that you don't need to type out the JSON on the CLI
I'm trying to pull a list of users from our Atlassian Confluence/Jira instance. However I'm struggling to find good documentation on what REST services are available, and it seems the SOAP services are deprecated.
The following code does get results, but we have over 100 users, and this returns 0.
if(-not ($credentials)) { #put this here so I can rerun the same script in the same IDE session without having to reinput credentials each time
$credentials = get-credential 'myAtlassianUsername'
$tenant = 'myCompany'
invoke-restmethod -Method Get -Uri ('https://{0}' -f $tenant) -Credential $credentials | ConvertTo-Json -Depth 5
(The ConvertTo-Json is just to make it simpler to see the expanded result set).
"users": {
"users": [
"total": 0,
"header": "Showing 0 of 0 matching users"
"groups": {
"header": "Showing 2 of 2 matching groups",
"total": 2,
"groups": [
"name": "confluence-users",
"html": "confluence-\u003cb\u003eusers\u003c/b\u003e",
"labels": [
"name": "jira-users",
"html": "jira-\u003cb\u003eusers\u003c/b\u003e",
"labels": [
I think the result's trying to give me the URLs for the JIRA and Confluence User APIs; but I can't figure out how those relative URLs map to the root URL (I've tried appending at various positions in the URL, all of which give me a 404 or dead link error).
The query parameter in your following call is a search query on the Name or E-mail address.
You can use maxResults parameter to get more than 50 results.
Sadly, this REST API call will not give you all users in one call.
The only way that I know to do with Jira to get all users is to make one call by starting letter (iterate on each letter):
GET .../rest/api/2/user/search?username=a&maxResults=1000
GET .../rest/api/2/user/search?username=b&maxResults=1000
GET .../rest/api/2/user/search?username=c&maxResults=1000
Sample Code
function Get-AtlassianCloudUsers {
param (
[Parameter(Mandatory=$false)][string]$UserFilter = '%'
[Parameter(Mandatory=$false)][int]$MaxResults = 9999
process {
#refer to for additional notes
[string]$uri = 'https://{0}{1}&maxResults={2}' -f $Tenant, $UserFilter, $MaxResults
Invoke-RestMethod -Method Get -Uri $Uri -Credential $credential | select -Expand syncRoot | Select-Object name, displayName, active, self
#| ConvertTo-Json -Depth 5
Get-AtlassianCloudUsers -Tenant 'MyCompany' -credential (Get-Credential 'MyUsername') | ft -AutoSize
As an alternate answer, I recently discovered the PSJira project on GitHub:
This library provides a nice set of wrapper functions around the JIRA services, and seems well documented and maintained.
To achieve the above requirement follow the steps below:
Installing the package:
Documented here:
Upgrade to PS5 (optional; but required for this method of installation):
Install NuGet Package Manager: Install-PackageProvider -Name NuGet -MinimumVersion -Force
Install PSJira Module: Install-Module PSJira
Configuring PSJira:
Set-JiraConfigServer -Server "https://$" (assigning $Tenant to your instance's name)
Create a credential for your Jira/Atlassian account: $cred = get-credential $JiraUsername
Get list of users: Get-JiraUser -UserName '%' -IncludeInactive -Credential $cred | select Name, DisplayName, Active, EmailAddress