I've created a custom seqio task and added it to the TaskRegistry following the instruction per the documentation. When I set the gin parameters, accounting for the new task I've created, I receive an error that says my task does not exist.
No Task or Mixture found with name [my task name]. Available:
Am I using the correct Mixture/Task module that needs to be imported? If not, what is the correct statement that would allow me to use my custom task?
--gin.MIXTURE_OR_TASK_MODULE=\"t5.data.tasks\"
Here is the full eval script I am using.
python3 t5x/eval.py \
--gin_file=t5x/examples/t5/t5_1_0/11B.gin \
--gin_file=t5x/configs/runs/eval.gin \
--gin.MIXTURE_OR_TASK_NAME=\"task_name\" \
--gin.MIXTURE_OR_TASK_MODULE=\"t5.data.tasks\" \
--gin.partitioning.PjitPartitioner.num_partitions=8 \
--gin.utils.DatasetConfig.split=\"test\" \
--gin.DROPOUT_RATE=0.0 \
--gin.CHECKPOINT_PATH=\"${CHECKPOINT_PATH}\" \
--gin.EVAL_OUTPUT_DIR=\"${EVAL_OUTPUT_DIR}\"
Related
I have created a cloud run service. My event arc is not triggering the cross project to read the data. How to give the event filter for resource name in event arc with insert job/Job completed to trigger to BQ table.
gcloud eventarc triggers create ${SERVICE}-test1\
--location=${REGION} --service-account ${SVC_ACCOUNT} \
--destination-run-service ${SERVICE} \
--destination-run-region=${REGION} \
--event-filters type=google.cloud.audit.log.v1.written \
--event-filters methodName=google.cloud.bigquery.v2.JobService.InsertJob \
--event-filters serviceName=bigquery.googleapis.com \
--event-filters-path-pattern resourceName="/projects/destinationproject/locations/us-central1/jobs/*"
I have tried multiple options giving the resource name like:
"projects/projectname/datasets/outputdataset/tables/outputtable"
So I want to run this pythong script to end all active Flows currently executing
executions = client.studio \
.v1 \
.flows('FWXXXXXX') \
.executions \
.list(limit=20)
for record in executions:
if record.status == 'active':
execution = client.studio \
.flows('FWXXXXXX') \
.executions(record.sid) \
.update(status='ended')
print('ending', execution)
I get this error "'ExecutionContext' object has no attribute 'update'"
I want to end a Twilio flow, but the documentation on the twilio website does not work for me in this case.
Twilio developer evangelist here.
Since you have already collected the executions when you list them from the API, you should be able to use that object to update the status. Try this:
executions = client.studio \
.v1 \
.flows('FWXXXXXX') \
.executions \
.list(limit=20)
for record in executions:
if record.status == 'active':
record.update(status='ended')
print('ending', record)
I want to migrate an ant build to Bazel 4.2.1.
The ant build uses an Eclipse compiler (ecj-3.27.0).
The way to declare a Java compiler in Bazel is java_toolchain.
So I had a look at the output of bazel query #bazel_tools//tools/jdk:all and tried to use the vanilla toolchain as an inspiration (bazelisk query --output=build #bazel_tools//tools/jdk:toolchain_vanilla).
java_toolchain(
# modified settings
name = "my-ecj-toolchain",
javabuilder = ["#ecj//:ecj.jar"], # my Eclipse compiler jar
target_version = "8",
# prevent ecj.jar's error message 'unknown option --persistent_workers'
javac_supports_workers = False,
# prevent ecj.jar's error message 'unknown option --persistent_workers'
javac_supports_multiplex_workers = False,
# keeping most vanilla-toolchain settings
bootclasspath = ["#bazel_tools//tools/jdk:platformclasspath"],
misc = ["-XDskipDuplicateBridges=true", "-XDcompilePolicy=simple", "-g", "-parameters"],
jvm_opts = [],
tools = ["#bazel_tools//tools/jdk:javac_jar", "#bazel_tools//tools/jdk:java_compiler_jar", "#bazel_tools//tools/jdk:jdk_compiler_jar"],
singlejar = ["#bazel_tools//tools/jdk:singlejar"],
forcibly_disable_header_compilation = True,
genclass = ["#bazel_tools//tools/jdk:genclass"],
ijar = ["#bazel_tools//tools/jdk:ijar"],
header_compiler = ["#bazel_tools//tools/jdk:turbine_direct"],
header_compiler_direct = ["#bazel_tools//tools/jdk:turbine_direct"],
jacocorunner = "#bazel_tools//tools/jdk:JacocoCoverageFilegroup",
)
However, it still does not work. ecj.jar compiles output the unknown option --output.
bazel aquery MYTARGET displays the whole compiler command line (plus some more build steps):
Command Line: (exec external/remotejdk11_linux/bin/java \
-jar \
external/ecj/ecj.jar \
--output \
bazel-out/k8-opt/bin/[...].jar \
--native_header_output \
bazel-out/k8-opt/bin/[...]-native-header.jar \
--output_manifest_proto \
bazel-out/k8-opt/bin/[...].jar_manifest_proto \
--compress_jar \
--output_deps_proto \
bazel-out/k8-opt/bin/[...].jdeps \
--bootclasspath \
bazel-out/k8-opt/bin/external/bazel_tools/tools/jdk/platformclasspath.jar \
--sources \
bazel-out/k8-opt/bin/[...].java \
--javacopts \
-target \
8 \
'-XDskipDuplicateBridges=true' \
'-XDcompilePolicy=simple' \
-g \
-parameters \
-g \
-- \
--target_label \
bazel-out/k8-opt/bin/[...]:[...] \
--strict_java_deps \
ERROR \
--direct_dependencies \
[...]
I don't know any Java compiler that accepts --output. Do I have to pass ecj.jar in a different way?
I don't know any Java compiler that accepts --output.
Indeed, Bazel does not invoke OpenJDK's javac executable directly but rather its own wrapper over javac's APIs called buildjar. There is a little about this in the Bazel documentation; buildjar enables some of Bazel's fancier Java features.
So, you would have to provide a buildjar-compatible wrapper for EJC. The closest starting point for such an implementation is likely VanillaJavaBuilder, which is for OpenJDK's javac without any of the invasive API usages in the normal buildjar implementation.
I am trying to deploy a dataflow job using google's predefined template using python api
I do not want my dataflow compute instance to have a public ip, so I use something like this:
GCSPATH="gs://dataflow-templates/latest/Cloud_PubSub_to_GCS_Text"
BODY = {
"jobName": "{jobname}".format(jobname=JOBNAME),
"parameters": {
"inputTopic" : "projects/{project}/topics/{topic}".format(project=PROJECT, topic=TOPIC),
"outputDirectory": "gs://{bucket}/pubsub-backup-v2/{topic}/".format(bucket=BUCKET, topic=TOPIC),
"outputFilenamePrefix": "{topic}-".format(topic=TOPIC),
"outputFilenameSuffix": ".txt"
},
"environment": {
"machineType": "n1-standard-1",
"usePublicIps": False,
"subnetwork": SUBNETWORK,
}
}
request = service.projects().templates().launch(projectId=PROJECT, gcsPath=GCSPATH, body=BODY)
response = request.execute()
but I get this error:
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://dataflow.googleapis.com/v1b3/projects/ABC/templates:launch?alt=json&gcsPath=gs%3A%2F%2Fdataflow-templates%2Flatest%2FCloud_PubSub_to_GCS_Text returned "Invalid JSON payload received. Unknown name "use_public_ips" at 'launch_parameters.environment': Cannot find field.">
If I remove the usePublicIps, it goes through, but my compute instance gets deployed with public ip.
The parameter usePublicIps cannot be overriden in runtime. You need to send this parameter with value false into Dataflow Template generation command.
mvn compile exec:java -Dexec.mainClass=class -Dexec.args="--project=$PROJECT \
--runner=DataflowRunner --stagingLocation=bucket --templateLocation=bucket \
--usePublicIps=false"
It will add an entry ipConfiguration on template's JSON indicating that workers needs only with Private IP.
The links are printscreens of template JSON with and without ipConfiguration entry.
Template with usePublicIps=false
Template without usePublicIps=false
It seems you are using the json from projects.locations.templates.create
The environment block documented here needs to follow
"environment": {
"machineType": "n1-standard-1",
"ipConfiguration": "WORKER_IP_PRIVATE",
"subnetwork": SUBNETWORK // sample: regions/${REGION}/subnetworks/${SUBNET}
}
The value for ipConfiguration is an enum documented at Job.WorkerIPAddressConfiguration
By reading the docs for Specifying your Network and Subnetwork on Dataflow I see that python uses use_public_ips=false insted of usePublicIps=false which is used by Java. Try changing that parameter.
Also, keep in mind that:
When you turn off public IP addresses, the Cloud Dataflow pipeline can
access resources only in the following places:
another instance in the same VPC network
a Shared VPC network
a network with VPC Network Peering enabled
I found one way to make this work
Clone Google Defined Templates
Run the template with custom parameters
mvn compile exec:java \
-Dexec.mainClass=com.google.cloud.teleport.templates.PubsubToText \
-Dexec.cleanupDaemonThreads=false \
-Dexec.args=" \
--project=${PROJECT_ID} \
--stagingLocation=gs://${BUCKET}/dataflow/pipelines/${PIPELINE_FOLDER}/staging \
--tempLocation=gs://${BUCKET}/dataflow/pipelines/${PIPELINE_FOLDER}/temp \
--runner=DataflowRunner \
--windowDuration=2m \
--numShards=1 \
--inputTopic=projects/${PROJECT_ID}/topics/$TOPIC \
--outputDirectory=gs://${BUCKET}/temp/ \
--outputFilenamePrefix=windowed-file \
--outputFilenameSuffix=.txt \
--workerMachineType=n1-standard-1 \
--subnetwork=${SUBNET} \
--usePublicIps=false"
Besides all the other methods mentioned so far, gcloud dataflow jobs run and gcloud dataflow flex-template run define the optional flag --disable-public-ips.
I am new to neo4j and I am trying to construct bitcoin transaction graph using it. I am following this link behas/bitcoingraph to do so, I came across the neo4j import command to create a database
$NEO4J_HOME/bin/neo4j-import --into $NEO4J_HOME/data/graph.db \
--nodes:Block blocks_header.csv,blocks.csv \
--nodes:Transaction transactions_header.csv,transactions.csv \
--nodes:Output outputs_header.csv,outputs.csv \ .......
After executing the above command I encountered an error
Exception in thread "Thread-1" org.neo4j.unsafe.impl.batchimport.cache.idmapping.string.DuplicateInputIdException: Id '00000000f079868ed92cd4e7b7f50a5f8a2bb459ab957dd5402af7be7bd8ea6b' is defined more than once in Block, at least at /home/nikhil/Desktop/Thesis/bitcoingraph/blocks_0_1000/blocks.csv:409 and /home/nikhil/Desktop/Thesis/bitcoingraph/blocks_0_1000/blocks.csv:1410
Here is the block_header. csv
hash:ID(Block),height:int,timestamp:int
Does anyone know how to fix it? I read there is a solution available in id-spaces but I am not quiet sure how to use it. Thanks in advance for any help
The --skip-duplicate-nodes flag will skip import of nodes with the same ID instead of aborting the import.
For example:
$NEO4J_HOME/bin/neo4j-import --into $NEO4J_HOME/data/graph.db \
--nodes:Block blocks_header.csv,blocks.csv --skip-duplicate-nodes \
--nodes:Transaction transactions_header.csv,transactions.csv \
--nodes:Output outputs_header.csv,outputs.csv \ .......