Google Cloud Dataflow jobs failing (IO errors) - google-cloud-dataflow

Some of our Dataflow jobs randomly crash while reading source data files.
The following error is written in the job logs (there is nothing in the workers logs) :
11 févr. 2016 à 08:30:54
(33b59f945cff28ab): Workflow failed.
Causes: (fecf7537c059fece): S02:read-edn-file2/TextIO.Read+read-edn-file2
/ParDo(ff19274a)+ParDo(ff19274a)5+ParDo(ff19274a)6+RemoveDuplicates
/CreateIndex+RemoveDuplicates/Combine.PerKey
/GroupByKey+RemoveDuplicates/Combine.PerKey/Combine.GroupedValues
/Partial+RemoveDuplicates/Combine.PerKey/GroupByKey/Reify+RemoveDuplicates
/Combine.PerKey/GroupByKey/Write faile
We also get that kind of error sometimes (logged in the workers logs) :
2016-02-15T10:27:41.024Z: Basic: S18: (43c8777b75bc373e): Executing operation group-by2/GroupByKey/Read+group-by2/GroupByKey/GroupByWindow+ParDo(ff19274a)19+ParDo(ff19274a)20+ParDo(ff19274a)21+write-edn-file3/ParDo(ff19274a)+write-bq-table-from-clj3/ParDo(ff19274a)+write-bq-table-from-clj3/BigQueryIO.Write+write-edn-file3/TextIO.Write
2016-02-15T10:28:03.994Z: Error: (af73c53187b7243a): java.io.IOException: com.google.api.client.googleapis.json.GoogleJsonResponseException: 410 Gone
{
"code" : 503,
"errors" : [ {
"domain" : "global",
"message" : "Backend Error",
"reason" : "backendError"
} ],
"message" : "Backend Error"
}
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.waitForCompletionAndThrowIfUploadFailed(AbstractGoogleAsyncWriteChannel.java:431)
at com.google.cloud.hadoop.util.AbstractGoogleAsyncWriteChannel.close(AbstractGoogleAsyncWriteChannel.java:289)
at com.google.cloud.dataflow.sdk.runners.worker.TextSink$TextFileWriter.close(TextSink.java:243)
at com.google.cloud.dataflow.sdk.util.common.worker.WriteOperation.finish(WriteOperation.java:100)
at com.google.cloud.dataflow.sdk.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:77)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.executeWork(DataflowWorker.java:254)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.doWork(DataflowWorker.java:191)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorker.getAndPerformWork(DataflowWorker.java:144)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.doWork(DataflowWorkerHarness.java:180)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:161)
at com.google.cloud.dataflow.sdk.runners.worker.DataflowWorkerHarness$WorkerThread.call(DataflowWorkerHarness.java:148)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
The source data files are stored in google cloud storage.
Data paths are correct and the job generally works after relaunching it.
We didn't experience this issue until the end of january.
Jobs are launched with these parameters :
--tempLocation='gstoragelocation' --stagingLocation='another gstorage location' --runner=BlockingDataflowPipelineRunner --numWorkers='a few dozen' --zone=europe-west1-d
SDK version : 1.3.0
Thanks

As a clearly-marked "backend error", this should be reported on either the Cloud Dataflow Public Issue Tracker](google-cloud-dataflow) or the more general Cloud Platform Public Issue Tracker and there's relatively little anybody on Stack Overflow could do to help you debug this.

Related

Spinnaker Kayenta service is failing to identify the spinnaker details

I am testing the spinnaker for one pipeline implementation, during canary analysis process spinnaker is unable to read the metrics from datadog and throwing below error.
Request GET:https://app.datadoghq.com/api/v1/query?from=&to=&query=avg%3Asystem.mem.total%7B is missing [X-SPINNAKER-USER, X-SPINNAKER-ACCOUNTS] authentication headers and will be treated as anonymous.
Request from: com.netflix.spinnaker.okhttp.MetricsInterceptor.doIntercept(MetricsInterceptor.java:98)
2022-10-10 06:02:41.583 INFO 1 --- [ handlers-20] c.n.k.d.service.DatadogRemoteService : <--- HTTP 403 https://app.datadoghq.com/api/v1/query?from=&to=&query=avg%3Asystem.mem.total%7 (1106ms)
2022-10-10 06:02:41.587 ERROR 1 --- [ handlers-20] c.n.s.orca.q.handler.RunTaskHandler : Error running DatadogFetchTask for pipeline[01GF07SZ02KCXHCD2SWZ69XY38] (edited)
(https://spinnakerteam.slack.com/archives/C091CCWRJ/p1665387018133599)
it would be really helpful if some one can help me on this.
It is looking for
X-SPINNAKER-USER and X-SPINNAKER-ACCOUNTS

(GCP) ERROR: (gcloud.builds.submit) INVALID_ARGUMENT: could not resolve source: googleapi: Error 403: 354778943856#cloudbuild.gserviceaccount.com

I'm trying to upload a docker image to GCR(Google Container Registry) with Cloud Shell but I got this error message:
ERROR: (gcloud.builds.submit) INVALID_ARGUMENT:
could not resolve source: googleapi: Error 403:
354778943856#cloudbuild.gserviceaccount.com does not have
storage.objects.get access to the Google Cloud Storage object.,
forbidden
This is the steps which I've done.
(1st step) I ran this command:
gcloud builds submit --tag gcr.io/my_project_id/hello_world
(2nd step) Then, I was asked whether or not enabling and retrying "API [cloudbuild.googleapis.com]" then, I put and ran "y":
Creating temporary tarball archive of 2 file(s) totalling 478 bytes
before compression. Uploading tarball of [.] to
[gs://my_project_id_cloudbuild/source/1642137449.192753-983fc894e2f24fa086f55fa3b56d58aa.tgz]
API [cloudbuild.googleapis.com] not enabled on project [354778943856].
Would you like to enable and retry (this will take a few minutes)?
(y/N)? y
(3rd step) Finally, I got this message with the error message:
Enabling service [cloudbuild.googleapis.com] on project
[354778943856]... Operation
"operations/acf.p2-354778943856-e99f6fd8-78ec-4cbd-94a2-07e0697d5455"
finished successfully. ERROR: (gcloud.builds.submit) INVALID_ARGUMENT:
could not resolve source: googleapi: Error 403:
354778943856#cloudbuild.gserviceaccount.com does not have
storage.objects.get access to the Google Cloud Storage object.,
forbidden
Even though I enabled "API [cloudbuild.googleapis.com]" by running "y", I got the error message.
ERROR: (gcloud.builds.submit) INVALID_ARGUMENT:
could not resolve source: googleapi: Error 403:
354778943856#cloudbuild.gserviceaccount.com does not have
storage.objects.get access to the Google Cloud Storage object.,
forbidden
Are there any ways to solve this error?
Did you find the message below at the end of the message of 2nd step?:
Would you like to enable and retry (this will take a few minutes)?
(y/N)? y
As the message says, it takes a few minutes after enabling "API [cloudbuild.googleapis.com]" by running "y".
So run the command again a few minutes after enabling "API [cloudbuild.googleapis.com]" by running "y":
gcloud builds submit --tag gcr.io/my_project_id/hello_world
Then, it will be successful:
ID: f4478e51-557b-407d-9c30-c379ef707258 CREATE_TIME:
2022-01-14T05:22:29+00:00 DURATION: 19S SOURCE:
gs://my_project_id_cloudbuild/source/1642137748.745566-d75b61b6c6bc4acb9aba900650f201b2.tgz
IMAGES: gcr.io/my_project_id/hello_world(+1 more) STATUS: SUCCESS

JFrog Container Registry 7.3.2 won't work with Active Directory

I was hoping someone here could help me out. We are currently evaluating JFrog's Artifactory - Container Registry running as a Docker service and for the life of me I cannot get this thing to work properly with our Active Directory instance. I had it working fine in version 6 but with the release of version 7, decided to start new with the new version.
So I have artifactory-jcr:7.3.2 up and running in our swarm. Go into administration -> security -> LDAP and create a new LDAP settings profile with the following fields:
LDAP URL: ldap://mydc.company.net:389/DC=company,DC=net
User DN Pattern: blank
Email Attribute: mail
Search Filter: (sAMAccountName={0})
Search Base: OU=Company Users
Search Sub-Tree: checked
Manager DN: CN=_svcAccount,OU=Service Accounts,OU=Company Users,DC=company,DC=net
Manager Password: Correct Password
The Manager DN is correct and the password has been verified and tested. I can log in with the service account from any machine and successfully query the directory using ADExplorer and issue a query for my account using only my sAMAccountName which returns my user object. So I know the service account's password is correct, permissions for it are correct and it can successfully issue queries.
But when trying to test an account from the LDAP settings profile page, I get a generic error message popup stating "Error connecting to the LDAP server:"
For the log, I am looking at the /var/opt/artifactory/artifactory-service.log file.
Here's the entry immediately following a failed 'test account' attempt:
2020-04-03T17:16:46.714Z [jfrt ] [ERROR] [7faa71d56a50ef2b] [o.a.s.l.AbstractLdapService:67] [http-nio-8081-exec-4] - Error connecting to the LDAP server:
org.springframework.security.authentication.AuthenticationServiceException: User myuseraccount failed to authenticate
at org.artifactory.security.ldap.ArtifactoryBindAuthenticator.authenticate(ArtifactoryBindAuthenticator.java:166)
at org.artifactory.security.ldap.LdapServiceImpl.testLdapConnection(LdapServiceImpl.java:77)
at org.artifactory.security.SecurityServiceImpl.testLdapConnection(SecurityServiceImpl.java:3193)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:343)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:205)
at com.sun.proxy.$Proxy156.testLdapConnection(Unknown Source)
at org.artifactory.ui.rest.service.admin.security.ldap.ldapsettings.TestLdapSettingsService.testLdapConnection(TestLdapSettingsService.java:76)
at org.artifactory.ui.rest.service.admin.security.ldap.ldapsettings.TestLdapSettingsService.execute(TestLdapSettingsService.java:63)
at org.artifactory.rest.common.service.ServiceExecutor.process(ServiceExecutor.java:38)
at org.artifactory.rest.common.resource.BaseResource.runService(BaseResource.java:92)
at org.artifactory.ui.rest.resource.admin.security.ldap.LdapSettingResource.testLdapSetting(LdapSettingResource.java:90)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
2020-04-03T17:16:46.732Z [jfrt ] [ERROR] [7faa71d56a50ef2b] [o.a.s.l.AbstractLdapService:68] [http-nio-8081-exec-4] - Error connecting to the LDAP server:
2020-04-03T17:17:57.524Z [jfrt ] [WARN ] [81a5689d90762c9 ] [o.a.s.l.LdapServiceImpl:179 ] [http-nio-8081-exec-8] - Unexpected exception in LDAP query:for user myuseraccount vid LDAP: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C090446, comment: AcceptSecurityContext error, data 52e, v2580]; nested exception is javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C090446, comment: AcceptSecurityContext error, data 52e, v2580]
2020-04-03T17:17:57.547Z [jfrt ] [INFO ] [81a5689d90762c9 ] [o.a.s.l.LdapServiceImpl:129 ] [http-nio-8081-exec-8] - Couldn't find user named "myuseraccount" in ADsettings
From the login ui, I try to use my sam account name only I get a message above the login form stating: "Username or password is incorrect"
Here's the log entry that's generated at the time:
2020-04-03T17:05:12.060Z [jfrt ] [WARN ] [77c816e57e51530 ] [o.a.s.l.LdapServiceImpl:179 ] [http-nio-8081-exec-8] - Unexpected exception in LDAP query:for user admin vid LDAP: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C090446, comment: AcceptSecurityContext error, data 52e, v2580]; nested exception is javax.naming.AuthenticationException: [LDAP: error code 49 - 80090308: LdapErr: DSID-0C090446, comment: AcceptSecurityContext error, data 52e, v2580]
I am only using the sam account name for the login, not the user principal name. I am also leaving off the netbios domain name portion of the login. If I try to add in the full SAM Account name to include the domain, "companyname\myuseraccount" I get a Status 500 error page.
Can anyone tell me what I'm doing wrong here?
Thanks for any help!
Shortly after posting this question, I decided to shell into the running jcr container and copy and install the necessary rpm files and to get openldap working. Then used ldapsearch from the container to query our domain controller using the settings I had provided the artifactory UI. And viola! The issue was the bind DN. I thought the the Manager DN form field was supposed to be the full distinguished name of the binding user account used to query the directory but ldapsearch was returning object not found errors.
I changed the binding account to the service account's SAM account name ("_svcAccount") and got a result back. I've since gone back into the artifactory settings and update the Manager DN to be "_svcAccount" and everything is working.
Jfrog should change the description of the Manager DN field. A distinguished name consists of the full LDAP path to the object. Which doesn't work, at least not in my particular situation. Other Java based products we use like SonarQube, use the classic full distinguished name for the bind account. Jfrog Container Registry apparently does not.
-Update -- I ended up having to use the NetBIOS domain as part of the Manager DN account to get it to authenticate. So, instead of "_svcAccount" as the Manager DN, I had to use "mycompany\_svcAccount" as the Manager DN. However, Active Directory users do not use the NetBIOS domain when logging into the Container Registry, just the SAM account name. (i.e. "myAccount" vs. "mycompany\myAccount")

Azure Functions Dependency Injection Failing Again

I have an Azure V2 application running in Azure. For about the 3rd time now over various releases my custom dependency Injection is failing again!
This time it is worse than ever as the issue only happens when it is deployed to Azure and the logging was a nightmare to find out what was happening.
Azure Functions is currently running as follows with the usual error that it is unable to index my Functions.
018-11-10T10:41:56.091 [Information] Host Status: {
"id": "ad-api-dev",
"state": "Default",
"version": "2.0.12165.0",
"versionDetails": "2.0.12165.0 Commit hash: f9d6c271296eaae48f933c67d1df796fd4ff2a94"
}
2018-11-10T10:41:57.607 [Information] Initializing Host.
2018-11-10T10:41:57.607 [Information] Host initialization: ConsecutiveErrors=0, StartupCount=1
2018-11-10T10:41:57.644 [Information] Starting JobHost
2018-11-10T10:41:57.655 [Information] Starting Host (HostId=ad-api-dev, InstanceId=387865be-cfdf-45ca-98ba-2e0091827461, Version=2.0.12165.0, ProcessId=5140, AppDomainId=1, InDebugMode=True, InDiagnosticMode=False, FunctionsExtensionVersion=~2)
2018-11-10T10:41:57.674 [Information] Loading functions metadata
2018-11-10T10:41:57.723 [Information] 15 functions loaded
2018-11-10T10:41:58.158 [Information] Generating 15 job function(s)
2018-11-10T10:41:58.237 [Error] Error indexing method 'CompleteSimulation.Run'
Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexingException : Error indexing method 'CompleteSimulation.Run' ---> System.InvalidOperationException : Cannot bind parameter 'executionService' to type IExecutionService. Make sure the parameter Type is supported by the binding. If you're using binding extensions (e.g. Azure Storage, ServiceBus, Timers, etc.) make sure you've called the registration method for the extension(s) in your startup code (e.g. builder.AddAzureStorage(), builder.AddServiceBus(), builder.AddTimers(), etc.).
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsyncCore(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 277
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsync(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 167
End of inner exception
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexMethodAsync(MethodInfo method,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 175
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Host.Indexers.FunctionIndexer.IndexTypeAsync(Type type,IFunctionIndexCollector index,CancellationToken cancellationToken) at C:\projects\azure-webjobs-sdk-rqm4t\src\Microsoft.Azure.WebJobs.Host\Indexers\FunctionIndexer.cs : 103
2018-11-10T10:41:58.574 [Warning] Function 'CompleteSimulation.Run' failed indexing and will be disabled.
My local version runs fine as below
Azure Functions Core Tools (2.2.32 Commit hash: c5476ae629a0a438d6850e58eae1f5203c896cd6)
Function Runtime Version: 2.0.12165.0
[10/11/2018 13:35:15] Building host: startup suppressed:False, configuration suppressed: False
[10/11/2018 13:35:15] Reading host configuration file 'Y:\src\modo.solutions\Modo.Esg.Api\ModoSol.Functions.Api\bin\Debug\netstandard2.0\host.json'
[10/11/2018 13:35:15] Host configuration file read:
[10/11/2018 13:35:15] {
[10/11/2018 13:35:15] "version": "2.0"
[10/11/2018 13:35:15] }
[10/11/2018 13:35:17] Initializing extension with the following settings: Initializing extension with the following settings:
You can see I am using the same runtime version locally as Azure, So if anyone knows why my Dependency Injection would work locally but not when deployed it would be appreciated.
I am using Autofac 4.8.1 as my dependency injection container.
Thanks in advance.
BTW If anyone else has that issue of not being able to see why their Functions host is not starting follow these instructions.
Go into the Portal for your Azure function and find and enable the Live Log Streaming.
While this window is open re-deploy/publish your app.
Once thats done go into cloud explorer (Visual Studio Windows only :/) and find the logs.
If you don't have the live log streaming window open while you deploy you don't get any logs!!!
Screen shot below.

Bug with Spring Cloud Dataflow Container Task Deployment?

Version : spring-cloud-dataflow-server-yarn-1.2.2.RELEASE
Issue : All OOTB / Custom Task Apps seem to be NOT working with Yarn Deployer (I Specifically tested with timestamp-task-1.3.0.RELEASE and a hello world Custom Task built per the reference doc).
We have a Yarn cluster where all the streams that we have deployed are running fine which rules out any issue with hadoop/yarn cluster. The moment we try to deploy a task, the task exits with code 0 with below message logged in Yarn Container/AppMaster stdout
2018-09-19 18:04:20.782 DEBUG 22625 --- [ask-scheduler-2] o.s.yarn.am.allocate.AbstractAllocator : completed container: container_1536919363436_0805_01_000002 with status=ContainerStatus: [ContainerId: container_1536919363436_0805_01_000002, State: COMPLETE, Diagnostics: Exception from container-launch.
Container id: container_1536919363436_0805_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
, ExitStatus: 1, ]
Full AppMaster Log can be found here and corresponding servers.yml can be found here
Any help is Appreciated.
I am answering my own question -- our yarn server had log aggregation enabled and hence container logs weren't displayed immediately and I had to grep through the aggregated logs to find out why custom tasks weren't launching. Once we (temporarily) disabled log aggregation in yarn, custom task's Container.stdout and Container.stderror were visible under log directory configured in yarn-site.xml

Resources