Trace Propagation on Google Cloud Run with OpenTelemetry - google-cloud-run

I have a Flask app talking to a Python gRPC service, both deployed on Google Cloud Run. I can see traces on Google Trace after instrumenting the apps, but they all appear to have different Trace IDs which means the traces are not being linked together between the two services. This is my setup code for tracing on both services with grpc/Flask instrumentors setup on each side:
import logging
from opentelemetry import trace
from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import SimpleExportSpanProcessor
from opentelemetry.propagators import set_global_textmap
from opentelemetry.tools.cloud_trace_propagator import CloudTraceFormatPropagator
from google.auth.exceptions import DefaultCredentialsError
logger = logging.getLogger(__name__)
def setup_tracing():
"""
Setup Tracing on Google Cloud. The Service Account Roles must have `Cloud Trace Agent`
Role added for traces to be ingested.
"""
trace.set_tracer_provider(TracerProvider())
try:
# If running on Google Cloud, will use instance metadata service account credentials to initialize
trace.get_tracer_provider().add_span_processor(
SimpleExportSpanProcessor(CloudTraceSpanExporter())
)
# Using the X-Cloud-Trace-Context header
set_global_textmap(CloudTraceFormatPropagator())
logger.info("Tracing Setup. Exporting Traces to Google Cloud.")
except DefaultCredentialsError:
# Not running on Google Cloud so will use console exporter
from opentelemetry.sdk.trace.export import ConsoleSpanExporter
trace.get_tracer_provider().add_span_processor(
SimpleExportSpanProcessor(ConsoleSpanExporter())
)
logger.info("Tracing Setup. Exporting Traces to Console.")
Locally I can see with the ConsoleSpanExporter that the Trace IDs on both services match, however on Google Cloud Run they clearly don't resulting in separate traces on Google Trace, so I'm wondering if the Networking removes the headers between services or something else is happening which means the Trace ID isn't being propagated?
As an extra note I've also noticed that the load balancer in front of Cloud Run's Trace/Span IDs aren't being propagated using CloudTraceSpanFormatPropagator() which makes my logs messy too as the logs aren't nested together for requests.

After hours of debugging turns out it was bad documentation on the Python gRPC Client Instrumentation. For insecure (localhost) channels, the documentation works and the client is instrumented. For secure channels (as required for Google Cloud Run) you need to pass in channel_type='secure'. I'm not sure why it was designed this way and raised an issue on the module: https://github.com/open-telemetry/opentelemetry-python-contrib/issues/365
In addition, you need to use the X-Cloud-Trace-Context header to ensure your traces use the same trace ID as the load balancer and AppServer on Google Cloud run and all link up in Google Trace, but the default implementation of their propagator uses upper case letters that can't be used in gRPC metadata keys so throws a validation error. I took the class below and made it all lowercase and it all works perfectly now:
https://github.com/GoogleCloudPlatform/opentelemetry-operations-python/blob/master/opentelemetry-tools-google-cloud/src/opentelemetry/tools/cloud_trace_propagator.py
Finally I had a long standing issue with linking my logs to traces on Google Cloud logs, the documentation says use a Hex Trace ID and Hex Span ID, but they didn't work as I was using the wrong OpenTelemetry functions to format them. However this code works and I can now see my logs alongside my traces in Google Trace's Trace List view now!
from opentelemetry import trace
from opentelemetry.trace.span import get_hexadecimal_trace_id, get_hexadecimal_span_id
current_span = trace.get_current_span()
if current_span:
trace_id = current_span.get_span_context().trace_id
span_id = current_span.get_span_context().span_id
if trace_id and span_id:
logging_fields['logging.googleapis.com/trace'] = f"projects/{self.gce_project}/traces/{get_hexadecimal_trace_id(trace_id)}"
logging_fields['logging.googleapis.com/spanId'] = f"{get_hexadecimal_span_id(span_id)}"
logging_fields['logging.googleapis.com/trace_sampled'] = True
It took a while, but I guess its my fault for picking an Alpha (just turned Beta) framework (OpenTelemetry) on a new, not very well documented (in this area) Google Cloud service. But with those fixes it all works now and much easier to debug issues and see the total end to end request!

Related

Is it possible to execute some code like logging and writing result metrics to GCS at the end of a batch Dataflow job?

I am using apache beam 2.22.0 (java sdk) and want to log metrics and write them to a GCS bucket after a batch pipeline finishes execution.
I have tried using result.waitUntilFinish() followed by the intended code:
DirectRunner- GCS object is created as expected and the logs appear on the console
DataflowRunner- GCS object is created but logs (post pipeline exec) don't appear on stackdriver
Problem: When a GCS template is created for the same, Neither the GCS object is created nor logs appear using the template.
what you are doing is the correct way of getting a signal for when the pipeline is done. There is no direct API in Apache Beam that allows for getting that signal within the running pipeline aside from wait_until_finish().
For your logging problem, you need to use the Cloud Logging API in your code. This is because the pipeline is submitted to the Dataflow service and runs in GCE VMs which logs to Cloud Logging. However, the code outside of your pipeline runs locally.
See Perform action after Dataflow pipeline has processed all data for a little more information.
It is possible to export the logs from your Dataflow job to Google Cloud Storage, Big Query or PubSub. In order to do that, you can use Cloud Logging Console, Cloud Logging API or gcloud logging to export the desired metrics to a specific sink.
In summary, to use the log export:
Create a sink, selecting Google Cloud Storage as the Sink Service( or one of the desired other options).
Within the sink, create a query to filter your logs (Optional)
Export destination
Afterwards, every time Cloud Logging receives new entries it will add them to the sink, only the new entries.
While you did not mention if you are using custom metrics, I should point that you should follow the Metrics naming rules, here. Otherwise, it won't show up in StackDriver.

Does Cloud Run add location-aware request header similar to App engine?

App engine requests have location-aware headers (X-AppEngine-Country, X-AppEngine-Region, X-AppEngine-City) automatically added. Does Cloud Run have something similar?
This is (will be) possible with Google Cloud HTTP(S) Load Balancer via user-defined headers.
However, putting your Cloud Run service behind the load balancer is currently in alpha, so you cannot try this out today. You can wait for a while, or if you’re willing try the alpha out and give feedback, please contact me. #ahmetbtodo
AFAIK, these Google custom header values doesn't exist today. However, in the current headers you can find the IP of the originated requester (here in IPv6)
forwarded: for="2a01:cb14:af0:b500:ccf6:1a91:1713:b48";proto=https
x-forwarded-for: 2a01:cb14:af0:b500:ccf6:1a91:1713:b48*emphasized text*
You can use external services to know the exact location.

Send Docker Entrypoint logs to APP in realtime

I'm looking for ideas to send Docker Logs for each runs to be sent to my application in realtime. I'm looking ways this can be done. Please let me know how this can be done.
Let me know if you have done this already or know how this can be achieved. I want to build feature similar to Netlify or vercel where they show you all build log on UI in realtime. I want something similar for my node application.
You can achieve this with Vercel and Log Drains.
Log Drains make it easy to collect logs from your deployments and forward them to archival, search, and alerting services by sending them via HTTPS, HTTP, TLS, and TCP once a new log line is created.
At the time of writing, we currently support 3 types of Log Drains:
JSON
NDJSON
Syslog
Along with Log Drains, we are introducing two new open-source integrations with logging services for you to start using them today: LogDNA and Datadog.
Install the integration: https://vercel.com/integrations?category=logging
See the announcement blog post: https://vercel.com/blog/log-drains
Note that Vercel does not allow Docker deployments, but does support Serverless Functions.

Google Cloud BigTable - slow response time from outside of Google Project

When all the code is running in a Google Project; performance is as expected.
However; during development, I connect my laptop to a test Google Project BigTable instance; and each query takes 2-4 seconds to run.
Its a similar response response when I trigger commands using cbt CLI commands.
Is there a known reason for this overhead? Perhaps its how auth needs to be done for external connections?
On start up I see the below logs:
Opening connection for projectId ..., instanceId ..., on data host bigtable.googleapis.com, admin host bigtableadmin.googleapis.com.
Bigtable options: BigtableOptions{dataHost=bigtable.googleapis.com, adminHost=bigtableadmin.googleapis.com, ..., appProfileId=, userAgent=hbase-1.4.1, credentialType=DefaultCredentials, port=443, dataChannelCount=32, retryOptions=RetryOptions{retriesEnabled=true, allowRetriesWithoutTimestamp=false, statusToRetryOn=[DEADLINE_EXCEEDED, UNAVAILABLE, UNAUTHENTICATED, ABORTED], initialBackoffMillis=5, maxElapsedBackoffMillis=60000, backoffMultiplier=2.0, streamingBufferSize=60, readPartialRowTimeoutMillis=60000, maxScanTimeoutRetries=3}, bulkOptions=BulkOptions{asyncMutatorCount=2, useBulkApi=true, bulkMaxKeyCount=125, bulkMaxRequestSize=1048576, autoflushMs=0, maxInflightRpcs=320, maxMemory=143183052, enableBulkMutationThrottling=false, bulkMutationRpcTargetMs=100}, callOptionsConfig=CallOptionsConfig{useTimeout=false, shortRpcTimeoutMs=60000, longRpcTimeoutMs=600000}, usePlaintextNegotiation=false}.
Refreshing the OAuth token
Are there any options I can consider; other than using the BitTable emulator? I had some trouble getting that running a while back; so must try again.
Thanks,
Brent
As Solomon said above, please open a Google Cloud support ticket to resolve this.

Is it possible to have Centralised Logging for ElasticBeanstalk Docker apps?

We have custom Docker web app running in Elastic Beanstalk Docker container environment.
Would like to have application logs be available for viewing outside. Without downloading through instances or AWS console.
So far neither of solutions been acceptable. Maybe someone achieved centralised logging for Elastic Benastalk Dockerized apps?
Solution 1: AWS Console log download
not acceptable - requires to download logs, extract every time. Non real-time.
Solution 2: S3 + Elasticsearch + Fluentd
fluentd does not have plugin to retrieve logs from S3
There's excellent S3 plugin, but it's only for log output to S3. not for input logs from S3.
Solution 3: S3 + Elasticsearch + Logstash
cons: Can only pull all logs from entire bucket or nothing.
The problem lies with Elastic Beanstalk S3 Log storage structure. You cannot specify file name pattern. It's either all logs or nothing.
ElasticBeanstalk saves logs on S3 in path containing random instance and environment ids:
s3.bucket/resources/environments/logs/publish/e-<random environment id>/i-<random instance id>/my.log#
Logstash s3 plugin can be pointed only to resources/environments/logs/publish/. When you try to point it to environments/logs/publish/*/my.log it does not work.
which means you can not pull particular log and tag/type it to be able to find in Elasticsearch. Since AWS saves logs from all your environments and instances in same folder structure, you cannot chose even the instance.
Solution 4: AWS CloudWatch Console log viewer
It is possible to forward your custom logs to CloudWatch console. Do achieve that, put configuration files in .ebextensions path of your app bundle:
http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/AWSHowTo.cloudwatchlogs.html
There's a file called cwl-webrequest-metrics.config which allows you to specify log files along with alerts, etc.
Great!? except that configuration file format is neither yaml,xml or Json, and it's not documented. There is absolutely zero mentions of that file, it's format either on AWS documentation website or anywhere on the net.
And to get one log file appear in CloudWatch is not simply adding a configuration line.
The only possible way to get this working seem to be trial and error. Great!? except for every attempt you need to re-deploy your environment.
There's only one reference to how to make this work with custom log: http://qiita.com/kozayupapa/items/2bb7a6b1f17f4e799a22 I have no idea how that person reverse engineered the file format.
cons:
Cloudwatch does not seem to be able to split logs into columns when displaying, so you can't easily filter by priority, etc.
AWS Console Log viewer does not have auto-refresh to follow logs.
Nightmare undocumented configuration file format, no way of testing. Trial and error requires re-deploying whole instance.
Perhaps an AWS Lambda function is applicable?
Write some javascript that dumps all notifications, then see what you can do with those.
After an object is written, you could rename it within the same bucket?
Or notify your own log-management service about the creation of a new object?
Lots of possibilities there...
I've started using Sumologic for the moment. There's a free trial and then a free tier (500mb /day, 7 day retention). I'm not out of the trial period yet and my EB app does literally nothing (it's just a few HTML pages serve by Nginx in a docker container. Looks like it could get expensive once you hit any serious amount of logs though.
It works ok so far. You need to create an IAM user that has access to the S3 bucket you want to read from and then it sucks the logs over to Sumologic servers and does all the processing and searching over there. Bit fiddly to set up, but I don't really see how it could be simpler and it's reasonably well-documented.
It lets you provide different path expressions with wildcards, then assign a "sourceCategory" to those different paths. You then use those sourceCategories to filter your log searching to a specific type of logging.
My plan long-term is to use something like your solution 3, but this got me going in very short order so I can move on to other things.
You can use a Multicontainer environment, sharing the log folder to another docker container with the tool of your preference to centralize the logs, in our case we connected an Apache Flume to move the files to an HDFS. Hope this helps you with this.
The easiest method I found to do this was using papertrail via rsyslog and .ebextensions, however it is very expensive for logging everything.
The good part is with rsyslog you can essentially send your logs anywhere and you are not tied to papertrail.
example ebextension
I've found loggly to be the most convenient.
It is a hosted service which might not be what you want. However if you check out their setup page you can see a number of ways your situation is supported (docker specific solutions, as well as like 10 amazon specific options). Even if loggly isn't to your taste, you can look at those solutions and easily see how some of them could be applied to most any centralized logging solution you might use or write.

Resources