Schedulers.boundedElastic() and Logging Issue - project-reactor

I have a very generic query as I'm using Schedulers.boundedElastic() in .subscribeOn in my reactive pipeline and I'm seeing a very strange problem of Logger not working as expected as it's messing with the reactive context like for example if I run a load test on the function, in the logger it's aggregating the latency on all. But if I use .subscribeOn(Schedulers.elastic()), Logger is working as expected
Let me post a skeleton of my code here for reference
Mono.just(stream)
.transform(#my application logic)
.subscribeOn(Schedulers.elastic())
.subscriberContext(reactiveContext)
.subscribe(
s -> #logging happens here,
error -> # error handling logic,
() -> # inspect);

Related

Dart Functions Framework usage

I'm new to the Dart functions framework. My goal is to use this package to create several functions and deploy them to Cloud Run (in combination with Firebase, but I guess that's irrelevant to this question).
I've run the quick starts and I've read all of the contents in the docs.
The quick start mentions just one function at a time (e.g. Hello World, Cloud Events, etc..), like this:
import 'package:functions_framework/functions_framework.dart';
import 'package:shelf/shelf.dart';
#CloudFunction()
Response function(Request request) {
return Response.ok('Hello, World!');
}
But as you can see in the quickstarts only one function is handled in a project at a time. How about me wanting to deploy several functions? Should I:
Write several functions in the same project / file, so that the function framework compiles the 'server.dart` by itself
OR
Create a different functions_framework for each function?
Let me be more specific. Should I do the following (option 1 - which makes more sense to me):
import 'dart:math';
import 'package:functions_framework/functions_framework.dart';
import 'package:shelf/shelf.dart';
#CloudFunction()
Response function(Request request) {
return Response.ok('Hello, World!');
}
#CloudFunction()
Response function2(Request request) {
if (Random().nextBool()) {
return Response.ok('Hello, World!');
} else {
return Response.internalServerError();
}
}
Or should I build a different folder by running a build_runner for each function I need in my project?
Is there a difference and/or a best practice?
Thanks in advance.
EDIT. This question is related to the deployment on Cloud Run itself, and not just testing on my own PC. To test my own functions I did the following:
Run dart run build_runner build, so that it updates the server.dart file correctly (I can see that the framework does a lot behind the scenes and that the _nameToFunctionTarget is basically a router);
Run the server in two different terminals, like this: dart run bin/server.dart --port MYPORT --target MYFUNCTION (where MYPORT and MYFUNCTION are either 8080/8081 or function/function2 respectively).
I guess I'm just confused on how to correctly manage this framework once deployed on Cloud Run.
EDIT 2. I just gave up using Dart as a Serverless language or even a Backend language. There's just too much jargon even for the basic things. Any backend framework is either dead, or maintained by one single enthusiast guy (props to him!). This language has not yet received enough love from the Google Team / the community and at this moment in time is basically not possible to go fullstack on just Dart. It's a dream, but it can't be realized now. Furthermore, Dart hardly lacks a proper SDKs to use Firestore, etc., so Firebase isn't an option. I find it easier to just learn NodeJS and exploit the Firebase support for Firebase Functions written in NodeJS, and I'll wait for more support in there in the future, if there ever will be.
The documentation is a bit sparse right now (and I'm new to it also! I couldn't find any good examples, so here goes...)
You can only have a single function that is served. It should be
named 'function' (the type and name can be overriden, see the
cloudevent example dartfn generate cloudevent)
You 'could' have many of these deployed so that each does a specific thing, such as processing cloudevents above, but most people
want something more REST-like (see next)
You need to attach a Router() so that you can have the single entry point (function) handled by specific logic in your code.
Example for Rest
add to pubspec.yaml (in dependencies:) shelf_router: ^1.1.2
delegate the #CloudFunction to use the Router()
functions.dart
import 'package:functions_framework/functions_framework.dart';
import 'package:shelf/shelf.dart';
import 'package:shelf_router/shelf_router.dart';
Router app = Router()
..get('/health', (Request request) {
return Response.ok('healthy');
})
..get('/user/<user>', (Request request, String user) {
// fetch the user... (probably return as json)
return Response.ok('hello $user');
})
..post('/user', (Request request) {
// convert request body to json and persist... (probably return as json)
return Response.ok('saved the user');
});
#CloudFunction()
Future<Response> function(Request request) => app.call(request);

Beam/Dataflow: No session file found: /var/opt/google/dataflow/pickled_main_session

When using Apache Beam (GCP Dataflow) I see the following warning in worker logs:
No session file found: /var/opt/google/dataflow/pickled_main_session.
Functions defined in __main__ (interactive session) may fail.
My Dataflow job seems to be fine regardless, but I'm wondering what this warning is all about.
I have seen the following in some sample code (which I am NOT currently doing):
pipeline_options.view_as(SetupOptions).save_main_session = True
where pipeline_options is the main way of specifying options for the Beam/Dataflow pipeline, as in the following later in the code:
with beam.Pipeline(options=pipeline_options) as p:
# actual pipeline code here
I am curious if the two are related. Does the presence of the warning mean I should always be saving the main session? Are these two things related? Unrelated?
You should be able to safely ignore this warning. No need to set save_main_session if it's not required for your pipeline.

Reactor flux throws illegalArgumentException - suspecting due to bufferTimeout

I have a spring application which builds a reactive pipeline as follows:
buildPipeline(). // returns a flux based on changeStreamEvents or Kafka receives
.bufferTimeout( capacity, Duration.ofSeconds(1))
. flatMap( r -> {
element x = r.get(r.size()-1)
//some processing on element and the batch obtained
})
.doOnError( e-> log.info("error occurred:" + e.toString())
.subscribe()
However, I see my application intermediately throwing the below error -
java.lang.illegalArgumentException:3.9 While the Subscription is not cancelled, Subscription.request(long n) MUST throw a java.lang.illegalArgumentException if argument <= 0
at com.mongodb.reactivestreams.client.internal.ObservableToPublisher$1
$1.request(ObservableToPublisher.java:43)
at reactor.core.publisher.FluxMap$MapSubscriber.request(FluxMap.java:155)
at reactor.core.publisher.FluxBufferTimeout
$BufferimeoutSubscriber.requestMore(FluxBufferTimeout.java:317)
I'm not able to determine what is wrong, and why the stream is terminating with this error.
Any help would be highly appreciated.
The application started throwing this error after I added "bufferTimeout" to add a feature of batching. Before that, I had never encountered this exception.
Not sure how to replicate the issue as well, as it is not occurring locally or in UAT, but only in production environment of the application.
Any leads would be helpful.
Thanks!
Try adding a onBackPressureBuffer(), so that in case of low demand this operator buffers the requests, and emits items in a controlled way.

How does one inject dependencies like a logger, database connection, or SHA256 generator in Iron? [duplicate]

In writing my tests, I'd like to be able to inject a connection into the request so that I can wrap the entire test case in a transaction (even if there is more than one request in the test case).
I've attempted to do this using a BeforeMiddleware which I can link in my test cases to insert a connection, as such:
pub type DatabaseConnection = PooledConnection<ConnectionManager<PgConnection>>;
pub struct DatabaseOverride {
conn: DatabaseConnection,
}
impl BeforeMiddleware for DatabaseOverride {
fn before(&self, req: &mut Request) -> IronResult<()> {
req.extensions_mut().entry::<DatabaseOverride>().or_insert(self.conn);
Ok(())
}
}
However, I'm encountering a compile error in trying to do this:
error: the trait bound `std::rc::Rc<diesel::pg::connection::raw::RawConnection>: std::marker::Sync` is not satisfied [E0277]
impl BeforeMiddleware for DatabaseOverride {
^~~~~~~~~~~~~~~~
help: run `rustc --explain E0277` to see a detailed explanation
note: `std::rc::Rc<diesel::pg::connection::raw::RawConnection>` cannot be shared between threads safely
note: required because it appears within the type `diesel::pg::PgConnection`
note: required because it appears within the type `r2d2::Conn<diesel::pg::PgConnection>`
note: required because it appears within the type `std::option::Option<r2d2::Conn<diesel::pg::PgConnection>>`
note: required because it appears within the type `r2d2::PooledConnection<r2d2_diesel::ConnectionManager<diesel::pg::PgConnection>>`
note: required because it appears within the type `utility::db::DatabaseOverride`
note: required by `iron::BeforeMiddleware`
error: the trait bound `std::cell::Cell<i32>: std::marker::Sync` is not satisfied [E0277]
impl BeforeMiddleware for DatabaseOverride {
^~~~~~~~~~~~~~~~
help: run `rustc --explain E0277` to see a detailed explanation
note: `std::cell::Cell<i32>` cannot be shared between threads safely
note: required because it appears within the type `diesel::pg::PgConnection`
note: required because it appears within the type `r2d2::Conn<diesel::pg::PgConnection>`
note: required because it appears within the type `std::option::Option<r2d2::Conn<diesel::pg::PgConnection>>`
note: required because it appears within the type `r2d2::PooledConnection<r2d2_diesel::ConnectionManager<diesel::pg::PgConnection>>`
note: required because it appears within the type `utility::db::DatabaseOverride`
note: required by `iron::BeforeMiddleware`
Is there a way around this with diesel's connections? I've found several examples on Github to do this using the pg crate, but I'd like to keep using diesel.
This answer will certainly solve the problem, but it's not optimal. As mentioned, you can't share a single connection as it's not thread safe. However, while wrapping it in a Mutex makes it thread-safe, it would force all the server threads to use a single connection. Instead, you want to use a connection pool.
You can accomplish this with the r2d2 and r2d2-diesel crates. This will establish multiple connections as needed, and reuse them when possible in a thread safe manner.
Since there isn't enough code provided for me to reproduce your issue, I've made this:
use std::cell::Cell;
trait Middleware: Sync {}
struct Unsharable(Cell<bool>);
impl Middleware for Unsharable {}
fn main() {}
which has the same error:
error: the trait bound `std::cell::Cell<bool>: std::marker::Sync` is not satisfied [E0277]
impl Middleware for Unsharable {}
^~~~~~~~~~
help: run `rustc --explain E0277` to see a detailed explanation
note: `std::cell::Cell<bool>` cannot be shared between threads safely
note: required because it appears within the type `Unsharable`
note: required by `Middleware`
You can solve the problem by changing the type to make it cross-thread compatible:
use std::sync::Mutex;
struct Sharable(Mutex<Unsharable>);
impl Middleware for Sharable {}
Note that Rust has done a very good thing for you: it prevented you from using a type that is unsafe to be called in multiple threads.
In writing my tests, I'd like to be able to inject a connection into the request so that I can wrap the entire test case in a transaction (even if there is more than one request in the test case).
I'd suggest that it's possible an architectural change would be even better. Separate the domains of "web framework" from your "database". The authors of Growing Object-Oriented Software, Guided by Tests (a highly recommended book) advocate for this style.
Pull apart your code such that there is a method that simply accepts some type that can start / end a transaction, write the interesting stuff there, and test it thoroughly. Then have just enough glue code in the web layer to create a transaction object, then call the next layer down.

Cloud Dataflow: java.lang.IllegalStateException: no evaluator registered for GroupedValues

I'm getting the following exception when running the pipeline locally. There is no exception when submitting for cloud execution.
Thanks,
Genady
INFO: Executing pipeline using the DirectPipelineRunner.
Exception in thread "main" java.lang.IllegalStateException: no evaluator registered for GroupedValues [GroupedValues]
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.visitTransform(DirectPipelineRunner.java:606)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:200)
at com.google.cloud.dataflow.sdk.runners.TransformTreeNode.visit(TransformTreeNode.java:196)
at com.google.cloud.dataflow.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:109)
at com.google.cloud.dataflow.sdk.Pipeline.traverseTopologically(Pipeline.java:204)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner$Evaluator.run(DirectPipelineRunner.java:583)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:327)
at com.google.cloud.dataflow.sdk.runners.DirectPipelineRunner.run(DirectPipelineRunner.java:70)
at app.Main.main(Main.java:124)
The code outline is basically this:
PCollection<KV<MyKey, Iterable<MyValue>>> groupedByMyKey = ...
PCollection<KV<MyKey, MyAggregated>> aggregated = groupedByMyKey.apply(
Combine.<MyKey, MyValue, MyAggregated>groupedValues(new Aggregator()));
Aggregator class extends CombineFn<MyValue, List<MyValue>, MyAggregated>
Can you share a code snippet that triggers this? GroupedValues is a PTransform that is often used within various combining transforms, so it might be from using something like Min, Max, etc.
The error means that the DirectPipelineRunner doesn't know how to evaluate a GroupedValues. However, that's unexpected, since that should have been expanded into a ParDo before execution.
I found the reason to this behaviour
I was using a command line argument to run it in remote mode (--runner=BlockingDataflowPipelineRunner) and then forced it to run locally with
PipelineRunner<?> runner = DirectPipelineRunner.fromOptions(options);
runner.run(p);
After removing these lines and just using the --runner=DirectPipelineRunner argument it worked as expected.

Resources