Kafka reactor consumer API - project-reactor

Kafka reactor consumer API - project-reactor

I need to implement retry feature in kafka consumer. I am using spring reactor api.
Flux> inboundFlux = KafkaReceiver.create(receiverOptions).receive();
inboundFlux.subscribe(r -> {
if (accept(r)) {
r.receiverOffset().acknowledge();
}
}, this::errorOut);
The retry option on inboundFlux.retry() is not working. Please suggest how do I make this work.

Related

MassTransit with SQS/SNS. Publish into SNS?

There is an official examples of MassTransit with SQS. The "bus" is configured to use SQS (x.UsingAmazonSqs). The receive endpoint is an SQS which in turn subscribed to an SNS topic. However there is no example how to Publish into SNS.
How to publish into SNS topic?
How to configure SQS/SNS to use http, since I develop against localstack?
AWS sdk version:
var cfg = new AmazonSimpleNotificationServiceConfig { ServiceURL = "http://localhost:4566", UseHttp = true };
Update:
After Chris's reference and experiments with configuration I came up with the following for the 'localstack' SQS/SNS. This configuration executes without errors and Worker gets called, and publishes a message to a bus. However consumer class is not triggered and doesn't seem messages end up in the queue (or rather topic).
public static readonly AmazonSQSConfig AmazonSQSConfig = new AmazonSQSConfig { ServiceURL = "http://localhost:4566" };
public static AmazonSimpleNotificationServiceConfig AmazonSnsConfig = new AmazonSimpleNotificationServiceConfig {ServiceURL = "http://localhost:4566"};
...
services.AddMassTransit(x =>
{
x.AddConsumer<MessageConsumer>();
x.UsingAmazonSqs((context, cfg) =>
{
cfg.Host(new Uri("amazonsqs://localhost:4566"), h =>
{
h.Config(AmazonSQSConfig);
h.Config(AmazonSnsConfig);
h.EnableScopedTopics();
});
cfg.ReceiveEndpoint(queueName: "deal_queue", e =>
{
e.Subscribe("deal-topic", s =>
{
});
});
});
});
services.AddMassTransitHostedService(waitUntilStarted: true);
services.AddHostedService<Worker>();
Update 2:
When I look at sns subscriptions I see that the first which was created and subscribed manually through aws cli has a correct Endpoint, while the second that was created by MassTransit library has incorrect one. How to configure Endpoint for the SQS queue?
$ aws --endpoint-url=http://localhost:4566 sns list-subscriptions-by-topic --topic-arn "arn:aws:sns:us-east-1:000000000000:deal-topic"
{
"Subscriptions": [
{
"SubscriptionArn": "arn:aws:sns:us-east-1:000000000000:deal-topic:c804da4a-b12c-4203-83ec-78492a77b262",
"Owner": "",
"Protocol": "sqs",
"Endpoint": "http://localhost:4566/000000000000/deal_queue",
"TopicArn": "arn:aws:sns:us-east-1:000000000000:deal-topic"
},
{
"SubscriptionArn": "arn:aws:sns:us-east-1:000000000000:deal-topic:b47d8361-0717-413a-92ee-738d14043a87",
"Owner": "",
"Protocol": "sqs",
"Endpoint": "arn:aws:sqs:us-east-1:000000000000:deal_queue",
"TopicArn": "arn:aws:sns:us-east-1:000000000000:deal-topic"
}
Update 3:
I've cloned the project and ran some unit tests of the project for AmazonSQS bus configuration, consumers don't seem to work.
When I list subscriptions after the test run I can tell that Endpoints are incorrect.
...
{
"SubscriptionArn": "arn:aws:sns:us-east-1:000000000000:MassTransit_TestFramework_Messages-PongMessage:e16799c2-9dd3-458d-bc28-52a16d646de3",
"Owner": "",
"Protocol": "sqs",
"Endpoint": "arn:aws:sqs:us-east-1:000000000000:input_queue",
"TopicArn": "arn:aws:sns:us-east-1:000000000000:MassTransit_TestFramework_Messages-PongMessage"
},
...
Could it be that AmazonSQS for localstack has a major bug?
It's not clear how to use library with 'localstack' sqs, how to point out to actual endpoint (QueueUrl) of an SQS queue.

Whenever Publish is called in MassTransit, messages are published to SNS. Those messages are then routed to receive endpoints as configured. There is no need to understand SQS or SNS when using MassTransit with Amazon SQS/SNS.
In MassTransit, you create consumers, those consumers consume message types, and MassTransit configures topics/queues as needed. Any of the samples using RabbitMQ, Azure Service Bus, etc. are easily converted to SQS by changing UsingRabbitMq to UsingAmazonSqs (and adding the appropriate NuGet package).

Looks like your configuration is setup properly to publish, but there are probably at least a few reasons I can think of why you are not receiving messages:
Issue with the current version of localstack. I had to use 0.11.2 - see Localstack with MassTransit not getting messages
You are publishing to a different topic. Masstransit will create the topic using the name of the message type. This may not match the topic you configured on the receive endpoint. You can change the topic name by configuring the topology - see How can I configure the topic name when using MassTransit SQS?
Your consumer is not configured on the receive endpoint - see the example below
public static readonly AmazonSQSConfig AmazonSQSConfig = new AmazonSQSConfig { ServiceURL = "http://localhost:4566" };
public static AmazonSimpleNotificationServiceConfig AmazonSnsConfig = new AmazonSimpleNotificationServiceConfig {ServiceURL = "http://localhost:4566"};
...
services.AddMassTransit(x =>
{
x.UsingAmazonSqs((context, cfg) =>
{
cfg.Host(new Uri("amazonsqs://localhost:4566"), h =>
{
h.Config(AmazonSQSConfig);
h.Config(AmazonSnsConfig);
});
cfg.ReceiveEndpoint(queueName: "deal_queue", e =>
{
e.Subscribe("deal-topic", s => {});
e.Consumer<MessageConsumer>();
});
});
});
services.AddMassTransitHostedService(waitUntilStarted: true);
services.AddHostedService<Worker>();
From what I see in the docs about Consumers you should be able to add your consumer to the AddMastTransit configuration like your original sample, but it didn't work for me.

How to use Websocket (socket.io) with Ruby?

I need to implement WebSocket synchronization in our Rail project. MetaApi project's use Socket.Io as default support. Only found 2 projects (websocket-client-simple) and outdated with native socket.io. We try to implement this with Faye-Websocket and socketcluster-client-ruby but without success.
Code Example
import ioClient from 'socket.io-client';
const socket = ioClient('https://mt-client-api-v1.agiliumtrade.agiliumtrade.ai', {
path: '/ws',
reconnection: false,
query: {
'auth-token': 'token'
}
});
const request = {
accountId: '865d3a4d-3803-486d-bdf3-a85679d9fad2',
type: 'subscribe',
requestId: '57bfbc9f-108d-4131-a300-5f7d9e69c11b'
};
socket.on('connect', () => {
socket.emit('request', request);
});
socket.on('synchronization', data => {
console.log(data);
if (data.type === 'authenticated') {
console.log('authenticated event received, you can send synchronize now');
}
});
socket.on('processingError', err => {
console.error(err);
});

Socket.io protocol is a bit more complicated than a simple websocket connection, with the latter being only one of the used transports, see description in official repository. Websockets are used only after initial http handshake, so you need a somewhat full client.
I'd start with trying to consume events with a js client stub from browser, just to be sure the api is working as you expect and determine used and compatible socket.io versions (current is v4, stale ruby clients are mostly for v1). And you can peek into protocol in browser developer tools.
Once you have a successful session example and have read protocol spec above - it will be easier to craft a minimal client.

easiest way to schedule a Google Cloud Dataflow job

I just need to run a dataflow pipeline on a daily basis, but it seems to me that suggested solutions like App Engine Cron Service, which requires building a whole web app, seems a bit too much.
I was thinking about just running the pipeline from a cron job in a Compute Engine Linux VM, but maybe that's far too simple :). What's the problem with doing it that way, why isn't anybody (besides me I guess) suggesting it?

This is how I did it using Cloud Functions, PubSub, and Cloud Scheduler
(this assumes you've already created a Dataflow template and it exists in your GCS bucket somewhere)
Create a new topic in PubSub. this will be used to trigger the Cloud Function
Create a Cloud Function that launches a Dataflow job from a template. I find it easiest to just create this from the CF Console. Make sure the service account you choose has permission to create a dataflow job. the function's index.js looks something like:
const google = require('googleapis');
exports.triggerTemplate = (event, context) => {
// in this case the PubSub message payload and attributes are not used
// but can be used to pass parameters needed by the Dataflow template
const pubsubMessage = event.data;
console.log(Buffer.from(pubsubMessage, 'base64').toString());
console.log(event.attributes);
google.google.auth.getApplicationDefault(function (err, authClient, projectId) {
if (err) {
console.error('Error occurred: ' + err.toString());
throw new Error(err);
}
const dataflow = google.google.dataflow({ version: 'v1b3', auth: authClient });
dataflow.projects.templates.create({
projectId: projectId,
resource: {
parameters: {},
jobName: 'SOME-DATAFLOW-JOB-NAME',
gcsPath: 'gs://PATH-TO-YOUR-TEMPLATE'
}
}, function(err, response) {
if (err) {
console.error("Problem running dataflow template, error was: ", err);
}
console.log("Dataflow template response: ", response);
});
});
};
The package.json looks like
{
"name": "pubsub-trigger-template",
"version": "0.0.1",
"dependencies": {
"googleapis": "37.1.0",
"#google-cloud/pubsub": "^0.18.0"
}
}
Go to PubSub and the topic you created, manually publish a message. this should trigger the Cloud Function and start a Dataflow job
Use Cloud Scheduler to publish a PubSub message on schedule
https://cloud.google.com/scheduler/docs/tut-pub-sub

There's absolutely nothing wrong with using a cron job to kick off your Dataflow pipelines. We do it all the time for our production systems, whether it be our Java or Python developed pipelines.
That said however, we are trying to wean ourselves off cron jobs, and move more toward using either AWS Lambdas (we run multi cloud) or Cloud Functions. Unfortunately, Cloud Functions don't have scheduling yet. AWS Lambdas do.

There is a FAQ answer to that question:
https://cloud.google.com/dataflow/docs/resources/faq#is_there_a_built-in_scheduling_mechanism_to_execute_pipelines_at_given_time_or_interval
You can automate pipeline execution by using Google App Engine (Flexible Environment only) or Cloud Functions.
You can use Apache Airflow's Dataflow Operator, one of several Google Cloud Platform Operators in a Cloud Composer workflow.
You can use custom (cron) job processes on Compute Engine.
The Cloud Function approach is described as "Alpha" and it's still true that they don't have scheduling (no equivalent to AWS cloudwatch scheduling event), only Pub/Sub messages, Cloud Storage changes, HTTP invocations.
Cloud composer looks like a good option. Effectively a re-badged Apache Airflow, which is itself a great orchestration tool. Definitely not "too simple" like cron :)

You can use cloud scheduler to schedule your job as well. See my post
https://medium.com/#zhongchen/schedule-your-dataflow-batch-jobs-with-cloud-scheduler-8390e0e958eb
Terraform script
data "google_project" "project" {}
resource "google_cloud_scheduler_job" "scheduler" {
name = "scheduler-demo"
schedule = "0 0 * * *"
# This needs to be us-central1 even if the app engine is in us-central.
# You will get a resource not found error if just using us-central.
region = "us-central1"
http_target {
http_method = "POST"
uri = "https://dataflow.googleapis.com/v1b3/projects/${var.project_id}/locations/${var.region}/templates:launch?gcsPath=gs://zhong-gcp/templates/dataflow-demo-template"
oauth_token {
service_account_email = google_service_account.cloud-scheduler-demo.email
}
# need to encode the string
body = base64encode(<<-EOT
{
"jobName": "test-cloud-scheduler",
"parameters": {
"region": "${var.region}",
"autoscalingAlgorithm": "THROUGHPUT_BASED",
},
"environment": {
"maxWorkers": "10",
"tempLocation": "gs://zhong-gcp/temp",
"zone": "us-west1-a"
}
}
EOT
)
}
}

Spray to Akka HTTP migration: spray.can.client.request-timeout

I'm migrating an application from Spray to Akka HTTP. In the config, I have:
spray {
can {
client.request-timeout = infinite
}
}
What is the equivalent config for Akka HTTP? It appears that request-timeout is now only available on server, and not client.
See https://github.com/akka/akka-http/blob/master/akka-http-core/src/main/resources/reference.conf

From Akka-HTTP docs (http://doc.akka.io/docs/akka-http/current/scala/http/client-side/connection-level.html#timeouts)
Currently Akka HTTP doesn’t implement client-side request timeout
checking itself as this functionality can be regarded as a more
general purpose streaming infrastructure feature.
It should be noted that Akka Streams provide various timeout
functionality so any API that uses streams can benefit from the stream
stages such as idleTimeout, backpressureTimeout, completionTimeout,
initialTimeout and throttle. To learn more about these refer to their
documentation in Akka Streams (and Scala Doc).
Essentially the choice is left to the user to add timeout control to their client-side stream. For instance, in the example shown in the docs you could add a completionTimeout stage to achieve this
val responseFuture: Future[HttpResponse] =
Source.single(HttpRequest(uri = "/"))
.via(connectionFlow)
.completionTimeout(5.seconds)
.runWith(Sink.head)
And note that if you're after infinite timeout (as per your Spray config), that will come for free by not adding any timeout stage.

akka.http {
server {
idle-timeout = infinite
}
client {
idle-timeout = infinite
}
host-connection-pool {
idle-timeout = infinite
}
}
You can choose client section.

kinesis firehose endpoint missing

I have set-up firehose to collect data through agent and push it to elasticasearch. It works for a single record using pyhon code. But I am not able to send data using Kinesis Agent.
As per the documentation, there should be firehose and kinesis endpoints. But there is no such endpoint available.
https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html
{
"cloudwatch.emitMetrics": true,
"kinesis.endpoint": "https://your/kinesis/endpoint",
"firehose.endpoint": "https://your/firehose/endpoint",
"flows": [
{
"filePattern": "/tmp/app1.log*",
"kinesisStream": "yourkinesisstream"
},
{
"filePattern": "/tmp/app2.log*",
"deliveryStream": "yourfirehosedeliverystream"
}
]
}
I can not find firehose endpoint. What all I have is the Delivery stream name.

The documentation link you referenced has the value for the Firehose endpoint, but that wouldn't help you for your Kinesis endpoint.
The endpoints depend on the region you're writing to. The default for the Amazon Kinesis Agent is firehose.us-east-1.amazonaws.com.
https://docs.aws.amazon.com/firehose/latest/dev/writing-with-agents.html#agent-config-settings
Your best bet is to refer to the AWS Regions and Endpoints doc:
http://docs.aws.amazon.com/general/latest/gr/rande.html

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Kafka reactor consumer API - project-reactor

Related

MassTransit with SQS/SNS. Publish into SNS?

How to use Websocket (socket.io) with Ruby?

easiest way to schedule a Google Cloud Dataflow job

Spray to Akka HTTP migration: spray.can.client.request-timeout

kinesis firehose endpoint missing

Categories

Resources