How do I separate out SQS queues for different environments in Mass Transit? - amazon-sqs

I have pointed mass transit at my AWS account and given it a "scope". This creates one SNS topic per scope. Here's my config with the scope set to "development":
container.AddMassTransit(config =>
config.AddConsumer<FooConsumer>(typeof(FooConsumerDefinition));
config.UsingAmazonSqs((amazonContext, amazonConfig) =>
{
amazonConfig.Host(
new UriBuilder("amazonsqs://host")
{
Host = "eu-north-1"
}.Uri, h =>
{
h.AccessKey("my-access-key-is-very-secret");
h.SecretKey("my-secret-key-also-secret");
h.Scope("development", true);
});
amazonConfig.ConfigureEndpoints(amazonContext);
});
);
// Somewhere else:
public class FooConsumerDefinition : ConsumerDefinition<FooConsumer>
{
public FooConsumerDefinition ()
{
ConcurrentMessageLimit = 1;
// I used to set EndpointName, but I stopped doing that to test scopes
}
}
If I change the scope and run it again, I get more SNS topics and subscriptions, which are prefixed with my scope. Something like:
development_Namespace_ObjectThatFooRecieves
However, the SQS queues aren't prefixed and won't increase in number.
Foo
The more scopes you run, the more SNS subscriptions 'Foo' will get. And on top of that, if I start a consuming application that is configured to say "development" it will start consuming all the messages for all the different scopes. So as a consequence, I'm not getting any environment separation.
Is there anyway to have these different topics feeding different queues? Is there anyway to neatly prefix my queues alongside my topics?
In fact, What's the point of the "Scope" configuration if it only separates out topics, and then after that they all go to the same queue to get processed indiscriminately?
NB. I don't think the solution here is to just use a separate subscription. That's significant overhead, and I feel like "Scope" should just work.

MassTransit 7.0.4 introduced new options on the endpoint name formatter, including the option to include the namespace, as well as add a prefix in front of the queue name. This should cover most of your requirements.
In your example above, you could use:
services.AddSingleton<IEndpointNameFormatter>(provider =>
new DefaultEndpointNameFormatter("development", true));
That should give you what you want. You will still need to specify the scope as you have already done, which is applied to SNS topics names.
SQS support for MassTransit has a long history, and teams hate breaking changes. That is why these aren't the new defaults as they'd break existing applications.

Related

Creating different types of workers that are accessed using a single client

EDIT:
My question was horrifically put so I delete it and rephrase entirely here.
I'll give a tl;dr:
I'm trying to assign each computation to a designated worker that fits the computation type.
In long:
I'm trying to run a simulation, so I represent it using a class of the form:
Class Simulation:
def __init__(first_Client: Client, second_Client: Client)
self.first_client = first_client
self.second_client = second_client
def first_calculation(input):
with first_client.as_current():
return output
def second_calculation(input):
with second_client.as_current():
return output
def run(input):
return second_calculation(first_calculation(input))
This format has downsides like the fact that this simulation object is not pickleable.
I could edit the Simulation object to contain only addresses and not clients for example, but I feel as if there must be a better solution. For instance, I would like the simulation object to work the following way:
Class Simulation:
def first_calculation(input):
client = dask.distributed.get_client()
with client.as_current():
return output
...
Thing is, the dask workers best fit for the first calculation, are different than the dask workers best fit for the second calculation, which is the reason my Simulation object has two clients that connect to tow different schedulers to begin with. Is there any way to make it so there is only one client but two types of schedulers and to make it so the client knows to run the first_calculation to the first scheduler and the second_calculation to the second one?
Dask will chop up large computations in smaller tasks that can run in paralell. Those tasks will then be submitted by the client to the scheduler which in turn wil schedule those tasks on the available workers.
Sending the client object to a Dask scheduler will likely not work due to the serialization issue you mention.
You could try one of two approaches:
Depending on how you actually run those worker machines, you could specify different types of workers for different tasks. If you run on kubernetes for example you could try to leverage the node pool functionality to make different worker types available.
An easier approach using your existing infrastructure would be to return the results of your first computation back to the machine from which you are using the client using something like .compute(). And then use that data as input for the second computation. So in this case you're sending the actual data over the network instead of the client. If the size of that data becomes an issue you can always write the intermediary results to something like S3.
Dask does support giving specific tasks to specific workers with annotate. Here's an example snippet, where a delayed_sum task was passed to one worker and the doubled task was sent to the other worker. The assert statements check that those workers really were restricted to only those tasks. With annotate you shouldn't need separate clusters. You'll also need the most recent versions of Dask and Distributed for this to work because of a recent bug fix.
import distributed
import dask
from dask import delayed
local_cluster = distributed.LocalCluster(n_workers=2)
client = distributed.Client(local_cluster)
workers = list(client.scheduler_info()['workers'].keys())
with dask.annotate(workers=workers[0]):
delayed_sum = delayed(sum)([1, 2])
with dask.annotate(workers=workers[1]):
doubled = delayed_sum * 2
# use persist so scheduler doesn't clean up
# wrap in a distributed.wait to make sure they're there when we check the scheduler
distributed.wait([doubled.persist(), delayed_sum.persist()])
worker_restrictions = local_cluster.scheduler.worker_restrictions
assert worker_restrictions[delayed_sum.key] == {workers[0]}
assert worker_restrictions[doubled.key] == {workers[1]}

Rails 6 with multiple database, auto change connection based on read or create query

The question might be silly and it's not practiced in real world. Anyway kindly give your thoughts/pros/cons....
Lets say I am having two database read replica database and master database
Scenario 1:
Model.all # It should query from read replica database
Scenario 2:
Model.create(attributes) # It should create data in master database
Scenario 3:
Model.where(condition: :some_condition).update(attributes) # It should read data from replica database and update the data in master database
Note: In runtime database should detect the query and process the above 3 scenario.
Questions:
Is this a valid expectation?
if Yes, How to achieve this case completely or partially?
if No, What wrong in this case and what issues we will be facing?
Rails 6 provides a framework for auto-routing incoming requests to either the primary database connection, or a read replica.
By default, this new functionality allows your app to automatically route read requests (GET, HEAD) to a read-relica database if it has been at least 2 seconds since the last write request (any request that is not a GET or HEAD request) was made.
The logic that specifies when a read request should be routed to a replica is specified in a resolver class, ActiveRecord::Middleware::DatabaseSelector::Resolver by default, which you would override if you wanted custom behavior.
The middleware also provides a session class, ActiveRecord::Middleware::DatabaseSelector::Resolver::Session that is tasked with keeping track of when the last write request was made. Like the resolver, this class can also be overridden.
To enable the default behavior, you would add the following configuration options to one of your app's environment files - config/environments/production.rb for example:
config.active_record.database_selector = { delay: 2.seconds }
config.active_record.database_resolver =
ActiveRecord::Middleware::DatabaseSelector::Resolver
config.active_record.database_operations =
ActiveRecord::Middleware::DatabaseSelector::Resolver::Session
If you decide to override the default functionality, you can use these configuration options to specify the delay you'd like to use, the name of your custom resolver class, and the name of your custom session class, both of which should be descendants of the default classes

Pass dynamic data to an exported resource

For my work, we are trying to spin up a docker swarm cluster with Puppet. We use puppetlabs-docker for this, which has a module docker::swarm. This module allows you to instantiate a docker swarm manager on your master node. This works so far.
On the docker workers you can join to docker swarm manager with exported resources:
node 'manager' {
##docker::swarm {'cluster_worker':
join => true,
advertise_addr => '192.168.1.2',
listen_addr => '192.168.1.2',
manager_ip => '192.168.1.1',
token => 'your_join_token'
tag => 'docker-join'
}
}
However, the your_join_token needs to be retrieved from the docker swarm manager with docker swarm join-token worker -q. This is possible with Exec.
My question is: is there a way (without breaking Puppet philosophy on idempotent and convergence) to get the output from the join-token Exec and pass this along to the exported resource, so that my workers can join master?
My question is: is there a way (without breaking Puppet philosophy on
idempotent and convergence) to get the output from the join-token Exec
and pass this along to the exported resource, so that my workers can
join master?
No, because the properties of resource declarations, exported or otherwise, are determined when the target node's catalog is built (on the master), whereas the command of an Exec resource is run only later, when the fully-built catalog is applied to the target node.
I'm uncertain about the detailed requirements for token generation, but possibly you could use Puppet's generate() function to obtain one at need, during catalog building on the master.
Update
Another alternative would be an external (or custom) fact. This is the conventional mechanism for gathering information from a node to be used during catalog building for that node, and as such, it might be more suited to your particular needs. There are some potential issues with this, but I'm unsure how many actually apply:
The fact has to know for which nodes to generate join tokens. This might be easier / more robust or trickier / more brittle depending on factors including
whether join tokens are node-specific (harder if they are)
whether it is important to avoid generating multiple join tokens for the same node (over multiple Puppet runs; harder if this is important)
notwithstanding the preceding, whether there is ever a need to generate a new join token for a node for which one was already generated (harder if this is a requirement)
If implemented as a dynamically-generated external fact -- which seems a reasonable option -- then when a new node is added to the list, the fact implementation will be updated on the manager's next puppet run, but the data will not be available until the following one. This is not necessarily a big deal, however, as it is a one-time delay with respect to each new node, and you can always manually perform a catalog run on the manager node to move things along more quickly.
It has more moving parts, with more complex relationships among them, hence there is a larger cross-section for bugs and unexpected behavior.
Thanks to #John Bollinger I seem to have fixed my issue. In the end, it was a bit more worked than I envisioned, but this is the idea:
My puppet setup now uses PuppetDB for storing facts and sharing exported resources.
I have added an additional custom fact to the code base of Docker (in ./lib/facter/docker.rb).
The bare minimum in the site.pp file, now contains:
node 'manager' {
docker::swarm {'cluster_manager':
init => true,
advertise_addr => "${::ipaddress}",
listen_addr => "${::ipaddress}",
require => Class['docker'],
}
##docker::swarm {'cluster_worker':
join => true,
manager_ip => "${::ipaddress}",
token => "${worker_join_token}",
tag => "cluster_join_command",
require => Class['docker'],
}
}
node 'worker' {
Docker::Swarm<<| tag == 'cluster_join_command' |>> {
advertise_addr => "${::ipaddress}",
listen_addr => "${::ipaddress}",
}
}
Do keep in mind that for this to work, puppet agent -t has to be run twice on the manager node, and once (after this) on the worker node. The first run on the manager will start the cluster_manager, while the second one will fetch the worker_join_token and upload it to PuppetDB. After this fact is set, the manifest for the worker can be properly compiled and run.
In the case of a different module, you have to add a custom fact yourself. When I was researching how to do this, I added the custom fact to the LOAD_PATH of ruby, but was unable to find it in my PuppetDB. After some browsing I found that facts from a module are uploaded to PuppetDB, which is the reason that I tweaked the upstream Docker module.

How can RabbitMQ Shovel be configured to overwrite the timestamp property with the current time?

For instance:
{myshovel, [
{sources, ...}
, {destinations, ...}
, {queue, <<>>}
, {ack_mode, on_confirm}
, {publish_properties, [
{delivery_mode, 2}
, {timestamp, now} % this is the line I need to understand how to write
]}
, {publish_fields, [{exchange, <<"">>}, {routing_key, <<"">>}]}
, {reconnect_delay, 5}
]}
I'm curious how to write the publish_properties in a way so that RabbitMQ Shovel overwrites the timestamp with the current time (as in when the shovel receives the message and shovels it onto the destination queue).
Unfortunately it is not possible to configure the shovel in this way at the time of writing. The shovel configuration, including the publish_properties for forwarded messages, is read when the shovel workers start, and can only contain static content. So whatever value you put into {publish_properties, [{timestamp, TimeStamp}]} will be passed directly to the erlang-client, which is turn will attempt to serialise these (using the amqp_ framing layer).
We are currently planning some improvements to the shovel plugin (such as cluster-wide fail-over and dynamic reconfiguration) and you're not the first person to ask for this feature, so we will consider whether it makes sense to support something specific here (such as setting new timestamps for each processed message) or a general purpose approach to configuring shovel worker's runtime behaviour.

Ejabber structures and roster

I'm a new to ejabberd but the first thing I noticed is the completely absence of documentation and code comments.
I have many doubts, but the main are:
inside the record jid what is the difference between user and luser, server and lserver, ... and ...?
-record(jid, {user, server, resource,
luser, lserver, lresource}).
what is useful for the record iq?
-record(iq, {id = "",
type,
xmlns = "",
lang = "",
sub_el}).
what is a subscription inside ejabber? a relation between two users?
what is the jid inside the roster?
I know that these questions can be also quite stupid, but I don't really know how to understand without asking, thanks
what is the difference between user and luser?
luser,lserver and lresource are the corresponding parts of the jid after being processed with the appropiate stringprep profile. See https://www.rfc-editor.org/rfc/rfc3920#section-3 . In short, inside ejabberd you will most likely always use the processed versions, and the raw ones only when serializing the JID back to the wire.
what is useful for the record iq?
it make it easier to match on the IQ namespace, id or type (get|set|error) than to retrieve that info from the underling xml each time.
what is a subscription inside ejabber? a relation between two users?
basically, yes. A subscription from user A to user B means A is interested in B presence. But the subscription can be in different states (as the other user has to accept it, etc.). See http://xmpp.org/rfcs/rfc3921.html#sub .
what is the jid inside the roster?
sorry, didn't understand you on that, what do you want to know?

Resources