Im curious to understand the implementation of GCP's PubSub. Although Pubsub seems to point to follow a Publish-Subscribe design pattern, it seems more close to AWS's SQS (queue) than AWS SNS (that use publish-subscribe model). Why is think this is, GCP's pubSub
Allows upto 10,000 subscriptions per project.
Allows filtering on subscriptions
It even allows ordering (beta) - which should involve a FIFA queue somewhere.
It exposes synchronous api for request/response pattern.
It makes me wonder if subscriptions in pub/sub are merely queues of SQS.
I would like your opinions on this comparison. The confusion is due to lack of implementation details on PubSub and the obvious name indicating a certain design pattern.
Regards,
The division for messaging in GCP is along slightly different lines than what you may see in AWS. GCP breaks down messaging into three categories:
Torrents: Messaging pipelines that are designed to handle large amounts of throughput on pipes that are persistent. In other words, one creates a new pipeline rarely and sends messages over it for long periods of time. The scaling pattern for torrents is a relatively small number of pipelines transmitting a lot of data. For this category, Cloud Pub/Sub is the right product.
Trickles: Messaging pipelines that are largely ephemeral or require broadcast to a very large number of end-user devices. These pipelines have a low throughput but the number of pipelines can be extremely large. Firebase Cloud Messaging is the product that fits into this category.
Queues: Messaging pipelines where one has more control over the end-to-end message delivery. These pipelines are not really high throughput nor is the number of pipelines large, but more advanced properties are supported, e.g., the ability to delay or cancel the delivery of a message. Cloud Tasks fits in this category, though Cloud Pub/Sub is also adopting features that make it more and more viable for this use case.
So Cloud Pub/Sub is the publish/subscribe aspects of SQS+SNS, where SNS is used as a means to distribute messages to different SQS queues. It also serves as the big-data ingestion mechanism a la Kinesis. Firebase Cloud Messaging covers the portions of SNS designed to reach end user devices. Cloud Tasks (and Cloud Pub/Sub, more and more) provide functionality of a single queue in SQS.
You are correct to say that GCP PubSub is close to AWS SQS. As far as I know, there is no exact SNS tool available in GCP, but I think the closest tool is GCM (Google Cloud Messaging). You are not the only one who has had this query:
AWS SNS equivalent in GCP stack
Related
Can anyone explain what is the benefit of adopting google cloud pub/sub service in a streaming pipeline?
I saw one of the event streaming pipeline example showcased, and it was using pub/sub to ingest the events data before connecting to the google cloud data flow service to transform it. Why does it not connect to the events data directly through data flow?
Thanks.
Dataflow will need a source to get the data from. If you are using a streaming pipeline you can use different options as a source and each of them will have its own characteristics that may fit your scenario.
With Pub/Sub you can easily publish events using a client library or directly the API to a topic, and it will guarantee at least once delivery of that message.
When you connect it with Dataflow streaming pipeline, you can have a resilient architecture (Pub/Sub will keep sending the message until Dataflow acknowledge that it has processed it) and a near real-time processing. In addition, Dataflow can use Pub/Sub metrics to scale up or down depending on the number of the messages in the backlog.
Finally, Dataflow runner uses an optimized version of the PubSubIO connector which provides additional features. I suggest checking this documentation that describes some of these features.
Documented from https://cloud.google.com/monitoring/api/v3/metrics#time-series
Metric data is collected on schedules that vary across monitored resources. Some data is regularly "pulled" by Stackdriver Monitoring from the monitored resources, and some data is "pushed" by applications, services, or the Stackdriver Monitoring agent.
I'd like to know how stackdriver collects data from Google Cloud Pub/Sub, what is the promised latency bound? I've tried creating a topic/subscription and publishing messages and watch how long until the metrics logged in stackdriver. On average it's about 1-2 minutes, but sometimes very slow, up to 5-8 minutes.
We don't currently document what the expectations are for this, in part because there's not a single answer and it depends on different factors. But we are aware that this is important to have, and are working on a clear way to communicate it. Stay tuned.
I'm also facing high latencies in PubSub monitoring with KEDA. It takes 2+ minutes for KEDA to start scaling UP the pods based on the Pub/Sub undelivered messages count provided by GCP's monitoring.
I'm currently working on a project which has a large amount of IAM users, each of whom need limited access to particular SQS queues.
For instance, let's say I have an IAM user named 'Bob' and an SQS queue named 'BobsQueue'. What I'd like to do is grant Bob full permission to manage his queue (BobsQueue), but I'd like to restrict his usage such that:
Bob can make only 10 SQS requests per second to BobsQueue.
Bob cannot make more than 1,000,000 SQS requests per month.
I'd essentially like to apply arbitrary usage restrictions to this SQS queue.
Any ideas?
From the top of my head none of the available AWS services offers resource usage limits at all, except if built into the service's basic modus operandi (e.g. the Provisioned Throughput in Amazon DynamoDB) and Amazon SQS is no exception, insofar the Available Keys supported by all AWS services that adopt the access policy language for access control currently lack such resource limit constraints.
While I can see your use case, I think it's actually more likely to see something like this see the light as an accounting/billing feature, insofar it would make sense to allow cost control by setting (possibly fine grained) limits for AWS resource usage - this isn't available either yet though.
Please note that this feature is frequently requested (see e.g. How to limit AWS resource consumption?) and it's absence actually allows to launch what Christofer Hoff aptly termed an Economic Denial of Sustainability attack (see The Google attack: How I attacked myself using Google Spreadsheets and I ramped up a $1000 bandwidth bill for a somewhat ironic and actually non malicious example).
Workaround
You might be able to achieve an approximation of your specification by facilitating Shared Queues with an IAM policy granting access to user Bob as outlined in Example AWS IAM Policies for Amazon SQS and monitoring this queue with Amazon CloudWatch in turn by Creating Amazon CloudWatch Alarms for one or more of the Amazon SQS Dimensions and Metrics you want to limit, e.g. NumberOfMessagesSent. Once the limit is reached you could revoke the IAM grant for user Bob for this shared queue until he is in compliance again.
Obviously it is not necessarily trivial to implement the 'per second'/'per-month' specification based on this metric alone without some thorough bookkeeping, nor will you be able to 'pull the plug' precisely when the limit is reached, rather need to account for the processing time and API delays.
Good luck!
The type of content isn't really important for this question, but let's just say I wanted to implement a (native mobile) shopping list app that allowed multiple users to collaborate on a shared list.
How are sync features like this usually implemented that work automatically (without explicit user interaction)? Is the preferred way to pull every few seconds to check for newer versions and update if necessary, or is it possible to push changes?
A polling solution would be (relatively) easy to implement I guess using something like AWS, Google App Engine or even from scratch on a LAMP stack and REST. But I'm worried about traffic resulting from continuous polling.
Would it be practical to try to implement this using push updates? If so, what technologies, services or design principles should I look into? Is something like this possible with AWS or Google App Engine? Or is pulling (and reducing traffic as much as possible) the way to go?
On app engine you should look into the channel API. From the overview:
The Channel API creates a persistent connection between your application and Google
servers, allowing your application to send messages to JavaScript clients in real time without the use of polling. This is useful for applications that are designed to update the user about new information immediately or where user input is immediately broadcast to other users. Some examples include collaborative applications, multi-player games, and chat rooms. In general, using Channel API is a better choice than polling in situations where updates can't be predicted or scripted, such as when relaying information between human users or from events not generated systematically.
You can use a few of Amazon Web Services to create an effective and responsive service.
If you check out the IOS SDK that you can download from AWS site, you can find in it an example for a service that is using such services: S3_SimpleDB_SNS_SQS_Demo
First you can use SQS, which is the queueing service, which has long polling that will help you to lower the number of requests.
Second you can use SNS, which is the notification (pub/sub) service. It is integrated with SQS, and you can subscribe queues to listen to notifications.
These services (and others) are accessible through the iOS SDK, as well as with other SDKs (Java, .NET, Android...) and REST and SOAP APIs.
I read in forum that while implementing any application using AMQP it is necessary to use fewer queues. So would I be completely wrong to assume that if I were cloning twitter I would have a unique and durable queue for each user signing up? It just seems the most natural approach and if not assign a unique queue for each user how would one design something like that.
What is the most used approach for web messaging. I see RabbitHUb and Rabbit WebHooks but Webhooks doesn't seem to be a scalable solution. i am working with Rails and my AMQP server as running as a Daemon.
In RabbitMQ, queues are quite cheap. They're effectively lightweight Erlang processes, and you can run tens to hundreds of thousands of queues on a single commodity machine (i.e. my laptop). Of course, each will consume a bit of RAM, but unused-recently queues will hibernate, so they'll consume as little memory as possible. In addition, if Rabbit runs low on memory for messages, it will page old messages to disk.
The above only applies to a single machine. RabbitMQ supports a form of lightweight clustering. When you join several Rabbit nodes into a cluster, each can see the queues and exchanges on the other nodes but each runs only its own queues. So, you'll be able to have even more queues! (to the limit of Erlang clusters, which is usually a few hundred nodes) So, a cluster forms a logical broker distributed over several machines; clients connect to it and use it transparently through any of the nodes.
That said, having a single durable queue for each user seems a bit strange: in AMQP, you cannot browse messages while they're on the queue; you may only get/consume messages which takes them off the queue and publish which adds the to the end of the queue. So, you can use AMQP as a message router, but you can't use it as a sort of message database.
Here is a thread that just talks about that: http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2009-February/003041.html