Cost of continuous replications vs one-shot replications (using TouchDB and Cloudant) - ios

We have an app that uses Cloudant as a remote server. Nevertheless, Cloudant is not completely compatible with TouchDB's continuous replications from previous experience. So our alternative for now is to trigger manually one-shot replications at a fixed frequency. Nevertheless, we would like to know if that approach is going to cost us more money than continuous replications, since continuous replications use longpoll and doesn't need to query the server often. In other words, does one-shot pull replications with Cloudant as the target cost us a GET request?
Thank you,
Paul

I think the issue you refer to is [1].
Cloudant's replication is 100% compatible with CouchDB. In this
instance, TouchDB's logs indicate the iOS network stack passed
on incomplete JSON to TouchDB. It's not clear who was to blame
in this case for the replication failure.
[1] https://github.com/couchbaselabs/TouchDB-iOS/issues/241
For the cost question, a one-shot pull replication will result in a GET to the _changes
feed each time it happens, plus the other requests required to
replicate. This _changes request will be counted as a light
HTTP request against your Cloudant account.
However, whether this works out as more or fewer requests overall
depends on the number of changes coming down from the remote server.
It's also important to remember that the number of _changes calls are very small
relative to the number of other calls involved (e.g., getting the
content of the changes themselves and particularly if there are many
attachments).
While this question is specific to TouchDB, and I mention specific
behaviours of that codebase, this answer deals with the requests involved
in replication between any two systems speaking the CouchDB replication
protocol[2].
[2] http://www.dataprotocols.org/en/latest/couchdb_replication.html
Let's take a contrived example: 1 update per 10 second window to
the source database for the replication, where a TouchDB database
is the target. Let's take a 5 minute poll vs. a continuous replication.
For simplicity of call-counting, let's also take attachments out of the
picture. We'll also assume the device has a constant network connection.
For the continuous case, every 10s TouchDB will receive an update in
the _changes feed. This causes the longpoll connection to close.
TouchDB then runs through the changes, requesting the updates from the
source database; one or more GET requests on the remote server. While
this is happening, TouchDB has to open up another longpoll request
to _changes. So in a five minute period, you'd end up with perhaps
30 calls to _changes, plus all the calls to get documents and record
checkpoints.
Compare this with a one-shot replication every five minutes. You'd
receive notification of the 30 updates in one _changes feed call.
TouchDB implements an optimisation[3] whereby it will call _all_docs
to get updated documents for 1- revs, so you might end up with a single
call to get all 30 documents (not possible in the continuous case as
you've received a single change). Then you've the checkpoint documents
to record. At best fewer than 5 HTTP calls, at most about a third of
the continuous case as you've avoided extra _changes requests.
[3] https://github.com/couchbaselabs/TouchDB-iOS/wiki/Replication-Algorithm#performance
It comes down to the frequency of updates you expect to the source
database. One-shot replication is likely to provide a smoother price
curve as you're in better control of the number of requests you make.
A further question is how often connections will drop because of the
network disconnects which happen regularly with mobile devices.
TouchDB's continuous replications will fire back up each time the
user comes on line (if added via the _replicator database). This is a
further source of unpredictable costs.
However, the benefits from more immediate visibility of changes may
certainly be worth the uncertainty.

Related

FreeRadius accounting altering/updating sessions start times after a day, weeks and in some cases months

This might be a very specific problem or just ignorance from my side, but I don't seem to figure it out.
Within our organization, we have a FreeRadius Accounting system logging sessions from Wi-Fi usage. Our team is responsible for the data analysis of this accounting data.
Recently, we had to dump the Radius Accounting Database and made a freeze frame of it. While doing so we found a weird behavior.
Running the same query before and after the dump (a query that retrieves the total amount of sessions for a single day) gave a different amount. Around a difference of 5-10%.
Looking a bit deeper we discovered that several updates were being issued that altered the start time of sessions after they had been first registered in the accounting database.
We then found that previous data we collected had disparity after weeks or months even (with the discrepancy being around 2-10%).
TLDR:
Does FreeRadius adjust the start times of sessions based on some maintenance? Are WiFi controllers allowed to do this? Is it a bug?
Overal we just want to understand the rationale so we can justify the data and adjust our processing correctly, as currently, we cannot trust the values we collect daily or even weekly on these stats!
Any help or insight would be great!!!
FreeRADIUS only updates the database as a result of data in an incoming RADIUS packet, using the SQL queries in the local configuration. The only real way to understand this is to look at your SQL queries, and incoming requests (via radiusd -X) and see what is making changes to the data. It is possible that the NAS is broken and sending invalid or changing data, or possibly re-using session IDs which overwrite existing records.
It is also possible to configure FreeRADIUS to create a "fake" accounting start entry in the database in post-auth, which will then be updated when the real Start packet arrives. If you are doing this then you should check the values that are being written, and also if the session never starts up (or the Start is lost) then bad things might happen.
But in all circumstances the only solution you really have is to look at the debug output and see what is happening and why data is being written in the way that it is. There is nothing in FreeRADIUS that randomly updates the database without being sent that data from the NAS.

iOS Network Connection Failure Policy suggestions

I'm looking for suggestions to the best way to handle network connection issues for an iPhone app (iOS9/Swift2/Xcode7), to give the best user experience since we know that mobile data networks are unreliable. I have my coding options in place but I'd like to know what's worked well for other experienced techs. There's lots of info out there but nothing I could find specific to a strategy to occur when there is a connection failure.
Here is my basic strategy dealing with failed connections I'd like to implement (along with questions):
App sends request to api.myserver.com and the request fails
Wait X second(s) and try request to api.myserver.com again (how many tries and at what time interval would you suggest?)
Try pinging some other server (i.e. google.com) to see if we can access a resource other than api.myserver.com
If we can successfully ping google.com then we know our internet is working, so we try once again to ping api.myserver.com
If this last ping fails then we alert the user that we can't communicate for some reason and to try again later
I'm using the philosophy outlined in this SO answer recommended by an Apple tech, which in general means you always check the connection to your server first, using Reachability as a separate check to ensure phone hardware is available.
At any time during this process if Reachability is false then we would put our request in a queue to be tried again when the phone hardware connection was restored.
I think I've got a handle on the code involved, but looking for insights like "this is what worked for our app and gives a good user experience during connection issues...and was approved for use in the Apple app store...". I understand the concepts of trying/retrying connections in the case of failure and alerting the user (currently my code already does this successfully), but still not solid on a good policy to use for how many times should I try to reconnect and at what intervals?
For most of the apps I have worked on it was useful to define a couple of categories of requests which have different rules. For each category consider if retries are appropriate and how long you can really afford to wait before considering the request(s) a failure.
At the most sensitive are blocking requests, things which the user must allow to complete before they can proceed. Sign in, checkout, some editing actions, etc. For these it is often not worth retrying(1) and failures need to be communicated to the user immediately: if the device is offline let the user decide when to try again, if the request fails you've probably already made the user wait too long. Since failures tend to block the user they usually also need to be communicated prominently.
Less sensitive are usually use initiated but non-blocking actions: pull-to-refresh, loading details of a selected collection item, or performing a search. Your user might be waiting to see the results but is probably free to give up or navigate elsewhere in the app and check back later. Failures still need to be communicated so users can choose to try again or at least know to stop waiting but the notification of those failures can be less prominent. Here retries start to make sense. I usually start by trying to define a time limit from the user's perspective, how long will they wait before the app feels broken, and then let that be your limit for how long a request can wait for connectivity or make any number of retries in response to failed connections.
Even less sensitive are requests triggered only indirectly by your user; polling for updates, loading non-essential images, warming caches. These you might retry but the impact of failure is often so low that it may not matter.
Of all of those requests your retry policy really only impacts #2 so I would make sure you actually have requests of that type before worrying about it. Assuming those do actually apply to your app...
Wait X second(s) and try request to api.myserver.com again (how many tries and at what time interval would you suggest?)
I would set some interval here (in the tens to hundreds of milliseconds depending on your normal api performance) to avoid an accidental flood of requests. I don't want to suggest a precise number when I don't have a solid justification for it.
My experience has been that optimizing this value is unlikely to make a perceptible difference to your users because requests often take hundreds of milliseconds to fail and users are only willing to wait for a few thousand milliseconds so making 1 or 5 or 10 requests in that time doesn't really change the final outcome. If you are able to set different expectations with your users then your results may vary.
Try pinging some other server (i.e. google.com) to see if we can access a resource other than api.myserver.com
If we can successfully ping google.com then we know our internet is working, so we try once again to ping api.myserver.com
I would not assume that this is true nor do I think that making an extra request to a third party will help you make useful predictions about when to attempt to reach your own systems. This seems like extra work to build and maintain and likely to be a source of misleading results more than valuable information. In what scenario do you imagine this provides useful information to your app or its user?
Maybe not the answer you're looking for, hopefully it's still useful.
Disclaimer: my experience is biased toward apps with a fairly simple set of REST or RPC style network requests. If you're working on a problem which calls for streaming data, P2P connections, or some other scenario then don't start with these assumptions.
(1) One end note here because I see it as a source of failures so often: These requests should really be idempotent. Yes, even those POSTs creating new resources, checking out your cart, or whatever. When you cannot safely repeat a request you'll eventually see cases where the request completed but the client never got the acknowledgement so it looked like a failure. It's much easier to recover through a retry (automatic or user triggered) of the same request than to detect and recover from duplicate requests.
For better network performance. In my application I use to ping Google server for before every API request if its reachable then I called my server API else no network alert.
If you are on wifi network then also you have to do the same, because wifi reachability only checks for wifi connectivity not for network access.

How do I spread out load on my rails app from webhook responses?

I have a rails app that easily handles the traffic we currently experience, except once a day when we receive a large number of pings within a few seconds from an external service's webhook that is reporting on past transactions. Currently this causes the app to time out due to lack of db connection availability, meaning we lose some of the webhooks as well as bringing the site down for a few seconds. It's not important that the data contained in these webhooks be processed instantaneously, so I am looking for a good way to spread out the responses, rather than do an expensive upgrade just to handle these bursts with additional db connection capability.
Is it okay to just have the relevant controller method sleep for a small, random number of seconds before doing anything that would open a db connection to spread things out? Or is there a better way to do this?
Setup a background/async processing system like Sidekiq (or whatever Heroku offers). Modify your controller action to do nothing but shove the parameters into a background job and return "ok". Then process the job in the background.

Ruby on Rails performance on lots of requests and DB updates per second

I'm developing a polling application that will deal with an average of 1000-2000 votes per second coming from different users. In other words, it'll receive 1k to 2k requests per second with each request making a DB insert into the table that stores the voting data.
I'm using RoR 4 with MySQL and planning to push it to Heroku or AWS.
What performance issues related to database and the application itself should I be aware of?
How can I address this amount of inserts per second into the database?
EDIT
I was thinking in not inserting into the DB for each request, but instead writing to a memory stream the insert data. So I would have a scheduled job running every second that would read from this memory stream and generate a bulk insert, avoiding each insert to be made atomically. But i cannot think in a nice way to implement this.
While you can certainly do what you need to do in AWS, that high level of I/O will probably cost you. RDS can support up to 30,000 IOPS; you can also use multiple EBS volumes in different configurations to support high IO if you want to run the database yourself.
Depending on your planned usage patterns, I would probably look at pushing into an in-memory data store, something like memcached or redis, and then processing the requests from there. You could also look at DynamoDB, which might work depending on how your data is structured.
Are you going to have that level of sustained throughput consistently, or will it be in bursts? Do you absolutely have to preserve every single vote, or do you just need summary data? How much will you need to scale - i.e. will you ever get to 20,000 votes per second? 200,000?
These type of questions will help determine the proper architecture.

Executing large numbers of asynchronous IO-bound operations in Rails

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
http://github.com/defunkt/resque
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.

Resources