I'm looking into using Heroku as our platform instead of managing our own systems. We have a Ruby/Rails stack and use Resque as our background job processor. I'm evaluating addons such as RedisToGo and RedisGreen, but it looks like there's no secure transport layer for all the services. However, according to RedisGreen's FAQ it doesn't matter:
Do you offer an encrypted connection to your servers?
No. Most organizations working in EC2 or Heroku treat Amazon’s internal network as a “trusted” one, so transport-level security doesn’t make much sense. We recommend against transferring data that should be secure over the open Internet.
As an Ops guy, makes me feel a bit queasy to have unencrypted data transfers. On the other hand, they make a good point. If Amazon is considered a trust internal network, then we wouldn't have to worry about 3rd parties trying to sniff us out.
So my question: is it safe to use these add-ons if I'm on the Heroku/EC2 ecosystem?
I've used AWS for years without any problems and most AWS users don't seem to be malicious. Also, Amazon has a comprehensive monitoring solution for their infrastructure. For example, they would be able to tell if another customer is trying to hack into your server in a few minutes if not seconds. I believe AWS also doesn't allow promiscuous mode on their Virtual/Physical networking infrastructure.
However, you have to also see how secure you want to be about your data. If you want 100% security that no other user is going to sniff your data then encrypt your connections/data transfers. Although unlikely, other AWS users could potentially sniff the data if they are sharing the same ethernet segment.
The current recommendation is to use a secure proxy with Redis if you want to have SSL encryption of your traffic (see the debate at https://code.google.com/p/redis/issues/detail?id=71 for example). AFAIK, only Redis Cloud can offer that functionality among existing Heroku's Redis providers.
As for whether security is a requirement for Heroku apps and their add-on over AWS, that really has to do with your data's nature and the risk of it being read by a potentially malicious party. Just remember that even a very low risk is still a risk and no security mechanism is unbreakable, so it's basically a matter of how much you're willing to invest to make it harder for someone to mess with you stuff.
(Due diligence - I work at Garantia Data, the company operating Redis Cloud and Memcached Cloud.)
I consider this highly dangerous. Heroku, for example, suggests running apps locally for development and copying the config to do so:
https://devcenter.heroku.com/articles/heroku-local
What that implies is that your laptop will then do an unencrypted connection - potentially through public wifis and definitely through the public internet - to your Redis To Go instance. As such, whether AWS will allow sniffing on their network or not is then completely irrelevant.
Related
I'm designing some OSX/iOS apps that I'd like to share a resource to be hosted on a webserver. I would like to have some sort of web app or script that can store a list of subscribers, and to notify them when the resource is updated. (The obvious goal here is to avoid having every app poll the webserver for updates.)
The only trick here is that I'd like a significant number of clients (say, a dozen) to be subscribed for updates on a 24/7 basis. I'm not sure if it's a good idea for all of the clients to maintain a live connection... I imagine that many web service providers will be happy about their webserver maintaining a dozen persistent connections (especially if they're virtually always idle).
(Edit) I looked into the Apple Push Network Service (APNs), but it's not the right solution for my problem. APNs requires an Entrust SSL Certificate, and some heavy interaction with the Apple Push Network service. My project is much simpler and more lightweight: I just need a script that says, "Upon receiving data from Device A, push it out to Devices B/C/D" (presuming those devices are somehow accessible... either through a persistent connection or some other technique).
What's the absolute simplest way of providing this mechanism?
The "simplest way" probably means different things to different people. If you're not a fan of locking yourself into third party services then there's a veritable plethora of app frameworks and open source tools you could use to build something yourself. But this is hardly 'simple' if web app development isn't your strong point.
There are several 'off the shelf' services available to do real-time messaging on iOS: bear in mind I'm just listing the ones I know from memory, there are other alternatives. Pusher and PubNub both offer real-time messaging services for mobile apps, along with ready to go SDKs. You can interface with them to send messages bi-directionally via sockets (so similar to how APNS works, but with considerable more control).
You could use these services with your own device/user management system, or you could use a 'backend as a service' provider such as Parse or Stackmob - you may not need this step, it depends how complex your intended app/integration is.
XMPPFramework has a publish–subscribe module (for XEP-0060) which works with most XMPP servers. I've even adapted it to work with Chat Server which comes with Snow Leopard.
If you already have an XMPP server this might be worth doing; otherwise it's kind of a heavyweight solution.
I'm developing an application in erlang/elixir. I'd like to access Couchbase 2.0 from erlang. I found the erlmc project (https://github.com/JacobVorreuter/erlmc ) which is a binary protocol memcached client. The notes say "you must have a version 1.3 or greater of memcached."
I understand that Couchbase 2.0 uses memcached binary protocol for accessing data, and I'm looking for the best way to do this from erlang.
The manual talks about a "Couchbase API Port" on 8092, and calls the 11210 (close to the 11211 memcached normal port) as "internal cluster port".
http://www.couchbase.com/docs/couchbase-manual-2.0/couchbase-network-ports.html
So, the question is this:
Is setting up erlmc to talk to Couchbase 2.0 on port 8092 the correct way to go about it?
Erlmc talks about how it hashes keys to find the right server, which makes me think that it might be too old of a version of the memcached protocol (or is there a built in MOXI on couchbase 2.0 that I should be connecting to? If so which port?)
Which is the port for the erlang views? And presumably the REST interface for views does not support straight key lookups, so I'll need to write code to access that as well, right?
I'm keen to use a pure erlang solution since NIFs are not concurrent and I'll have some unknown number of processes wanting to access Couchbase 2.0 at the same time.
The last time I worked with Couch was CouchDB, and so I'm trying to piece things together after the merger of Couch and Membase.
If I'm off on the wrong track, please advise on the best way to access Couchbase 2.0 from erlang in a highly concurrant manner. The memcached protocol should be pretty solid, thus possibly libraries a couple years old should work, right?
Thanks!
The short answer is: yes, Couchbase is compatible with memcached text protocol.
But the key point here is "memcached text protocol". Since memcached is using two different protocol types (text and binary), you should use those clients that are using text protocol.
At Mochi, we are using merle for memcached, and looks like it should work for you. Recently, one of my colleagues forked it and made some minor corrections: https://github.com/twonds/merle
Also, consider taking a look at https://github.com/EchoTeam/mcd. This client could use some refactoring, but is also production proven and even allows simple sharding.
Thanks to Xavier's contributions, I refactored the whole thing added pooling, now it builds and performs okay. I also included a basho_bench driver so you can benchmark it yourself. You can find the code on here . I am pretty sure this would perform better than text protocol.
I had to create own vbucket aware erlmc based erlang couchbase client.
The differences:
- http connection to retrieve vbucket map from couchbase
- fill two "reserved" bytes with vbucket id (see python client for example)
- active once async tcp connection for performance reason
The only answer I have so far is:
https://github.com/chitika/cberl
This project is based on the C++ "official" couchbase client.
It seems to have two possible problems:
1) it might be abandoned (last activity was 3 months ago)
2) it uses an NIF, which as I understand it, cannot be accessed concurrently.
We don't use Couchbase with Erlang, but with Python, which also needs to connect with a memcache client. I can't speak to the Erlang libraries specifically, but hopefully the lessons apply in both situations.
Memcache Client Limitations
Memcache clients can only access memcache functionality. You won't be able to use views or any other features not specified in the memcache protocol. If you want access to the views, you will need to use the REST protocol separately on port 8092 (docs).
Connecting to Couchbase with Vanilla Memcache Clients
The ports mentioned on that page are used either internally or by "smart" clients written for Couchbase specifically. By default, memcache clients can connect to the normal memcache port 11211 on any of the nodes in your Couchbase cluster. Do not use the memcache cluster features of any memcache client not written specifically for Couchbase; the usual methods of distribution for vanilla memcached are incompatible with Couchbase.
Explanation
In order to connect with the memcached client, you need to connect to port for the Couchbase bucket directly. When you set up a new bucket, you specify the port you want the bucket to be accessible on. The default bucket is setup on port 11211. Each bucket acts like an independent memcached instance, but is internally distributed to all nodes in the cluster. You can connect to the bucket port on any of the Couchbase servers, and you will be accessing the same data set.
This means that you should not try to use the distributed memcache features of your memcache client. Those features are designed for ad-hoc memcached clusters. Just connect to the appropriate port on the Couchbase server as if it was a single memcached server.
The reason this is possible is because there is a Moxi instance which finds the appropriate Couchbase server to process the request. This Moxi instance automatically runs for each bucket on every Couchbase server. Even though you may not be connected to the node which has your specific key, Moxi will transparently direct your request to the appropriate server.
In this way, you can use a vanilla Memcache client to talk to Couchbase, without needing any additional logic to keep track of cluster topology. Moxi takes care of that piece for you.
Binary protocol
We did have the binary protocol working at one point, but there were problems when we tried to use the flush_all command. That was a while ago, though. I suggest experimenting yourself to see if the level of support meets your needs.
I have a TCP/IP based component which is communicating with a c++ based system. In fact it is reading raw bytes from that system and then marshaling those raw bytes in objects and storing it in the DB. This multi-threaded tcp/ip based component is in java and could be deployed on a dual core or quad core processor (not sure if its important for my question but nevertheless a detail I am giving). Now I have a few questions:
How can I scale this tcp/ip based component. This component is deployed on a server and is listening to a port. In future if there's more data that is envisaged at this point that comes from the C++ system we should be able to scale this java component.
What about security. One thing which I can probably do is employ this communication on secure sockets or probably get encrypted data (any particular encryption that I could use here??). Any other way to take care of security?
There is also a requirement of high availability to be satisfied. How do I handle that? How could I possible have redundancy here?
Yes, we are working on the system architecture of a product and therefore, I was wondering if some experienced architect or designer could help me.
How can I scale this tcp/ip based component. This component is deployed on a server and is listening to a port. In future if there's more data that is envisaged at this point that comes from the C++ system we should be able to scale this java component.
You normally use a network load-balancer to scale these kind of services across multiple servers. That load-balancer can distribute load using a variety of algorithms, such as:
CPU load (usually measured with snmp)
Client ip address (if you need persistence when mapping clients to your services)
Number of active sockets
etc
Look at HAProxy for a popular open-source load-balancer. F5 has the most popular commercial load-balancer solution.
What about security. One thing which I can probably do is employ this communication on secure sockets or probably get encrypted data (any particular encryption that I could use here??). Any other way to take care of security?
As mentioned, SSL is an option, but understand that is a big performance hit on your services if you encrypt on the same hardware that is performing your customer services. One option along these lines is using a commercial load-balancer that implements SSL in hardware; that load-balancer would then forward unencrypted sockets to your TCP services farm.
Under some circumstances you can use IPSec network-level encryption; often, this is another network hardware solution. Typically your clients will download an IPSec application that resides on their PC... then they make a connection into your IPSec server, which encrypts between their client and your IPSec termination point
SSH Tunneling with port-forwarding (low-tech solution)
tcpcrypt looks interesting as a future technology, but I'm not sure how mature it is right now.
There is also a requirement of high availability to be satisfied. How do I handle that? How could I possible have redundancy here?
A lot depends on what you mean by high availability, and what kind of recovery timing you need. At a high level, you have a few options:
DNS-based HA works if you don't need client to socket mapping persistence; if you use DNS, you need to be willing to accept typical DNS A-record timeouts (usually people don't go lower than ~5 minutes / 300 seconds). This also assumes you find a way to synchronize your databases across multiple sites.
Load-balancer solutions. Same issue with synchronizing back-end databases
To do any kind of HA, you probably want to hire a consultant that has a proven track record of implementing these services (if you don't have this kind of resource in-house).
I am looking for a dedicated server because shared webhosting solutions have some limitations.
I am going to start with one appliation (web server + db) but in the future I will need more resources for more applications. I am starting small so the price is very important right now the quality is more important though.
The requirements are like (not sure what I forgot)
scalable hw resources (memory, hdd, bandwith)
linux/unix based
able to install programs
ssh
ssl/https
backup solution?
unlimited number of outgoing emails
'simple scripts' ?
server user management
Update
Does the location of the server matters as I want to target my 'visitors' world wide?
Well I don't know where you are from and if it matters to you where the server's at. But I am very happy with swiss based hostfactory (I host some ecommerce solutions there). The support team reacts very fast and you'll get full control of the server (rdp access on windows, shell access on linux).
Check it out here: hostfactory
Hardware resources are scalable via the web interface.
Yes - location matters. If you are going with just one server location, you need to make your best guess as to where most of your visitors are going to come from.
The plumbing of the internet tends to be US centric, so if you are not sure, and have no legal restrictions on where your data can live, that may be your best (and often cheapest) option.
I went for linode
I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive