How to shard existing key-value storage? - scalability

Let's suppose that we have 3Gb key-value storage on server A. I'm starting to feel that we need another server (server B). So, I have to separate server A data over shards (server A, server B), but... All keys on server A currently represented as is (for example, comment_ids:user_id:10).
Question #1: What is the best practice to hash current key names and separate all data over shards?
Question #2: What is the best practice of adding additional servers to shard's row?
PS: Sorry for my English, but I hope that my answer is clear for you.
Thank you.
PS: I've marked this question with redis tag, but really it's not about redis at all, but all key-value storages.

Consistent hashing tends to be a good choice http://en.wikipedia.org/wiki/Consistent_hashing

Related

Rails+React/Next.js: how to store code blocks so formatting persists?

tl;dr How should I approach storing code blocks in a react + rail application? If I were to store the code block data in the rails backend, which datatype should I store it as? And if on the frontend, would mdx files be the best solution?
I’m building a programming quiz application where a question has many answers and each answer (only one is correct) has an explanation. The question consists of the question itself and a code block, similar to what’s circled in orange in this wireframe.
As I want to practice building Rails+React (Next.js) applications, I thought that the questions would be stored on the backend. However, is that a good idea? If so, I’m wondering about what would be a possible way to store the code snippets given the Rails datatypes?
Alternatively, I was also considering storing all the questions on the frontend. If I choose to do so, would mdx files be the best solution here?
So, to sum up, which of the following solutions would be best here:
Storing code block as markdown files in the frontend
Storing code block data in the backend
Different solution altogether?
I thought that the questions would be stored on the backend. However, is that a good idea?
This depends on if you want the questions and answers to be user editable. In that case you need to actually store it somewhere. That somewhere would typically be a database which your Rails app communicates with.
If you're using markdown you can use the :text type in ActiveRecord. The database adapter will map this to a suitible type for that database for storing long strings - the exact details vary per adapter.
Alternatively, I was also considering storing all the questions on the
frontend. If I choose to do so, would mdx files be the best solution
here?
If by "storing the questions on the frontend" you mean putting them into your react project and serving them through that server (or cloud storage) then that is definately an option if you're building a very simple application where the questions and answers are a developer concern.
TLDR;
You really need to write use cases and goals for your application. If its a learning application do you really want to learn more about the server side and writing database applications or do you want to focus on writing client side code? The answer to those questions will determine your choices.

Advantages of using key-value store for URL shortener?

I researched a lot of URL shorteners for Rails in the web, and a majority of them delegate the persistence feature to redis.
Can anyone explain to me the benefits of using a key-value store, like Redis, instead of the database for persisting and accessing short URLs?
Databases like redis are optimized for storing lots of small values (such as links and their short urls) because they are loaded up into memory (ram). This means when a call is made to redis it reads data from ram (faster) instead of the hard drive (slower).
EDIT:
If you would like to learn more, this is a great writeup of the advantages and shortcomings of the top no-sql databases. Definitely a great reference.
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
It is much simpler to shard, scale and replicate a key-value store than a SQL database, so it makes sense to use one when data suits

Rails: Caching a Tree in memory on the server

I have a postgresql database which contains multidimensional data. What I did was I wrote a data structure that sorts all database rows into a tree format. Now the database is large and so I dont want to generate the tree every time a request comes in from a browser. What Id like to do is construct the tree once in a certain time period and persist it in memory on the server.
The tree is read only by the way. So now each time a request comes in the tree need not be generated new, its already there.
How can I make this happen. Im not an expert programmer, just a beginner and definitely new to web programming. So some of these concepts are new to me.
But if you could please point me in the right direction in terms of the concepts involved here, I can google the rest.
Or if you have actual links or examples that would be fantastic.
Thanks
There are several ways to approach this problem. It depends on just how close to the application you want the variables. If you're really looking to have them right "on top" of the application, for fastest possible use, then you could look at using a global variable "$tree" and hooking in to the application flow. Other options might include memcached, which is still pretty darn close to the application. Redis would be a good option for an in-memory database that could be shared between instances of an application, as it is a NoSQL database that you query. Not quite as close to the application though.
Generally, those are your primary options. In-application variables that survive requests. Application frameworks that will help variables survive requests and provide you a querying mechanism. Or, an In-Memory databases that will allow you to store and query rapidly from multiple instances. Each is a viable option, though I'm pretty sure you'd get a lot of 'community' flack for using a straight up global variable (such practices are considered unclean for their lack of thread-safety and other such concerns).

Riak vs Amazon SimpleDB

I am looking for an eventually consistent key value data store and i decided to choose between Amazon SimpleDB and Riak ,so can anyone share their valuable experiences comparing both .
Thanks in advance
Fedrick
Riak is a key-value store. The data values you store is opaque to the database, so you have no secondary indexes. But you do have the ability to run map-reduce if your data is JSON (or XML, I think). You can run map-reduce over all data, or just a subset ("seed keys"). It also has a "link walking" feature where documents can refer to other documents, which can be auto-fetched. They don't currently have an incremental map-reduce like CouchDB, which means any secondary queries (non-key) are quite expensive. They have plans to fix this.
SimpleDB is actually halfway between a docstore and a keystore: Each key->item supports multiple attributes, but it only goes one level deep. You can query on your key or your attribute values.
In production, Riak should be pretty "hands-off". If it's slow or getting full, just spin up a new server and tell it to join the cluster. (unlike CouchDB or MongoDB where you have to futz with multiple config files).
SimpleDB can take a pounding (tens of thousands of requests per second I've heard), but you are responsible for data scaling (i.e. don't violate their domain size limits or it will slow down).
I have used SimpleDB for about 6 months now. I am going into production with it. It works well, but I wish it were faster. I perform %like% queries for searching, and I can't seem to get it to dive through more than a few MB a second worth of values. But non %like% searches are much faster. I get the feeling that it could be sped up if someone at Amazon wrote a few algorithms in good old c, rather than Erlang, but then again I am a c coder.
Also the first few queries on a recently opened Domain will take longer, as the system gets it all read in.
Overall it worked for me, but if I want to scale higher I will have to go with something else.
Also, I think that almost all my use of it will be free - there is a generous allocation of space, etc.
Make sure you plan on the fact that SimpleDB currently has no 'read only' access modes, etc. Any user that can use it can edit it.
--Tom

Rails: Storing binary files in database [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Using Rails, is there a reason why I should store attachments (could be a file of any time), in the filesystem instead of in the database? The database seems simpler to me, no need to worry about filesystem paths, structure, etc., you just look in your blob field. But most people seem to use the filesystem that it leaves me guessing that there must be some benefits to doing so that I'm not getting, or some disadvantages to using the database for such storage. (In this case, I'm using postgres).
This is a pretty standard design question, and there isn't really a "one true answer".
The rule of thumb I typically follow is "data goes in databases, files go in files".
Some of the considerations to keep in mind:
If a file is stored in the database, how are you going to serve it out via http? Remember, you need to set the content type, filename, etc. If it's a file on the filesystem, the web server takes care of all that stuff for you. Very quickly and efficiently (perhaps even in kernel space), no interpreted code needed.
Files are typically big. Big databases are certainly viable, but they are slow and inconvenient to back up etc. Why make your database huge when you don't have to?
Much like 2., it's really easy to copy files to multiple machines. Say you're running a cluster, you can just periodically rsync the filesystem from your master machine to your slaves and use standard static http serving. Obviously databases can be clustered as well, it's just not necessarily as intuitive.
On the flip side of 3, if you're already clustering your database, then having to deal with clustered files in addition is administrative complexity. This would be a reason to consider storing files in the DB, I'd say.
Blob data in databases is typically opaque. You can't filter it, sort by it, or group by it. That lessens the value of storing it in the database.
On the flip side, databases understand concurrency. You can use your standard model of transaction isolation to ensure that two clients don't try to edit the same file at the same time. This might be nice. Not to say you couldn't use lockfiles, but now you've got two things to understand instead of one.
Accessibility. Files in a filesystem can be opened with regular tools. Vi, Photoshop, Word, whatever you need. This can be convenient. How are you gonna open that word document out of a blob field?
Permissions. Filesystems have permissions, and they can be a pain in the rear. Conversely, they might be useful to your application. Permissions will really bite you if you're taking advantage of 7, because it's almost guaranteed that your web server runs with different permissions than your applications.
Cacheing (from sarah mei below). This plays into the http question above on the client side (are you going to remember to set lifetimes correctly?). On the server side files on a filesystem are a very well-understood and optimized access pattern. Large blob fields may or may not be optimized well by your database, and you're almost guaranteed to have an additional network trip from the database to the web server as well.
In short, people tend to use filesystems for files because they support file-like idioms the best. There's no reason you have to do it though, and filesystems are becoming more and more like databases so it wouldn't surprise me at all to see a complete convergence eventually.
There's some good advice about using the filesystem for files, but here's something else to think about. If you are storing sensitive or secure files/attachments, using the DB really is the only way to go. I have built apps where the data can't be put out on a file. It has to be put into the DB for security reasons. You can't leave it in a file system for a user on the server/machine to look at or take with them without proper securty. Using a high-class DB like Oracle, you can lock that data down very tightly and ensure that only appropriate users have access to that data.
But the other points made are very valid. If you're simply doing things like avatar images or non-sensitive info, the filesystem is generally faster and more convenient for most plugin systems.
The DB is pretty easy to setup for sending files back; it's a little bit more work, but just a few minutes if you know what you're doing. So yes, the filesystem is the better way to go overall, IMO, but the DB is the only viable choice when security or sensitive data is a major concern.
I don't see what the problem with blobstores is. You can always reconstruct a file system store from it, e.g. by caching the stuff to the local web server while the system is being used.
But the authoritative store should always be the database. Which means you can deploy your application by tossing in the database and exporting the code from source control. Done.
And adding a web server is no issue at all.
Erik's answer is great. I will also add that if you want to do any caching, it's much easier and more straightforward to cache static files than to cache database contents.
If you use a plugin such as Paperclip, you don't have to worry about anything either. There's this thing called the filesystem, which is where files should go. Just because it is a bit harder doesn't mean you should put your files in the wrong place. And with paperclip (or other similar plugins) it isn't hard. So, gogo filesystem!
Unable to find an up-to-date answer to this question I have implemented an
database service for Active Storage available since Rails 5.2 that works just like any other Active Storage service, but stores file content in a special database column instead of a cloud service.
The implementation is based on a standard Rails Active Storage service, adding a migration with a new model: an extra table that stores blob contents in a binary field. The service creates and destroys records in this table as requested by Active Storage.
Therefore, this service, once installed, can be consumed via a standard Rails Active Storage API.
https://github.com/TitovDigital/activestorage-database-service
Please be aware of all pros and cons of using a database for storing files.
With the right database it will provide full ACID support and can wrap file storage and deletion into transactions. It is also much easier in DevOps as there is one less service to configure.
Large files or large traffic are the risky cases. Either will put an unnecessary strain on the app and database servers.

Resources