How do large websites like Facebook distribute load? [closed] - scalability

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This may be better suited to Server Fault, but it seems more of a programming challenge to me. I could be wrong.
I was thinking about how Facebook does what it does. It has over 500 million active users. How do they manage to serve all of those users? Is there one gigantic database holding a record for every single user so that whenever someone on logs in, authentication is checked against that central machine? I'm pretty ignorant about this topic but I can see that a solution like that is simple not scalable. There will come a point where that central server just can't handle everything.
Instead, say that central database is split up into 100 databases so that the load is split across all of them evenly. That must be what Facebook does, but how do they know which user record to store on what machine? Is there a record stored in every single machine and when you log in, a random user machine is used for authentication? That would mean every time someone registers or changes their password, the changes have to be propagated across all 100 servers.
One other solution comes to mind. Maybe they have some way of hashing a user's email address to a specific user database. Then all that would have to be known by the web servers is that hashing algorithm. But this solution brings up its own problem I think. What if you want to add more user database machines? Do you change the hashing algorithm to take into account 101 user databases instead of 100? Would you start moving user records around so the 101 user databases have the same number of user records? No, that seems ridiculous as well.
Anyways, as you can see I don't know too much about how to solve these problem. Does anyone have some recommended reading about this topic?

A good starting point might be to take a look at Cassandra (lecture notes), the distributed database that powers FB's inbox search.
Here's more about FB's nuts and bolts.
You might also find some gems in the FB developer news.

Related

What is the benefit of storing data in databases like SQL? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
This is very elementary question but why does a framework like Rails use ActiveRecord to run SQL commands to get data from a DB? I heard that you can cached data on the Rails server itself, so why not just store all data on the the server instead of the DB? Is it because space on the server is a lot more expensive/valuable than on the DB? If so, why is that? Also can the reason be that you want a ORM in the DB and that just takes too much code to set up on the Rails server? Sorry if this question sounds dumb but I don't know where else I can go for an answer.
What if some other program/person wants to access this data and for some reason cannot use your rails application? What if in future you decide to discontinue using rails and decide to go with some other technology for front end but want to keep the data? In these cases having a separate database helps. Also could you run complex join queries on cached data on Rail Server?
databases hold a substantial amount of advantages against other types of databases. Some of them are listed below:
Data integrity is maximised and data redundancy is minimised, as
the single storing place of all the data also implies that a given
set of data only has one primary record. This aids in the maintaining
of data as accurate and as consistent as possible and enhances data
reliability.
Generally bigger data security, as the single data storage location
implies only a one possible place from which the database can be
attacked and sets of data can be stolen or tampered with.
Better data preservation than other types of databases due to
often-included fault-tolerant setup.
Easier for using by the end-user due to the simplicity of having a
single database design.
Generally easier data portability and database administration. More
cost effective than other types of database systems as labour, power
supply and maintenance costs are all minimised.
Data kept in the same location is easier to be changed, re-organised,
mirrored, or analysed.
All the information can be accessed at the same time from the same
location.
Updates to any given set of data are immediately received by every
end-user.

How to secure Rails app with several companies sharing application and databases [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Environment: Ruby 2.0.0, Rails 4.1, Windows 8.1, Devise, CanCan, Rolify all on PostgreSQL.
I am building an app that will have multiple companies sharing it. Each company will have devise admins that will manage their users. And, each company will have its own data in use. All of this is planned to share tables, isolated by company id within those tables. The app is currently working with user management and no problems. Each admin sees only interacts with only their company's users. I am about to build the MVC for the main application.
I want to take a reality check at this point. How exposed will one company be to another? What exposures will exist and how do I mitigate them? Is there another gem out there that will help me implement this? Or, is this just a really, really bad enough idea that I should isolate each company to its own image?
Properly isolating customers from each other is harder than it seems. Its not just a one time event, you will have to keep it in mind and continue to deal with it as you grow. And data segregation is just one part of the problem. All of your resources, servers, databases, caches, background workers, etc... are contended for by your customers and the actions of one customer can have an impact on your app's performance for others.
Definitely do your research on multi-tenancy techniques, but I would suggest you ultimately settle on wrapping a simple solution in an abstraction that is seamless to the rest of the app. Something like:
for_customer(1) do
# This should return only the models visible to customer 1
# regardless of where they live or however they are partitioned.
MyModel.all
end
For the web case, that code can wrap controller actions via an around filter. Don't worry about implementing it crudely now, thats why you have an abstraction and partitioning code that lives in one place. As things change and you encounter problems and/or deficiencies, improve the implementation and deploy.
I work at a SaaS company with several hundred customers all getting real traffic, and there was no way we could have foreseen all of the issues we'd eventually run into in keeping customers isolated from one another. Things like passenger not correctly clearing memcached connections across process forks at startup during a seamless deployment. Or code that would correctly ensure db connections weren't shared across resque worker process forks suddenly becoming inadequate after an ActiveRecord upgrade.
Don't try to figure everything out now, just make sure this code lives in one spot and that if it changes, its not going to have a cascade effect to the rest of your app. Because you know its going to need to change.

is there any optimized way to avoid loops? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
Hi I am implementing Email Client Application. My Application is like web based Application. I have xxxxxx number of users Mail Credentials. I need to sync for them without using any looping concept. It should immediately reflect when something changes in Mail Server. IMAP IDLE will not helpful seems it is only giving New Email Arrival. I want to get Each and Every changes from Mailserver. Without using any looping, is there alternate way to do this. I need to do this in Generalized Approach. bcoz Something supported in Gmail but not supported in yahoo...
so generalized approach is appreciated.
The optimized way to avoid loops is called IDLE. That's why it was specified and implemented. If there had been a way to do what you want without IDLE, then IDLE would not have been necessary in the first place.
Your questions show a lack of familiarity with a protocol, and a weird entitlement to features that do not exist. I suspect very much you have bitten off way more than you can chew.
Read RFC 3501 cover to cover. This is the IMAP4Rev1 specification. This is all you can expect, in general, from general servers on the internet, and no more. And sometimes, not even that much*. IMAP4 is fundamentally a single-folder protocol. You can only get information and messages for one folder at a time. It is up to you to interrogate the server for what information you want; it is not obligated to tell you anything you did not specifically ask for.
Writing real, full-featured, broadly-compatible email clients is difficult. Start by writing a client that can synchronize one user and one folder, on baseline IMAP. Implement extensions to make this more efficient when available: IDLE, CONDSTORE, etc. Then move on to many folders, and many users.
Also understand that this dream of thousands of users is also going to run into serious logistical issues. No server on the internet is going to allow you to log in hundreds of times without having gotten an agreement ahead of time. You will cause your users extreme headaches when they get mysteriously locked out of their accounts for logging in from 'weird' places. In the case of google, this can be mitigated with OAuth, but that will only take you so far before they become interested in what you're doing.
Also, good reading is RFC 2683: IMAP Implementation Recommendations.
*: There are many servers that do not properly support SEARCH, several that do not send UIDNEXT on SELECT, Yahoo won't allow you to keep a persistent connection or give any updates at all without reconnecting, etc.

When creating an end to end iOS app, should the front end or back end be created first? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've been a developer for 10 years (non iOS), and working for a large company have never created applications end to end. Just worked on very large applications, on pieces.
I'm starting to get into iOS for fun, and have an app in my head that I want to create. I've wireframed the entire thing using the iOS app 'interface'. Since then, I've begun to start coding. I have about 15 scenes in storyboard (the total app will probably be 100+), and right now I'm just using hardcoded 'fake' data.
However, I've recently begun to think that maybe I should be creating the database and some initial data there instead of using all this hardcoded fake data.
Does anyone have any suggestions and reasons why one way is better than another?
Should I create the back end before the front end? If I do, then each new scene I add I can work the real data in from the beginning instead of having to replace fake hard coded data.
Also, I know little about creating back ends. The application I'm creating is nothing like twitter, but for data access and for this example, let's say it is. It's main view of the application is something like twitter. The user can hit refresh and get many new data points ('tweets' in twitter) from the server. So the application could be very data intensive. Am I best off using something like Parse and paying for their services, or creating something in LAMP, or something else. I've worked with SQL and database a lot in my last 10 years and am very comfortable with that aspect of the back end.
Thoughts? Suggestions?
Thanks!
I'd say you have 3 options here :
Front-end first, back-end afterwards
Good thing is while developing your front-end, you may understand what's really relevant and what isn't. You probably won't do anything unnecessary on the back-end part. A bad thing though is that bad stuff may happen when you try to connect you back-end to your front-end, and involve some code refactoring on the front-end side, if you don't make sure they at least can work together.
Back-end first, front-end afterwards
You may here not really see where you're going while developing the back-end. You'll see (you may even know it already) that what you'll create for the client-side may not really be as it looked in your head.. You'll probably have to rework a lot on the back-end.
Front-and-back-end together
This is how I usually work. Start the front-end just as you did with hard-coded data, and start asap to work on the back-end. Move your boilerplate data on it, just so you can make sure they communicate well. Then, try to work on both simultaneously. That way, if you change your mind about something on one side, you won't have to redo much code on the other side.
Regarding the back-end solution, pretty much all I can say is that I used Parse.com services, and it's really good. In my case, I was not ready to create an entire back-end by myself. If you can, maybe you don't need them. But, (and it's a big one), Parse's SDK can take care of the whole communication between your back-end and your front-end. You don't have to manage network availability, caching stuff, and every thing you have to think about when you develop for a mobile OS. This is very nice.
Their free plan lets you run 1M queries every month, which is quite a lot. But if you want to go further and reduce the number of requests to Parse, you can combine your own back-end and theirs. It may not work for your specific case, but you can have the user access your server to check if new data is available, and only then query Parse. For example, for a news app, have the news on parse.com, store the most recent news date on your server, save the last update date on the client device, and before accessing parse, compare the dates with your server. If needed, query parse, if not, go to the cache (handled by parse's SDK). That way you can limit the number of queries and stay in the free plan.
You should probably try to estimate the number of queries you'll have per month and the monetary impact before choosing.
Just my own opinion :]
I would suggest you to add new features to your app with the smallest possible complexity. Like e. g. "The user can see a list of all registered users." - This example might not fit perfectly well for your case, but I hope you get the point: build one small thing at a time.
But for these small things: make the full trip front and back. Since it shouldn't take you too long to complete such a feature, it doesn't really matter if you complete the frontend or the backend first. So for this part: basically what #rdurand already said ;)
Regarding the backend I see two options:
Either you create some REST-Services yourself. The choice of technilogy should depend on what you know already. I am a big fan of JAX-RS, but if you don't already have some java experience you might have hard time with this.
Use some kind of SAAS-API. I've heard some good things about http://www.apiomat.com/, but never used it myself...
Good luck ;)

Legal considerations when proposing business idea to employer [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Ok, so you've got an idea for a potentially successful online start-up. You know that you'll never set the thing up yourself, you prefer the stability of your permanent job. Then you think, "Hey maybe my boss would be interested in backing this as an internal project. Obviously I'd want X% of the profits for coming up with the concept in the first place and I want to be able to retire when we get bought by BigCorp."
Can you follow any legal procedures to prevent anybody you tell about a business idea from exploiting it independently?
What kind of deal should you be looking at (e.g. profit share, shares etc)?
IANAL, but I've been around the block a few times.
First, find a competent attorney that specializes in startups and get some professional advice and follow it. Preventive lawyering is really a whole lot cheaper than waiting until it is too late.
Second, protect yourself and make sure you do everything in writing. If your employer doesn't bite and you decide to do it yourself then you'll be protected when you're successful and your company comes back later and tries to claim ownership over the idea. I've seen this happen more than once.
Third, try to get the people you pitch to to sign a non-disclosure. If they have half a brain they won't sign one, but it's an important sign of your intent. Without an NDA do your best to not pitch your idea to anyone who can take the idea and run with it based on your elevator pitch. That generally means really competent technical people, or an expert in the business area.
If you get to the point of having to worry about a deal go back to that competent attorney you retained.
To answer your first question: no. There's nothing you can do that will stop someone from exploiting an idea you have. Apart from patenting it. (Which you aren't really supoposed to be able to do unless you've already at least partly implemented it in the first place).
My advice would be: if you know someone you trust not to steal your idea, tell them. If not, don't.
Also, if you have an idea and you tell your boss, you're not really entitled to anything in terms of ownership of X% of profits. That's just supposed to be part of your job.
Sorry: this probably seems like a negative answer, but I'm trying to be realistic.
Ben

Resources