I'm in the middle of a fictional scenario project where I have allowed multiple users for a company to log in, create records, and so on, who all connect to the one database. They can all records absence records, attendance records, and so on.
What I want to do however, is use this same schema but expands this to allow several companies to have their own databases using the same schema. So each company will have their own data, but all companies use the same data model. In other words all company's can create absence records, but they each only have access to their own absence records that they created themselves.
How can I achieve this?
All I need is two or three files for this, I'm not going commercial with it in case you guys think I'm cutting corners at someone else's expense!
Something as simple as an if-else that decides which file to use would be very useful to me, so if such a line of code exists please let me know.

I think you are doing it wrong (unless you have a really good reason to have a database for each company), because it seems like you are repeating your data model over and over while introducing unnecessary complexity to your code.
Try to have all the companies in one DB/tables with having separated by the company_id.
Ex: data structure would be as follows
companies table
users table
However if you really want to connect to multiple databases, check this SO question.


How should I be storing expiring stats?

Let's say I have 2 models in my app: User and Survey
I'm trying to plot the number of paid surveys over time. A paid survey is one that has been created by a user that has an active subscription. For simplicity, let's assume the User model has subscription_start_date and subscription_end_date.
So a survey becomes "paid" the moment it is created (provided the user has an active subscription) and loses its "paid" status when the subscription_end_date has passed. Essentially, the "paid survey" is really a state with a defined start and end date.
I can generate the data fine. What I'm curious about is what's the most recommended way of storing this kind of stats? What should that table look like basically.
Another thing I'm concerned about is whether there are any disadvantages of having a daily task that adds the data point for the past day.
For more context, this app is written in Rails and we're thinking of using this stat architecture for other models too.
If I am understanding you correctly, I do not think you need an additional model or daily task to generate data points. To generate your report you just need to come up with the right SQL/ActiveRecord query. When you aggregate the information, be careful not to introduce nested queries. For simplicity's sake we could pull all the information you need using:
surveys = Survey.all.includes(:user)
Based on your description, an instance of survey has a start date that is just created_at.to_date. And since Survey belongs_to :user, it's end date is user.subscription_end_date.
When plotting the information you may need to transform surveys into some data structure that groups the information by date. Alternatively you could probably achieve that with a more complex SQL statement.
You could of course introduce a new table that stores the data points by date to avoid a complex query or data aggregation via ruby. The downside of this is that you are storing redundant information and assume the burden of maintaining data integrity. That doesn't mean you shouldn't do it because there may be an upside in regards to performance and reporting convenience.
I would need more information about your project before saying exactly what I would do, but it sounds like you already have the information you need in your database and it's just a matter querying it properly.

What is the best way to store a user's Facebook friends list in my database?

I'm creating a Ruby on Rails website which uses Facebook to login.
For each user I have a database entry which stores their Facebook User ID along with other basic information.
I'm also using the Koala gem in order to retrieve a user's friendlist from Facebook, but I'm unsure as to how I should store this data...
Option 1
I could store the user's friends as a serialized hash in the User table, then if I wanted to display a list of all the current user's friends, I could grab this hash and do something along the lines of SELECT FROM Users WHERE facebook_user_id IN hash
Each time the user logs in I could update this field to store the latest friends list.
Option 2
I could create a Friend table and store friendship information in here, where a User has many Friends. So there would be a row for each friendship, (User1 and User2 columns). Then to display a list of the current user's friends I could do something like SELECT User2 FROM Friends WHERE User1 = current_user
This seems like the better option to me, but...
It has the disadvantage that there would be many rows... If there were 100,000 users, each with 100 friends, that's now 10,000,000 rows in the Friends table.
It also means each time the user logs in, I'd need to loop over their Facebook friends list returned using Koala and create a Friend record if someone on their friendlist is in my User table and there isn't a corresponding entry in the Friends table. This seems like it'd be slow if a user has 1000 Facebook friends?
I'd appreciate any guidance on how it would be best to achieve this.
Apologies for the badly worded question, I'll try and reword/organise it shortly.
Thanks for any help in advance.
If you need to store a lot of data, then you need to store a lot of data. If you are like most, you probably won't run into that problem sooner than you have the cash to solve it. In other words, you are probably assuming you'll have more traffic and data than you'll get, at least in the short-term. So I doubt this is an issue, even though it is a good sign that you are thinking about it now rather than later.
As I mentioned in my comment below, the easiest solution is to have a tie table with a row for each side of the friend relationship (a has_many :friends, through: :facebook_friend_relationships, class_name: 'FacebookFriend' on FacebookFriend, per the design mentioned below). But your question seemed to be about how to reduce the number of records, so that is what the remainder of the answer will address.
If you have to store in the DB and you know for sure that you will absolutely have every FB user on the planet hitting your site because it is so awesome, but they won't all hit at once, then if you are limited in storage, you may want to use a LRU algorithm (remove the least recently used records) possibly with timed expiration also. You could just have a cron job that does a query on the DB then deletes old/unused records to do this. Wouldn't be perfect, but it would be a simple solution.
You could also archive older data rather than throw it away. So, frequently used data could stay in the table of active users, and then you might offload older data to another table or even another database (and you might see the apartment and second_base gems for that). However, once you get to the size, you're probably looking at a number of other architectural solutions that have much less to do with ActiveRecord models/associations or schema design. Though it pays to plan ahead, I wouldn't worry about that excessively until you are sure that the application will get enough users to invest the time in that.
Even though ActiveRecord has some caching, you could just avoid the DB and cache friends in memory yourself in the beginning for speed, especially if you don't yet have many users, which you probably don't yet. If you think you'll run out of memory because of the high number of users, LRU might be a good option here also, and lru_redux looks interesting. Again, you might want to time the cache also so expires and re-gets friends when the cache expires. Even just storing the results in the user session may be adequate, i.e. in the controller action method, just do #friends ||= Something.find_friends(fb_user_id), and the latter is what most might do as a first shot at it while you're getting started.
If you use ActiveRecord, in your query in the controller (or on the association in the model) consider using include: to avoid n+1 queries. That will speed up things.
For the schema design, maybe:
User - users table with email and authN info. Look at the Devise gem.
FacebookUser - info about the Facebook user.
FacebookFriendRelationship - a tie model with (id and) two columns, one for one FacebookUser id and one for the other.
By separating the authN info (User) from the FB data (FacebookUser and FacebookFriendRelationship), you make it easier to have other social media accounts, etc. each with information specific to those accounts in other tables.
The complexity comes in FacebookUser's relationship with friends if the goal is to minimize rows in the relationship table. To half the number of rows, you'd have a single row for a relationship where the id of FacebookUser could be in either foreign key column. Either the user has a friend or is a friend, so you could have two has_many :through associations on FacebookFriend that each use a different foreign key in FacebookFriendRelationship. Or you could do HABTM without the model and use foreign_key and association_foreign_key options in each association. Either way, you could add a method to add both associations together (because they are arrays). Instead, you could use custom SQL in a single has_many if you didn't care about having to use ActiveRecord to remove associations the normal way. However, per your comments, I think you want to avoid this complexity, and I agree with you, unless you really must limit the number of relationship rows. However, it isn't the number of tie table rows that will eat the data, it is going to be all of the user info you keep in the FacebookFriends table.

Dimensional Modeling - Queries without facts

I'm creating a dimensional model about a "calls recording system", for a VoIP service.
I'll give demonstrate just a little example to show my question.
Suppose I have a fact that represents a single call. And I have a dimension called Client, and another one called Provider. (pretend that there are other dimensions, like Date of course, and etc...)
(Dimension)Client ---> (Fact)Call <--- (Dimension)Provider
With this, i'll be able to see how many calls a client did, or how many calls were sent through a provider, and other questions.
And lets suppose that one client is associated with a provider, and one provider can have many clients.
So, here comes the question. Hhow can I create a query like: What clients each provider has?
It seems to be a query that is just between both dimensions. I cant involve the fact on that, because if a client never used the service, he wont be on the calls fact table, and he wont apper on this "Clients per Provider" query.
I was thinking with myself that one way to do that would be by creating a Role-Playing-Dimension, a view of the Client dimension and add it directly to the Provider dimension, just to do queries like this. It would be something like this:
(Dimension)Client ---> (Fact)Call <--- (Dimension)Provider <--- (Dimension)View Client
Of course, with this approach the user must be very carefull to dont use this View Client dimension with the fact table, because it would duplicate fact rows.
So, is this one of the situations where I need to use the famous factless fact tables?
Whats the right way to do this?
Role-playing dimensions should be used when you are "recycling" a dimension to be used multiple times in the same fact table (i.e Date of Call, Date of Service, etc).
It doesn't sound like that's what you're looking for. Instead, if the relationship is truly one to many, then I would just add the provider ID directly on the client dimension (no need for a view or anything), with the recognition that this relationship has nothing to do with the facts.
Essentially, think of the "provider" as just an attribute snowflaked off of client, when it comes to this sort of query.
However, it sounds like you might want to be sure that you don't have a many to many relationship between Clients and Providers (a client can use multiple providers, and a provider can have multiple clients). A many-to-many relationship is modeled dimensionally as a fact table. Your fact table could be a snapshot of the current point in time, with or without history. Just two columns are needed, Client and Provider. If you wanted to keep a record of the client/provider relationship by some timeframe, you'd just add a date stamp.
Note that a factless fact will work to model the one-many relationship as well (and if the model changes on the back end, your ETL is already done..)

MongoDB and embedded documents, good use cases

I am using embedded documents in MongoDB for a Rails 3 app. I like that I can use embedded documents and the values are all returned with one query and there is less load on the database server. But what happens if I want my users to be able to update properties that really should be shared across documents. Is this sort of operation feasible with MongoDB or would I be better off using normal id based relations? If ID based relations are the way to go would it affect performance to a great degree?
If you need to know anything else about the application or data I would be happy to let you know what I am working with.
Document that has many properties that all documents share.
name: string
description: string
Document that wants to use these properties:
(references many people)
body: string
This all depends on what are you going to do with your Person model later. I know of at least one working example (blog using MongoDB) where its developer keeps user data inside comments they make and uses one collection for the entire blog. Well, ok, he uses second one for his "tag cloud" :) He just doesn't need to keep centralized list of all commenters, he doesn't care. His blog contains consolidated data from all his previous sites/blogs?, almost 6000 posts total. Posts contain comments, comments contain users, users have emails, he got "subscribe to comments" option for every user who comments some post, authorization is handled by the external OpenID service aggregator (Loginza), he keeps user email got from Loginza response and their "login token" in their cookies. So the functionality is pretty good.
So, the real question is - what are you going to do with your Users later? If really feel like you need a separate collection (you're going to let users have centralized control panels, have site-based registration, you're going to make user-centristic features and so on), make it separate. If not - keep it simple and have fun :)
It depends on what user info you want to share acrross documents. Lets say if you have user and user have emails. Does not make sence to move emails into separate collection since will be not more that 10, 20, 100 emails per user. But if user say have some big related information that always growing, like blog posts then make sence to move it into separate collection.
So answer depend on user document structure. If you show your user document structure and what you planning to move into separate collection i will help you make decision.

Should a user's profile be a separate model?

I'm learning Rails by building a simple site where users can create articles and comment on those articles. I have a view which lists a user's most recent articles and comments. Now I'd like to add user 'profiles' where users can enter information like their location, age and a short biography. I'm wondering if this profile should be a separate model/resource (I already have quite a lot of fields in my user model because I'm using Authlogic and most of it's optional fields).
What are the pros and cons of using a separate resource?
I'd recommend keeping profile columns in the User model for clarity and simplicity. If you find that you're only using certain fields, only select the columns you need using :select.
If you later find that you need a separate table for some reason (e.g. one user can have multiple profiles) it shouldn't be a lot of work to split them out.
I've made the mistake of having two tables and it didn't buy me anything but additional complexity.
Pros: It simplifies each model
Cons: Managing 2 at once is slightly harder
It basically comes down to how big the user and profile are. If the user is 5 fields, and the profile 3, there is no point. But if the user is 12 fields, and the profile 20, then you definitely should.
I think you'd be best served putting in a separate model. Think about how the models correspond to database tables, and then how you read those for the various use cases your app supports.
If a user only dips in to his actual profile once in a while but the User model is accessed frequently, you should definitely make it a separate object with a one-to-one relationship. If the profile data is needed every time the User data is needed, you might want to stick them in the same table.
Maybe the location is needed every time you display the user (say on a comment they left), but the biography should be a different model? You'll have to figure out the right breakdown, but the general rule is to structure things so you don't have to pull data that isn't being used right away.
A user "owns" various resources on your site, such as comments, etc. If you separate the profile from the user then it's just one more resource. The user is static, while the profile will change from time to time.
Separating it out would also allow you to easily maintain a profile history.
I would keep it separate. Not all your users would want to fill out a profile, so those would be empty fields sitting in your user table. It also means you can change the profile fields without changing any of the logic of your user model.
Depends on the width of the existing user table. Databases typically havea limit to the number of bytes a recird can contain. I fyou are close to (or over which you can usually do if you have lots of fields with null values) the limit, I would add a table with a one-to-one relationship for better performance and less of a likelihood of a record that suddenly can't be inserted as there is too much data for the row size. If you are nowhere near the limit, the add to the exisiting table.
