Seperate Object over multiple models or encode in JSON? - ruby-on-rails

sorry if the question sounds so weird, but I don' really know how else to put it.
Essentially, my application will a bunch of objects. Each objects has somekind of post/comment structure, the unique thing though is, that it is more or less static, so i figure out it would make no sense to put in every single post and comment into my database, because that would cause more database load? Instead of this, I was thinking about putting the JSON representation of the post with its comments, thus only causing one database access per object. I would then render the JSON object in the controller or view or something. Is this a valid solution?

No!
You loose all ability to query that data at no benefit unless you are at massive scale. The database's job is to pull that stuff out for you efficiently, and if you create the proper indexes and implement the proper caching strategies, you shouldn't have any issues with database load. You want to replace all the goodness of the Rails ORM with your own decidedly less useful version in the interest of a speed gain, waaay before you need it.
What if later you want to do a most popular comments sidebar widget? Or you want to page through the comments, regardless of the post they are associated with, in a table for moderation? What if you want your data to be searchable?
Don't sacrifice your ability to easily query and manipulate the data for premature optimization.

Though it sounds a good idea but I don't think that it will work in the long run thinking of what is going to happen when you have many comments on your posts. You will have to get the long string from the database and then add the new comment to it and then update it in the data. This will be very inefficient compared to just inserting one more comment in the table.
Also, just think what is going to happen, if at some point, you will have to give user the option to update the comment. Getting the particular comment from that long string and then update it will be a nightmare, don't you think?

In general you want to use JSON and the like as a bit of a last resort. Storing JSON in the db makes sense if your information isn't necessarily known ahead of time. It is not a substitute for proper data modelling and is not a win performance-wise.
To give you an idea where I am looking at using it in a project, in LedgerSMB we want to be able to have consultants track additional information on some db objects. Because we don't know what it will be in advance JSON makes a lot of sense. We don't expect to be searching on the data or support searches on the data but if we did that could be arranged using plv8js.

Related

Store objects with CoreData

i'm refactoring my app. Currently, I store objects in a .plist for further processing. It works fine, but I thought it was about time to dive into CoreData.
My app fetches data from a web service. This data I parse into individual objects.
I use the properties of these objects to fill Tableviews.
While refactoring, I could just bluntly store the whole object as a transformable with CoreData, as far as I understand.
I could also define an entity with attributes similar to the properties of my object.
Is there any Best Practice here? I think the first approach makes it easier to do the refactoring, but I somehow think I'm missing out on advantages of CoraData in that case. Like maybe performance?
Do not store objects as transformable. You will get just DB where it is not possible to fetch some separated objects based on some criteria. You will need to fetch all DB in memory and than work with it. So it will be the same as plist file and you will waste the effort. Just use entities with proper attributes. CoreData is fast, you don't need to worry about performance.
Transformables are generally a good idea only for attributes that Core Data doesn't know how to represent. They let you use a binary data blob as a fallback, but they're never ideal. They can also be used if you absolutely, definitely, will never ever need to filter or sort a fetch request based on the attribute value. In that case they're still not great, because there's extra unnecessary work.
If you need to (or might possibly someday need to) filter or sort a fetch request based on attribute values, don't use a transformable. They can't be used for either purpose beyond extremely basic stuff like checking to see if the value is nil.
OK, I owe you.
I asked a question without investigating things properly.
The real answer is to understand NSManagedObjects.
Sorry for bothering you

What shouldn't I store in core data?

So this is more of an application design question. But I think it can be 'answered' and not just discussed. :)
I'm using RestKit for an application we're building. It obviously makes it super easy to put stuff into either straight objects or core data objects.
In the specific instance I'm dealing with, we have comments, much like comments on a facebook post.
Now, the nicest thing about storing these comments in core data is that with NSFRC I can sort them super easily and deal with updating/inserting automatically into the right spots into the timeline. But there's a couple sticking points there as well.
For instance with infinite loading, I now have to manage loading the comments in between the new most recent comments and the old stored comments. (Maybe the first time I grabbed 25, but there has been 100 new comments since then. So I retrieve the latest 25 first, then have to have an auto load cell in between the new comments and the old until I run into those, then have to paginate any after.
Aside from that, then you are also storing potentially thousands of comments in core data. Maybe it's not a big deal for quite a long time, but eventually you might want to start cleaning up old comments with a GCD task.
So what are the leading thoughts on what to store in core data, and what to keep as transient objects. (Maybe storing those in a cache like NSCache or the new Tumblr cache https://github.com/tumblr/TMCache).
Edit
Ok maybe I should clarify a little here. I get the purpose of Core Data... for persisting across app restarts and having an object graph with relationships. I make plenty of use of it. I guess what I'm wondering about here is the grey areas where I would like things to persist for the sake of not always having to wait for a network call, and offline availability.
But much like stories and comments on facebook, there are always going to be a constant stream of new ones coming in, and you don't necessarily care about 300 comments on an old post. Someone could come back to view comments on their 'post' quite a few times, or someone may just be browsing 'posts' and comments casually, and never coming back to them.
So I'm just trying to consider the strategy for something like this where you have potentially lots of entities (comments) coming down from a service. Sometimes people will want to view them several times (their own 'post') and sometimes they are just browsing through. When trying to see how others do this, it seems some stuff it all into core data, some (like Facebook) seem to store 25-50 most recent in the db, and any beyond that are transient (they probably are clearing out older stories and comments regularly too.)
Core Data is not designed to be used as "dumb data storage", but rather object persistence. So, anything that you want to persist between uses of your app should go into Core Data.
If you are using Core Data properly, it will take care of all of your caching for you as well.
EDIT:
Anything that is going to change too often for your taste or that you just don't want to permanently store, NSCache may be a better option. If you don't think your user will look at it again tomorrow, leave those bits out of your persistence. (IMHO)
Create a scond repository. Either select a time period that is known as 'recent' or provide a preference for such. Periodically look at the primary repository and find objects now older than the recent range, and move those objects to the scond repository.
Then provide users the means to search in just recent or all.
If all they want are recent values the searches should be faster, and nothing is lost.

What database should I use in an app where my models don't represent different ideas, but instead different types with overlapping fields?

I'm building an application where I will be gathering statistics from a game. Essentially, I will be parsing logs where each line is a game event. There are around 50 different kinds of events, but a lot of them are related. Each event has a specific set of values associated with it, and related events share a lot of these attributes. Overall there are around 50 attributes, but any given event only has around 5-10 attributes.
I would like to use Rails for the backend. Most of the queries will be event type related, meaning that I don't especially care about how two event types relate with each other in any given round, as much as I care about data from a single event type across many rounds. What kind of schema should I be building and what kind of database should I be using?
Given a relational database, I have thought of the following:
Have a flat structure, where there are only a couple of tables, but the events table has as many columns as there are overall event attributes. This would result in a lot of nulls in every row, but it would let me easily access what I need.
Have a table for each event type, among other things. This would let me save space and improve performance, but it seems excessive to have that many tables given that events aren't really seperate 'ideas'.
Group related events together, minimizing both the numbers of tables and number of attributes per table. The problem then becomes the grouping. It is far from clear cut, and it could take a long time to properly establish event supertypes. Also, it doesn't completely solve the problem of there being a fair amount of nils.
It was also suggested that I look into using a NoSQL database, such as MongoDB. It seems very applicable in this case, but I've never used a non-relational database before. It seems like I would still need a lot of different models, even though I wouldn't have tables for each one.
Any ideas?
This feels like a great use case for MongoDB and a very awkward fit for a relational database.
The types of queries you would be making against this data is very key to best schema design but imagine that your documents (in a single collection similar to 1. above) look something like this:
{ "round" : 1,
"eventType": "et1",
"attributeName": "attributeValue",
...
}
You can easily query by round, by eventType, getting back all attributes or just a specified subset, etc.
You don't have to know up front how many attributes you might have, which ones belong with which event types, or even how many event types you have. As you build your prototype/application you will be able to evolve your model as needed.
There is a very large active community of Rails/MongoDB folks and there's a good chance that you can find a lot of developers you can ask questions and a lot of code you can look at as examples.
I would encourage you to try it out, and see if it feels like a good fit. I was going to add some links to help you get started but there are too many of them to choose from!
Since you might have a question about whether to use an object mapper or not so here's a good answer to that.
A good write-up of dealing with dynamic attributes with Ruby and MongoDB is here.

Should is_paranoid be built into Rails?

Or, put differently, is there any reason not to use it on all of my models?
Some background: is_paranoid is a gem that makes calls to destroy set a deleted_at timestamp rather than deleting the row (and calls to find exclude rows with non-null deleted_ats).
I've found this feature so useful that I'm starting to include it in every model -- hard deleting records is just too scary. Is there any reason this is a bad thing? Should this feature be turned on by default in Rails?
Ruby is not for cowards who are scared of their own code!
In most cases you really want to delete the record completely. Consider a table that contains relationships between two other models. This is an obvious case when you would not like to use deleted_at.
Another thing is that your approach to database design is kinda rubyish. You will suffer of necessity to handle all this deleted_At stuff, when you have to write more complex queries to your tables than mere finds. And you surely will, when your application's DB takes lots of space so you'll have to replace nice and shiny ruby code with hacky SQL queries. You may want then to discard this column, but--oops--you have already utilized deleted_at logic somewhere and you'll have to rewrite larger pieces of your app. Gotcha.
And at the last place, actually it seems natural when things disappear upon deletion. And the whole point of the modelling is that the models try to express in machine-readable terms what's going on there. By default you delete record and it passes forever. And only reason deleted_at may be natural is when a record is to be later restored or should prevent similar record to be confused with the original one (table for Users is most likely the place you want to use it). But in most models it's just paranoia.
What I'm trying to say is that the plausibility to restore deleted records should be an explicitly expressed intent, because it's not what people normally expect and because there are cases where implicit use of it is error prone and not just adds a small overhead (unlike maintaining a created_at column).
Of course, there is a number of cases where you would like to revert deletion of records (especially when accidental deletion of valuable data leads to an unnecessary expenditure). But to utilize it you'll have to modify your application, add forms an so on, so it won't be a problem to add just another line to your model class. And there certainly are other ways you may implement storing deleted data.
So IMHO that's an unnecessary feature for every model and should be only turned on when needed and when this way to add safety to models is applicable to a particular model. And that means not by default.
(This past was influenced by railsninja's valuable remarks).
#Pavel Shved
I'm sorry but what? Ruby is not for cowards scared of code? This could be one of the most ridiculous things I have ever heard. Sure in a join table you want to delete records, but what about the join model of a has many through, maybe not.
In Business applications it often makes good sense to not hard delete things, Users make mistakes, A LOT.
A lot of your response, Pavel, is kind of dribble. There is no shame in using SQL where you need to, and how does using deleted_at cause this massive refactor, I'm at a loss about that.
#Horace I don't think is_paranoid should be in core, not everyone needs it. It's a fantastic gem though, I use it in my work apps and it works great. I am also fairly certain it hasn't forced me to resort to sql when I wouldn't need to, and I can't see a big refactor in my future due to it's presence. Use it when you need it, but it should be a gem.

Pros and cons of using DB id in the URL?

For example: http://stackoverflow.com/questions/396164/exposing-database-ids-security-risk and http://stackoverflow.com/questions/396164/blah-blah loads the same question.
(I guess this is DB id of Questions table? Is this standard in ASP.NET?)
What are the pros and cons of using this type of scheme in your web app?
Well, for one, simple id's are usually sequential, so it's quite easy to guess at and retrieve other data from your application.
Load JSON at runtime rather than dynamically via AJAX
https://stackoverflow.com/questions/395858/doesnt-matter-what-I-type-here
Now, having said that, that might also be seen as a bonus, because nobody in their right mind would make their whole security hinge on the fact that you have to clink on a link to get to your secure data, and thus easy discoverability of the data might be good.
However, one point is that you're at some point going to reindex your database, having something that makes the old url's invalid would be bad, if for no other reason that search engines would still have old links.
Also, here on SO it's quite normal to use links like this to other questions, so if they at some point want to reindex and thus renumber things (or move to guid's), they will still have to keep the old structure and id's.
Now, is this likely to ever happen or be needed? Probably no.
I wouldn't worry too much about it, just build your security as though every entrypoint to your application is known and there should be no problems.
The database ID is used to lookup the question in the database. It's numerical which means: fast. If you would leave it out you had to lookup the title which is a lot slower.
The question itself is part of the url to make it "search engine friendly". It'll be higher ranked by g**gle etc.
Pro:
Super easy to retrieve the page information. Take the ID, call the database, viola. Your table will (should) be indexed to make this lookup super fast.
Guaranteed unique URL.
Con:
IDs in your system are being publicly displayed. Not a problem in a publicly available system like SO. However, proper security measures on the back end can make this not a problem even on sensitive systems.
Ugly URLs. 6+ digit numbers are just hard to remember, and makes it more difficult to distinguish pages, if the number is all that identifies it. This can also has SEO consequences, as URLs with more relevant and well structured information are generally ranked better. SO compensates by providing the post name in the URL as well. While I still can't rattle off a particular post to my buddy at lunch, I can still find it easier in the browser history.
Slower lookups. Doing text searches on a database is generally slower.
But remember in a community like this there is a higher (although still minimal) chance of the same question name being posted at the same time, which would break things, thus some kind of unique identification need be applied, ID's are probably quite logical in the context that this particular web application was developed in.
I dont think it's bad practice, and fairly common, to do it in ASP.NET and other frameworks. As #lassevk said, if your security depends on it, then you need some more checks in there (can user X get to record Y), but it more comes down to the SEO-friendlyness of the URLs for public sites.
For example, SO's URLs are fairly friendly:
Pros and cons of using DB id in the URL?
google rates information at the START of the URL higher than at the end, so having it look like:
https://stackoverflow.com/pros-and-cons-of-using-db-id-in-the-url/q/407120
should get a higher ranking for "pros and cons of using db id in the url". It's not the only factor, but it is quite a major one - look at Amazon's format, they do it for a very good reason:
http://www.amazon.com/Maverick-Ricardo-Semler/dp/0712678867
http://server/book-name/dp/book-id
Wordpress does it like this:
http://server/yyyy/mm/dd/name-of-the-post
however, if you post two posts on the same day called "foo", you get:
http://server/yyyy/mm/dd/foo
http://server/yyyy/mm/dd/foo2
the slug (foo/foo2) isn't a PK, but it IS maintained as unique over the posts table.
I think putting the ID in the URL isn't a problem, unless your URL is a GUID! Way too long, and hard to type. If it's an int, or some kind of short guid (eg 6-8 chars), then it shouldn't be a problem.

Resources