Using yml as a data reference. Bad idea? - ruby-on-rails

I need about 1,000 words to be able to be accessed constantly by my app. The reason I want it detached is so that down the line, I can dynamically change what those 1,000 words will be.
But I don't necessarily feel this requires a database injection. I feel a simple .yml file could work.
Is this a good practice? And if it is, what would be the best way to do this?
Is it ok to load the 1,000 words in a before_filter call ?

Don't think that architecturally this is the right call.
If those are to be changed rarely, then you can put them inside a constants class. If you plan to change them often, what you will do is write to file system and read from it which is exactly what every database does, but in a much much faster and optimal way then you could possibly write.
My two cents. Let us know how it worked out.

Do not use relational database unless you really need to. If your "words" are relatively immutable, can be accessed simultaneously without multithreadig concerns, you do not need querying etc - they are better off in some hash loaded for entire applications on startup
loading them for every reuest is not a good idea.
( this advice is not related to ROR )

Related

Rails: Caching a Tree in memory on the server

I have a postgresql database which contains multidimensional data. What I did was I wrote a data structure that sorts all database rows into a tree format. Now the database is large and so I dont want to generate the tree every time a request comes in from a browser. What Id like to do is construct the tree once in a certain time period and persist it in memory on the server.
The tree is read only by the way. So now each time a request comes in the tree need not be generated new, its already there.
How can I make this happen. Im not an expert programmer, just a beginner and definitely new to web programming. So some of these concepts are new to me.
But if you could please point me in the right direction in terms of the concepts involved here, I can google the rest.
Or if you have actual links or examples that would be fantastic.
Thanks
There are several ways to approach this problem. It depends on just how close to the application you want the variables. If you're really looking to have them right "on top" of the application, for fastest possible use, then you could look at using a global variable "$tree" and hooking in to the application flow. Other options might include memcached, which is still pretty darn close to the application. Redis would be a good option for an in-memory database that could be shared between instances of an application, as it is a NoSQL database that you query. Not quite as close to the application though.
Generally, those are your primary options. In-application variables that survive requests. Application frameworks that will help variables survive requests and provide you a querying mechanism. Or, an In-Memory databases that will allow you to store and query rapidly from multiple instances. Each is a viable option, though I'm pretty sure you'd get a lot of 'community' flack for using a straight up global variable (such practices are considered unclean for their lack of thread-safety and other such concerns).

what is the best way for caching frequently changing status information?

my projects deals with a client / server structure where the clients provide status information via a soap interface in a periodically way. every request (1 per minuete) contains a complex stucture of stat us data.
status information is used by many views and instead fetching the information each time from database i store the data in a sychronized list.
are there better caching techniques in grails? are sychronized lists a good solution?
This seems more like a generalized question so I'll provide some generalized thoughts form my own experience.
Are there better caching techniques in grails? are sychronized lists a good solution?
There may be several layers of cache depending on what your dealing with. I don't believe bare-bones grails itself caches anything with regard to your question however; there are configurable options and plugins that allow you to cache everything from queries, domain classes, service calls, page fragment, images, css and just about everything else. Not to mention your database and other layers may have their own cache options.
Having said that I would avoid using your own caching techniques unless your dealing with a very specific issue where you know you can perform better than a more generic approach like a second level cache (ie EHCache).
If you do roll your own cache you'll want to be aware of everything else that might be caching the same content as well. Caching a cached object form a cached query is a tough one to debug.
If performance is your concern you should always do some bench marking before you change anything. To truly get the best performance out of anything you'll need to understand how it works. Grails, hibernate and spring work together on performance and this isn't anything I can put in few sentences but there are plugins that can help you understand what is going on beyond the scenes like JavaMelody.
Lastly, if you already built something that works and everyone's happy don't break it. :)
Probably a properly scoped service may help:
http://grails.org/doc/2.0.x/guide/services.html#scopedServices
Maybe a "session"-scoped service may be the thing you're looking for.
You may want to take a look at the built-in caching techniques: http://grails.org/doc/2.0.x/ref/Database%20Mapping/cache.html
A more detailed way is described here: http://grails.org/doc/2.0.x/guide/single.html#caching
Depending on what you want to cache, you may want to use Caching instances (to cache everything of that instance) or Caching Queries (where you only cache the result of one query)
As you can see in the second link, the config lets you use EhCache as cache manager.

database/datamart best practice

I have developed a grails application that stores a great deal of information. Currently, when I wish to run analysis on the large sets of data, it can be pretty time consuming. To help speed things up, I have decided to move all the calculated and aggregated data off into a "data mart". This way, a process can be run, maybe by a cron job, to work with all the records, pull out all the requested information, and store the calculated and requested data in separate tables.
My questions are: First, does this seem like the best way to tackle the problem? If so, I'm trying to figure out the best way to manage the new domain classes. Should I keep them in the same domain project folder or is it possible to create a new folder? My domain classes just seem to be getting very cluttered and I would rather a way to separate the relational tables from the data-mart tables. Any suggestions for best structuring would be great.
I am using groovy on grails and MySQL database
thanks
jason
It sounds like what you are doing is a pretty good idea. You will notice that the stack set of apps also have data aggregation processes the run every day (rankings), every few days (badge calculations), etc.
You could create a new package for all the 'datamart' classes required, to keep it separate.
If you don't need anything in your current app, you can even create a new project. Keep in mind that if you need to pull all of the data out of your tables, hibernate might not be the best solution. If possible, leverage your db to do calculations for you.

Seperate Object over multiple models or encode in JSON?

sorry if the question sounds so weird, but I don' really know how else to put it.
Essentially, my application will a bunch of objects. Each objects has somekind of post/comment structure, the unique thing though is, that it is more or less static, so i figure out it would make no sense to put in every single post and comment into my database, because that would cause more database load? Instead of this, I was thinking about putting the JSON representation of the post with its comments, thus only causing one database access per object. I would then render the JSON object in the controller or view or something. Is this a valid solution?
No!
You loose all ability to query that data at no benefit unless you are at massive scale. The database's job is to pull that stuff out for you efficiently, and if you create the proper indexes and implement the proper caching strategies, you shouldn't have any issues with database load. You want to replace all the goodness of the Rails ORM with your own decidedly less useful version in the interest of a speed gain, waaay before you need it.
What if later you want to do a most popular comments sidebar widget? Or you want to page through the comments, regardless of the post they are associated with, in a table for moderation? What if you want your data to be searchable?
Don't sacrifice your ability to easily query and manipulate the data for premature optimization.
Though it sounds a good idea but I don't think that it will work in the long run thinking of what is going to happen when you have many comments on your posts. You will have to get the long string from the database and then add the new comment to it and then update it in the data. This will be very inefficient compared to just inserting one more comment in the table.
Also, just think what is going to happen, if at some point, you will have to give user the option to update the comment. Getting the particular comment from that long string and then update it will be a nightmare, don't you think?
In general you want to use JSON and the like as a bit of a last resort. Storing JSON in the db makes sense if your information isn't necessarily known ahead of time. It is not a substitute for proper data modelling and is not a win performance-wise.
To give you an idea where I am looking at using it in a project, in LedgerSMB we want to be able to have consultants track additional information on some db objects. Because we don't know what it will be in advance JSON makes a lot of sense. We don't expect to be searching on the data or support searches on the data but if we did that could be arranged using plv8js.

Should i keep a file as text or import to a database?

I am constructing an anagram generator that was a coding exercise, and uses a word list thats about 633,000 lines long (one word per line). I wrote the program just in Ruby originally, and I would like to modify this to deploy it online.
My hosting service supports Ruby on Rails as about the only Ruby-based solution. I thought of hosting on my own machine, and using a smaller framework, but I don't want to deal with the security issues at this moment.
I have only used RoR for database-driven (CRUD) apps. However, I have never populated a sqlite database this way, so this is a two-part question:
1) Should I import this to a database? If so, what's the best method to do so? I would like to stick with sqlite to keep things simple if that's the case.
2) Is a 'flat file' better? I wont be doing any creating or updating, just checking against the list of words.
Thank you.
How about keeping it in memory? Storing that many words would take just a few megabytes of RAM, and otherwise you'd be accessing the file frequently so it'd probably be cached anyway. The advantage of keeping the word list in memory is that you can organize it in whatever data structure suits your needs best (I'm thinking a trie). If you can't spare that much memory, it might be to your advantage to use a database so you can efficiently load only the parts of the word list you need for any given query - of course, in that case you'd want to create some index columns (well at least one) so you can take advantage of the indexing capabilities of SQL.
Assuming that what you're doing is looking up whether a word exists in your list, I would say that SQLite with an indexed column will likely be faster than scanning through the word list linearly. Now, if your current approach is fast enough for your purposes, then I see no reason to bother porting it over to a database; it's just an added headache for no gain as far as you're concerned. If you're seeing the search times become a burden, then dumping it into an indexed database would be a good idea.
You can create the table with the following schema:
CREATE TABLE words (
word text primary key
);
CREATE INDEX word_idx ON words(word);
And import your data with:
sqlite words.db < schema.sql
while read word
do
sqlite3 words.db "INSERT INTO words values('$word');"
done < words.txt
I would skip the database for reasons listed above. A simple hash in memory will perform about as fast a lookup in the database.
Even if the database was a bit faster for the lookup, you're still wasting time with the DB having to parse the query and create a plan for the lookup, then assemble the results and send them back to your program. Plus you can save yourself a dependency.
If you plan on moving other parts of your program to a persistent store, then go for it. But a hashmap should be sufficient for your use.

Resources