Creating a Ruby gem that uses a Postgres database - ruby-on-rails

I have a Ruby script that downloads web pages containing financial statements for publicly traded companies, scrapes the pages for essential financial data, processes the financial data, and writes the results to a Postgres database.
I looked at the procedure for creating a Ruby gem at http://guides.rubygems.org/make-your-own-gem/ , and I'm considering making my Ruby web-scraping script a Ruby gem. Unlike the Hello World exercise in the example, my script needs a Postgres database ready to go.
I am working on a Rails app (Doppler Value Investing) that displays the stock parameters. Having a Ruby gem that nicely integrates into my app would be smoother and more elegant than the setup I would otherwise use. (At the moment, I have a separate Ruby app that does the scraping work and writes the results to the Postgres database.)
The one hitch I can think of is the need to manually create a Postgres database first. Is there a way to programmatically do this, or do I simply need to include in the README a statement that says something like "You MUST create a Postgres database with the name *db_name*, or this gem will not work"?

Just include the instruction in the README. Apart from anything else, you can't know ahead of time what privileges the user of your gem is going to have, so you'd have to deal with not being able to create the database programmatically anyway. It's a one-time task, so automating it doesn't make a huge amount of sense.
Once the database is set up, creating the schema automatically does make sense.

Related

Multitenant Rails app - Strategies for pulling customer data

I am developing a multitenant Rails app using Postgresql schemas. All is going great, but my situation is a little different from the conventional multitenant apps out there; namely my app will require that I pull customer data for each tenant from their database to mine.
Here is where it gets tricky. I wrote a jRuby gem that connects to each customer's database and pulls data to my server, and then it processes that data and loads it into my Rails app (each customer set of data will end up in the appropriate tenant schema). Therefore this gem is the only place that it aware of all my tenants and their configuration (database info, which tables to pull, and so on).
My question is: What do you think of this design choice? Some of the problems I am already seeing is that this forced the app to be in two Ruby states, i.e. it normally functions in Ruby, but when I need to do a pull, I have to switch to jRuby. Furthermore, it is hard to inspect into tenant configuration without resort to this gem.
Any comments or feedback on this? Is there another path I could have taken with this?

Periodically edit a Rails database from a gem

I have been working with and learning Rails for a few months and have recently taken up a project that calls for a gem that will periodically pull data from another site and compile it into a database friendly format.
I have the ruby code to do the pulling, editing, and formatting. My question is how to get the gem to edit the database for the Rails app that will be built by inserting the data. What I want is to have a few models based off the data being mined by the gem.
Some background info on the app. The app will be a stats reporting app for sports. So the models that are based on the data mined will be Stats, Players, Teams, and Games. There will be other models in the application as well, such as Devise users and others.
Once again, I have made the ruby code that will pull the data using 'json' and 'nokogiri' gems and will put them into (a lot of) hashes. I just have no idea how to store them in a database usable by a Rails app. Or any database for that matter. The only information I could turn up had to do with Engines and Railtie but there were no thorough explanations.
Thanks for the help.

How to gracefully handle "Mysql2::Error: Invalid date" in ActiveRecord?

I'm building a Rails 3.2 app upon a legacy database which also has some broken records in different tables. One of the issues giving the most headache is that it includes invalid dates.
I've setup a sandbox which I manually fixed one time to get my code working. Now it's time for deployment. For this reason, the sandbox is reset every night and copied from the live database, ferret indexes are rebuilt, and migrations are re-applied. We are going to deploy to the sandbox often to get in the last fixes before deploying to the live setup.
As the legacy PHP app and this new Rails app need to run in parallel for a few weeks to months, we cannot simply one-time-fix the dates (Update: just for clarification, that means they run on the same database at the same time). I need a way to automate this, maybe with a migration or rake task (I'd go for the latter).
But the problem is: ActiveRecord chokes on loading such records so I have no way to investigate the record and fix the dates by some hardcoded assumptions made in ruby code.
A second problem is that the legacy database has inconsistencies because the PHP code did not use transactions and some code paths are broken and left orphans and broken table constraints behind. I will deal with that as they occur, most of them is already taken care of in the models. First problem goes with the dates.
How would you usually fix this? Maybe there's even some magic gem out there which supports migrating legacy databases with broken records by intercepting exceptions and running some try-to-fix code...
The migration path uses MySQL, and three production environments (stable with live database, staging with the same database, and sandbox with a database clone reset every night). We decided against doing a one-time data mapping / migration because we cannot replace the complete legacy application in one step (it consists of a CMS with about 50000 articles, hundreds of topics, huge file database with images and downloads, supporting about 10 websites, about 12 years of data and work, messy PHP code from different programming skills, duplicated code from different migration stages, pulling in RSS content from partner sites to mix articles/posts from there into the article timelines in our own application's topics, and a lot more fun stuff...
First step is to migrate the backend application to get a consistent admin and publishing interface. The legacy frontend applications still need to write to the database (comments and other content created by visitors). So the process of fixing the database must be able to run unattended on a regular basis.
We already have fixes in place that gracefully handle broken model dependencies in belongs_to and has_many. Paperclip integration has been designed to work with all the fantastic filename mappings invented. And the airbrake gem reports all application crashes to our redmine installation so we get a quick overview of all the left quirks.
The legacy applications have already been modified to work with the latest MySQL version and has been migrated to a current MySQL database server.
I had the same problem. The solution was to tell mysql2 not to perform casting, like this:
client.query(sql, cast: false).each do |row|
row['some_date'] = Date.parse(row['some_date']) rescue(nil)
end
See mysql2 documentation for details on how to build client object. If required, access rails db config via ActiveRecord::Base.configurations.
Create a data import rake task that does all the conversions and fixes you need (including the data parsing and fixing), and run it every time you get a fresh update from the legacy app. The task can use raw SQL (look-up "execute" and "exec_query" methods), it doesn't have to work with models. This will be your magical "gem" that you were looking for. Obviously, you cannot have a one-fits-all tool for that, as every case of broken data is unique.
But just don't create kludges in your new code base.
Similar to: Rails: How to handle existing invalid dates in database? and also without correct answer so I repost my solution below.
I think the simplest solution that worked for me was to set in database.yml file write cast: false, e.g. for development section
development
<<: *default
adapter: mysql2
(... some other settings ...)
cast: false
I believe it will solve your problem Date.parse()
e.g. Date.parse(foo.created_at)

How to manage migrations when multiple apps share the same database in Ruby?

I have a Rails app and a Sinatra app, sharing the same database. The Sinatra app uses ActiveRecord.
Can I run migrations from within each app, as if they were in the same app? Will this cause any problems?
The schema.rb file in the Rails app tracks the current migration via
ActiveRecord::Schema.define(:version => 20121108154656) do
but, how does the Sinatra app know the current version the database?
Rails 3.2.2, Ruby 1.9.3.
The version column in the schema_migrations table equate to the time stamp on the front of the ruby migration file example: 20130322151805_create_customers.rb So if two ore more applications are contributing to the schema_migrations table roll backs will not be possible if rails can't find the down() method (because it will not find a migration file contained in another app ie db/migrate/...)
I have a current situation that is exactly this and I have opted to have a master ActiveRecord app that manages migration and data conversions as our database evolves. Keep in mind that part of the deal is to keep models up to date as well. This has been time consuming so we are considering breaking apart the DB in to business domains and providing APIs (JSON) to query support data for another application. This way each application manages it domain and is responsible for exposing data via API.
regards.
If you connect both applications to the same database you should be able to run migrations on it but I strongly suggest you use another option since you will almost surely hit a wall at one time or another:
split the database in two if possible with each application responsible for its own database /migrations.
have one application considered the "master" database and use another database for the data specific to the second application but make it connects to both database (each application still only apply migrations to one database)
If you need to share data between multiple applications another option is to implement a REST service in one and use it on the other, you can have a look at the grape gem for a simple way of doing so.
Edit: I realize I forgot to speak about the activerecord migration, there is no longer any "version" of the schema, what activerecord does is that it read all your migration filename, extract their identifier (the starting part) and check if they have already been applied so in theory you can run migrations from two applications on the same database provided they don't interfere.
But if both migrations act on the same tables you will almost certainly run into big troubles at one point.
I disagree with Schmurfy, even if his presented options are valid, its a bit of an overkill to share data through REST (granted, its pretty easy to implement with ruby / rails).
If your case is simple you could just use one database from both apps, and since you use AR in both of them you have no problems with versioning, AR takes care of that.
Also i dont know what happens if you run db:migrate from both apps simultaniously if you use a inferior dbms like mysql which does not allow DDL in a transaction, certainly nothing good..
Also it would bother me to look which app needs what column and not have the migrations in one place. You could use a shared repository to manage the migrations from both apps.
Rails migrations store current database version in schema_migrations table in the database. So both of your apps will be able to check the current version.
The version numbers are timestamps, so there shouldn't be any problem with duplicate values, as it'll be almost impossible to generate two migrations at the exact same millisecond. So you should be fine here.
The only problem I see is that when you rollback a migration in one app, it'll set the db to the previous known version and I'm not sure if it will pick the previous one from the db (which could be from the other app), or the number from the previous migration file. You may want to test that scenario to make sure.
I decided to put all migrations in the Rails app because:
Since there is only one database
Rails manages migrations
This has worked well.
This simplifies the system because all migrations are stored in one place. And, the Sinatra app doesn't need to know about them anyway.

Freezing database with Rails application

So for a class I have to turn in my Rails application to my professor. What is the best way to make sure everything goes smoothly when he trys to start it up? Also, is there anyway I can freeze a database and send that with it so he has all of the data I have been using in the application?
Thanks a lot.
Depending on your needs, the SQLite3 database (used by default in Rails) is stored on the file system in the db directory of your Rails app. So, assuming your professor has the requirements to run Ruby on Rails, the application will start up with the data you've used.
My guess is you have hard coded connection strings in your rails application. Ask your professor what server he will be running it off of. At that point either change the strings to match or create a config file that is read in and can be edited (which is the better choice of the two). Most databases have export functionality which will allow you to export the current information within the database.

Resources