Bulk upsert with Ruby on Rails - ruby-on-rails

I have a Rails 3 application where I need to ingest an XML file provided by an external system into a Postgres database. I would like to use something like ActiveRecord-Import but this does not appear to handle upsert capabilities for Postgres, and some of the records I will be ingesting will already exist, but will need to be updated.
Most of what I'm reading recommends writing SQL on the fly, but this seems like a problem that may have been solved already. I just can't find it.
Thanks.

You can do upserting on MySQL and PostgreSQL with upsert.
If you're looking for raw speed, you could use nokogiri and upsert.
It might be easier to import the data using data_miner, which uses nokogiri and upsert internally.

If you are on PostgreSQL 9.1 you should use writeable common table expressions. Something like:
WITH updates (id) AS (
UPDATE mytable SET .....
WHERE ....
RETURNING id
)
INSERT INTO mytable (....)
SELECT ...
FROM mytemptable
WHERE id NOT IN (select id from updates);
In this case you bulk process thins in a temp table first, then it will try to update the records from the temptable according to your logic, and insert the rest.

Its a two step thing. First you need to fetch the XML File. If its provided by a user via a form that luck for you otherwise you need to fetch it using the standard HTTP lib of ruby or otherwise some gem like mechanize (which is actually really great)
The second thing is really easy. You read all the XML into a string and then you can convert it into a hash with this pice of code:
Hash.from_xml(xml_string)
Then you can parse and work with the data...

Related

How to retrieve data from a DSN using Rails

I have a DSN (data source name) of the following format:
<driver>://<username>:<password>#<host>:<port>/<database>
and I am asked to retrieve rows from the corresponding database, which has a single table in this specific example and it is on AWS. I would like to do it using an endpoint in a Rails app.
I did some research online to look for an example about DSN, but couldn't find any help.
I am looking for some high level explanation of how to work with DSN, and ideally how to use Rails to communicate with the database
I am not sure if it is going to be useful for anybody, but this is as much info I could gather.
The DSN format is something like:
<driver>://<username>:<password>#<host>:<port>/<database>
Within rails it should be used like, assuming you want an array of users:
require 'pg'
conn = PG.connect('<driver>://<username>:<password>#<host>:<port>/<database>')
puts conn.exec("SELECT count(*) FROM users").to_a

How can i point a rails model to json file instead of creating a table

I have a Rails application and a JSON file which contains the data, instead of creating table for a model i want to point my model to read that jSON file, and i should be table to treat that file like a table, please suggest.
If I understand you correctly, you want a model that uses a JSON file as it's "backend DB" instead of a normal DB?
In order to get a Rails model to point to a JSON file, you would need to use a JSON DB adapter, and I'm not sure if there is one for Rails.
Rails uses what are called "adapters", where the same code:
Model.find(1)
Will work on any DB (PostgreSQL, MySQL, SQLite3, etc...) because the Model.find() method has been implemented by each adapter.
This allows the interface for the developer to remain the same as long as the adapter implements it.
It avoids the problem where every DB creator implements a different interface, and now everyone has to learn those particular methods (convention for the win!).
All that said, I can't find a JSON DB adapter, so if you want that functionality you'll have to read a JSON file and search against it.
However, if you're talking about using client-side storage with a JSON file, this isn't possible because client-side only understand JavaScript and a Ruby model (class) is on the backend server. They don't directly talk to each other. In that case, you'll have to implement a JavaScript model that maps to the JSON data.
MySQL as in-memory-database
Rails with in memory database has a way to use MySQL in-memory; then you load your data from the JSON file at the start and dump it out at the end (or after commits).
In-memory DB adapter
https://github.com/maccman/supermodel exists but looks dead (4 years old). Maybe you find others?
Rolling it yourself with Nulldb
Use https://github.com/nulldb/nulldb to throw away all SQL statements and register some hooks (after_save etc.) to store them in some hash. You then load that has into memory at the start and dump it out to JSON later.
Separating concerns
My favourite approach, maybe too late if you have lots of working code already:
Separate your active-record code from your actual domain model. That means, if you today have a model class Stuff < ActiveRecord::Base, then separate that into class StuffAR < ActiveRecord::Base and class Stuff.
Stuff then contains an instance of StuffAR.
Using the proper ruby mechanisms (i.e., BasicObject::method_missing), you can delegate all calls from Stuff to StuffAR by default.
You now have complete control over your data and can do whatever with it.
(This is only one way to do it, depending on how dynamic/flexible you want to be, and if you need a real DB part-time, you can do it different; i.e. class Stuff < StuffAR at the one extreme, or not using a generic method_missing but explicitly coded methods which call StuffAR etc. - Stuff is PORO (plain old ruby objects) now and you use StuffAR just for the DB contact)
In this approach, be careful not to use Stuff like an AR object. I.e., do not use Stuff.where(name: 'xyz') from outside, but create domain methods for that (e.g., in this example, Stuff.find_by_name(...).
Yes, this is coding overhead, but it does wonders to improve your code when your models become big and unwieldy after a time.
Don't need AR at all?
If you do only want to use JSON ever, and never use a real DB, then do the same as before, just leave StuffAR out. It's just PORO then.
I think you have to import your JSON file to the database (e.g. sqlite3) to handle it as a table.
The other workaround would be:
Create a JSON importer for your model which fills the Users from the JSON into an Array of users.
If you do that, you'll have to write the whole searching/ordering by yourself in plain ruby.
I don't know what your current circumstances are, but if you would like to change some data or add data, I suggest using a simple and lightweight database like sqlite3

Create Ruby on Rails project from an sql file

I have a previously separately managed sql file containing rather simple but large database. Would there be a way to import this sql file and generate ruby code as models using this data as a starting point for my future development?
Thank you for your help!
Yes!
It will take some work!
And you'll need to post a WHOLE HELL OF A LOT more detail to get more than that. ;-)
Taking a stab:
Rails can use legacy databases with a lot of effort manually specifying foreign key columns, table names, etc. It can be done. My suggestion, though, would be to convert the data in-place in whatever database you have by using a lot of ALTER TABLE RENAME... work and same for columns to make the old DB conform to Rails' convetions (primary key == 'id', table name is plural underscore'd version of model name, all that) before doing the import, and then you can just use plain vanilla ActiveRecord and all will be easy.

Can I perform a INSERT-SELECT operation with the Rails API?

I have to duplicate a BLOB field from one table into another and I want to use a INSERT-SELECT query to achive this.
INSERT INTO target_table (key, data, comment)
SELECT 'my key', data, 'some comment' FROM source_table
Can this be done with the Rails API?
Of course I could always use ActiveRecord::Base.connection to send a native query to the database, but I'm hoping to find a "Rails way" to do this. (One which doesn't involve actually loading the data in my Rails application)
This is a typical scenario where using the SQL directly using ActiveRecord::Base.connection makes sense and sensibility. There can't possibly be any rails way to it as you described. Even if there were to be one, it has to load it in memory and insert it into the target table involving two models; this is insanity.

Rails: Accessing a database not meant for Rails?

I have a standard rails application, that uses a mysql database through Active Record, with data loaded through a separate parsing process from a rather large XML file.
This was all well and good, but now I need to load data from an Oracle database, rather than the XML file.
I have no control how the database looks, and only really need a fraction of the data it contains (maybe one or two columns out of a few tables). As such, what I really want to do is make a call to the database, get data back, and put the data in the appropriate locations in my existing, Rails friendly mysql database.
How would I go about doing this? I've heard* you can (on a model by model basis) specifiy different databases for Rails Models to use, but that sounds like they use them in their entirety, (that is, the database is Rails friendly). Can I make direct Oracle calls? Is there a process that makes this easier? Can Active Record itself handle this?
A toy example:
If I need to know color, price, and location for an Object, then normally I would parse a huge XML file to get this information. Now, with oracle, color, price, and location are all in different tables, indexed by some ID (there isn't actually an "Object" table). I want to pull all this information together into my Rails model.
Edit: Sounds like what I'd heard about was ActiveRecord's "establish_connection" method...and it does indeed seem to assume one model is mapped to one table in the target database, which isn't true in my case.
Edit Edit: Ah, looks like I might be wrong there. "establish_connection" might handle my situation just fine (just gotta get ORACLE working in the first place, and I'll know for sure... If anyone can help, the question is here)
You can create a connection to Oracle directly and then have ActiveRecord execute a raw SQL statement to query your tables (plural). Off the top of my head, something like this:
class OracleModel < ActiveRecord::Base
establish_connection(:oracle_development)
def self.get_objects
self.find_by_sql("SELECT...")
end
end
With this model you can do OracleModel.get_objects which will return a set of records whereby the columns specified in the SELECT SQL statement are attributes of each OracleModel. Obviously you can probably come up with a more meaningful model name than I have!
Create an entry named :oracle_development in your config/database.yml file with your Oracle database connection details.
This may not be exactly what you are looking for, but it seems to cover you situation pretty well: http://pullmonkey.com/2008/4/21/ruby-on-rails-multiple-database-connections/
It looks like you can make an arbitrarily-named database configuration in the the database.yml file, and then have certain models connect to it like so:
class SomeModel < ActiveRecord::Base
establish_connection :arbitrary_database
#other stuff for your model
end
So, the solution would be to make ActiveRecord models for just the tables you want data out of from this other database. Then, if you really want to get into some sql, use ActiveRecord::Base.connection.execute(sql). If you need it as a the actual active_record object, do SomeModel.find_by_sql(sql).
Hope this helps!
I don't have points enough to edit your question, but it sounds like what you really need is to have another "connection pool" available to the second DB -- I don't think Oracle itself will be a problem.
Then, you need to use these alternate connections to "simply" execute a custom query within the appropriate controller method.
If you only need to pull data from your Oracle database, and if you have any ability to add objects to a schema that can see the data you require . . . .
I would simplify things by creating a view on the Oracle table that projects the data you require in a nice friendly shape for ActiveRecord.
This would mean maintaining code to two layers of the application, but I think the gain in clarity on the client-side would outweigh the cost.
You could also use the CREATE OR REPLACE VIEW Object AS SELECT tab1., tab2. FROM tab1,tab2 syntax so the view returned every column in each table.
If you need to Insert or Update changes to your Rails model, then you need to read up on the restrictions for doing Updates through a view.
(Also, you may need to search on getting Oracle to work with Rails as you will potentially need to install the Oracle client software and additional Ruby modules).
Are you talking about an one-time data conversion or some permanent data exchange between your application and the Oracle database? I think you shouldn't involve Rails in. You could just make a SQL query to the Oracle database, extract the data, and then just insert it into the MySQL database.

Resources