Can a Drupal based website's schema be imported to Rails?

Can a Drupal based website's schema be imported to Rails? - ruby-on-rails

I'm working on a web site that needs to be re-written in Rails. The website was before in Drupal, and there are almost 100,000 records in the database. Now, in Drupal there are tables that do not make any place in Rails in my opinion. For example,
Table name: node_type
It stores information regarding modules in Drupal.
Table name: node
It stores information for node(s) in Drupal.
Table name: semaphore # I've no idea what it is!
Table name: rdf_mapping # No idea
I've not been working with Drupal, so all I want to ask: Is it possible to have a schema for Rails, in which the existing 100,000 records can be imported from Drupal? If so, how? If not so, what are the other options that I'm left with? Or I have to design an entirely new database schema?

Drupal's database schema is not extensively documented for a reason... it's considered implementation details, is not a public API and should not be accessed directly, especially by outside application.
It is also very hard to document because for a given site, any enabled module can add its own tables and alter existing ones (usually adding columns). Plus you have module like Fields (part of Drupal core) that create tables dynamically depending on defined content types.
For a RoR developer, the Drupal schema will probably look weird and be uncomfortable to work with. I would follow suggestion from others, create a new schema for your new application and create a migration script to get the data from the old Drupal database to your new database. I don't knwon about RoR, but try to find a good data migration that allows replay, updates and rollback, etc. You will probably have to migrate the data multiple times to fixes bugs in the process.

Well, I don't have straight forward answers, but I have some ideas what I would do simply to not make so much changes in the database, or as per the comment you can write down an sql script to migrate the data according to the rails schema like types for each tables. Now, I am just intended here to share my thoughts, but I believe there might be more explicit solutions and this is do-able in many ways, may be you need some customizations(?) overriding the default conventions. According to my thoughts, you can try the following things.
Generated Related model skipping migrations
Define tables explicitly to each models like the following snippets:
class Semaphore < ActiveRecord::Base
self.table_name = "semaphore"
end
You have to define foreign keys and primary keys explicitly for both record id and associations.
You have generate time stamp or you can explicitly avoid that like the following ways
ActiveRecord::Base.record_timestamps = false
These are basic things I can see is important.

Related

How to create a user customizable database (like Zoho creator) in Rails?

I'm learning Rails, and the target of my experiments is to realize something similar to Zoho Creator, Flexlist or Mytaskhelper, i.e. an app where the user can create his own database schema and views. What's the best strategy to pursue this?
I saw something about the Entity-Attribute-Value (EAV) but I'm not sure whether it's the best strategy or if there is some support in Rails for it.
If there was any tutorial in Rails about a similar project it would be great.
Probably it's not the easiest star for learning a new language and framework, but it would be something I really plan to do since a long time.

Your best bet will be MongoDB. It is easy to learn (because the query language is JavaScript) and it provides a schema-less data store. I would create a document for each form that defines the structure of the form. Then, whenever a user submits the data, you can put the data into a generic structure and store it in a collection based on the name of the form. In MongoDB collections are like tables, but you can create them on the fly. You can also create indexes on the fly to speed searches.
The problem you are trying to solve is one of the primary use cases for document oriented databases which MongoDB is. There are several other document oriented databases out there, but in my opinion MongoDB has the best API at the moment.
Give the MongoDB Ruby tutorial a read and I am sure you will want to give it a try.
Do NOT use a relational database to do this. Creating tables on the fly will be miserable and is a security hazard, not just for your system, but for the data of your users as well. You can avoid creating tables on the fly by creating a complex schema that tracks the form structures and each field type would require its own table. Rails makes this less painful with polymorphic associations, but it definitely is not pretty.

I think it's not exactly what you want, but this http://github.com/LeonB/has_magic_columns_fork but apparently this does something similar and you may get some idea to get started.

Using a document store like mongodb or couchdb would be the best way forward, as they are schema-less.

It should be possible to generate database tables by sending DDL-statements directly to the server or by dynamical generating a migration. Then you can generate the corresponding ActiveRecord models using Class.new(ActiveRecord::Base) do ... end. In principle this should work, but it has to be done with some care. But this definitely no job for a beginner.
A second solution could be to use MongoMapper and MongoDB. My idea is to use a collection to store the rows of your table and since MongoDB is schema less you can simply add attributes.

Using EntryAttributeValue allows you to store any schema data in a set amount of tables, however the performance implications and maintenance issues this creates may very well not be worth it.
Alternately you could store your data in XML and generate an XML schema to validate against.
All "generic" solutions will have issues with foreign keys or other constraints, uless you do all of that validation in memory before storage.

Should I use multiple databases?

I am about to create an application with Ruby on Rails and I would like to use multiple databases, basically is an accounting app that will have multiple companies for each user. I would like to create a database for each company
I found this post http://programmerassist.com/article/302
But I would like to read more thoughts about this issue.
I have to decide between MySQL and PosgreSQL, which database might fit better my problem.

There are several options for handling a multi-tenant app.
Firstly, you can add a scope to your tables (as suggested by Chad Birch - using a company_id). For most use-cases this is fine. If you are handling data that is secure/private (such as accounting information) you need to be very careful about your testing to ensure data remains private.
You can run your system using multiple databases. You can have a single app that uses a database for each client, or you can have actually have a seperate app for each client. Running a database for each client cuts a little against the grain in rails, but it is doable. Depending on the number of clients you have, and the load expectations, I would actually suggest having a look at running individual apps. With some work on your deployment setup (capistrano, chef, puppet, etc) you can make this a very streamlined process. Each client runs in a completely unique environment, and if a particular client has high loads you can spin them out to their own server.
If using PostgreSQL, you can do something similar using schemas.
PostgresQL schemas provide a very handy way of islolating your data from different clients. A database contains one or more named schemas, which in turn contain tables. You need to add some smarts to your migrations and deployments, but it works really well.
Inside your Rails application, you attach filters to the request that switch the current user's schema on or off.
Something like:
before_filter :set_app
def set_app
current_app = App.find_by_subdomain(...)
schema = current_app.schema
set_schema_path(schema)
end
def set_schema_path(schema)
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{schema}, #{connection.schema_search_path}")
end
def reset_schema_path
connection = ActiveRecord::Base.connection
connection.execute("SET search_path TO #{connection.schema_search_path}")
end

The problem with answers about multiple databases is when they come from people who don't have a need or experience with multiple databases. The second problem is that some databases just don't allow for switching between multiple databases, including allowing users to do their own backup and recovery and including scaling to point some users to a different data server. Here is a link to a useful video
http://aac2009.confreaks.com/06-feb-2009-14-30-writing-multi-tenant-applications-in-rails-guy-naor.html
This link will help with Ruby on Rails with Postgresql.
I currently have a multi-tenant, multi-database, multi-user (many logons to the same tenant with different levels of access), and being an online SaaS application. There are actually two applications one is in the accounting category and the other is banking. Both Apps are built on the same structure and methods. A client-user (tenant) can switch databases under that user's logon. An agent-user such as a tax accountant can switch between databases for his clients only. A super-user can switch to any database. There is one data dictionary i.e. only one place where tables and columns are defined. There is global data and local data. Global data such as a master chart-of-accounts which is available to everyone (read only). Local data is the user's database. A new user can get a clone of a master database. There are multiple clones to choose from. A super-user can maintain the clone databases.
The problem is that it is in COBOL and uses ISAM files and uses the CGI method. The problem with this is a) there is a perception that COBOL is outdated, b) getting trained people, c) price and d) online help. Otherwise it works and I'm happy with it.
So I'm researching what to replace it with and what a minefield that is.

It has past time and the decission for this has been to use PostgreSQL schemas, making multitenant applications, I have a schema called common where related data is stored.
# app/models/organisation.rb
class Organisation < ActiveRecord::Base
self.table_name = 'common.organisations'
# set relationships as usual
end
# app/models/user.rb
class User < ActiveRecord::Base
self.table_name = 'common.users'
# set relationships as usual
end
Then for migrations I have done that with this excellent tutorial. http://timnew.github.com/blog/2012/07/17/use-postgres-multiple-schema-database-in-rails/ use this, this is way better than what I saw in other places even the way Ryan Bates did on railscasts.
When a new organisation is created then a new schema is created with the name of the subdomain the organisation. I have read in the past that it's not a good idea to use different schemas but it depends on the job you are doing, this app has almost no soccial component so it's a good fit.

No, you shouldn't use multiple databases.
I'm not really sure what advice to give you though, it seems like you have some very basic misunderstandings about database design, you may want to educate yourself on the basics of databases first, before going further.
You most likely just want to add a "company id" type column to your tables to identify which company a particular record belongs to.

Rails with DB2 and multiple schemas

I have a 'legacy' DB2 database that has many other applications and users. Trying to experiment with a rails app. Got everything working great with the ibm_db driver.
Problem is that I have some tables like schema1.products, schema1.sales and other tables like schema2.employees and schema2.payroll.
In the ibm_db adapter connection, I specify a schema, like schema1 or schema2, and I can work within that one schema, but I need to be able to easily (and transparently) reference both schemas basically interchangeably. I don't want to break the other apps, and the SQL I would normally write against DB2 doesn't have any of these restrictions (schemas can be mixed in SQL against DB2 without any trouble at all).
I would like to just specify table names as "schema1.products" for example and be done with it, but that doesn't seem to jive with the "rails way" of going about it.
Suggestions?

Schemas in DB2 are a very handy way to provide separate namespace to different applications. For example, you can separate all database objects for an application called "recruiting" from say application called "payroll". You can have objects (tables, views, procedures etc.) with the same name reside in multiple schemas and not colide with one another. Having your application set a schema is a handy way for it to say "hey, I am a payroll and I only want to work with my objects". So, what happens when you want to work with objects owned by another application? Well, in traditional procedural programming languages your application code would explicitly specify the schema when referencing an object in another schema or you would just do a SET CURRENT SCHEMA to switch to another schema. The problem with ORMs like ActiveRecord in Ruby on Rails is that you are not supposed to use SQL i.e. you don't have a good "Rails way" to specify schema when referencing an object. You can use find_by_sql and qualify your objects in the SQL statement but this is not what RoR people will consider to be good Rails.
You can fix things on the DB2 side. You can define a view per table in the "foreign" schema but you will have to take care to name the view so that it does not colide with what you already have in your primary schema. And, when you do that, you will undoubtedly create names that are not true Rails names.
Rails people are very proud of the "Rails way". It makes it very easy to create new applications. Rails is really awesome in this space. However, when it comes to integration with what is already out there Rails ... how do we say it ... sucks. I suggest you will have to accept to do things that are not the best examples of the Rails Way if you want to work with existing database structures.

How do i work with two different databases in rails with active records?

Generate new models and schema at runtime

Let's say your app enables users to create their own tables in the database to hold their own, custom data. Each table would have it's own schema. What are some good approaches?
My first stab involved dynamically creating migration files and model files bu I'd like to run this on heroku where you can't write to the filesystem.
I'm thinking eval may be the way to go to create and run the migration class and the model class. But I want to make sure the model class exists when a new process of the app is spawned. Can probably do this by storing these class definition with each user as they create new tables and then run through them all at startup. But now it's convulted enough that I may be missing something obvious.

It's probably a better idea not to generate new classes on runtime. Besides all of the security risks, each thread's startup time will be abominable if you ever get a significant number of users.
I would suggest rethinking your app design and aim at generic tables to hold the user's custom data. If you have examples of data structures that users can create we might be able to help.

Have you thought about a non-sql database for those tables? Look at CouchDB - there are several plugins on Github that integrate it with rails. Records in the database are JSON documents, with arbitrary key-value structure. May be perfect for a user-defined schema.

There is (was?) a cool Wiki project, called Informl. It was a Wiki, not just for web pages but for web applications. (Get it? It's informal because it's a Wiki, it's got forms because it is an application, and it's user-generated, thus Web 2.0, which means that according to an official UN resolution it is legally required to have a name which is missing a vwl.)
So, in other words, it was not just about user-generated content, but also user-generated structured data.
They did this by generating PostgreSQL-specific SQL at runtime to create new tables and then have ActiveRecord reload the schemas.
The code is up on RubyForge. It's based on Rails 1.2.3. I guess you could do much better than that today, especially with the upcoming extensibility interfaces in Rails 3.

Considering a new Rails app with existing data (not a db, actual data) -- what is the best way to proceed?

I have been tasked with developing a new retail e-commerce storefront for my current job, and I am considering tackling it with RoR to A) Build a "real" project with my limited Rails knowledge, and B) Give management quick turnaround and feedback (they are wanting to get this done ASAP and their deadlines are rather unrealistic - I'm talking a couple of weeks to go from nothing to working model so they can start to market it with SEO/SEM and, I kid you not, "video blogging" because my boss heard that's the future).
We do have a database structure in place but it's absolutely terrible and was thrown together without rhyme nor reason, so I'm going to largely ignore it and create a new database from scratch; however, I have existing data that I need to load into the application (like I said, it's an e-commerce app and we have the product data). I need to massage this data into a usable format because our supplier provides it to us with cryptic, abbreviated column names and it's highly denormalized, especially in the categories (I've posted a question regarding it before - basically the categories table has six fields, one for each category/subcategory, with some of them being blank if that category doesn't apply).
There are two main issues that are giving me second thoughts:
As I said the data needs to be put into a "proper" database schema; I can't just load it as-is. I have some thoughts as to a good data model for it, but my analysis is not completed yet. There would end up being a large amount of joins tables to link various things together (e.g. products_categories, products_attributes, products_prices) etc and these tables would link products not via an ID but by their SKU (see below).
Everything already has an ID that's generated for it, but anything new I add needs to have one autogenerated; I doubt this will be a problem with any mature RDBMS, but I know Rails likes to generate IDs itself. Also, almost all of the product-related tables are linked by SKUs (and in the data provided by the supplier are actually a composite key consisting of the prefix and stock number, which combined make up the full SKU), not by IDs and I'm not sure if this will be a performance issue (of course, I could always manually create indexes on these columns to speed it up). It does mean that I'll need to break away from the Rails conventions, however.
In short, I think that Rails might be a good choice as far as time-to-market and ease-of-development, but having to work with the existing data content might turn into a pain because the application will need to be developed around that, instead of the "traditional" Rails app, and that factor is giving me major doubts about using Rails. There are also some other issues (having to set up a Linux server, and the fact that the area I live in has very few Rails developers so if I left the company I'd basically be holding them hostage as far as updates/modifications). I'm really unsure as to the best path to proceed.

I would develop the app as if you didn't have the data. Use the ORM and make your database the best it can be, but of course keep in mind what data you have to populate it with (eg: don't make crazy new constraints for things that will leave you going through old data record by record).
When you're done and tested, write an import script that pulls your real data onto your new database.
It's not that different from the conventional design/development model... Apart from you can do your data-input in a semi-automated fashion.

I was in the same situation not too long ago — a crappy PHP app that held ten years worth of all company data.
What I did was simply create a Migration model and added methods to import each resource.
class Migration
def migration_all
self.jobs
end
def self.jobs
...
end
end
The cool thing about this is that you can arrange which order resources are imported as one will likely reference another. I also added methods that directly modified the db schema. One nice trick if you have to keep an existing primary key is to create a field named 'legacy_id', copy over your existing primary key, and when you're done, simply remove the 'id' field, rename the 'legacy_id' field to 'id', then add the primary_key constraint on the new 'id' field.

Don't use the SKU as the unique key for each product - use the standard Rails incremented id.
SKU could change as it may be misentered, etc and that would make it a nightmare to change all of the references from other tables. Put your current id in a sku column, index it and update the references in your other tables to the Rails ids.
You'd be able to do Product.find_by_sku(params[:sku]) in your controllers, set up a /products/:sku route, etc. I don't see what you'd gain (other than a headache) by using your non generated ids as the database primary keys.

I'd also suggest running your old data through your app's validations to make sure you are not loading up a bunch of inconsistencies and erros. It will help your app run smoothly and highlight existing data errors at a point where you can fix them.
Don't assume the existing data is valid just because it is already there.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart