Friday, December 14, 2007

Why create database migrations, when you don't need to?

Rails out of the box supports database migration. It allows Rails programmers to be more Agile, do less database BDUF and instead change the database schema as the requirements change and the business wants it. Writing a migration is also as easy as running a command line script/generate migration and then executing it with rake db:migrate.

Database migration is useful for two reasons. Number one is to allow developers to change the database schema. This can be as simple as a single ALTER TABLE statement to add a new column. Number two, more importantly, is to migrate the all-important data exist in the database. But if your application is still under development for initial release, then migration may not be buying you too much good, because chances are no one cares how your table structure got to where it is now, and you may not have a whole lot of data.

We all use migrations feverishly probably because most of the Rails books/references/tutorials begins with: "Let's start by creating a database migration, in it we insert some data into the newly created tables, and learn how to do XYZ." Thus, your db/migrate folder is stuffed with migrations that does everything: create new tables; alter existing tables; create data for those tables; update data created from previous migrations; and perhaps all of the above, while without justifiable reasons on what those migrations are trying to migrate. In practice, it is quite time-consuming for a developer to run 200+ migrations every time he blows away a database, which is not uncommon. Not only that, sometimes you have multiple migrations that basically cancel out each other's changes as our customers change their minds back and forth. As a result, you could be creating migrations left and right when there may not be any real beneficiaries: a database loaded with data.

Sometimes your QA team may have their own data set that tests your app, and thus their loaded database is a beneficiary. While that is true, I prefer their data to be scripted and be freshly generated by ActiveRecord models (using create!) every time instead of keep migrating them, because as my application domain model expands, I want not just QA data but all datasets to be cleansed and validated by my model validations. There is no guarentee that after running a drop_column migration the QA dataset does not violate any business logic. Keeping all these data valid while database migrations are being rapidly created is very hard.

To take advantage of the fact that your development environment has nothing to lose until you have an initial release, while maximizing the benefits of keeping all your data valid (for all enviroments) the whole time while your app is under development, it's a simple steps 1-2-3:


Step 1:

Have one migration file, 001_release_one_schema.rb, that captures all database object creations. For example, all your create_table, create_index, views, triggers (*yikes*), etc. After this migration, your database should contain all database objects for your Rails app but in a "blank", data-less state.

$ cat db/migrate/001_release_one_schema.rb

class ReleaseOneSchema < ActiveRecord::Migration
def self.up
create_table "foos" do |t|
end

(... and many others ...)
end

def self.down
(... ... ...)
end
end


Step 2:

Create a rake task to populate all reference data that your application requires to run with. Reference data meaning all data that your application cannot change through its screens, but are essential for your app to run. For example, all currencies that are used to populate a drop-down on your app. A lot of drop-down lists data are reference data.
$ cat lib/tasks/data.rake

namespace :data do
desc "Loads a default dataset of both reference and user data into database."
task :load => [ :environment,
:configuration,
:reference_data,
:user_data ]

private

task :configuration do
ENV['DATASET'] ||= 'slim'
end

task :reference_data do
require "#{RAILS_ROOT}/db/data/reference_data/#{ENV['DATASET']}"
end
end

$ cat db/data/reference_data/slim.rb

@us_dollar = Currency.create! :name => "US Dollar"
@yen = Currency.create! :name => "Yen"
@euro = Currency.create! :name => "EURO"
... ... ...


Step 3:

Create a rake task to populate all user data that your app requires to run with. User data are data that a user can create/update within your application. They are also required for the Rails app to function properly the first day when it launches. For example, for your flashy Paypal application, the fees structure on how it charges its users. An application administrator is allowed to raise or drop fees in your app.


namespace :data do

private

task :user_data do
require "#{RAILS_ROOT}/db/data/user_data/#{ENV['DATASET']}"
end
end

$ cat db/data/user_data/slim.rb

Fee.create! :amount => 1_000, :currency => @us_dollar
Fee.create! :amount => 1_500, :currency => @yen
Fee.create! :amount => 2_500, :currency => @euro
... ... ...


There are several benefits of managing your database schema and data this way:
  • It is easier and faster to re-populate your entire database to the latest schema from scratch with data, since there are no extraneous migrations.

  • Faster to locate and update data needed for application. They are always in your dataset generation scripts.

  • No need to worry about outdated/removed ActiveRecord classes and declare them inside the migration file itself. They is no "legacy" ActiveRecord models.

  • All data are valid all the time because they are created through Model.create! sanitized by your model validations.

  • Easy to specify datasets to load by preference for DEV (slim, loaded), BA (story sign-off), QA (scenario-based), or demo (full) environments. e.g. rake data:load DATASET=slim RAILS_ENV=qa

  • No worries about broken/incomplete migrations. Fewer code, fewer trouble.

Now, after your Rails app goes to a 1.0 production release, you should switch this back to the normal Rails database migration style. I suspect your application users won't be too happy if you blow away their data every time you roll out a minor update or a major release... (or not?)

5 comments:

Curtis Summers said...

Stephen,

I personally have found it a joy to use migrations during development for teams larger than one. As soon as you have more than one developer, you have the potential to step on each other's toes.

If I'm working on a feature, putting in and taking out data as I develop, I hate having to blow away my database when developer #2 adds a column to a table. This data may not be reference data or even required--it's just data that I'm working with at that very moment, and I want it to stick around until I'm finished with it.

If migrations get big and messy, then we'll collapse and clean them up before we roll to production, but I still like having them during development. And, shoot, if we're going to use migrations as soon as we roll to production, we might as well train ourselves well with them during development.

Having the reference data as a rake task is a cool idea, though.

Stephen Chu said...

Curtis:

First, thank you for sharing your thoughts. I love it when I get honest feedback :-)

Just to clarify, the strategy above is deduced from experiences of a 16-developer Rails project. It works out quite well for us.

Both reference data and user data that I mentioned above are all data required for the application to go into production. If, while developing a feature, you have temporary data that you are unwilling to blow away, one can solve this problem by having a third set of data (perhaps called development data) and be loaded in your data:load after :reference_data and :user_data. My advise is, script them. It shouldn't be too hard to script such data, or it indicates to me the model/controller is hairy and needs some refactoring love. After you are done with your feature, your development data hopefully either becomes legit reference data or user data, or becomes part of another dataset's (like QA/Testing) reference data or user data. I would still advise such data to be part of one of your development datasets. The fact that you can selectively load any datasets for any environments by not using migrations gives you a lot more control over your data on what gets loaded and what doesn't.

Don't hate blowing away your database during development. Because, it happens, and I do it on every check-in. The point is, find ways to re-generate all your data using ActiveRecord models, and keep all data valid *at all times*. If you have data in your database you cannot generate through AR models because it violates validations, you will want to blow away your database.

Collapsing migrations is a good and efficient way to reduce the number of migrations. But IMHO I don't need to use this trick yet when I don't even need migrations.

Once your application goes into production, not just database migrations but the whole development becomes quite a different ball game. Release 2.0, bug fixes, app monitoring, testing, branching, etc., all become important at a different level. But, since you need them all once you go into production, does it make sense to do them all in pre-release 1.0 just to get some training? From 1.0 and on, 'planning' becomes the name of the game before your every move.

Anonymous said...

Hello,

I'm quite new in Ruby so I don't know that much about it.
What you describe is very useful for me but I need more details. From your example it seems that you only get the data from a table imported and exported into the new structure of the database, is this correct?
Is it possible to have more examples and tutorials on Rubby??

Regards,
Pedro
pedro_garrinhas@hotmail.com

Danimal said...

Stephen,

I love this idea but I'm wondering how you'd handle prepopulating binary data? I.e. I use attachment_fu for logos for some models. I can see having a "reference_images" directory to hold the source files for the logos. But I don't know how to simulate the data read that happens from a form using a file_field

Thoughts?

Danimal said...

Actually, I ended up figuring it out. Turns out the testing framework of Rails includes a class to handle testing file uploads. So I just grab that and I'm good to go! It's quite cool as I can keep images in a reference_data/images directory and then do stuff like this:


require 'action_controller/test_process'

Logo.create! :uploaded_data => ActionController::TestUploadedFile.new(
'test/reference_data/images/logo.jpg',
'image/jpg')


So now I can load images through attachment_fu and it does all the resizing, thumbnailing, directory creation and such.

I ended up creating a custom rake task to delete the images store (the attachment_fu processed images, not the ones in reference_data!), wipe the DB, recreate it, run the migrations and load my data (via your idea).

So voila! single command and I'm ready to go!

Thanks a million!

-Dan