Monday, December 01, 2008

Speed up your MySQL data load

Perhaps you have some large datasets you would like to load into MySQL, such as a big text file generated from mysqldump that contains many sql INSERT statements. You can obviously populate your empty database by redirecting the content of this file into the mysql command line like such:


$ mysql -u root -p your_database < /tmp/table.dump


Another way to load data into your database or table is to use the sql syntax LOAD DATA INFILE. It requires you to provide a file that is in some delimiter format such as csv or tab-delimited values. One can use the mysqldump command line switches --fields-*-by and --tab to dump the content of the database to such format.

After you started either of these data load, you may want to kick off your shoes and take a nap, cos this will take a while if your MySQL server is not tuned and your dump file is pretty big (in this example, an InnoDB table of 790,000 rows [~500 MB dump] took 45 mins on my MBP using a Macports [mysql5 @5.0.67_1] installation with default settings). If you have to reload your database/tables often, this is unbearably slow.

Fortunately, there are a few MySQL tuning settings that you can specify at mysql server start-up time to tremendously speed up your data load time. Keep in mind, it is advisable to flush your MySQL logs (like the following) prior to tinkering with any log file size settings to avoid any log file corruptions:

(Before you proceed, I recommend you backup your data first)


$ sudo mysqladmin flush-logs
$ ... then shutdown your MySQL server ...
$ sudo rm /path/to/mysql/datadir/ib_logfile*
$ ... then start your MySQL server ...


So, now it's time to put in our magic sauce. On my Macports MySQL installation, the settings were meagerly defaulted to:


+---------------------------------+---------+
| Variable_name | Value |
+---------------------------------+---------+
| innodb_additional_mem_pool_size | 1048576 |
| innodb_buffer_pool_size | 8388608 |
| innodb_log_buffer_size | 1048576 |
| innodb_log_file_size | 5242880 |
| bulk_insert_buffer_size | 8388608 |
+---------------------------------+---------+


Add the following settings in your appropriate my.cnf file. Your settings may vary with the kind of machine you have (mine is a MBP 2GB RAM):


[mysqld]
innodb_additional_mem_pool_size=256M
innodb_buffer_pool_size=512M
innodb_log_buffer_size=256M
innodb_log_file_size=512M
bulk_insert_buffer_size=256M


Our settings will now look like:


+---------------------------------+-----------+
| Variable_name | Value |
+---------------------------------+-----------+
| innodb_additional_mem_pool_size | 268435456 |
| innodb_buffer_pool_size | 536870912 |
| innodb_log_buffer_size | 268435456 |
| innodb_log_file_size | 536870912 |
| bulk_insert_buffer_size | 268435456 |
+---------------------------------+-----------+


Then, as I reload my table, it only takes just 4' 55", a whopping 900% performance boost.

Bonus Tip:

You might notice that you cannot use SELECT COUNT(*) to track your data load progress while LOAD DATA INFILE is being executed (it always returns a count of zero). Instead, you can use the SHOW INNODB STATUS command. Find the line that says "undo log entries" under section TRANSACTIONS, and that number is the number of rows inserted so far. You can also look at how many inserts were performed per second by finding the line "inserts/s" under the ROW OPERATIONS section.

Friday, September 26, 2008

Go to Ruby ! and ? method definition with TextMate CTags

If you are not yet using the TextMate CTags bundle in your Ruby/Rails development, you are missing out big time. This bundle allows you to go to a method definition as easy as pointing your cursor at the method and hit a keyboard shortcut. It has served me well and I highly recommend it.

But soon you might realize that this "go to definition" shortcut only works most of the time. Specifically, whenever you try to go to the definition of a Ruby method that ends with a bang (!) or question-mark (?), CTags will fail open to the line that defines that method for you.

Here's a fix for you. The perl script that got executed underneath this little utility actually uses a TextMate variable TM_CURRENT_WORD. This variable points at the "word" your cursor is currently laying on top of in TextMate. But since by default, TextMate does not consider ! or ? to be part of a word, your CTags fails on you. All you need to do, is go to Preferences / Text Editing, and change Word Characters to include ! and ?.

Sunday, June 22, 2008

Object-Oriented Programming with Rails ActiveRecord

Have you seen this before?

@posts = Post.find :all, :conditions => { :user_id => @user.id }

This happens to new Rails developers quite a bit. This code retrieves correctly an array of @posts and send them to the view for rendering. But what about it?

The key thing here is that this can also be done in a better way:

@posts = @user.posts

When you use the former method to retrieve @posts, you are actually thinking in database terms along the lines of "in the database table 'posts' and therefore ActiveRecord model 'Post', retrieve the rows whose foreign key column 'user_id' has the value @user.id." ActiveRecord is a pattern for accessing your data from objects, but you also have to combine it with the power of the object-oriented-ness of Ruby to create the chimes of beautiful code. Using Rails and Ruby does not make anyone a better programmer automagically. One can still write code in Java/C# in the same procedural style as if you were writing C. It is how to leverage the best of all worlds makes you a better problem solver.

Therefore, think like an object bigot. Think in objects, and not in database foreign key column values. Whenever you see any Rails code in the pattern of this:

:model_id => @model.id

Just stop. "Objects on Rails" sounds a lot better than "Ruby on Foreign Keys."

Wednesday, June 18, 2008

Learning encapsulation should come before learning Rails

Do you see anything wrong in this one line of code?

puts 'Drink up your milk!' if @milk.expiration_date < 2.days.from_now


I think there is. In particular, the programmer can do better. It is about a programming concept that we have all heard and should be familiar with: encapsulation. If you are into testing, which you ought to be, you can probably identify such encapsulation violations by how smelly your tests are, as I believe in Test Driven Design. Going back to the line of code. What's wrong with it?

The pattern of this code is, you have code somewhere that yanks states (expiration_state) out of an object (@milk) and then interrogate against those states (< 2.days.from_now). By no means this is a rule, as exceptions do exist. But, when this happens, your programming siren in your head should go off as if the CI build is broken: can this be a method on the object @milk itself?

Ruby is a powerful programming language. It allows you to do all sorts of fancy programming: dynamically altering classes, add methods to only the selected few object instances, while being duck-typed and object-oriented all at the same time. There are examples galore in Rails itself. But to enable all such magic, all of your ActiveRecord domain models have getters and setters on all of their states (i.e. data). While that is convenient to access their states, sometimes you have to be careful. With great power comes great responsibility, and that responsibility comes down to you.

You might not think this applies to you. But have you ever written code like this in Ruby?

MyMailer.deliver_product_returned_notification if @product.state == State::Returned

total = @line_items.sum { |line_item| line_item.price * line_item.quantity }

if [:admin, :superuser].include?(@user.role)
...
end


Now, let's look at these rewritten examples:

puts 'Drink up your milk!' if @milk.expiring?

MyMailer.deliver_product_returned_notification if @product.returned?

total = @line_items.sum(&:subtotal)

if @user.superuser?
...
end


Forget about the loops, the blocks, the hype, and everything about Ruby for a sec. Code is code. Not only does the code become more readable, when you try to enrich your domain by naming things correctly, you could also very well be opening up new business concepts that wasn't previously clear or accurate. If you have a domain model, you are modeling it against domain concepts so that your app can interact with. You'd better be sure it is right, or else lots of time will go wasted on code that solves only half of the business problems all the time.

Monday, June 09, 2008

Launching apps in command line (Mac)

In most cases, to launch an application on a Mac is just a matter of using the awesome QuickSilver keyboard shortcut and type in what you want to open. But, sometimes it is useful to be able to launch/open an application in command line or terminal in a scripting context.

So, instead of going through all the typing $ /Application/MyFavouriteApp/Contents/MacOS/MyFavouriteApp in your terminal, you can do:

$ open -a MyFavouriteApp

This will open any applications in your /Applications folder by name.

Wednesday, May 21, 2008

Rails composed_of validation

ActiveRecord allows you to declaratively write validations (e.g. validates_presence_of) in any ActiveRecord::Base models. In addition, the errors will be stored in the @model#errors collection and be used by various view helpers and methods like FormBuilder, error_messages_for, and error_message_on, wrapping a nice error CSS around the offending html form elements. But when you have a normal domain model object that is non-ActiveRecord::Base such as a composed_of type Value Object, you will not have access to these declarative validation magic methods. Today let me try to elaborate on how to use ActiveRecord::Validation methods in your composed_of objects in Rails 2.0.


(The following entry revolves around the assumption that you are using f.fields_for to create and assign value object domain models onto your ActiveRecord::Base models. Using this approach eliminates most value object creation code in your controllers, achieving "Skinny Controllers". To learn about, visit, check out the Rails composed_of &block conversion blog entry.)


Using the same example in the previous entry, let's include our ActiveRecord::Validation module into our Money class like such:

class Money
include ActiveRecord:Validations

attr_reader :balance, :currency

validates_presence_of :currency
validates_inclusion_of :currency, :in => ['USD', 'EUR'], :if => :currency?
validates_presence_of :balance
validates_numericality_of :balance, :if => :balance?

def initialize(balance, currency)
@balance, @currency = balance, currency
@errors = ActiveRecord::Errors.new self
end

def new_record?
true
end

def currency?
!@currency.blank?
end

def balance?
!@balance.blank?
end

def self.human_attribute_name(attr)
attr.humanize
end

def balance_before_type_cast
@balance
end
end


You will notice a couple things. One, I have to define the #new_record? method. This method is defined on all ActiveRecord::Base objects, but since our PORO object is not a record per se, we just stub it out. Also, we need to store a collection @errors of type ActiveRecord::Errors.

Depending on what validation routine you will end up using in Money, you may have to stub out different methods. For example, I am showing error messages with error_messages_for (more on this later), and it requires stubbing out self.human_attribute_name (as of Rails 2.0, but no longer needed in future Rails). Using validates_numericality_of requires me to stub out balance_before_type_cast. Also, the validation :if conditions requires me to add the question-mark methods balance? and currency?. Remember, this approach does not give you all the validation magic. For example, validates_uniqueness_of will not work because it assumes too much about your object being a normal AR model and needs access to a database. But in practice, your Value Objects should not need such validations, and in most cases they contain only simple one-off validations, and provide simple functionalities such as formatting like this and this.

After all these, let's see our ActiveRecord Book class.

class Book < ActiveRecord::Bases
composed_of :balance, :class_name => 'Money', :mapping => [%w(balance balance), %w(currency currency)] do |params|
Money.new params[:balance], params[:currency]
end
validates_presence_of :name
validates_associated :balance
end


The composed_of conversion block remains. You will notice the validates_associated :balance line as well. This tells your book instances that they should not be persisted should there be any balance validations failing, just like any normal validations you would write. By default, any failing balance validation will add an error message 'Balance is invalid' in your @book#errors collection. If you want to suppress that message from showing up, you can pass in option :message => nil.

So, to put it all together, here is the view and the controller code:

<h1>New book</h1>

<%= error_messages_for :object => [@book, @book.balance] %>

<% form_for(@book) do |f| %>
<p>Name: <%= f.text_field :name %></p>

<% f.fields_for :balance, @book.balance do |ff| %>
<p>Balance: <%= ff.text_field :balance %></p>
<p>Currency: <%= ff.text_field :currency %></p>
<% end %>

<p>
<%= f.submit "Create" %>
</p>
<% end %>

<%= link_to 'Back', books_path %>


To show error messages from multiple objects on the view, I am using the view helper method error_messages_for(*args). The :object option actually allows you to pass an array of objects (c'mon, you should know this trick about ActiveRecord by now. If not, check it out here and here).

def create
@book = Book.new params[:book]

if @book.save
flash[:notice] = 'Book was successfully created.'
redirect_to @book
else
render :action => "new"
end
end


Again, a skinny, thin, sexy-looking controller action.

Tuesday, May 06, 2008

A normal Rails un-RESTful page as a resource... or can it?

An index page comes with Rails scaffolding. It is used to show a list of the "thing" you are CRUD-ing on. However, all too often we are tasked to show some pages that we aren't doing any CRUD operations on. Just a few examples:

  • Dashboard page after user logs in

  • The "Forgot your password?" page

  • Tabs or subtabs of multiple lists

  • List page showing multiple lists of different entities

A normal dashboard page isn't something you would normally CRUD on. It is more like a place holder page that a user sees the first thing after s/he logs in, showing many items of all sorts valuable to a user. In many Rails app, depending on what those items are, the dashboard page will end up being rendered by one of the items' index action. Worse yet, next time when you actually need that index page by that model, you have to call it something else, maybe 'list', and then play with the routes to get it wired up correctly.

class CustomersController < ApplicationController

# The dashboard page
def index
@customers = Customer.find :all
@products = Product.find :all
@tasks_of_the_day = Task.find :all
end

# A list of customers
def list
@customers = Customer.find :all
end

end

ActionController::Routing::Routes.draw do |map|

map.resources :customers, :products, :tasks
map.with_options :controller => 'customer' do |r|
r.dashboard '/customers/list', :action => 'list'
end

end


Here's a suggestion: how about put it in views/dashboards/show.html.erb, and, while you are at it, give it a DashboardsController? Then, put it in your routes.rb as a (singular) map.resource :dashboard, like such:

class DashboardsController < ApplicationController

def show
@customers = Customer.find :all
@products = Product.find :all
@tasks_of_the_day = Task.find :all
end

end

ActionController::Routing::Routes.draw do |map|

map.resources :customers, :products, tasks
map.resource :dashboard

end


By rendering the dashboard page in a completely different controller, you now have a very readable GET dashboard_path named route (GET: http://localhost:3000/dashboards), and you will not contaminate the index action of your other models' controllers with instance variables of all kinds. You also have a more readable routes.rb file.

One of the examples above for non-RESTful pages is the "Forgot My Password" page. Can you think of a good way to do it the Rails REST-ful way? Please go to Seeing Rails Resources Clearly to share some of your thoughts.