Showing posts with label agile. Show all posts
Showing posts with label agile. Show all posts

Friday, December 14, 2007

Why create database migrations, when you don't need to?

Rails out of the box supports database migration. It allows Rails programmers to be more Agile, do less database BDUF and instead change the database schema as the requirements change and the business wants it. Writing a migration is also as easy as running a command line script/generate migration and then executing it with rake db:migrate.

Database migration is useful for two reasons. Number one is to allow developers to change the database schema. This can be as simple as a single ALTER TABLE statement to add a new column. Number two, more importantly, is to migrate the all-important data exist in the database. But if your application is still under development for initial release, then migration may not be buying you too much good, because chances are no one cares how your table structure got to where it is now, and you may not have a whole lot of data.

We all use migrations feverishly probably because most of the Rails books/references/tutorials begins with: "Let's start by creating a database migration, in it we insert some data into the newly created tables, and learn how to do XYZ." Thus, your db/migrate folder is stuffed with migrations that does everything: create new tables; alter existing tables; create data for those tables; update data created from previous migrations; and perhaps all of the above, while without justifiable reasons on what those migrations are trying to migrate. In practice, it is quite time-consuming for a developer to run 200+ migrations every time he blows away a database, which is not uncommon. Not only that, sometimes you have multiple migrations that basically cancel out each other's changes as our customers change their minds back and forth. As a result, you could be creating migrations left and right when there may not be any real beneficiaries: a database loaded with data.

Sometimes your QA team may have their own data set that tests your app, and thus their loaded database is a beneficiary. While that is true, I prefer their data to be scripted and be freshly generated by ActiveRecord models (using create!) every time instead of keep migrating them, because as my application domain model expands, I want not just QA data but all datasets to be cleansed and validated by my model validations. There is no guarentee that after running a drop_column migration the QA dataset does not violate any business logic. Keeping all these data valid while database migrations are being rapidly created is very hard.

To take advantage of the fact that your development environment has nothing to lose until you have an initial release, while maximizing the benefits of keeping all your data valid (for all enviroments) the whole time while your app is under development, it's a simple steps 1-2-3:


Step 1:

Have one migration file, 001_release_one_schema.rb, that captures all database object creations. For example, all your create_table, create_index, views, triggers (*yikes*), etc. After this migration, your database should contain all database objects for your Rails app but in a "blank", data-less state.

$ cat db/migrate/001_release_one_schema.rb

class ReleaseOneSchema < ActiveRecord::Migration
def self.up
create_table "foos" do |t|
end

(... and many others ...)
end

def self.down
(... ... ...)
end
end


Step 2:

Create a rake task to populate all reference data that your application requires to run with. Reference data meaning all data that your application cannot change through its screens, but are essential for your app to run. For example, all currencies that are used to populate a drop-down on your app. A lot of drop-down lists data are reference data.
$ cat lib/tasks/data.rake

namespace :data do
desc "Loads a default dataset of both reference and user data into database."
task :load => [ :environment,
:configuration,
:reference_data,
:user_data ]

private

task :configuration do
ENV['DATASET'] ||= 'slim'
end

task :reference_data do
require "#{RAILS_ROOT}/db/data/reference_data/#{ENV['DATASET']}"
end
end

$ cat db/data/reference_data/slim.rb

@us_dollar = Currency.create! :name => "US Dollar"
@yen = Currency.create! :name => "Yen"
@euro = Currency.create! :name => "EURO"
... ... ...


Step 3:

Create a rake task to populate all user data that your app requires to run with. User data are data that a user can create/update within your application. They are also required for the Rails app to function properly the first day when it launches. For example, for your flashy Paypal application, the fees structure on how it charges its users. An application administrator is allowed to raise or drop fees in your app.


namespace :data do

private

task :user_data do
require "#{RAILS_ROOT}/db/data/user_data/#{ENV['DATASET']}"
end
end

$ cat db/data/user_data/slim.rb

Fee.create! :amount => 1_000, :currency => @us_dollar
Fee.create! :amount => 1_500, :currency => @yen
Fee.create! :amount => 2_500, :currency => @euro
... ... ...


There are several benefits of managing your database schema and data this way:
  • It is easier and faster to re-populate your entire database to the latest schema from scratch with data, since there are no extraneous migrations.

  • Faster to locate and update data needed for application. They are always in your dataset generation scripts.

  • No need to worry about outdated/removed ActiveRecord classes and declare them inside the migration file itself. They is no "legacy" ActiveRecord models.

  • All data are valid all the time because they are created through Model.create! sanitized by your model validations.

  • Easy to specify datasets to load by preference for DEV (slim, loaded), BA (story sign-off), QA (scenario-based), or demo (full) environments. e.g. rake data:load DATASET=slim RAILS_ENV=qa

  • No worries about broken/incomplete migrations. Fewer code, fewer trouble.

Now, after your Rails app goes to a 1.0 production release, you should switch this back to the normal Rails database migration style. I suspect your application users won't be too happy if you blow away their data every time you roll out a minor update or a major release... (or not?)

Tuesday, December 04, 2007

The last "D" in TDD means more than just "Development"

When asked "Do you write tests?", a lot of developers these days will say "of course" as their answers. However, not everyone can admit to doing TDD (Test Driven Development) correctly. Test Driven Development says, a developer will write a test that fails first, then write code to make the test pass, and refactor when possible, and repeat. This is what most people's TDD rhythm is. For the most part this is fairly easy to do. But to reach the next level, one has to understand TDD as a tool: TDD means more than just test your own code. Here is a couple tips on what the last "D" means:

Discipline

It takes a great deal of discipline to even write a failing test before writing the actual code. Sometimes, we write a little seudo-code here, or move a method definition there, or changing code else where trying to see if more new code needs to be written after it, and sooner than you think you are writing the actual implementation of the methods you wanted to test (Test Afterwards Development anyone?). Other times you write the test, but you are too anxious to even run it and see it fails. And other times you want to jump into the actual code immediately when you see your new test fails, but failing for the unexpected reasons.

Don't fall into these traps. If anything is true, testing is hard, but it is at the same time rewarding and fun. What's also true is, it will pay off. Write the failing test, draw up your list of tests you will need to write, and satisfy them one by one. Having discipline is the cornerstone of becoming a better programmer.

Design

It takes too long to write a test? Tests are running too slowly? Are your tests difficult to read? Are they too brittle and fail all the time? Hang in there! You ever had the feeling you saw code in the codebase that irks the living hell out of your mind written by someone else on your team? Well, it is time for you to get some of these feedback about your own code. Yay, your code sucks! Your tests are telling you that! Let's address each of these one by one.

Slow running tests? You shouldn't be hitting a database or web service in your unit tests, because you can mock/stub them out. Difficult to mock/stub it out? There probably is a better way to design your classes your tests are hitting. Ever heard of Inversion of Control (or Dependency Injection)? Master them. True TDD masters use them extensively.

Unreadable tests? Is it because of too many mocks/stubs? Or is it the code is 500 lines long and doing high octane 720-double-backflip logic? Either way, you have to learn to like small objects. Check this blog post of mine out.

Hard to test something? Tests too brittle? Perhaps you have encapsulation issues in your class design. If your classes are referencing 15 other neighbors, of course they are hard to mock/stub. Chances are, you have to spend time to debug your tests to find out what's wrong! Heard of Law of Demeter? Even if you have, take a look at this highly entertaining yet informative post. It might change your perspective a little.

The bottom line is, TDD is a way to guide you to writing good code, but only if you know how to use it as a tool. Now that you know, hopefully you will have a new perspective next time you write a test.

Sunday, November 18, 2007

Numbers don't lie, but your test coverage numbers might

We have learned all though our education that our decisions should base off of numbers. Can you support your family given however much you are making? What is the velocity of gravity? How many roses do you give your girlfriend on Valentine's Day (no satisfying answers ever...)

The same happens in software. On all software projects, various flavors of metrics are gathered and read by various types of people on the team. Code coverage being one of them that is most commonly mentioned.

But there is a misconception about code coverage: When the percentage is high, it is good; otherwise, it is not good.

Then people try to find definition of "high", some say 80% is a good number, others say 90%. Some even strive for 100%.

But let's decode this message a little more thoroughly. If your number is low, this means your code is not very well tested. Clearly this is not desirable in a code base that you have to go in every day and make changes here and there - good luck in not breaking stuff. So, a low coverage number is bad.

Now, a high coverage number means your code is well tested. But this number does not tell you a few things:

1) The quality of your code. It does not tell you whether your objects are coupled like spaghetti; it does not tell you whether your code is doing crazily unreadable nested iterations mixing with multiple levels of recursions; it does not tell you whether you are violating encapsulation and separate of concerns; it does not tell you whether you are copy & pasting code everywhere. If your code exhibits all of these symptoms, a 99.9% code coverage still means future changes to the code base is going to be a nightmare.

2) The quality of your tests. Tests are also code that needs to be maintained. If it takes someone 30 seconds to make a single line of code change, but it breaks hundreds of tests and takes that person an additional hour to duplicate that fix in all failed tests, then the burden of maintaining those tests outweights the feedback of whether the system still works or not. Further, if tests are hard to read as they are long and mocks/stubs everywhere, and the test method names do not reflect what the test method body is doing, then having your tests unmaintained just adds test maintenance time.

So how should you read the coverage number? You should read this number in combination with other metrics to gauge the health of the code base. For example, do defects constantly come from a feature area of the code when changes are being made? Are story estimates higher in certain areas of the code base but not others? Are stories in certain functional areas always being under-estimated during Planning Meeting?

Code coverage is just a number. It does not tell you whether your code's flexibility to reflect the pace of your business requirements change. True productivity comes in with good code solving a particular problem well. Well tested code can still be bad.

Thursday, August 24, 2006

Size does matter

When it comes to managing an Agile project team, one question usually arises for planners and managers is how big should a story really be. For the most people a story should be sized at the minimum as something that can be completed within an iteration by someone (or a programming pair). Each story is tagged with a difficulty level, whether you measure using 1-2-3 ideal development days or gummy bears, for best-guess estimates on how long it will take to implement the solution for the story.

I think while it looks trivial, story sizes is very important to an Agile team to manage business wishes of the application and development delivery of business values.

On some projects, a story could be one small thing on one screen (eg. enter customer address). Others, it spans multiple screens (eg. enter, edit, and delete customer info). For me, I would much prefer they being as small as they can be. For something like entering, editing, and customer info, strive to break things out to something meaningfully small, like:
- enter customer name
- enter customer address
- enter customer billing info
- edit customer name
- edit customer address
- edit customer billing info
- delete a customer

Small story has a few advantages:

More tailored to business needs
Suppose you have a specific need for a desktop computer, like you are a professional graphics designer or gamer. Would you be more satisfied for your purchase if you spend your budget at an online store for one of their 20 brand computers, or custom build it with hardware parts you get to play with one by one to address your specific needs? Is 1GB RAM gonna be enough? How about 2GB? Or 1GB plus more video card memory is good? If you purchase your computer parts by parts, chances are you will be buying something much more tailored to what you need now, at the same time pay for something that can accomodate tomorrow's change better (eg. a motherboard with more RAM slots than normal). The smaller your application's features are, the more flexibility you have. But don't go overboard and purchase your parts at microscopic level (buying transistors) which will cause you nightmare.

You can need it later, or YAGNI at all
You are building your dream vehicle from scratch. You have a tight budget, and a tight deadline. Do these constraints sound familiar? We are all working in this competitive business environment every day. So if you are building this car you need, can it satisfy your need with only 2 wheels like a motorbike? Does it still address your needs if it has three wheels? Do you need this car to have a 5th backup tire? Of course all these depends on what you use your car for. You might only need 2 wheels and as light weight as possible for a marathon or endurance race, or you might absolutely need a 5th wheel if you use it in the Africa dessert where there is no concrete. But the tight budget and deadline does not change, what changes is the delivery and whether the users' needs are satisfied. So, between one story that says "build 5 wheels for the vehicle" and five stories that says "add 1 additional wheel to the vehicle", what would you pick? Prioritization is key.

More consistent and better measured velocity
As previously mentioned, stories should be sized as something that can be completed in one iteration. Based on last iteration's performance, if you are planning for 3 stories to play the following iteration, constituting a 1-point, 2-point, and 3-point story. If the team were 99% complete with it but was unable to complete that 3-point story, your team's velocity dropped from a 6 down to a 3. This is a pretty big change in velocity. Now imagine you redo your iteration with 8 stories, same scope, a total of 20 points based on last iteration velocity. At the end of the iteration if your team cannot deliver one 2-point story and one 3-point story (note: much smaller scale), you team's velocity is 15. Your tracking of your development team's progress can be more accurately measured simply by splitting your stories up.
But with smaller stories it does come at a cost. There are some disadvantages.

Code is better, analysis is easier
Smaller stories encourage team members to complete something faster, meaning they get a chance to go back and refactor code easier and more often. It also allows programming pair to switch pairing partner more often to get more understanding of the code base. Smaller stories can also be defined in much more granularly fashion. Things that business do not understand yet can be split out to another story in order to get things going. Better tested and better understanding the code and stories, obviously makes your application more robust and easier to change.

With smaller stories, one must also understand there are drawbacks too. Here are a few I can think of:

Story explosion
Most people use an Excel spreadsheet to manage their day-to-day story progress. Having a spreadsheet of 60 stories is alright, now consider representing the same story list with 180 stories. If you are organizationally challenged, you will quickly run into issues with your tracking. How to categorize such information? Well, in today's blogsphere world there is something called tagging. You can tag a story with multiple tags, and later click on a tag topic (eg. UI) and a list of things you previously tagged will show up. This allows someone to quickly sort and filter the obviously larger story list. I don't know if there is a product out there that supports such information sorting and filtering yet, but I would start looking there if I were to solve this issue. Del.icio.us is a good place to explore tagging.

Story selections
When there are more stories to choose from, playing the right stories becomes more tricky. It is common to have groups of stories that are tied to an area that is under heavy business analysis and are subject to change completely, and groups of stories that are fairly stable and not prone to change. But only picking stories that are stable and easy to complete is not the correct way to play stories. Whichever stories to play should depend on the complexity of the technical solution
and business value that can be recouped from that solution. The application is meaningless to the business if out of the 180 stories, 90 are complete, 60 are still under intense business analysis and are subject to change, and 30 are ready to go, while the business value generated by those 90 completed stories are trivial. Get down to the dirty aspect of the problem the application is trying to solve, and start flushing them out. Without a defined problem to solve, any application is way to expensive to build.

So should you go with diet size story or super-sized story? This question is like asking how diverse your investment portfolio should be. It depends. Now that you know why size does matter, apply your thinker's head to see whether it applies to the project you are on!

Thursday, May 25, 2006

Some unspoken tips on managing an Agile project

Want to have an easier time to manage an Agile project and avoid common pitfalls while have some fun? Read on...

Food
Believe it or not, food is not a luxury item on an Agile project. Food is *essential*. Agile practices encourages as much open-air communication as quickly as possible, be it computer-to-human (through tests) or human-to-human (short iterative cycle). Yet sometimes it is hard to force down onto people to communicate more effectively from a Project Manager. By having good food/snacks, people who are not co-located are going to come to the room much more often than usual, and a casual "how are you guys doing" while shuffling popcorn into his/her mouth is usually a conversation starter on many things the development team needs. And food, of course, improves morale inmeasurably =)

Team co-location
This is key to being able to communicate more effectively, which is one of the tenets of a successful Agile project. By more effectively, I mean the project can execute in less time because people can make more informed decision based on an abundant amount of open-air information. For whatever reasons, people are generally good at filtering useless information but not good at carrying information to the appropriate people who need them. By bringing the team together in a relatively open area with maximum face-to-face communication, you are encouraging team members to make more informed decisions before they waste time to work on the wrong thing based on false assumptions on continually ongoing changes within the development team and the business.

CI build sound
If you use CruiseControl.NET, try putting a sound clip that reminds of some embarrassing moments of some development team members as the build success/failure cctray settings. This can get quite amusing some times. Another morale booster.

Quote sheet
During the course of development, there is bound to be someone who inadvertently said something hilarious about someone or something. Keep a list of them on a wiki on the build server where the team members have access to. This is a must as part of the team-building process. At ThoughtWorks's annual national gathering day, we collect great and funny quotes from ThoughtWorkers from projects everywhere, print them on T-shirts, then auction them out. The money collected goes to donation. Funny and meaningful.

My funny quote of my current project? Nah. You have to ask me personally for that =P

Second story wall for issues after 1.0 release
In a XP story wall, which represents the progress of stories planned for the current iteration, usually contains columns ("DEV Ready", "DEV In Progress", "QA Ready", "QA Complete") that mark the stages of a story from ready for development to QA complete. Now how does the wall change when your application goes 1.0 release?

Generally, after a 1.0 release, the team will create a new branch of the source code for urgent production defect fixing purpose. The problem is that when you have a large team of developers split across urgent production defect fixes and release 1.1 development, or there are many small and frequent 1.x releases coming after your initial release, or for some uncommon reason you have more than one branches of source code (after you have determined it is ultimately unavoidable). Then managing the merging of code from your release branch to your active development branch becomes tricky, because if developers forget to merge, your active development branch now has bugs.

How about at 1.0 release, create a new story/issue wall next to the original story wall, but it only contains columns "Issue Verified", "Issue In Progress", "Issue Resolved", and have the developers merge their source code before they can move any issue cards from "Issue Resolved" column of that wall to the other wall's "QA Ready"? This physical movement of an issue card from one wall to another always remind developers to merge their changes. If that doesn't help the busy and forgetful developers, have them move issue cards to a new column called "Issue Merged".

Stand-up token
A good stand-up meeting is not an avenue to identify "solutions" of problems participants face. It should be a quick meeting to identify individuals who need to hold face-to-face conversations after the stand-up and then move on. But people being people, sometimes conversations go out of hand during a stand-up. Have a stand-up token, like a football, in stand-up so that only the person who has the token can talk. When you see people exchange the token too many times in a conversation, perhaps it's time to ask them to offline it till after the stand-up.

Saturday, January 14, 2006

Agile story tips

Stories in an Agile project tend to be less talked about than Test Driven Development, Continuous Integration, and Food :-). But in fact, when it's done well it can profoundly affect the project team's productivity and end users' satisfaction towards the software. The following are a few findings from my experience related to stories management that will drive better software. Their importance are rated on a a scale of 1 to 5 asterisks:

  • Stories should be a thin-thread from the frontend all the way to the backend. (*****)

    This helps the features being developed each iteration to be completed more consistently, and consistency drives predictability, and thus increases visibility and helps business to prioritize and plan given the available time and resources.

  • Business analysts (or the sponsor/end user) should own the stories. During IPM (Iteration Planning Meeting), they should be signed-off, meaning no changes should be made when developers start developing them. (*****)

    By freezing the requirements, developers can be much more productive. Compare a 100m race between two runners, one can go all the way from start to finish without stopping, while the other has to stop-and-go every 10m because he has to worry about the next 10m track he runs will be changed. Also, by someone owning the stories, after the stories were developed the owner of the story will have the responsibility to verify the correctness of the solution. Thus this requires better-written story specifications (tying back to freezing requirements), and in the end the developed story becomes exactly what business wants.

  • Stories should be measured in terms of difficulties or story points, instead of ideal development days (IDD). (****)

    There are two camps of people when it comes to this bullet. One camp uses IDD to estimate the difficulty of a story, then uses Load Factor to measure how off they were in their original estimates. The other camp uses Level of Difficulty such as small/medium/large, or Story Points such as 1-5, so they can measure purely based on yesterday's weather, and not a number that someone uncomfortably being forced to make up.

    IMHO, the first camp's glossaries were created at "post-mortem," literally. Consider the following conversation after a project failure:
    Business: Why did you guys fail to deliver? You only delivered half of what I want.
    Developers: Because we were not productive.
    Business: Why?
    Developers: Too many damn meetings.
    Business: Hm... so in "ideal time" if you had no meetings you would be able to deliver?
    Developers: Oh yea...
    Business: So in hindsight, the original estimates you guys told me was off by a "load factor" of 1/2. Next time should I want a new software I better take that into my budget account...

    The problem is that all these numbers are only meaningful within the context of that one project. But people being people, they like to carry these numbers with them to any projects they walk into wherever they go... because apparently they have been burned before. Next time when business asks for budget for a brand new project, guess what. They will bump up the budget by half, because of that "load factor."

    By using the terms of the second camp, one is implicitly forced to think of these measurements in the context of the current project. When I say this story's difficulty is a Large, one has to ask: Relative to what? Of course, the answer is relative to the other stories of this project. When I say, Story A is a 1 and Story B is a 3, again you are forced to think in the context of the current project.

    You might ask, now if you don't bump up the budget by half, then doesn't the business not get all of what they ask for? The answer is yes. But that's the beauty of short, iterative releases. If we can do that, then in the end the business without bumping up the original budget, yes they will exactly get what they have asked for according to their business priority, perhaps the 6 out of their 10 features, but since we have delivered at least some of the features in early iterations, not only the business has saved money due to those features being rolled out, but also the business is in better shape when it comes to repositioning itself to face more real world challenges, and thus will pump more budget to continue develop the software to give them what they want.

  • If stories are small enough, then there is no need to task them out during an IPM (Iteration Planning Meeting). If stories are to be tasked, they should be estimated. As each of them are completed, actuals should be measured.

    Admit to it. Small estimates are much more accurate than big estimates. Therefore, if each story is tasked out into small chunks of time estimates, and after story completion we have its corresponding actual time spent measurement, we then can find out how much work really is to complete say Story A, a medium-difficulty story or one that has Story Points of 3.

    So what are these estimates and actuals for? They are actual prove (or tracked history) of how we complete our Stories. Let's say in Iteration One we have a story "Public user login" that has a Story Points of 3. In that iteration (two-weeks), at the IPM the development team estimates that there are a total of 10 tasks to be done to complete that story, and they busted their ass to complete that and only that story. Then, in Iteration Seven, a similar story "Restricted user login" shows up. Relatively speaking, it also has a Story Points of 3, since they are about equally difficult. However, since most of the one-time tasks to do that has been completed, the actual number of tasks to complete this story in Iteration Seven might be just 3 tasks. Now the team can use the rest of the time to build other stories, and thereby achieve more Story Points. If it turns out the team achieved total of 7 points in the end, then we say the team is kicking some ass and is more productive than they were in Iteration One. You would notice the total time spent on all tasks between the two iterations will be somewhat the same (assuming no resources change), but completed Story Points increased. From the point of the business, it rocks, because they are seeing more stuff being churned out by the team, in exactly the manner they want it to be.

    This brings up another very important point, if you notice...

  • For story difficulty measurements, whatever you use (IDD, Small/Medium/Large, Story Points), you should always estimate it using the same scale as you estimate the entire story list. (*****)

    This is the only way to measure whether the team is improving over the course of the development.

    Using the example in the last bullet, if in Iteration One the team thinks that "User login story" is a 3 Story Points story, and in Iteration Seven the team completes a similar also 3 Story Points story "Restricted login story", then if in Iteration Ten business comes back and say they want a brand new but similarly difficult "CEO only login story", now despite the fact that in Iteration Ten, after doing those two login stories, this new story requires very very little work to complete, we must again make this story have 3 Story points.

    This way, the measurements will tell the business the following:
    In Iteration One, the development completed 3 Story Points.
    In Iteration Seven, the development completed 7 Story Points (because the tasks required to do the "Restricted login story" has reduced.
    In Iteration Ten, the development completed 14 Story Points (because even few tasks is required to complete the new 3-Point "CEO only login" story).
    In terms of the business people, Story Points = functionalities = business value. They know what they are getting in a consistent basis.

    Should there be a case somewhere in Iteration Eight the number of Story Points dropped, then one has to figure out why. Here's the task actuals can come into handy. In Iteration Seven, 7 Story Points and total of say 100 actual hours of time were needed to complete all tasks. In Iteration Eight, only 6.5 Story Points were completed. But if we look at the hours of the actual, only 90 hours were recorded from all tasks. Now we know the time the team spent on the actual tasks for all stories are about the same. Probably because people having vacations or public holidays that contributes to the drop in productivity.

Thursday, April 28, 2005

Songs of the Extremos

My co-worker Shaun Jayaraj today showed me the Songs of the Extremos. This is funny stuff... Check this one out first:

Hey Dude (sing to the tune of - "Hey Jude" by the Beatles)

Hey dude
Your code smells bad
Go refactor and make it better
Remember
That tests are requirements
Then you can begin
To make it smell better

While talking funny about Agile here is another good one, The Agile Manifesto (Hip-Hop Remix).

Individuals and interactions over processes and tools

translates into:

Peeps and tradin' rhymes ova' fake moves and bling-bling

Do you dig it?