Writing code that's correct is the most basic requirement for a software engineer. While this might sound obvious and silly, it's easier said than done and actually quite hard to accomplish in real life.
Let me use one of my favorite interview questions as an example: Many popular spreadsheet products organize their columns from A to Z, followed by AA, AB, etc. Write a function to convert a column name to its corresponding column number. That is, A becomes 1, B becomes 2, C becomes 3, and so on.
Nearly everyone I've interviewed with this question quickly points out that the function is a base-26 conversion, and immediately starts to code it up. However, the percentage of people who can actually code the function correctly is alarmingly low. Don't believe me? Let's take a look at the following implementation, written in C++.
Can you identify any bugs in this code? If you can't, try harder. :)
There are several issues in this code:
const char* is C-style string. It can be dangerous to use C-style string in C++ code, because C-style string is assumed to be NULL terminated, but C++ string isn't necessarily so. Try using
- Pointers/references can be NULL, and this code will crash when a NULL pointer is passed in. Good luck debugging a server crash due to a NULL pointer at 3 a.m. while your users can't access your website.
- You might say, "but I will validate the input to make sure I don't call it with NULL pointers".
- Sorry, you can't in real life. Once your code is written, you won't know who's calling it and under what conditions.
- Does this code work if the input is lowercase string?
- What about non-alphabet characters in the input?
- What happens if the input is a Unicode string?
- What would happen if the input string has seven As in it? That's right, integer overflow - your code needs to deal with that too…
When it comes to code, it's important to remember Murphy's Law: anything that can go wrong, will go wrong
. That means it's imperative to write code that can handle any input, no matter what it is. If you can do that, you will avoid giving the caller of your code any unwanted surprises.
To achieve this level of coding "correctness", you need to have a good understanding of code testability and have extensive unit tests for your code.
Okay, now you've completed Level I, and let's say that your code is mostly correct. (Btw, there's no such thing as bug-free code, so "mostly correct" is already a very high bar.)
Imagine you push your code into production, and ta-da! Your website goes down. You trace through lots of server logs (while panicking and sweating), and finally narrow the problem down to a function call that was overwhelmed by the requests, couldn't finish, and timed out.
Welcome to the Efficiency Land!
What is "efficiency" you ask? "Is efficiency the same as Big-O notation?"
The answer is: efficiency and Big-O are close, but not the same.
Here's an example to illustrate what I'm talking about: what is the fastest sorting algorithm? You might say that quicksort is the fastest sorting algorithm and it's
Before we dig in, let's have a quick review. "Big-O notation" describes the maximum time an algorithm will take to complete a function. So if an algorithm is
, it means for whatever value
, the run time is
for some constant
is large enough, we know that
will be more efficient than
In reality, however, there are two conditions that need consideration when it comes to algorithm speed:
- The size
n is important. For example if we are sorting a 6-entry array, a
O(n^2) bubble sort is likely much faster than
O(n*log n) quicksort.
- The constant
k is also very important. In CS classes, we always ignore
k in algorithm complexity analysis, but in real life,
k is super critical. A
O(n^2) solution with
k=1 will be faster than a
O(n) solution with
k=1000 for any
n<1000. To put it a different way, an in-memory bubble sort might be faster than an on-disk mergesort.
So far we've only covered efficiency as it relates to time. But there are also other types of efficiencies to consider: space efficiency, power efficiency, algorithm convergence speed, user interaction efficiency, etc. I won't go into details in all these dimensions, but suffice it to say that code efficiency is very context-dependent, so the goal when writing any code is to be as efficient as possible under the given constraints
One last note about efficiency: overdoing it is bad too. This is known as "premature optimization" and should be avoided. Only optimize what you need to optimize. When unsure, use the simplest solution first, and then do performance testing to decide if you need to improve the code further.
I've only covered the first two parts. Found it interesting? Please continue to Code Quality Ladder (part 2)
next, when I'll address Readability and Extensibility.