I teach a lot of beginner-targeted programming courses, and something that I've experimented with recently is trying to introduce the idea of self-documenting code early on in the learning process. I usually start off by talking about the difference between good and bad names for things (mostly functions and variables, though many of the same arguments apply to class names) , and I've noticed a few common patterns that tend to crop up in beginners code. I thought it might be useful to lay out these common errors in one place.

Single-letter names

OK, we're writing a program and we need to create a new variable, but we can't think of a good name....let's just start with `a and work our way through the alphabet. Later on we find that we have a bit of code like this:

a = 'acgatagc'
b = len(a) - 2
d = ""
for e in range(0,f,3):
    g = a[e:e+3]
    h = i.get(g.upper(), 'X')
    d = d + h

Which is well on its way to becoming completely incomprehensible. Sometimes single-letter names are the result of a desire to avoid typing or a worry about running out of space in your text editor. Neither of these are good enough reasons to write code as unreadable as the example above! Rest assured that using longer, more descriptive names will not make your program slow, or make any appreciable difference to the time it takes to type a line. Here's a sane version:

dna = 'acgatagc'
last_codon_start = len(dna) - 2
protein = ""
for codon_start in range(0,last_codon_start,3):
    codon = dna[codon_start:codon_start+3]
    amino_acid = genetic_code.get(codon.upper(), 'X')
    protein = protein + amino_acid

which might now be interpretable as part of a program that translates DNA sequences.

Sometimes we might want to use non-meaningful variable names to illustrate a very generic bit of code: for example, if we want to demonstrate how to append a value to a list:

a = []
a.append(b)

but for these purposes it's better to use the well-known set of metasyntatic variables*:

foo = []
foo.append(bar)

There are a couple of situations where single-letter variables do make sense; mostly where there are strong conventions for their use. For example, if we're writing a program to deal with Cartesian co-ordinates then I won't be too upset to see variables called x and y (though I might make a case for x_pos and y_pos). Similarly, i and j are the traditional names for variables used as counters in a loop:

for i in range(10):
    for j in range(20):
    # do something

but remember that the most common use for these variables - to hold an index when iterating over a list - doesn't often occur in Python because we generally iterate over the elements of lists directly, in which case there's no excuse for not picking a meaningful variable name.

Naming thing after their type

This is a habit that people are most likely to fall into shortly after having learned of the existence of types, or shortly after having learned about a new type. The logic goes something like this: I've just been told that it's important to remember whether a variable is a string or a number, so I'll make that fact part of the name. This is not necessarily a terrible idea - in fact there is an entire system of variable naming based on it. Where it becomes a problem is when the type becomes the most important part of the name:

my_number = 20
my_string = "Homo sapiens"
the_list = [1,2,3]
a_file = open('foo.txt')
def my_function(scores):
    ...

This is obviously problematic: it's generally much more important to know what values are stored in a variable:

minimum_name_length = 20
species_name = "Homo sapiens"
reading_frames = [1,2,3]
input_file = open('foo.txt')
def calculate_average(scores):
    ...

There's a more subtle problem with types-as-variable-names - the dynamic nature of Python means that it tends to work best when we worry about the various ways that a variable can be used, rather than its type*. It's this magic that allows us, for example, to iterate over lists, strings and files using a single syntax.

Extremely vague names

Often when we create a variable, or start writing a function, we're not exactly sure what its job is going to be in our program. This is especially true when we first start writing code. Unfortunately, this can lead to some very unhelpful variable names - examples I have seen in the wild include data, input, output, do_stuff(), process_files(), params, object, and true_or_false. If you find yourself using a "placeholder" name like these during the process of coding, it's a good idea to go back and change them once you've figure out what the function or variable is actually doing.

Sequential names

This is a perennial problem in the world of naming, whether we are talking about Python variables or word processing files (how many people have a folder containing files with names like final_draft.doc, final_draft2.doc, final_draft2.1.doc, final_draft2_update.doc?) Thought of the perfect variable name, but then realized that you've already used it? No problem, just stick a "2" on the end of that bad boy:

word = 'albatross'
word2 = word.upper()
word3 = word2 + "!"

Hopefully it's not necessary to point out why this can be confusing when you come back to read the code. We can rescue the above example in a couple of ways. One is to use more descriptive names:

word = 'albatross'
uppercase_word = word.upper()
punctuated_word = uppercase_word + "!"

Another way is to recognize that we're probably not going to use word or uppercase_word in our program, and just do the whole thing in one step:

final_word = 'albatross'.upper()+ "!"

When we find ourselves needing to create a new variable name by sticking a number onto the end of an existing one it's often a good indication that the code in question should be turned into a function. One of the great thing about encapsulation using functions is that they provide a way for multiple variables with the same name to happily co-exist in a program without interfering with each other*.

An even worse version of sequential names is....

Re-using names

Thought of the perfect variable name but you've already used it? Never mind, just overwrite it! Often this is a symptom of variables names that are too general to begin with:

# store a name
name = 'Martin'
...
# now we need to store a last name
name = 'Jones'

Re-using variables to store different things can make a program extremely difficult to understand. Solutions involve either using more descriptive names, which renders the above example fine:

first_name = 'Martin'
...
last_name = 'Jones'

or splitting up the code into functions where the same variable name can be re-used without fear of confusion.

Don't confuse the idea of re-using variables for a different type of data, as in the firstname/lastname example above, with the idea of changing the value that's stored in a variable, but keeping the type the same. For example, we often want to update a count, or append a character to a string:

count = count + 1
letters = letters + 'q'

This is fine, and doesn't count as re-use, because the variables are still storing the same thing.

Names that only vary by case or punctuation

How many different ways can we write essentially the same variable name?

mp3filename = "song.mp3"
mp3FileName = "song.mp3"
MP3FileName = "song.mp3"
Mp3Filename = "song.mp3"
MP3FILENAME = "song.mp3"
mp3_file_name = "song.mp3"
mp3_FileName = "song.mp3"
MP3_Filename = "song.mp3"
Mp3_FileName = "song.mp3"
MP3_FILE_Name = "song.mp3"

Quite apart from the Python style guidelines (all-caps names should be used only for global variables), using more than one of the above in a program will lead to madness.... don't do it!

Names copied from example code

This is a trap that's easy to fall into when working from examples in a course or a textbook. Imagine we are looking at a piece of example code that prints the number of each of the five vowels in a word:

word = 'perspicacious'
for vowel in ['a', 'e', 'i', 'o', 'u']:
    print("count for " + vowel + " is " + str(word.count(vowel)))

Later on, we want to implement the same idea for counting the number of times each common word occurs in a sentence, so we copy and paste the example code and modify it. We replace the word with a sentence, and replace the vowels with words:

word = 'I think it was "Blessed are the cheesemakers"'
for vowel in ['it', 'I', 'the', 'and']:
    print("count for " + vowel + " is " + str(word.count(vowel)))

The program works, but the variable names are now very misleading - word doesn't contain a word and vowel doesn't contain a vowel. Using a piece of example code as a starting point for your ownpr ograms is an excellent way to learn - but be sure to go back once you've finished modifying it and check that the variable names still make sense.

Anything I've missed, or that you disagree with? leave a comment!


Subscribe to articles from the programming category via RSS or ATOM

Comments

comments powered by Disqus