Code smells for beginners
Code smells? Seriously? What on earth is a “code smell”?
Have you ever opened some mystery container in your refrigerator and gotten a whiff of something nasty? That’s a fridge smell: an indicator that something isn’t right.
A code smell is an indicator that there’s something wrong with your code. A code smell isn’t the same as a bug. Your code might still run, even if it’s a little smelly. But a code smell is often a sign that code is brittle or in need of refactoring. Accordingly, you should be aware of common code smells so when you catch a whiff, you’ll take a second look at your code to see what might be improved.
Some code smells
- Duplicated code
- Long function
- Magic numbers
- Shotgun surgery
There are many other well-known code smells but these will suffice to get you started.
Duplicated code
Whenever you have identical or near identical code, that’s often a sign that the duplicated code could be turned into a function and called as needed. Here’s an example:
= []
player_a for _ in range(5):
# simulate rolls of five dice
= random.choice([1, 2, 3, 4, 5, 6])
roll
player_a.append(roll)
= []
player_b for _ in range(5):
# simulate rolls of five dice
= random.choice([1, 2, 3, 4, 5, 6])
roll player_b.append(roll)
Here we have a serious code smell, but there’s an easy fix. All we need to do is extract the duplicated code and turn it into a function.
def get_rolls():
# simulate rolls of five dice
= []
lst for _ in range(5):
= random.choice([1, 2, 3, 4, 5, 6])
roll
lst.append(roll)return lst
= get_rolls()
player_a = get_rolls() player_b
Much better, but now, with this code duplication eliminated and a new function created, we can see how easy it would be to make the function more flexible, to accommodate dice with a number of faces other than six and to accommodate a different number of rolls. Where before we’d have to make changes in two places, now we can do it all in one place!
def get_rolls(faces, rolls):
"""
Simulate rolls of a number of dice
with variable number of faces
"""
= []
lst for _ in range(rolls):
= random.choice(range(1, faces + 1))
roll
lst.append(roll)return lst
= get_rolls(6, 5)
player_a = get_rolls(6, 5) player_b
Better still. Now, if we want three rolls of dodecahedral (12-face) dice we need only change the last two lines.
= get_rolls(12, 3)
player_a = get_rolls(12, 3) player_b
There’s actually an acronym for a sound principle of software design—DRY: don’t repeat yourself.
Long function
This one’s not quite so clear cut. What exactly is too long? This will vary from case to case, but a good general rule is that if a function is longer than, say, 25 lines then it’s likely ripe for refactoring.
Why is this so? It’s often the case that long functions are long because they’re doing too much, or are performing a computation which would best be implemented in a number of steps. Good functions have clear and limited responsibilities—a function should do one thing and do it well.
Here’s one example of a long function.
import math
def compute_entropy():
while True:
= input("Enter the number of positive samples: ")
positive try:
= int(positive)
positive if positive >= 0:
break
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {positive} to an integer")
while True:
= input("Enter the number of negative samples: ")
negative try:
= int(negative)
negative if negative >= 0:
break
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {negative} to an integer")
= positive + negative
total if total == 0:
print("Cannot compute entropy with zero samples!")
return None
= - (positive / total) * math.log2(positive / total)
term_1 = - (negative / total) * math.log2(negative / total)
term_2 print(f"Total entropy is {term_1 + term_2:.3f}.")
return term_1 + term_2
compute_entropy()
This code works in most instances. There is a bug, but it’s hard to see because this function is too large. This function is trying to do too much. Yes, there’s some duplicated code, but this function is unfocused as well. Let’s refactor.
First, we’ll extract another function to get valid input from the user.
import math
def get_num(label):
"""
This function is responsible for getting a good value
from the user.
"""
while True:
= input(f"Enter the number of {label} samples: ")
n try:
= int(n)
n if n >= 0:
return n
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {n} to an integer")
def compute_entropy():
= get_num('positive')
positive = get_num('negative')
negative = positive + negative
total if total == 0:
print("Cannot compute entropy with zero samples!")
return None
= - (positive / total) * math.log2(positive / total)
term_1 = - (negative / total) * math.log2(negative / total)
term_2 print(f"Total entropy is {term_1 + term_2:.3f}.")
return term_1 + term_2
compute_entropy()
That’s a step in the right direction.
Let’s continue. compute_entropy()
is still trying to do too much. It computes a value and prints the result and returns the result. Computation and display are two very different things, and they should be kept separate. Separation of concerns leads to good program design.
Let’s separate concerns: computation and display.
import math
def get_num(label):
"""
This function is responsible for getting a good value
from the user.
"""
while True:
= input(f"Enter the number of {label} samples: ")
n try:
= int(n)
n if n >= 0:
return n
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {n} to an integer")
def compute_entropy(pos, neg):
"""
Compute entropy, given number of positive and negative
samples.
"""
= pos + neg
total if total == 0:
raise ValueError("Cannot compute entropy with zero samples!")
= - (pos / total) * math.log2(pos / total)
p_term = - (neg / total) * math.log2(neg / total)
n_term return p_term + n_term
= get_num('positive')
positive_samples = get_num('negative')
negative_samples = compute_entropy(positive_samples, negative_samples)
entropy print(f"Total entropy is {entropy:.3f}.")
Much better. Now we have get_num()
which is called twice to get the necessary values from the user, and we have a pure function for computing entropy. Pure functions are easy to test—so let’s do just that (and we’ll find our bug).
import math
def get_num(label):
"""
This function is responsible for getting a good value
from the user.
"""
while True:
= input(f"Enter the number of {label} samples: ")
n try:
= int(n)
n if n >= 0:
return n
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {n} to an integer")
def compute_entropy(pos, neg):
"""
Compute entropy, given number of positive and negative
samples.
"""
= pos + neg
total if total == 0:
raise ValueError("Cannot compute entropy with zero samples!")
= - (pos / total) * math.log2(pos / total)
p_term = - (neg / total) * math.log2(neg / total)
n_term return abs(p_term + n_term)
def test_compute_entropy():
assert math.isclose(compute_entropy(4, 5), 0.991076060)
assert math.isclose(compute_entropy(100, 100), 1.0)
assert math.isclose(compute_entropy(0, 10), 0.0)
# will be silent if all OK
test_compute_entropy()
= get_num('positive')
positive_samples = get_num('negative')
negative_samples = compute_entropy(positive_samples, negative_samples)
entropy print(f"Total entropy is {entropy:.3f}.")
If all is well, tests will pass silently. However, on account of our bug we get the following error.
Traceback (most recent call last):
File "<string>", line 35, in <module>
File "<string>", line 33, in test_compute_entropy
File "<string>", line 26, in compute_entropy
ValueError: math domain error
On line 33, we have this test:
assert math.isclose(compute_entropy(0, 10), 0.0)
(The .isclose()
function provided with the math
module is used to compare numbers where we aren’t concerned with tiny representation or rounding errors. For example, 1000.0000000000000000 and 1000.0000000000000001 are close enough and in most cases we wouldn’t want a comparison to fail for a such tiny difference.)
The problem, now exposed, is that we can’t take math.log2(0)
. That’s what gives us the math domain error. Logarithms are only defined for positive values. So what can be done? We could do one of three things:
- We can check the value before calling
math.log2()
. - We can wrap the call to
math.log2()
and supply a default value in the event of an exception. - We can add a wee tiny epsilon to guarantee that we don’t pass zero to
math.log2()
.
Let’s roll with the first option. Because this call to math.log2()
is so intimately connected with the entropy calculation, let’s create a helper function inside compute_entropy()
to implement the fix.
import math
def get_num(label):
"""
This function is responsible for getting a good value
from the user.
"""
while True:
= input(f"Enter the number of {label} samples: ")
n try:
= int(n)
n if n >= 0:
return n
print("Value must be >= 0!")
except ValueError:
print(f"Cannot convert {n} to an integer")
def compute_entropy(pos, neg):
"""
Compute entropy, given number of positive and negative
samples.
"""
def get_log(x):
"""Guarded call to math.log2() """
if x == 0.0:
return 0.0
return math.log2(x)
= pos + neg
total if total == 0:
raise ValueError("Cannot compute entropy with zero samples!")
= - (pos / total) * get_log(pos / total)
p_term = - (neg / total) * get_log(neg / total)
n_term return abs(p_term + n_term)
def test_compute_entropy():
assert math.isclose(compute_entropy(4, 5), 0.991076060)
assert math.isclose(compute_entropy(100, 100), 1.0)
assert math.isclose(compute_entropy(0, 10), 0.0)
# will be silent if all OK
test_compute_entropy()
= get_num('positive')
positive_samples = get_num('negative')
negative_samples = compute_entropy(positive_samples, negative_samples)
entropy print(f"Total entropy is {entropy:.3f}.")
Now we have nicely structured code that’s easy to read and easy to test—and as a result, it’s more robust.
Magic numbers
Magic numbers are numbers that… poof!… appear in your code seemingly from nowhere. In the previous example, 0 and 0.0 aren’t magic numbers because it’s so abundantly clear from context why we need to compare with zero. However, consider this function:
def kinetic_energy_per_df(temp):
= temp + 273.15
temp return (1 / 2) * 1.380649E-23 * temp
Here we have two magic numbers. The first is used to convert degrees Celsius to degrees Kelvin. The second is Boltzmann’s constant. Both are constants, and either could be used elsewhere. It’s best to define them as Python constants.
= 273.15
ZERO_CELSIUS_IN_KELVIN = 1.380649E-23
BOLTZMANN_CONSTANT
def kinetic_energy_per_df(temp):
= temp + ZERO_CELSIUS_IN_KELVIN
temp return (1 / 2) * BOLTZMANN_CONSTANT * temp
This makes the code more readable. We can see what’s happening and why at each step. Also, when we write this way, it becomes clear that this function is trying to do two things at once. Since average kinetic energy per degree of freedom is always expressed with respect to degrees Kelvin, it’s best if we factor out the first conversion, and leave handling temperature units to some other portion of the code.
= -273.15
ABSOLUTE_ZERO_C = 1.380649E-23
BOLTZMANN_CONSTANT
def celsius_to_kelvin(deg_c):
if deg_c < ABSOLUTE_ZERO_C:
raise ValueError(f"{deg_c} is below absolute zero!")
return deg_c - ABSOLUTE_ZERO_C
def kinetic_energy_per_df(deg_k):
return (1 / 2) * BOLTZMANN_CONSTANT * deg_k
Better.
Sometimes we see the same constant peppered throughout some code. Defining a constant and then referring to it by name makes code more readable and helps avoid defects introduced when typing. For example, every time we type 8.617333262E-5
(Boltzmann’s constant in electronvolts per kelvin), we could introduce a typo. Better to define it once and then use the named constant.
= 8.617333262E-5 BOLTZMANN_CONSTANT_EV
Shotgun surgery
Shotgun surgery isn’t something we can always see immediately when looking at our code. It’s more something that we notice when we start to make changes to our code. If we want to change this little thing here, and that breaks something else over there, and in order to fix it we need to change code in three other places, that’s shotgun surgery—fixes scattered all over the place.
If you find yourself performing shotgun surgery, that’s a clear sign your code is in need of refactoring.
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p["name"].split()[0]
def format_contact(p):
return f"{p['name']} <{p['email']}>"
def construct_greeting(p):
return f"Dear {p['name'].split()[0]}:"
= {
person "name": "Mephista Garply",
"email": "mxgarply@uvm.edu"
}
# Test
"Hello there...")
send_email(person, print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
When we run this code, everything works as expected.
Sending to mxgarply@uvm.edu: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista:
Now let’s say the requirement for structure of the person
dictionary changes: instead of name as a single string, we want given name and surname instead.
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p["name"].split()[0]
def format_contact(p):
return f"{p['name']} <{p['email']}>"
def construct_greeting(p):
return f"Dear {p['name'].split()[0]}:"
person = {
"given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
If we run this code, the call to send_email()
works OK, but extract_given_name()
fails with a KeyError
. There is no key name
anymore. So let’s fix that.
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p['given_name']
def format_contact(p):
return f"{p['name']} <{p['email']}>"
def construct_greeting(p):
return f"Dear {p['name'].split()[0]}:"
= {
person "given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
# Test
"Hello there...")
send_email(person, print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
Now we have another KeyError
from format_contact()
!
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p['given_name']
def format_contact(p):
return f"{p['name']} <{p['email']}>"
def construct_greeting(p):
return f"Dear {p['name'].split()[0]}:"
= {
person "given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
# Test
"Hello there...")
send_email(person, print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
Here’s a fix.
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p['given_name']
def extract_full_name(p):
return f"{p['given_name']} {p['surname']}"
def format_contact(p):
return f"{extract_full_name(p)} <{p['email']}>"
def construct_greeting(p):
return f"Dear {p['name'].split()[0]}:"
= {
person "given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
# Test
"Hello there...")
send_email(person, print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
Again we have a KeyError
. This time it’s from construct greeting. Here’s a fix:
def send_email(p, message):
print(f"Sending to {p['email']}: {message}")
# ...more here
def extract_given_name(p):
return p['given_name']
def extract_full_name(p):
return f"{p['given_name']} {p['surname']}"
def format_contact(p):
return f"{extract_full_name(p)} <{p['email']}>"
def construct_greeting(p):
return f"Dear {extract_given_name(p)}:"
= {
person "given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
# Test
"Hello there...")
send_email(person, print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))
If we run this code, all is well again:
Sending to mxgarply@uvm.edu: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista:
Many would say that the best approach would be to use object-oriented programming (OOP) and to construct a person class (new kind of object). For this, I might be inclined to use Python’s dataclass
to facilitate.
from dataclasses import dataclass
@dataclass
class Person:
str
given_name: str
surname: str
email:
@property
def full_name(self):
return f"{self.given_name} {self.surname}"
@property
def greeting(self):
return f"Dear {self.given_name}:"
@property
def contact(self):
return f"{self.full_name} <{self.email}>"
def send_email(p, message):
print(f"Sending to {p.contact}: {message}")
= {
data "given_name": "Mephista",
"surname": "Garply",
"email": "mxgarply@uvm.edu"
}
= Person(**data)
person
# Test
"Hello there...")
send_email(person, print(person.given_name)
print(person.contact)
print(person.greeting)
The lines starting with @
are decorators which “wrap” a function to give it special features or functionality. When we run this, everything works OK.
Sending to Mephista Garply <mxgarply@uvm.edu>: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista
Constructing classes like this is one way of encapsulating information and providing a uniform interface for getting information out. The class itself is responsible for its own data. For more on classes and Python’s dataclass
, see: