Code smells for beginners

Author

Clayton Cafiero

Published

2025-06-24

Code smells? Seriously? What on earth is a “code smell”?

Have you ever opened some mystery container in your refrigerator and gotten a whiff of something nasty? That’s a fridge smell: an indicator that something isn’t right.

A code smell is an indicator that there’s something wrong with your code. A code smell isn’t the same as a bug. Your code might still run, even if it’s a little smelly. But a code smell is often a sign that code is brittle or in need of refactoring. Accordingly, you should be aware of common code smells so when you catch a whiff, you’ll take a second look at your code to see what might be improved.

Some code smells

Duplicated code
Long function
Magic numbers
Shotgun surgery

There are many other well-known code smells but these will suffice to get you started.

Duplicated code

Whenever you have identical or near identical code, that’s often a sign that the duplicated code could be turned into a function and called as needed. Here’s an example:

player_a = []
for _ in range(5):
    # simulate rolls of five dice
    roll = random.choice([1, 2, 3, 4, 5, 6])
    player_a.append(roll)

player_b = []
for _ in range(5):
    # simulate rolls of five dice
    roll = random.choice([1, 2, 3, 4, 5, 6])
    player_b.append(roll)

Here we have a serious code smell, but there’s an easy fix. All we need to do is extract the duplicated code and turn it into a function.

def get_rolls():
    # simulate rolls of five dice
    lst = []
    for _ in range(5):
        roll = random.choice([1, 2, 3, 4, 5, 6])
        lst.append(roll)
    return lst
        
player_a = get_rolls()
player_b = get_rolls()

Much better, but now, with this code duplication eliminated and a new function created, we can see how easy it would be to make the function more flexible, to accommodate dice with a number of faces other than six and to accommodate a different number of rolls. Where before we’d have to make changes in two places, now we can do it all in one place!

def get_rolls(faces, rolls):
    """
    Simulate rolls of a number of dice 
    with variable number of faces
    """
    lst = []
    for _ in range(rolls):
        roll = random.choice(range(1, faces + 1))
        lst.append(roll)
    return lst
        
player_a = get_rolls(6, 5)
player_b = get_rolls(6, 5)

Better still. Now, if we want three rolls of dodecahedral (12-face) dice we need only change the last two lines.

player_a = get_rolls(12, 3)
player_b = get_rolls(12, 3)

There’s actually an acronym for a sound principle of software design—DRY: don’t repeat yourself.

Long function

This one’s not quite so clear cut. What exactly is too long? This will vary from case to case, but a good general rule is that if a function is longer than, say, 25 lines then it’s likely ripe for refactoring.

Why is this so? It’s often the case that long functions are long because they’re doing too much, or are performing a computation which would best be implemented in a number of steps. Good functions have clear and limited responsibilities—a function should do one thing and do it well.

Here’s one example of a long function.

import math

def compute_entropy():
    while True:
        positive = input("Enter the number of positive samples: ")
        try:
            positive = int(positive)
            if positive >= 0:
                break
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {positive} to an integer")
    while True:
        negative = input("Enter the number of negative samples: ")
        try:
            negative = int(negative)
            if negative >= 0:
                break
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {negative} to an integer")
    total = positive + negative
    if total == 0:
        print("Cannot compute entropy with zero samples!")
        return None
    term_1 = - (positive / total) * math.log2(positive / total)
    term_2 = - (negative / total) * math.log2(negative / total)
    print(f"Total entropy is {term_1 + term_2:.3f}.")
    return term_1 + term_2

compute_entropy()

This code works in most instances. There is a bug, but it’s hard to see because this function is too large. This function is trying to do too much. Yes, there’s some duplicated code, but this function is unfocused as well. Let’s refactor.

First, we’ll extract another function to get valid input from the user.

import math

def get_num(label):
    """
    This function is responsible for getting a good value
    from the user.
    """
    while True:
        n = input(f"Enter the number of {label} samples: ")
        try:
            n = int(n)
            if n >= 0:
                return n
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {n} to an integer")

def compute_entropy():
    positive = get_num('positive')
    negative = get_num('negative')
    total = positive + negative
    if total == 0:
        print("Cannot compute entropy with zero samples!")
        return None
    term_1 = - (positive / total) * math.log2(positive / total)
    term_2 = - (negative / total) * math.log2(negative / total)
    print(f"Total entropy is {term_1 + term_2:.3f}.")
    return term_1 + term_2

compute_entropy()

That’s a step in the right direction.

Let’s continue. compute_entropy() is still trying to do too much. It computes a value and prints the result and returns the result. Computation and display are two very different things, and they should be kept separate. Separation of concerns leads to good program design.

Let’s separate concerns: computation and display.

import math

def get_num(label):
    """
    This function is responsible for getting a good value
    from the user.
    """
    while True:
        n = input(f"Enter the number of {label} samples: ")
        try:
            n = int(n)
            if n >= 0:
                return n
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {n} to an integer")

def compute_entropy(pos, neg):
    """
    Compute entropy, given number of positive and negative
    samples.
    """
    total = pos + neg
    if total == 0:
        raise ValueError("Cannot compute entropy with zero samples!")
    p_term = - (pos / total) * math.log2(pos / total)
    n_term = - (neg / total) * math.log2(neg / total)
    return p_term + n_term 

positive_samples = get_num('positive')
negative_samples = get_num('negative')
entropy = compute_entropy(positive_samples, negative_samples)
print(f"Total entropy is {entropy:.3f}.")

Much better. Now we have get_num() which is called twice to get the necessary values from the user, and we have a pure function for computing entropy. Pure functions are easy to test—so let’s do just that (and we’ll find our bug).

import math

def get_num(label):
    """
    This function is responsible for getting a good value
    from the user.
    """
    while True:
        n = input(f"Enter the number of {label} samples: ")
        try:
            n = int(n)
            if n >= 0:
                return n
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {n} to an integer")

def compute_entropy(pos, neg):
    """
    Compute entropy, given number of positive and negative
    samples.
    """
    total = pos + neg
    if total == 0:
        raise ValueError("Cannot compute entropy with zero samples!")
    p_term = - (pos / total) * math.log2(pos / total)
    n_term = - (neg / total) * math.log2(neg / total)
    return abs(p_term + n_term)

def test_compute_entropy():
    assert math.isclose(compute_entropy(4, 5), 0.991076060)
    assert math.isclose(compute_entropy(100, 100), 1.0)
    assert math.isclose(compute_entropy(0, 10), 0.0)


test_compute_entropy()   # will be silent if all OK

positive_samples = get_num('positive')
negative_samples = get_num('negative')
entropy = compute_entropy(positive_samples, negative_samples)
print(f"Total entropy is {entropy:.3f}.")

If all is well, tests will pass silently. However, on account of our bug we get the following error.

Traceback (most recent call last):
  File "<string>", line 35, in <module>
  File "<string>", line 33, in test_compute_entropy
  File "<string>", line 26, in compute_entropy
ValueError: math domain error

On line 33, we have this test:

assert math.isclose(compute_entropy(0, 10), 0.0)

(The .isclose() function provided with the math module is used to compare numbers where we aren’t concerned with tiny representation or rounding errors. For example, 1000.0000000000000000 and 1000.0000000000000001 are close enough and in most cases we wouldn’t want a comparison to fail for a such tiny difference.)

The problem, now exposed, is that we can’t take math.log2(0). That’s what gives us the math domain error. Logarithms are only defined for positive values. So what can be done? We could do one of three things:

We can check the value before calling math.log2().
We can wrap the call to math.log2() and supply a default value in the event of an exception.
We can add a wee tiny epsilon to guarantee that we don’t pass zero to math.log2().

Let’s roll with the first option. Because this call to math.log2() is so intimately connected with the entropy calculation, let’s create a helper function inside compute_entropy() to implement the fix.

import math

def get_num(label):
    """
    This function is responsible for getting a good value
    from the user.
    """
    while True:
        n = input(f"Enter the number of {label} samples: ")
        try:
            n = int(n)
            if n >= 0:
                return n
            print("Value must be >= 0!")
        except ValueError:
            print(f"Cannot convert {n} to an integer")

def compute_entropy(pos, neg):
    """
    Compute entropy, given number of positive and negative
    samples.
    """
    def get_log(x):
        """Guarded call to math.log2() """
        if x == 0.0:
            return 0.0
        return math.log2(x)
        
    total = pos + neg
    if total == 0:
        raise ValueError("Cannot compute entropy with zero samples!")
    p_term = - (pos / total) * get_log(pos / total)
    n_term = - (neg / total) * get_log(neg / total)
    return abs(p_term + n_term)

def test_compute_entropy():
    assert math.isclose(compute_entropy(4, 5), 0.991076060)
    assert math.isclose(compute_entropy(100, 100), 1.0)
    assert math.isclose(compute_entropy(0, 10), 0.0)


test_compute_entropy()   # will be silent if all OK

positive_samples = get_num('positive')
negative_samples = get_num('negative')
entropy = compute_entropy(positive_samples, negative_samples)
print(f"Total entropy is {entropy:.3f}.")

Now we have nicely structured code that’s easy to read and easy to test—and as a result, it’s more robust.

Magic numbers

Magic numbers are numbers that… poof!… appear in your code seemingly from nowhere. In the previous example, 0 and 0.0 aren’t magic numbers because it’s so abundantly clear from context why we need to compare with zero. However, consider this function:

def kinetic_energy_per_df(temp):
    temp = temp + 273.15
    return (1 / 2) * 1.380649E-23 * temp

Here we have two magic numbers. The first is used to convert degrees Celsius to degrees Kelvin. The second is Boltzmann’s constant. Both are constants, and either could be used elsewhere. It’s best to define them as Python constants.

ZERO_CELSIUS_IN_KELVIN = 273.15
BOLTZMANN_CONSTANT = 1.380649E-23

def kinetic_energy_per_df(temp):
    temp = temp + ZERO_CELSIUS_IN_KELVIN
    return (1 / 2) * BOLTZMANN_CONSTANT * temp

This makes the code more readable. We can see what’s happening and why at each step. Also, when we write this way, it becomes clear that this function is trying to do two things at once. Since average kinetic energy per degree of freedom is always expressed with respect to degrees Kelvin, it’s best if we factor out the first conversion, and leave handling temperature units to some other portion of the code.

ABSOLUTE_ZERO_C = -273.15
BOLTZMANN_CONSTANT = 1.380649E-23

def celsius_to_kelvin(deg_c):
    if deg_c < ABSOLUTE_ZERO_C:
        raise ValueError(f"{deg_c} is below absolute zero!")
    return deg_c - ABSOLUTE_ZERO_C

def kinetic_energy_per_df(deg_k):
    return (1 / 2) * BOLTZMANN_CONSTANT * deg_k

Better.

Sometimes we see the same constant peppered throughout some code. Defining a constant and then referring to it by name makes code more readable and helps avoid defects introduced when typing. For example, every time we type 8.617333262E-5 (Boltzmann’s constant in electronvolts per kelvin), we could introduce a typo. Better to define it once and then use the named constant.

BOLTZMANN_CONSTANT_EV = 8.617333262E-5

Shotgun surgery

Shotgun surgery isn’t something we can always see immediately when looking at our code. It’s more something that we notice when we start to make changes to our code. If we want to change this little thing here, and that breaks something else over there, and in order to fix it we need to change code in three other places, that’s shotgun surgery—fixes scattered all over the place.

If you find yourself performing shotgun surgery, that’s a clear sign your code is in need of refactoring.

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p["name"].split()[0]

def format_contact(p):
    return f"{p['name']} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {p['name'].split()[0]}:"
    
person = {
    "name": "Mephista Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

When we run this code, everything works as expected.

Sending to mxgarply@uvm.edu: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista:

Now let’s say the requirement for structure of the person dictionary changes: instead of name as a single string, we want given name and surname instead.

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p["name"].split()[0]

def format_contact(p):
    return f"{p['name']} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {p['name'].split()[0]}:"

person = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

If we run this code, the call to send_email() works OK, but extract_given_name() fails with a KeyError. There is no key name anymore. So let’s fix that.

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p['given_name']

def format_contact(p):
    return f"{p['name']} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {p['name'].split()[0]}:"
    
person = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

Now we have another KeyError from format_contact()!

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p['given_name']

def format_contact(p):
    return f"{p['name']} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {p['name'].split()[0]}:"
    
person = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

Here’s a fix.

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p['given_name']

def extract_full_name(p):
    return f"{p['given_name']} {p['surname']}"
    
def format_contact(p):
    return f"{extract_full_name(p)} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {p['name'].split()[0]}:"
    
person = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

Again we have a KeyError. This time it’s from construct greeting. Here’s a fix:

def send_email(p, message):
    print(f"Sending to {p['email']}: {message}")
    # ...more here

def extract_given_name(p):
    return p['given_name']

def extract_full_name(p):
    return f"{p['given_name']} {p['surname']}"
    
def format_contact(p):
    return f"{extract_full_name(p)} <{p['email']}>"
    
def construct_greeting(p):
    return f"Dear {extract_given_name(p)}:"
    
person = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

# Test
send_email(person, "Hello there...")
print(extract_given_name(person))
print(format_contact(person))
print(construct_greeting(person))

If we run this code, all is well again:

Sending to mxgarply@uvm.edu: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista:

Many would say that the best approach would be to use object-oriented programming (OOP) and to construct a person class (new kind of object). For this, I might be inclined to use Python’s dataclass to facilitate.

from dataclasses import dataclass

@dataclass
class Person:
    given_name: str
    surname: str
    email: str

    @property
    def full_name(self):
        return f"{self.given_name} {self.surname}"
        
    @property
    def greeting(self):
        return f"Dear {self.given_name}:"
        
    @property    
    def contact(self):
        return f"{self.full_name} <{self.email}>"
        
def send_email(p, message):
    print(f"Sending to {p.contact}: {message}")
    
data = {
    "given_name": "Mephista",
    "surname": "Garply",
    "email": "mxgarply@uvm.edu"
}

person = Person(**data)

# Test
send_email(person, "Hello there...")
print(person.given_name)
print(person.contact)
print(person.greeting)

The lines starting with @ are decorators which “wrap” a function to give it special features or functionality. When we run this, everything works OK.

Sending to Mephista Garply <mxgarply@uvm.edu>: Hello there...
Mephista
Mephista Garply <mxgarply@uvm.edu>
Dear Mephista

Constructing classes like this is one way of encapsulating information and providing a uniform interface for getting information out. The class itself is responsible for its own data. For more on classes and Python’s dataclass, see:

Reuse

CC BY-NC-SA 4.0