Fix Test Code Smells

Test smell or an anti-pattern: poor code pattern written repeatedly with documented possible improvements.

Improve your test code by learning these common anti-patterns

A poorly built automated test suite quickly becomes a maintenance burden.

Test code can suffer from the same issues as production code. This page doesn't cover such issues for the most part. Instead, this page focuses on major, frequently occurring, test-specific anti-patterns with examples and suggested solutions.

For an arguably exhaustive list, see this Test Smells Catalog¹.

Anti-patterns are grouped according to the graphic below. Assertions are part of the Test Body, but they deserve a separate section.

1. Poor Test Name

Test names are extremely important. Quick to write, hard to do so well. Poor test names hinder understanding, troubleshooting, and refactoring.

Invest time and thought into informative yet succinct names.

Improve a test name when you encounter a poor one.

Useless Test prefix/suffix

Some testing frameworks require the test function (method, class, file) to include "test" in its name to be discoverable. If it's not mandatory, or the framework provides @Test annotations - remove "test". It has no informational value.

// before
@Test
void testAdditionWorks() {}

// after
@Test
void additionWorks() {}

Name that lies

Test name claims to do one thing, but actually does another
Test name claims to do one thing, but actually does multiple checks
- Unit tests should almost always test one thing
- Complex integration and E2E tests may include several checks halfway for fail-fast purposes, but their last, main assertion must match and be related to the test name.

// before - someone added double-digit addition as a 2nd assertion
@Test
void addSingleDigitCorrectly() {
    Calculator calc = new Calculator();

    assertEquals(5, calc.add(2, 3));
    assertEquals(24, calc.add(12, 12));
}

// Solution 1 - rename and parameterize
@ParameterizedTest
@CsvSource({
        "2, 3, 5",
        "12, 12, 24"
})
void addNumbersCorrectly(int a, int b, int expected) {
    Calculator calc = new Calculator();
    assertEquals(expected, calc.add(a, b));
}

// Solution 2 - split the test
@Test
void addSingleDigitCorrectly() {
    Calculator calc = new Calculator();
    assertEquals(5, calc.add(2, 3));
}

@Test
void addDoubleDigitCorrectly() {
    Calculator calc = new Calculator();
    assertEquals(24, calc.add(12, 12));
}

# before - someone added double-digit addition as a 2nd assertion
def test_add_single_digit_correctly():
    calc = Calculator()

    assert calc.add(2, 3) == 5
    assert calc.add(12, 12) == 24


# Solution 1 - rename and parameterize
@pytest.mark.parametrize("a, b, expected", [
    (2, 3, 5),
    (12, 12, 24),
])
# Solution 2 - split the test
def test_add_numbers_correctly(a, b, expected):
    calc = Calculator()
    assert calc.add(a, b) == expected


def test_add_single_digit_correctly():
    calc = Calculator()
    assert calc.add(2, 3) == 5


def test_add_double_digit_correctly():
    calc = Calculator()
    assert calc.add(12, 12) == 24

Ambiguous

When we cannot tell from the test name what exactly is being tested. This is subjective and open to discussion, but it's worth making an initial effort.

// before
@Test
void badPrice() {}

// after (example)
@Test
void priceAboveLimitRejected() {}

test_for_bug_xyz

If a bug is discovered in later stages of development (say, in QA or even Production) and then fixed, a test should be written for it. But just like all other tests, the test name must describe the behavior and be placed into the relevant group of other tests, and not "test_for_bug_ticket_1234" and placed into a "bugs" folder.

// before
@Test
void bug123() {}

//after
@Test
void divisionByZeroHandledCorrectly() {}

Inconsistent

Names within a test suite have an inconsistent style.

// before - 3 different styles, difficult to read
@Test
void priceAboveLimitRejected() {}
@Test
void acceptWithWarningWhenPriceAtLimit() {}
@Test
void whenPriceBelowLimit_thenAccept() {}

// after (example) - same style, much easier to read
// here, they all start with the same pattern "price above/at/below limit..."
@Test
void priceAboveLimitRejected() {}
@Test
void priceAtLimitAcceptedWithWarning(){}
@Test
void priceBelowLimitAccepted(){}

# before - 3 different styles, difficult to read
def test_price_above_limit_rejected():
    pass


def test_accept_with_warning_when_price_at_limit():
    pass


def test_when_price_below_limit_then_accept():
    pass


# after (example) - same style, much easier to read
# here, they all start with the same pattern "price above/at/below limit..."
def test_price_above_limit_rejected():
    pass


def test_price_at_limit_accepted_with_warning():
    pass


def test_price_below_limit_accepted():
    pass

Decide on a naming style and convention and keep it consistent as much as possible, but bear in mind that in a large test suite of 100s and 1000s of tests, not all test names can fit into a simple template.

2. Poor Test Body

Prod logic duplication

A test provides input and checks output. Transforming input into output is the job of production code. Tests must not duplicate production logic - otherwise, the code exists in two places and the test adds no value.

// before
@Test
void addTwoNumbersWorksCorrectly() {
    Calculator calc = new Calculator();
    int result = calc.add(2, 3);

    // BAD: test re-implements production logic. 
    // If there's a bug, it just confirms the bug
    int expected = 2 + 3;
    assertEquals(expected, result);
}

// after
@Test
void addTwoNumbersWorksCorrectly() {
    Calculator calc = new Calculator();
    int result = calc.add(2, 3);

    // Good: only the final output
    int expected = 5;
    assertEquals(expected, result);
}

def test_add_two_numbers_works_correctly():
    calc = Calculator()
    result = calc.add(2, 3)

    # BAD: test re-implements production logic
    # If there's a bug, it just confirms the bug
    expected = 2 + 3  
    assert result == expected
    
def test_add_two_numbers_works_correctly():
    calc = Calculator()
    result = calc.add(2, 3)

    # Good: only the final output
    assert result == 5

Non-parameterized tests

2+ separate tests have been written, but they could've been a single parameterized test.

private final UsernameValidator validator = new UsernameValidator();

// before - 14 lines of code incl. empty lines
@Test
void rejectsTooShort() {
    assertFalse(validator.isValid("ab"));
}
    
@Test
void rejectsWithDigit() {
    assertFalse(validator.isValid("abc1"));
}

@Test
void rejectsWithSpecialChar() {
    assertFalse(validator.isValid("abc!"));
}

// after - 5 lines of code
@ParameterizedTest
@ValueSource(strings = { "ab", "abc1", "abc!" })
void rejectsInvalidUsername(String input) {
    assertFalse(validator.isValid(input));
}

# before - too many lines of code
def test_rejects_too_short():
    validator = UsernameValidator()
    assert not validator.is_valid("ab")

def test_rejects_with_digit():
    validator = UsernameValidator()
    assert not validator.is_valid("abc1")

def test_rejects_with_special_char():
    validator = UsernameValidator()
    assert not validator.is_valid("abc!")
    
# after - much more concise
@pytest.mark.parametrize("username", ["ab", "abc1", "abc!"])
def test_rejects_invalid_usernames(username):
    validator = UsernameValidator()
    assert not validator.is_valid(username)

Over-parameterized tests

The opposite of non-parameterized tests. Logically different tests merged into one for convenience or conciseness, reducing clarity.

private final UsernameValidator validator = new UsernameValidator();

// before - valid and invalid scenarios are different
// both input and output are parameterized
@ParameterizedTest
@CsvSource({
        "hello!, false",
        "World, true"
})
void validateStrings(String input, boolean expected) {
    assertEquals(expected, validator.isValid(input));
}

// after - valid and invalid scenarios split
// given that there are many more potential inputs for each,
// both valid and invalid tests can be parameterized
@Test
void invalidInputRejected() {
    assertFalse(validator.isValid("abc1"));
}

@Test
void validInputAccepted() {
    assertTrue(validator.isValid("Jin"));
}

validator = UsernameValidator();

# before - valid and invalid scenarios are different
# both input and output are parameterized
@pytest.mark.parametrize(
    "input_str, expected",
    [
        ("hello!", False),
        ("World", True)
    ]
)
def test_validate_strings(input_str, expected):
    assert validator.is_valid(input_str) == expected


# after - valid and invalid scenarios split
# given that there are many more potential inputs for each
# both valid and invalid tests can be parameterized
def test_invalid_input_rejected():
    assert not validator.is_valid("abc1")

def test_valid_input_accepted():
    assert validator.is_valid("Jin")

Signs when tests may be over-parameterized:

Inputs and outputs are both parameters (e.g., valid and invalid scenarios)
Test covers logically distinct scenarios in a single function
Difficulty in reasoning about what the test actually verifies (it verifies this AND that AND this)
if-else branching in test body:

if (test_input == a)
   assert one behavior
else
   assert another behavior

Complex / Large Test

If a test fails, we want to understand and troubleshoot it very quickly. That is why test code should be very simple, mostly non-nested.

Presence of loops, branching, and others generally make tests unnecessarily complex.

Possible solutions: improve variable names, break up a large test into several smaller tests, parameterize where appropriate

Complex integration or system tests are inevitably... complex. But their bodies should still be mostly flat and easy to read, with necessary complexities refactored or abstracted away into the framework or helper functions.

3. Assertions

Commented out or no assertions

No assertions: Test "passes" without verifying anything. Worse even - it gives false confidence.
Commented out assertions: Suppose a test fails, and the assertion is "temporarily" commented out or removed. The test runs and shows green. This may be OK for troubleshooting purposes on a local machine.

NEVER COMMIT SUCH CODE INTO THE SHARED REPOSITORY

Alternative actions:

Disable the test + add link to Created Ticket to resolve this
If a failed test doesn't block the build (e.g., separately run E2E tests), keep the test failing to keep the problem visible until it is resolved. Ultimately, it may be OK to just delete the test.

Too Many Assertions

More than one assertion means that if the first one fails, the second one will not run, and potentially important information will be missed.

// 2 test smells in one
// 1) multiple assertions check different behavior
// 2) The result test name is vague/ambiguous
@Test
void calculatorWorks() {
    Calculator calc = new Calculator();
    
    assertEquals(5, calc.add(2, 3));
    assertEquals(6, calc.multiply(2, 3));
    assertEquals(2.0, calc.divide(6, 3));
}
// Suggestion: 
// one test per core behavior (one for add(), another for multiply(), etc.)

# 2 test smells in one
# 1) multiple assertions check different behavior
# 2) The result test name is vague/ambiguous
def test_calculator_works():
    calc = Calculator()

    assert calc.add(2, 3) == 5
    assert calc.multiply(2, 3) == 6
    assert calc.divide(6, 3) == 2.0

# Suggestion: 
# one test per core behavior (one for add(), another for multiply(), etc.)

Possible solution:

If assertions are logically similar - parameterize the test
If assertions test different scenarios - split into several tests

Some frameworks allow soft (that don't fail the test immediately) and hard (fail right away) assertions.

A careful combination of a few soft assertions and hard assertions might be justified in some situations.

The "one assertion" rule usually applies to unit tests.

It may be OK to have several "fail-fast" checks in the middle of complex integration tests before the main assertion.

Wrong assertion used

There is more than one way to assert the same thing. The problem arises when the test fails. A mismatch of assertion to situation causes you to be diverted away from the problem. Is the error message as meaningful as it could be?

Example 1: Asserting a boolean when comparing two values is possible

@Test
void misusedBooleanAssertion() {
    Calculator calc = new Calculator();
    assertTrue(calc.add(2, 2) == 5);
}
// Error message of little use
// Expected :true
// Actual   :false

@Test
void correctUsage() {
    Calculator calc = new Calculator();
    assertEquals(5, calc.add(2,2));
}
// Error message with more useful information
// Expected :5
// Actual   :4

Example 2: Using boolean assertions on a list, when a more specific check is possible

@Test
void improperListAssertion() {
    List<String> actual = List.of("apple", "banana");
    boolean listHasRequiredFruit = actual.contains("pear");
    assertTrue(listHasRequiredFruit);
}
// Error message of little use
// Expected :true
// Actual   :false

@Test
void properListAssertion() {
    List<String> actual = List.of("apple", "banana");
    assertThat(actual).contains("pear");
}
//    Very useful message:

//    Expecting List12:
//            ["apple", "banana"]
//    to contain:
//            ["pear"]
//    but could not find the following element(s):
//            ["pear"]

4. Setup / Cleanup (Teardown)

Interdependent tests

In unique and very complex scenarios, it may be beneficial to construct a chain of interdependent tests for justifiable reasons.

But much more often than not, tests should be independent of each other. You should be able to run them in any order, and they must produce the same result.

Example:

Test 1 creates {Object} and asserts creation was successful. Leaves it for Test 2.
Test 2 updates {Object} and asserts update was successful.

Test 2 is dependent on Test 1. If the first one fails, it will cause a cascade of failures, usually false positives, meaning the "update" functionality works fine, but it was flagged as "failed".

Refactored example:

Test 1 creates {Object}, asserts something, and deletes the {Object} as part of clean up (teardown)
Test 2 creates an independent {Object} as part of setup, asserts something, and again deletes the {Object} as part of clean up (teardown).

The refactored example generally leads to slower runtime (more time necessary to create/delete independent test data), but the benefits of stability and deterministic behavior far outweigh the cost of troubleshooting (and rerunning) of large cascading failures.

There are many other ways tests can be independent, mostly unnecessarily: 1) Test 1 invoking Test 2 in its body, when they could be written independently

2) Tests (over)sharing a database or another common resource. Can a separate instance be created? Sometimes not, as it requires fighting through insurmountable corporate bureaucracy, for example.

Non-extracted Setup/Cleanup

The general rule of thumb states that tests should be "self-contained", including their setup and teardown. Such tests are generally easier to understand and troubleshoot.

But dozens or even hundreds of tests may share the same setup/teardown. This then leads to large code duplications and an increasing maintenance cost.

In such cases, applying DRY (Don't Repeat Yourself) and refactoring setup/teardown into reusable components is beneficial.

After all, most mature test runners already provide fixtures to centralize setup and teardown code.

========== Before ==========

@Test
void relatedTestOne() {
   // Repeated Arrange

   // Unique Act & Assert

   // Repeated Teardown
}

@Test
void relatedTestTwo() {
   // Repeated Arrange

   // Unique Act & Assert

   // Repeated Teardown
}

========== After ==========
@BeforeEach
void setUp() {
   // Refactored Arrange
}

@AfterEach
void teardown() {
   // Refactored Teardown
}

// tests using common setup and teardown

Other

Life is complex and messy. So is software development, including test automation. A golden rule in one context can be inadequate in another.

This is why giving prescriptive advice can be hard and at times - misleading.

Nevertheless, here are some other high-level occurrences in test automation that are likely to be considered anti-patterns.

Tests that do not have environment/configuration setup yet still depend on them.
1. "It works on my machine because I manually configured something, but it won't work on your machine."
2. "You must wait for hour X to launch these tests for them to work."
Hidden assertions and/or test data
1. "The test failed, I'm trying to look at the data and the assertion, but they're hidden deep in the test framework behind multiple layers of abstraction."

References

Open Catalog of Test Smells: https://test-smell-catalog.readthedocs.io/en/latest/

The downsides of this catalog:

Overwhelming: it is very large.
Hard to remember: test smells often given custom names by their authors (e.g., Parasite, Local Hero), which is fun, but hard to remember.
Contains many duplicates and overlaps: a non-coordinated effort results in a list where items overlap partially or fully. E.g., the catalog contains "The Flickering Test", "Flaky test", "Flaky locator", "Having Flaky or Slow Tests", "Erratic Test", "Nondeterministic Test".
Few code examples.
Contains general anti-patterns applicable to production code (subjective): messy naming, unused code, useless comments, basic code duplication, poor abstractions, inheritance used instead of composition, etc. All of these issues are relevant to both prod and test code, but it makes it hard to discern test-specific smells for those who seek just them.

PreviousMutation Testing NextTesting Books: Curated List

Last updated 5 months ago

hashtag1. Poor Test Name

hashtagUseless Test prefix/suffix

hashtagName that lies

hashtagAmbiguous

hashtagtest_for_bug_xyz

hashtagInconsistent

hashtag2. Poor Test Body

hashtagProd logic duplication

hashtagNon-parameterized tests

hashtagOver-parameterized tests

hashtagComplex / Large Test

hashtag3. Assertions

hashtagCommented out or no assertions

hashtagToo Many Assertions

hashtagWrong assertion used

hashtag4. Setup / Cleanup (Teardown)

hashtagInterdependent tests

hashtagNon-extracted Setup/Cleanup

hashtagOther

hashtagReferences

1. Poor Test Name

Useless Test prefix/suffix

Name that lies

Ambiguous

test_for_bug_xyz

Inconsistent

2. Poor Test Body

Prod logic duplication

Non-parameterized tests

Over-parameterized tests

Complex / Large Test

3. Assertions

Commented out or no assertions

Too Many Assertions

Wrong assertion used

4. Setup / Cleanup (Teardown)

Interdependent tests

Non-extracted Setup/Cleanup

Other

References