The What, Why, and How of Flaky Tests

Flaky tests are the bane of every developer and quality assurance engineer's existence. One day, they pass, the next they fail—without any changes to the code. They are unreliable and hinder trust in automation.

Arpita Goala , Content Marketing Manager

September 13th, 2024

A study by Mabl found that as much as 50% of test failures are caused by flakiness. So what exactly is a flaky test, what causes it, and how can you fix it? Our article has the answers to these questions and more. Read on to find out.

What Are Flaky Tests?

A flaky test is unpredictable and often produces inconsistent results. It both passes and fails in separate runs despite no new changes being introduced to the code or test.

While several things can cause flakiness, its non-deterministic nature can make it challenging for teams to pinpoint the exact cause of failure and reproduce the issue. However, in general, the most common drivers of flakiness are:

Timing and Synchronization: One of the most common causes of flakiness is not accounting for timing discrepancies between when test scripts are executed and the time needed for your application to respond.
– For example, say you’re testing your website and don’t account for any load time. In this scenario, your test would fail if the website doesn’t load immediately because it doesn’t account for different network conditions like 4G or 5G data plans that could slow down loading times.
Concurrency and Race Conditions: When two tests run simultaneously using the same resources, they can compete and cause the other to fail randomly.
Environment Instability: If your testing environment vastly differs from production or live environments, where network speed, server response times, and third-party services can impact performance, your tests will become flaky and unreliable.
Dependencies on External Services: Tests that rely on external APIs or databases can fail if those services are slow, down, or behave unexpectedly. These failures are often mistaken for issues in the codebase. An excellent example of this is a web server that hosts website. If your server is down and you’re testing your website, your test will fail even though your website isn’t to blame; a third-party service is.
Test Data Issues: Sometimes tests rely on specific information, like a username or password, and inconsistent data can cause failures. For example, if a user login requires captcha verification, any issues with the captcha can cause the test to fail. If test data isn’t properly isolated or if datasets are not reset between tests, this can create flakiness.

Why Are Flaky Tests Frustrating?

Flaky tests don’t just impact software quality. They also impact customer trust and team morale. Flaky tests can:

Erode Trust: When tests fail for no apparent reason, it can cast doubt on the efficacy of your entire testing process. Your developers and testing teams could question if all test results are accurate or simply a consequence of flakiness. As a result, you could miss real issues that go into production and impact your reputation. So not only do you lose your team’s trust, but you could also potentially lose your customers’ trust.
Waste Time and Resources: Because flaky tests are hard to diagnose and identify, your team could spend more time fixing flakiness than necessary. According to Capgemini’s World Quality Report, 50% of the time spent on automation is used to repair broken tests, many of which are flaky.
Delay Releases: Flaky tests often cause delays in CI/CD pipelines, making it harder to ensure reliable automated tests before pushing changes to production. Research suggests unreliable test automation can delay product release cycles by 20-25%.

Is It Really a Flaky Test?

Not all test failures are created equal; just because a test fails doesn’t mean it’s flaky. Sometimes, the failure is due to an actual bug in the software, or it could be your application itself.

Branding every failure as flaky without investigating and analyzing your test outcomes is detrimental to your software quality. So, It’s important to distinguish between real failures and flaky tests so you don’t waste time chasing ghosts.

Before You Ring the Alarms

Okay, if not all tests are flaky, how do you determine which ones are? You can confirm flakiness by following the steps below:

Retesting: One way to confirm if a test is flaky is by running it multiple times. If it sometimes passes and sometimes fails, it’s probably flaky. If it fails every time, the software might have a real problem.
Test Across Devices: You’ll also want to confirm if the issue is persistent across multiple devices, platforms and OS combinations. Going as granular and software versions can also help identify if the test is, in fact, flaky or a consequence of an OS update or device specifications.
Parallel Testing: You can also run the test alongside other tests to see if they’re interfering with each other. This helps identify situations where tests are “fighting” over shared resources, like files or data.
Test in Different Environments: Running the test in different setups—such as a local machine, staging, or production environment—can reveal if the failure is specific to one environment. If it works fine in other setups, it could be an issue with the test environment.
Analyze Tests and Results: Digging into the logs can help identify failure patterns. For example, if the test always fails at a particular point, this could indicate a possible timing or dependency issue.
Tools and Frameworks: Utilize test analysis tools that help flag and report flaky tests, such as Jenkins’ Test Flakiness Detection Plugin, Google Test’s flaky test detector, or CI pipeline tools. Some of these tools can even provide detailed insights on how often and under which conditions the test fails. You can also use tags and labels for known and common flaky issues to monitor their frequency over time.

If your test is inconsistent through all these steps, it’s safe to flag it as flaky and disable it until you identify and fix the root cause.

Best Practices for Mitigating Flaky Tests

Mitigating flaky tests is essential for maintaining a reliable and efficient testing process. By following best practices, developers can reduce the likelihood of flaky tests and ensure smoother workflows. Here are four key strategies to consider:

Enhance Test Design

Improving test design is one of the most effective ways to prevent flaky tests. A good test should be deterministic, meaning it always produces the same result with the same input.

To achieve this, each test should have a clear setup and teardown process. Isolate tests so they don’t depend on each other.

Use proper mocking for external services, databases, or APIs. This ensures that tests don’t rely on unstable systems, making them more predictable.

Additionally, writing independent tests that do not share state between runs is crucial for consistency.

Improve Environment Stability

A stable environment helps reduce flakiness. Ensure that the testing environment remains consistent throughout each stage.

One way to do this is by using containerization tools like Docker to create reproducible environments. This guarantees that tests always run under the same conditions.

Avoid environmental dependencies, such as network reliance, which can lead to inconsistent results. If external systems are necessary, use mock servers or stubs to simulate their behavior.

This keeps tests stable and independent of external factors.

Address Timing & Concurrency Issues

Timing and concurrency issues are common causes of flaky tests. To fix this, avoid hard-coded waits or sleep commands in tests. Instead, use dynamic waits that wait for specific conditions to be met.

Also, manage asynchronous code carefully. Synchronize threads properly to prevent race conditions, which can cause unpredictable results.

Handling timing correctly ensures that tests run reliably regardless of external factors.

Continually Monitor & Improve

Mitigating flaky tests is an ongoing process. Continuously monitor tests for flaky behavior using automated tools.

These tools can re-run tests multiple times to detect inconsistencies. Review and update test cases regularly as the application changes.

Implement continuous feedback loops to catch flaky tests early and maintain a robust, reliable test suite over time.

How To Fix Flaky Tests

Once you’ve determined that a test is flaky and disabled it, it’s time to fix the issue at the root. We recommend a four-pronged approach: Isolate, Eliminate, Strengthen, and Simplify. Let’s explore each of these.

Isolate

Since external dependencies sometimes cause flaky tests, you must isolate them from other tests and dependencies to identify the root cause. Use mocks, fakes, and stubs to replace dependencies and ensure controllable and predictable behavior. This ensures that the test results depend only on the logic being tested, not the components.

You’ll also want to ensure that you create dedicated resources for each test, such as temporary databases, files, or environments, so the test can modify itself without impacting others. The environment you choose should remain stable and unmodified while testing is performed.

Finally, you should rest the system before and after each test to ensure previous tests don’t influence the current one. This includes cleaning up databases, clearing caches, or rolling back files to their original state.

Just because tests are isolated doesn’t mean that they are issue-free. So, your team should still manually test impacted functionality for critical features to ensure no issues escape into production. The last thing you want is to risk your quality by skipping necessary quality assurance.

Eliminate

Tests that rely on random or unpredictable values often lead to inconsistent results. If a test passes one time but fails the next because it uses a different random value, it’s flaky. So it’s important you eliminate any randomness from your tests.

Replace random values like dates, times, UUIDs, or user inputs with fixed values. This removes any uncertainty and ensures the same conditions apply every time. Another strategy is to use deterministic algorithms to ensure that the same inputs always lead to the same outputs. This ensures that even if the code generates some data, it does so predictably.

It’s important to note that a manual fallback is sometimes the easiest way to eliminate flakiness. Suppose a test constantly fails and can be performed manually relatively easily and quickly. In that case, eliminating automation from it altogether is more efficient than spending countless hours fixing a flaky test.

Strengthen

Flaky tests often fail because they can’t handle slight environmental changes like network latency or system load. Strengthening your tests and making them more robust can help make them more resilient to fluctuations.

Implement retries, timeouts, and waits to reduce flakiness. Instead of checking for exact values, use assertions that account for slight variations. This makes your tests more forgiving of minor performance fluctuations while still catching significant issues.

You could also use comprehensive assertions to ensure that your tests verify all relevant aspects of the expected outcome. For example, if you’re testing user login, don’t just check if the “Welcome” message appears—also check for session creation, correct redirect URLs, and other expected post-login behaviors.

Simplify

Complex tests are more prone to errors and flakiness because they include more variables and points of failure. By simplifying your tests, you reduce the likelihood of failures from coding errors, unclear logic, or dependencies.

Use clear, descriptive names for every part of your test, whether it’s a function, variable, or assertion. This helps you understand what the test is doing at a glance and reduces the risk of confusion when modifying or debugging the test in the future.

You can also simplify your tests by breaking down complex tests into smaller, reusable functions. This makes the test easier to maintain and debug. If a test fails, it’s much easier to pinpoint the issue if the logic is broken down into modules.

Finally, simplify your test script creation and maintenance by implementing standards and governance processes. Standardizing your scripting ensures reliability and ease of maintenance. You could also use static code analysis tools, linters, and regular, thorough code reviews to catch these errors early and reduce unnecessary flakiness.

Flakiness is Fixable

Flaky tests can be frustrating and costly, but they are not unfixable. Understanding what makes a test flaky allows you to diagnose issues more effectively and take proactive steps to fix them. While it may be overwhelming to address flaky tests on your own, a knowledgeable test automation partner like Testlio can help.

Testlio’s unmatched flexibility and global network of the industry’s leading automation experts ensure that your test suites are reliable, stable, scalable, and efficient. No matter where you are in your automation testing journey, our customizable solutions ensure you get the right level of guidance and support when and where you need it.

Talk to a member of our team to learn how Testlio can help you maximize the benefits of test automation without the burden of flaky tests.