Pairwise Testing
Quick Summary
Coupled with domain partitioning, Pairwise Testing is an effective technique to reduce the combinatorial explosion to a manageable set of tests
The greater the exhaustive test set, the greater the savings with Pairwise (e.g., reduce from 40 tests to 20, or 2400 to 80).
Use PICT or other tools to generate pairs
Pairwise can be used for both Input Values and Configuration or Environment parameters.
Pairwise output is not perfect - review it, remove non-meaningful or low-value combinations, and add high-value ones. Use domain knowledge and expertise to do so.
Testing in Isolation vs. in Combination
EP & BVA techniques typically apply to a single input. In other words, inputs (or configurations, or others) are tested in isolation. This must be done, but it is far from enough.
Even relatively simple software has multiple variables that interact. These variables may come from the UI, the operating system, configuration files, a database, the network, or other sources. Many bugs arise from such interactions. In combinatorial testing, the task is to verify that the software functions correctly with such variable combinations.
Calculating Combinatorial Explosion
The first task is to handle (reduce) the number of test cases, which grows exponentially with the number of variables and possible values.
Q: How to tell how many combinations I am facing?
The basic formula for counting total combinations is:
Where:
v1, v2, vn = the number of possible values for each variable.
You multiply them together
Example 1: Given 3 checkboxes (binary on/off values only), then it is the same as the growth of a Decision Table (23 or more generally 2n)
Example 2: Some variables have 3+ values
Browser: 3 values (Chrome, Firefox, Safari)
OS: 2 values (Windows, macOS)
Language: 4 values (EN, FR, ES, DE)
Example 3: Booking a hotel, multi-select possible, one input field accepts a range of integers
Hotel: 3, 4, or 5 stars
Who is staying: Adults only, adults with kids
Add-ons (multi-select, 4 options, so 24 combinations): unlimited buffet (yum!), late checkout, airport transfer, parking
Nights (integer): 1-10 values allowed
For mission non-critical systems, 960 tests are too many. How can we reduce the number intelligently but retain good coverage?
Step 1: Start with Domain Partitioning
Apply EP & BVA techniques.
The primary candidate in the hotel booking is Nights. We can reduce the 1-10 values to:
1 and 10 (min, max)
1, 5, and 10 (min, mid, max)
Suppose we find out that:
7 nights (a week) is the most common length of stay
Almost no one ever chooses "Late Checkout"
Thus, we may decide to:
Choose 1, 7, and 10 (min, common, max) values for "Nights"
Leave out "Late Checkout" from the combinations. We can test it separately later
We are down to 144 cases thanks to EP, BVA, and domain knowledge. We may now proceed to combinatorial testing.
Step 2: Understand All Singles (Optional)
The "All Singles" is rarely used in practice, as it reduces the number of cases (and coverage) too much. It serves as a stepping stone to the Pairwise technique.
"Full coverage" in All Singles means that every value of every variable appears in at least one test, but we don’t care about combinations with other variables.
To achieve all singles, you only need at least as many tests as the largest number of values in any variable.
Normally, we just pick the highest number.
In our previous formula of 3×2×23×3, it would normally be 23= 8, resulting in 8 cases. But since this is a multi-select dropdown, 8 represents combinations, not distinct values.
There are only 3 distinct values. Thus, 3 is our highest number.
1
3
Adults only
Buffet
1
2
4
Adult + Child
Airport Transfer
7
3
5
Adults only
Parking
10
This table uses every possible value of every variable, but as you might guess, there is a high chance of missing a combinatorial bug with just 3 out of 144 cases.
Step 3: Achieve All Pairs (Pairwise)
All Pairs, aka Pairwise technique, is much more robust than All Singles, yet keeps the number of tests manageable. While there is no universal formula, a rule of thumb is to simply multiply the two largest numbers of values in any variable.
Examples:
3 browsers x
2 Operating systems x
4 languages
3 x 2 x 4 = 24
4 x 3 = 12 (saves 50%)
4
5 car brands x 4 colors x 2 engine types
5 x 4 x 2 = 40
5 x 4 = 20 (saves 50%)
5
3 subscription plans x 5 payment methods x 5 regions
5 x 5 x 3 = 75
5 x 5 = 25 (saves 66%)
5
Var 1 with 10 values x Var 2 with 8 values x Var 3 with 6 values x Var 4 with 5 values
10 x 8 x 6 x 5 = 2400
10 x 8 = 80 (saves 96%)
10
The greater the number of All Combinations, the greater are the savings from Pairwise
Step 4: Create Pairwise Table
Available Tools
It is possible to create the table manually, but it's a tedious and error-prone process. Instead, use a tool. Both commercial and open-source alternatives exist.
PICT is a widely used open-source tool:
GitHub by Microsoft: https://github.com/microsoft/pict
Online - an online service powered by Microsoft PICT created by an individual: https://pairwise.yuuniworks.com/
Quick PICT Tutorial for Windows Users
Non-Windows users: see instructions on how to build the tool from source.
For Windows users:
Go to the releases page and download the latest .exe binary
PICT is a CLI tool, so double-clicking the .exe achieves nothing. Instead, open the
cmdand run it
PICT is a sophisticated tool that allows a lot of behavior fine-tuning, and the documentation is very well written. Below is just a basic example of how to use it.
The only mandatory argument that the CLI expects is the model - a plain text file in a specific format.
Assuming the model file is input.txt , is located in the same dir as the .exe file, and it contains:
When you run:
Then the output.txt should contain:
You can now use the file to test software manually, or feed it to an automation script.
Step 5: Review and Amend the Table
Changing non-meaningful pairs
Using the Pairwise technique is like throwing everything into a mixer. As such, the output is not guaranteed to be always meaningful.
Non-meaningful examples:
*Example is from PICT tool docs
Review the output table and either manually adjust it or use the tool to prevent undesirable combinations. PICT allows you to do this with special syntax:
Adding special or valuable pairs
Step 1 emphasizes the importance of utilizing domain knowledge to generate more effective test values for domain partitioning. The same principle applies when reviewing the Pairwise Table.
Use your domain knowledge to:
remove meaningful but low-priority combinations,
add high-value, frequent, commonly used, critically important combinations
Again, the tool you're using should allow you to fine-tune the output without having to amend it later.
Use Pairwise for Both Input Values and Configurations
The content on this page had several examples, one with Hotel selection, and another for combinations of Browsers x Operating Systems x etc. It is no coincidence.
Pairwise testing can be applied to:
Business or technical input values (coming from UI, API, or other)
Environmental or configuration parameters: hardware, OS, browser, tech stack version, protocol, encoding, encryption, etc.
FAQ
TLDR: any question you may have in relation to the practicalities of using a combinatorial tool should be covered in the documentation of your tool. Nevertheless, here are the more important ones:
Q: How do I prevent the tool from creating invalid or ineffective combinations?
A: The tool should support constraint creation. PICT allows both simple and non-trivial constraints.
Q: I want the test suite to always contain certain important combinations.
A: This is called a "seed". PICT allows this. See Seeding section.
Q: I want to have "negative" scenarios with invalid values, but I don't want to have any scenarios with multiple invalid values because of "input masking", i.e. when the 1st invalid value is handled, the effect of the 2nd invalid value (or all other values) is typically "masked". How do I go about this?
A: Two solutions:
Use a separate input model and create a separate set with negative scenarios only
If the tool allows, use special syntax to mark invalid values. The tool can then ensure that only one invalid value per row is used. See PICT's Negative Testing section.
Q: I have determined that some parameter groups must be covered at higher interaction strength t (3-way, 4-way, instead of 2-way (pairwise)). But if I increase the t on the entire input model (ie ALL parameters), I incur the penalty of having many more test cases. Is there a balanced approach?
Example (plain):
A: Yes, it is called mixed-strength arrays. If the tool allows it, you may specify higher interaction strength just for the selected parameters. See PICTs Sub-Models section.
Criticisms
In their paper "Pairwise Testing: A Best Practice That Isn’t", Bach and Shroeder provide important criticisms and caveats on how and why Pairwise Testing may fail. The proposed solutions below are not part of the original paper.
Pairwise testing fails when you don’t select the right values to test with.
If prior Domain (Equivalence) Partitioning was done poorly, then the weakness of such selected values is carried over into pairwise testing.
Solution: carry out high-quality EP&BVA, construct your own scenarios of highest importance, and provide them as seed to the Pairwise tool
Pairwise testing fails when you don’t have a good enough oracle.
The value pairs (or triples, or quadruples) chosen DO expose a fault, but you or the automated test script fails to notice it. The fault may be unforeseen and hidden - gradual data or state corruption, an error in a downstream service that is not being observed as part of the scoped test.
Solution: There is no single solution or "best practice" for this other than using one's best judgment.
Pairwise testing fails when highly probable combinations get too little attention.
This is similar to Point 1 - in addition to EP&BVA, domain knowledge may be used to select highly used or highly probable value combinations (usage statistics, "factory" default configurations, etc.)
Pairwise testing fails when you don’t know how the variables interact.
There is plenty of software that produces output based on 3,4 and even 10 inputs. In such cases, Pairwise (2-way) testing is not very likely to find bugs hiding in such high-strength interactions.
Solution: analyze the software (from both black-box and, if possible, white-box perspectives) to determine the most appropriate strength of combinatorial testing and produce mixed-strength arrays. For example:
Area A of SUT uses variables A-D, max. Interaction is 2 - apply Pairwise testing
Area B of SUT uses variables E-M, max. Interaction is 5 - consider applying 4- or 5-way testing.
References
PICT documentation
Pairwise Testing in the Real World
Pairwise Testing: A Best Practice That Isn’t
Last updated