Skip to content

Reward

Coleman supports reward formulations inspired by TCP, ranking, and cost-aware literature:

  • RNFail (binary fault signal)
  • TimeRank (failure-aware order-sensitive reward)
  • ReciprocalRank (inverse-rank gain)
  • TopKRNFail (prefix-constrained binary reward; precision-style)
  • DiscountedFailure (DCG-like logarithmic discount)
  • APFDc (cost-aware reward using execution time and detected failure positions)

Literature references:

  • Spieker, H.; Gotlieb, A.; Marijan, D.; Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. ISSTA.
  • Jarvelin, K.; Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM TOIS.
  • Rothermel, G.; Untch, R. H.; Chu, C.; Harrold, M. J. (2001). Prioritizing test cases for regression testing. IEEE TSE.
  • Elbaum, S.; Malishevsky, A. G.; Rothermel, G. (2002). Test case prioritization: a family of empirical studies. IEEE TSE.
  • Manning, C. D.; Raghavan, P.; Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

coleman.reward

Reward functions for bandit-based test case prioritization.

Defines reward functions for agents in a multi-armed bandit framework in the context of software testing. These reward functions help agents to prioritize software test cases based on various strategies.

The module provides an abstract base class Reward that serves as a blueprint for all reward functions. Derived classes implement specific reward strategies based on the number of failures and the order of test cases.

Classes:

Name Description
Reward

An abstract base class that defines the structure and interface of a reward function.

TimeRankReward

A reward function that considers the order of test cases and the number of failures.

RNFailReward

A reward function that rewards based on the number of failures associated with test cases.

ReciprocalRankReward

A reward that gives inverse-rank gain to failing tests.

TopKRNFailReward

A binary top-k reward that estimates failure rate among first k tests.

DiscountedFailureReward

A logarithmically discounted rank gain for failing tests (DCG-like).

Notes

Reward functions are essential components of the bandit-based test case prioritization framework. They guide agents to make better decisions about which test cases to prioritize. Ensure that the evaluation metric provides necessary details like detection ranks for the reward functions to work correctly.

References
  • Spieker, H.; Gotlieb, A.; Marijan, D.; Mossige, M. (2017). Reinforcement Learning for Automatic Test Case Prioritization and Selection in Continuous Integration. ISSTA.
  • Jarvelin, K.; Kekalainen, J. (2002). Cumulated gain-based evaluation of IR techniques. ACM TOIS, 20(4), 422-446.

APFDcReward

Bases: Reward

Reward failing tests by their APFDc contribution.

The reward for each failing test at rank r is its APFDc contribution, normalized by total execution cost and the total number of failing tests. Non-failing tests receive 0.

__str__

__str__()

Return a string representation of the reward function.

evaluate

evaluate(reward, last_prioritization)

Evaluate APFDc-style rewards using stored test execution costs.

Parameters:

Name Type Description Default
reward EvaluationMetric

Evaluation metric containing detection ranks and testcase costs.

required
last_prioritization list of str

Test case names in prioritization order.

required

Returns:

Type Description
list of float

Reward values aligned with last_prioritization.

get_name

get_name()

Return the identifier of the reward function.

DiscountedFailureReward

Bases: Reward

Reward failures with logarithmic discount by rank.

For each failing test at rank r (1-indexed), the reward is:

.. math:: gain(r) = 1 / \log_2(r + 1)

Non-failing tests receive 0.

__str__

__str__()

Return a string representation of the reward function.

evaluate

evaluate(reward, last_prioritization)

Evaluate discounted rewards for failing positions.

Parameters:

Name Type Description Default
reward EvaluationMetric

Evaluation metric containing detection ranks.

required
last_prioritization list of str

Test case names in prioritization order.

required

Returns:

Type Description
list of float

Discounted reward values aligned with last_prioritization.

get_name

get_name()

Return the identifier of the reward function.

RNFailReward

Bases: Reward

Reward Based on Failures (RNFail).

This reward function is based on the number of failures associated with test cases t' in T': 1 if t' failed; 0 otherwise.

__str__

__str__()

Return a string representation of the reward function.

Returns:

Type Description
str

The reward function name.

evaluate

evaluate(reward, last_prioritization)

Evaluate rewards based on failures.

Parameters:

Name Type Description Default
reward EvaluationMetric

Evaluation metric containing detection ranks and scheduled test cases.

required
last_prioritization list of str

Test case names in prioritization order.

required

Returns:

Type Description
list of float

List of rewards for each test case in the prioritization.

get_name

get_name()

Return the identifier of the reward function.

Returns:

Type Description
str

The reward function identifier.

ReciprocalRankReward

Bases: Reward

Reciprocal-Rank reward.

Rewards failing tests by the inverse of their rank, following a classic information-retrieval signal that strongly favors earlier detections.

__str__

__str__()

Return a string representation of the reward function.

evaluate

evaluate(reward, last_prioritization)

Evaluate rewards based on reciprocal failing ranks.

Parameters:

Name Type Description Default
reward EvaluationMetric

Evaluation metric containing detection ranks.

required
last_prioritization list of str

Test case names in prioritization order.

required

Returns:

Type Description
list of float

Reciprocal-rank reward values aligned with last_prioritization.

get_name

get_name()

Return the identifier of the reward function.

Reward

Bases: ABC

Abstract base class for reward functions.

A reward function is used by the agent in the observe method to evaluate bandit results and return a reward.

evaluate abstractmethod

evaluate(reward, last_prioritization)

Evaluate a bandit result and return a reward.

Parameters:

Name Type Description Default
reward EvaluationMetric

The evaluation metric result.

required
last_prioritization list of str

The last prioritized test suite list.

required

Returns:

Type Description
list of float

The computed rewards for each test case.

get_name

get_name()

Retrieve the name or identifier of the reward function.

Returns:

Type Description
str

The name or identifier of the reward function.

TimeRankReward

Bases: Reward

Time-ranked Reward (TimeRank).

This reward function explicitly includes the order of test cases and rewards each test case based on its rank in the test schedule and whether it failed. As a good schedule executes failing test cases early, every passed test case reduces the schedule's quality if it precedes a failing test case. Each test case is rewarded by the total number of failed test cases; for failed test cases it is the same as reward function 'RNFailReward'. For passed test cases, the reward is further decreased by the number of failed test cases ranked after the passed test case to penalize scheduling passing test cases early.

__str__

__str__()

Return a string representation of the reward function.

Returns:

Type Description
str

The reward function name.

evaluate

evaluate(reward, last_prioritization)

Evaluate rewards based on the prioritization rank of test cases.

Parameters:

Name Type Description Default
reward EvaluationMetric

The evaluation metric containing detection ranks and scheduled test cases.

required
last_prioritization list of str

The list of test case names in the prioritization order.

required

Returns:

Type Description
list of float

A list of rewards for each test case in the prioritization.

get_name

get_name()

Return the identifier of the reward function.

Returns:

Type Description
str

The reward function identifier.

TopKRNFailReward

Bases: Reward

Top-k binary failure reward.

Considers only whether a test failed (binary signal) inside the first top_k positions. Each failing test within top-k receives 1 / k_eff, where k_eff = min(top_k, len(last_prioritization)). The reward sum over the selected prefix is therefore the failure percentage in top-k.

When use_time_budget is enabled, the prefix is also capped by the number of tests scheduled by the metric under the active time budget.

__init__

__init__(top_k=6, use_time_budget=False)

Initialize the reward with a top-k cutoff.

__str__

__str__()

Return a string representation of the reward function.

evaluate

evaluate(reward, last_prioritization)

Evaluate top-k binary rewards.

Parameters:

Name Type Description Default
reward EvaluationMetric

Evaluation metric containing detection ranks.

required
last_prioritization list of str

Test case names in prioritization order.

required

Returns:

Type Description
list of float

Reward vector aligned to last_prioritization.

get_name

get_name()

Return the identifier of the reward function.