Identifying Common Comparators for Indirect Drug Comparisons: A Strategic Guide for Researchers and HTA Submissions

Elizabeth Butler Dec 02, 2025 577

This article provides a comprehensive guide for researchers and drug development professionals on the critical process of identifying and justifying common comparators for Indirect Treatment Comparisons (ITCs).

Identifying Common Comparators for Indirect Drug Comparisons: A Strategic Guide for Researchers and HTA Submissions

Abstract

This article provides a comprehensive guide for researchers and drug development professionals on the critical process of identifying and justifying common comparators for Indirect Treatment Comparisons (ITCs). With head-to-head clinical trial data often unavailable, ITCs are essential for demonstrating relative treatment efficacy and safety to regulatory and Health Technology Assessment (HTA) bodies. The content covers foundational ITC methodologies, strategic application and selection of comparators, solutions for common challenges like cross-trial heterogeneity, and the validation of ITC findings against current HTA and regulatory standards. This guide synthesizes recent trends and methodological advancements to support robust evidence generation for healthcare decision-making.

The Critical Role of Common Comparators in Indirect Treatment Comparisons

Defining Indirect Treatment Comparisons (ITCs) and Their Importance in Drug Development

In the realm of evidence-based medicine and health technology assessment (HTA), Indirect Treatment Comparisons (ITCs) have emerged as a critical methodological approach for evaluating the relative efficacy and safety of therapeutic interventions when direct head-to-head evidence is unavailable. ITCs are defined as statistical techniques that compare treatment effects and estimate relative treatment effects between interventions that have not been studied directly within a single randomized controlled trial (RCT) [1]. These methods utilize a common comparator as an analytical anchor—typically a standard treatment, placebo, or active control—to facilitate comparisons between treatments that lack direct trial evidence [2] [3]. The fundamental premise of ITC is built upon the principle of transitivity: if Treatment A has been compared to Treatment C in one trial, and Treatment B has been compared to the same Treatment C in another trial, then Treatments A and B can be indirectly compared through their common relationship with Treatment C [2] [4].

The importance of ITCs in contemporary drug development continues to grow substantially, particularly in therapeutic areas such as oncology and rare diseases where conducting comprehensive head-to-head trials against all relevant comparators is often impractical, ethically challenging, or economically unviable [1] [5]. A recent targeted review of global oncology drug submissions found that among 185 assessment documents, there were 188 unique submissions supported by a total of 306 ITCs, demonstrating the extensive adoption of these methodologies in regulatory and reimbursement decision-making [1] [6]. Furthermore, ITCs in orphan drug submissions were associated with a higher likelihood of positive decisions compared to non-orphan submissions, highlighting their particular value in addressing evidence challenges for rare diseases [1].

The Growing Necessity of ITCs in Modern Drug Development

Limitations of Direct Evidence Generation

The gold standard for establishing comparative treatment efficacy remains the randomized controlled trial (RCT) with direct head-to-head comparison [5]. However, numerous practical constraints limit the feasibility of direct comparisons in modern drug development. Ethical considerations often prevent researchers from comparing patients directly to inferior treatments or placebo, especially in oncology and rare diseases with life-threatening conditions [1] [5]. Economic and logistical challenges further complicate direct evidence generation, as conducting RCTs against every potential comparator across multiple jurisdictions proves impractical [1]. The selection of appropriate comparators varies significantly across different healthcare systems and countries, making it economically unviable for manufacturers to conduct head-to-head trials for each potential comparator in every market [2]. Additionally, statistical feasibility diminishes as the required effective sample size increases with each additional intervention compared [1].

Quantitative Evidence of ITC Adoption

Table 1: Prevalence of ITCs in Recent Oncology Drug Submissions (2021-2023)

Authority	Documents with ITCs	Unique Submissions	Supporting ITCs
EMA (Regulatory)	33	33	42
CDA-AMC (Canada)	56	56	Not specified
PBAC (Australia)	46	46	Not specified
G-BA (Germany)	40	40	Not specified
HAS (France)	10	10	Not specified
Total	185	188	306

Source: [1] [6]

The use of ITCs has increased significantly in recent years, with health technology assessment bodies worldwide accepting evidence from ITCs to inform reimbursement recommendations and pricing decisions [1] [7]. A comprehensive review identified 68 ITC guidelines from 10 authorities worldwide, with many updated within the last five years to incorporate more complex ITC techniques, reflecting the rapidly evolving methodology and growing acceptance of these approaches [8]. The guidelines commonly cite the absence of direct comparative studies as the primary justification for using ITCs, with most jurisdictions favoring population-adjusted or anchored ITC techniques over naïve comparisons [8].

Fundamental Methodologies and Classifications

Core ITC Techniques and Their Applications

ITC methodologies can be broadly categorized into several distinct techniques, each with specific applications, strengths, and limitations. A systematic literature review identified seven primary ITC techniques reported in the literature, with network meta-analysis (NMA) being the most frequently described method (79.5% of included articles) [5].

Table 2: Overview of Primary ITC Methodologies

ITC Method	Description	Key Assumptions	Applications	Strength	Limitations
Bucher Method	Pairwise comparisons through common comparator	Constancy of relative effects (homogeneity, similarity)	Pairwise indirect comparisons	Simple approach for connected networks	Limited to comparisons with common comparator
Network Meta-Analysis (NMA)	Multiple interventions compared simultaneously	Constancy of relative effects (homogeneity, similarity, consistency)	Multiple indirect comparisons or ranking	Simultaneous comparison of multiple treatments	Complex, with challenging assumptions to verify
Matching-Adjusted Indirect Comparison (MAIC)	Propensity score weighting IPD to match aggregate data	Constancy of relative or absolute effects	Studies with population heterogeneity, single-arm studies	Adjusts for population imbalances	Limited to pairwise ITC, requires IPD
Simulated Treatment Comparison (STC)	Predicts outcomes using regression models based on IPD	Constancy of relative or absolute effects	Considerable population heterogeneity, single-arm studies	Adjusts for covariate differences	Limited to pairwise ITC, complex modeling
Network Meta-Regression (NMR)	Regression techniques to explore covariate impact	Conditional constancy of relative effects with shared effect modifier	Multiple ITC with connected network to investigate effect modifiers	Explores covariate effects on treatment outcomes	Not suitable for multiarm trials

Sources: [7] [5]

Key Methodological Assumptions

The validity of ITC findings depends on several critical assumptions that must be carefully evaluated in any indirect comparison. The homogeneity assumption refers to the equivalence of trials within each pairwise comparison in the network, which can be assessed quantitatively using statistics like the I-squared statistic [4]. Transitivity (or similarity) concerns the validity of making indirect comparisons and requires that trials are sufficiently similar with respect to potential effect modifiers [4]. This assumption must be evaluated qualitatively by carefully reviewing trial characteristics, including study design, patient populations, and outcome measurements [2] [4]. Consistency refers to the agreement between direct and indirect evidence when both are available, which can be assessed quantitatively in connected networks [4]. Violations of these assumptions can introduce bias and uncertainty into ITC results, potentially compromising their validity for decision-making [2].

Experimental Protocols and Implementation

Protocol for Matching-Adjusted Indirect Comparison (MAIC)

MAIC has become increasingly prominent, particularly for comparisons involving single-arm trials, which are common in oncology and rare diseases [5]. The methodology is applied in both anchored scenarios (with a common comparator) and unanchored scenarios (without a common comparator) [9].

Objective: To estimate the relative treatment effect between Treatment A and Treatment B when IPD is available for Treatment A but only aggregate data (AgD) is available for Treatment B, adjusting for imbalances in effect modifiers between studies.

Materials and Requirements:

Individual patient data (IPD) from the index trial (Treatment A)
Aggregate data (AgD) from the comparator trial (Treatment B)
Statistical software capable of propensity score weighting (e.g., R, Python)
Pre-specified list of effect modifiers and prognostic variables

Procedure:

Identify Effect Modifiers: Select baseline variables that are believed to modify treatment effect, based on clinical knowledge and preliminary analyses.
Calculate Weights: Using the IPD, estimate weights for each patient such that the weighted distribution of effect modifiers in the IPD sample matches the aggregate distribution from the comparator trial.
Assess Balance: Check that the weighting successfully achieves balance in the effect modifiers between the weighted IPD sample and the AgD sample.
Estimate Adjusted Treatment Effect: Estimate the treatment effect for A vs. common comparator C using the weighted IPD.
Indirect Comparison: Compare the adjusted effect of A vs. C with the effect of B vs. C from the AgD to obtain the indirect estimate of A vs. B.

Methodological Considerations:

MAIC is limited to pairwise comparisons and can only adjust for differences in observed effect modifiers with available data [7]
The method cannot adjust for differences in treatment administration, co-treatments, or treatment switching [7]
Recent research has identified the "MAIC paradox," where different sponsors analyzing the same data can reach conflicting conclusions due to implicitly targeting different populations [9]

Protocol for Network Meta-Analysis

NMA represents the most comprehensive ITC approach, enabling simultaneous comparison of multiple treatments while combining direct and indirect evidence [4].

Objective: To synthesize evidence from a network of randomized trials comparing multiple interventions and provide estimates of all pairwise relative treatment effects.

Materials and Requirements:

Systematic review of all relevant RCTs for the condition and treatments of interest
Data extraction for study characteristics, patient demographics, and outcomes
Statistical software for Bayesian or frequentist NMA (e.g., WinBUGS, R, Stata)
Assessment tools for risk of bias and study quality

Procedure:

Network Specification: Define the network structure, identifying all direct comparisons available.
Model Selection: Choose between fixed-effect and random-effects models based on assessment of heterogeneity.
Model Implementation: Fit the NMA model using either frequentist or Bayesian approaches.
Assessment of Assumptions: Evaluate homogeneity, transitivity, and consistency assumptions.
Results Synthesis: Generate estimates of all pairwise comparisons with measures of uncertainty.

Analytical Considerations:

Bayesian framework is preferred when source data are sparse [7]
Multiarm trials can be managed within a frequentist framework [7]
Consistency between direct and indirect evidence should be evaluated using methods such as node-splitting [4]

Visualization of ITC Concepts and Methodologies

Basic ITC Network Structure

Basic ITC Structure: This diagram illustrates the fundamental concept of indirect treatment comparison, where Treatments A and B are compared indirectly through their common relationship with Comparator C.

Complex Network Meta-Analysis Structure

Complex NMA Network: This expanded network demonstrates how multiple treatments can be connected through both direct comparisons (solid lines) and indirect comparisons (dashed red lines), forming the basis for network meta-analysis.

The Researcher's Toolkit: Essential Components for ITC Analysis

Table 3: Essential Research Reagents and Tools for ITC Implementation

Tool Category	Specific Tools/Techniques	Function/Purpose
Data Requirements	Individual Patient Data (IPD)	Enables patient-level adjustment methods like MAIC and STC
	Aggregate Data (AgD)	Essential for all ITC methods; typically extracted from publications
Statistical Software	R Statistical Environment	Implementation of various ITC packages (e.g., gemtc, netmeta)
	Bayesian Analysis Tools (WinBUGS, Stan)	Essential for complex Bayesian NMA models
	Python with relevant libraries	Alternative environment for statistical analysis
Methodological Frameworks	PRISMA Extension for NMA	Reporting guidelines for network meta-analyses
	ISPOR ITC Good Research Practices	Methodological guidance for conducting ITCs
	Cochrane Risk of Bias Tool	Quality assessment of included studies
Analytical Techniques	Propensity Score Weighting	Core method for MAIC implementation
	Network Meta-Regression	Exploring impact of covariates on treatment effects
	Consistency Assessment Methods	Evaluating agreement between direct and indirect evidence

Sources: [7] [5] [4]

Current Challenges and Methodological Innovations

Despite the growing acceptance and application of ITCs in drug development and health technology assessment, several methodological challenges persist. The "MAIC paradox" recently described in the literature highlights how different sponsors analyzing the same data can reach conflicting conclusions due to implicitly targeting different populations [9]. This paradox emerges when there are imbalances in effect modifiers with different magnitudes of modification across treatments, leading to contradictory conclusions if MAIC is performed with the IPD and AgD swapped between trials [9].

To address these challenges, researchers are developing innovative approaches such as arbitrated indirect treatment comparisons that focus on estimating treatment effects in a common target population, specifically chosen to be the overlap population between trials [9]. This approach requires the involvement of a third-party arbitrator (such as an HTA body) to ensure that MAIC is conducted by both sponsors targeting a common population, thereby resolving the inconsistency in findings [9].

Additionally, assessment of similarity between trials remains a significant challenge in ITC implementation. A review of National Institute for Health and Care Excellence (NICE) technology appraisals found that none incorporated formal methods to determine similarity, instead relying on narrative summaries to assert similarity, often based on a lack of significant differences [10]. This approach leads to uncertainty in appraisals, which is typically resolved through clinical expert input alone [10]. The most promising methods identified include estimation of noninferiority ITCs in a Bayesian framework followed by probabilistic comparison of the indirectly estimated treatment effect against a prespecified noninferiority margin [10].

Indirect Treatment Comparisons have evolved from niche statistical methods to essential components of drug development and health technology assessment, particularly in therapeutic areas where direct head-to-head trials are impractical or unethical. The growing prevalence of ITCs in submissions to regulatory and HTA agencies worldwide demonstrates their increasing importance in contemporary healthcare decision-making [1] [6] [8]. As drug development continues to face challenges of increasing complexity, cost constraints, and ethical considerations, the strategic application of robust ITC methodologies will remain crucial for generating comparative evidence and facilitating patient access to innovative therapies. Future methodological developments will likely focus on addressing current limitations such as the MAIC paradox and establishing more formal approaches for assessing similarity and equivalence in indirect comparisons [9] [10].

Why Common Comparators Are the Linchpin of Valid Anchored ITCs

In the realm of evidence synthesis for health technology assessment, indirect treatment comparisons (ITCs) have become indispensable tools when head-to-head randomized controlled trials are unavailable or infeasible [5] [6]. Among various ITC methodologies, anchored indirect comparisons stand apart as the most methodologically robust approach, with their validity critically dependent on the presence of a common comparator [11] [12]. This technical guide examines the foundational role of common comparators as the linchpin of valid anchored ITCs, framing this discussion within broader research on identifying common comparators for indirect drug comparisons.

A common comparator—typically a standard care, placebo, or active control treatment—serves as the statistical anchor that connects otherwise disconnected evidence from separate clinical trials [2]. By providing a bridge between studies that would otherwise remain isolated islands of evidence, the common comparator enables analysts to respect the randomization within trials while making comparisons across them [12]. This anchoring function is not merely a statistical convenience but a fundamental requirement for minimizing bias and producing reliable estimates of relative treatment effects in connected evidence networks [7] [11].

Fundamental Concepts and Definitions

What Constitutes an Anchored Indirect Treatment Comparison

An anchored indirect treatment comparison is a statistical methodology that enables the estimation of relative treatment effects between two interventions that have not been compared directly within the same randomized trial, but that have each been studied against a common comparator in separate trials [11] [2]. The conceptual framework is elegantly simple: if Treatment A has been compared to Common Comparator C in one trial, and Treatment B has been compared to the same Common Comparator C in another trial, then the relative effect of A versus B can be indirectly estimated through their respective effects versus C [2].

This approach stands in direct contrast to unanchored comparisons, which attempt to compare treatments across studies without a common reference point [11] [12]. The critical distinction lies in the strength of the underlying assumptions: anchored comparisons require only the conditional constancy of relative effects, whereas unanchored comparisons require the much stronger and often untenable assumption of conditional constancy of absolute effects [11] [12].

The Statistical Foundation of Anchoring

The statistical basis for anchored ITCs rests on the preservation of within-trial randomization [12]. In a direct randomized trial comparing A versus C, randomization ensures that both measured and unmeasured confounding factors are balanced between treatment arms, providing an unbiased estimate of the A-C treatment effect. Similarly, in a separate trial comparing B versus C, randomization ensures valid estimation of the B-C effect. The anchored ITC preserves this randomization benefit by using only the within-trial relative effects (A-C and B-C) to derive the indirect comparison (A-B), rather than comparing absolute outcomes across studies [12].

The fundamental algebra of the standard anchored ITC (often called the Bucher method) for a simple three-treatment network is straightforward [7] [12]. For a chosen outcome measure on an appropriate scale (e.g., log odds ratio, mean difference), the indirect estimate of the A versus B effect is derived as:

d_AB = d_AC - d_BC

Where d_AC represents the relative effect of A versus C, and d_BC represents the relative effect of B versus C [12]. This calculation can be visualized as removing the common comparator C from the comparison, leaving the indirect A-B effect.

Methodological Framework: How Common Comparators Enable Valid Inference

Key Assumptions Underpinning Anchored ITCs

The validity of anchored indirect comparisons depends on three critical assumptions that must be rigorously assessed during the analysis [7] [12]:

Homogeneity: This assumption requires that the relative treatment effect between each intervention and the common comparator is similar across different studies of the same comparison. Significant heterogeneity suggests effect modification that may bias the indirect comparison.
Similarity (Transitivity): This fundamental assumption requires that the studies included in the evidence network are sufficiently similar in their methodological characteristics (e.g., patient populations, outcome definitions, treatment protocols) that comparing their results is clinically meaningful [2]. Violations of similarity threaten the validity of any cross-study comparison.
Consistency: This assumption requires that the direct and indirect evidence are in agreement where they exist. In a network where both direct comparisons of A versus B and indirect comparisons through C are available, consistency means these estimates agree within random error.

The following table summarizes how these assumptions differ between standard and population-adjusted anchored ITCs:

Table 1: Key Assumptions for Anchored Indirect Treatment Comparisons

Method Type	Constancy Assumption	Valid Only If	Data Requirements
Standard Anchored ITC	Constancy of relative effects	No effect modifiers are imbalanced between studies	Aggregate data from all studies
Population-Adjusted Anchored ITC	Conditional constancy of relative effects	All effect modifiers are known and adjusted for	Individual patient data (IPD) from at least one trial plus aggregate data from others

The Evidence Network Structure

The common comparator creates what methodologists term a connected evidence network [11] [12]. In a connected network, all treatments can be linked through a pathway of direct comparisons, enabling the estimation of relative effects between any two treatments in the network. The common comparator serves as the anchor point that connects different segments of the evidence base.

More complex networks may include multiple common comparators and both direct and indirect evidence, leading to network meta-analysis (NMA), which extends the principles of simple anchored ITCs to larger evidence networks [5] [7]. In such networks, the common comparators become the connecting nodes that enable simultaneous comparison of multiple treatments.

Table 2: Prevalence of Different ITC Methods in Recent Submissions

ITC Method	Description	Prevalence in Recent Submissions	Key Features
Network Meta-Analysis	Extension of anchored ITC to multiple treatments	79.5% of methodological articles [5]	Most frequently described technique; allows multiple treatment comparisons
Bucher Method	Standard anchored indirect comparison	23.3% of methodological articles [5]	Foundation of all anchored ITCs; limited to pairwise comparisons through common comparator
Matching-Adjusted Indirect Comparison	Population-adjusted anchored method	30.1% of methodological articles [5]; 69.2% of recent articles focus on population-adjusted methods [5]	Uses IPD to match aggregate data population; requires common comparator for anchoring
Simulated Treatment Comparison	Regression-based population adjustment	21.9% of methodological articles [5]	Uses outcome models to predict outcomes in target population; requires common comparator

Experimental and Analytical Protocols

Implementing the Bucher Method

The standard protocol for conducting an anchored ITC using the Bucher method involves these critical steps [7] [12]:

Define the Research Question: Precisely specify the target population, interventions of interest, common comparator, and outcomes. This definition drives all subsequent methodological choices.
Systematic Literature Review: Identify all relevant studies comparing the interventions of interest with the common comparator using comprehensive, reproducible search strategies.
Assess Similarity and Transitivity: Evaluate whether the included studies are sufficiently similar in their patient characteristics, methodologies, and definitions to permit meaningful comparison.
Extract or Estimate Relative Effects: For each study, extract the relative effect of each intervention versus the common comparator on an appropriate scale (e.g., log odds ratios, hazard ratios, mean differences).
Check for Heterogeneity: Assess statistical heterogeneity within each comparison (e.g., using I² statistic) and investigate potential sources of heterogeneity when present.
Combine Evidence Using the Bucher Formula: Calculate the indirect estimate using the algebraic approach described previously, with appropriate attention to variance estimation.
Assess Consistency (if possible): If both direct and indirect evidence are available, statistically assess their consistency using node-splitting or other appropriate methods.

Workflow for Anchored Indirect Comparison

The following diagram illustrates the complete analytical workflow for conducting a valid anchored indirect treatment comparison:

Impact on Healthcare Decision-Making

Regulatory and HTA Acceptance

Anchored ITCs have gained significant traction in regulatory and health technology assessment (HTA) submissions worldwide [6]. A recent review of oncology drug submissions from 2021-2023 found that authorities more frequently favored anchored or population-adjusted ITC techniques for their effectiveness in data adjustment and bias mitigation compared to unadjusted methods [6]. This preference reflects the recognized methodological rigor that common comparators bring to indirect comparisons.

The impact of anchored ITCs extends to orphan drug submissions, where these methods more frequently led to positive decisions compared to non-orphan submissions [6]. This is particularly significant given the ethical and practical challenges of conducting direct comparative trials in rare diseases.

Limitations and Methodological Challenges

Despite their utility, anchored ITCs face several important limitations that researchers must acknowledge:

Limited Common Comparators: In rapidly evolving therapeutic areas, standard of care changes quickly, making historical common comparators less relevant to current decision contexts [11].
Effect Modifier Imbalance: Even with a common comparator, imbalances in effect modifiers across studies can bias results, necessitating population adjustment methods like MAIC or network meta-regression [7] [11].
Complexity in Larger Networks: While this guide focuses on simple three-treatment networks, real-world applications often involve multiple comparators and complex connections, requiring sophisticated network meta-analysis approaches [5] [7].

Table 3: Key Methodological Tools for Anchored Indirect Comparisons

Tool/Resource	Function	Application Context
PRISMA Guidelines	Standardized reporting of systematic reviews	Ensuring comprehensive literature identification and study selection
Bucher Method	Statistical foundation for indirect comparison	Calculating indirect treatment effects through common comparator
I² Statistic	Quantifying statistical heterogeneity	Assessing consistency of treatment effects across studies
Node-Split Analysis	Testing consistency assumption	Evaluating agreement between direct and indirect evidence
Network Meta-Regression	Adjusting for effect modifiers	Addressing heterogeneity when IPD is unavailable
Matching-Adjusted Indirect Comparison	Population adjustment using IPD	Balancing covariate distributions across studies when IPD is available for one trial

The common comparator remains the indispensable foundation for valid anchored indirect treatment comparisons, providing the statistical and methodological anchor that enables reliable estimation of relative treatment effects when direct evidence is unavailable [11] [12]. Through the preservation of within-trial randomization and the enabling of connected evidence networks, common comparators allow researchers to extend inference beyond the confines of individual studies while maintaining methodological rigor.

As therapeutic landscapes continue to evolve and decision-makers demand increasingly sophisticated evidence, the strategic identification and application of common comparators will remain central to comparative effectiveness research [6]. Future methodological developments will likely focus on enhancing population adjustment methods, handling complex treatment networks, and developing robust approaches for dynamic evidence ecosystems where common comparators may change over time. Through continued refinement of these methodologies, anchored ITCs will maintain their critical role in informing healthcare decisions across the drug development lifecycle.

Indirect Treatment Comparisons (ITCs) are statistical methodologies used to compare the effects of two or more treatments when direct, head-to-head evidence from randomized controlled trials (RCTs) is unavailable or limited [5] [13]. These methods have become increasingly important in health technology assessment (HTA) and drug development, providing crucial evidence for decision-makers when direct comparisons are unethical, unfeasible, or impractical to conduct [5]. The fundamental principle underlying ITCs is the use of a common comparator to facilitate indirect inferences about the relative efficacy and safety of interventions that have not been studied directly against each other [13].

The growing importance of ITCs is reflected in their adoption by HTA agencies worldwide, though acceptability remains contingent on appropriate methodology and transparent reporting [5]. In therapeutic areas such as oncology and rare diseases, where single-arm trials are increasingly common, ITCs provide valuable comparative evidence that would otherwise be unavailable [5]. This technical guide provides a comprehensive overview of core ITC methods, from established approaches like the Bucher method to advanced population-adjusted techniques, with particular emphasis on their application in identifying and utilizing common comparators for indirect drug comparisons research.

Fundamental ITC Methods

The Bucher Method

The method of adjusted indirect comparison as described by Bucher et al. represents the foundational approach for simple indirect comparisons involving three interventions [13]. This method is applicable when there is no direct evidence comparing interventions A and B, but both have been studied against a common comparator C. The relative effect of B versus A is estimated indirectly using the direct estimators for the effects of C versus A and C versus B [13].

For absolute effect measures (e.g., mean differences, risk differences), the indirect estimate is calculated as:

effect_AB = effect_AC - effect_BC

The variance of this indirect estimator is the sum of the variances of the two direct estimators:

variance_AB = variance_AC + variance_BC

The corresponding 95% confidence interval can then be calculated using the standard formula:

effect_AB ± Z_0.975 * √(variance_AB)

where Z_0.975 refers to the 97.5% quantile of the standard normal distribution (approximately 1.96) [13]. For relative effect measures (e.g., odds ratios, relative risks), this additive relationship holds true only on a logarithmic scale, requiring appropriate transformation before analysis [13].

Network Meta-Analysis

Network meta-analysis (NMA) extends the principles of the Bucher method to more complex networks involving multiple treatments and comparisons [5] [13]. As the most frequently described ITC technique (covered in 79.5% of included articles in a recent systematic review), NMA allows for the simultaneous comparison of multiple interventions by combining direct and indirect evidence across a connected network of trials [5]. This approach provides effect estimates for all possible pairwise comparisons within the network, even when some pairs have never been compared directly in primary studies [13].

NMA can be conducted using either frequentist or Bayesian frameworks, with Bayesian approaches often employing Markov Chain Monte Carlo methods for model estimation [13]. The validity of NMA depends on three key assumptions: similarity (trials must be comparable in terms of potential effect modifiers), homogeneity (no relevant heterogeneity between trial results in pairwise comparisons), and consistency (no relevant discrepancy between direct and indirect evidence) [13].

Table 1: Core Assumptions for Valid Indirect Treatment Comparisons

Assumption	Description	Evaluation Methods
Similarity	Trials must be comparable in terms of potential effect modifiers (e.g., trial or patient characteristics)	Comparison of study design, patient characteristics, outcome definitions, and other potential effect modifiers across trials
Homogeneity	No relevant heterogeneity between trial results in pairwise comparisons	Statistical tests for heterogeneity (I² statistic, Q statistic), visual inspection of forest plots
Consistency	No relevant discrepancy between direct and indirect evidence	Statistical tests for inconsistency (node-splitting, design-by-treatment interaction model), comparison of direct and indirect estimates

Population-Adjusted Indirect Comparison Methods

Matching-Adjusted Indirect Comparison (MAIC)

Matching-Adjusted Indirect Comparison is a population adjustment method that uses individual patient data (IPD) from one trial to create a weighted sample that matches the aggregate baseline characteristics of another trial [12] [5]. MAIC employs a method of moments approach to estimate weights for each patient in the IPD trial such that the weighted sample matches the aggregate moments (e.g., means and proportions) of the comparator trial's baseline characteristics [12]. The premise is to create a pseudo-population from the IPD trial that is similar to the comparator trial population with respect to observed effect modifiers, thus reducing bias due to cross-trial differences in these characteristics [12].

The methodology involves identifying a set of effect-modifying variables, then using propensity score-based weighting techniques to balance these variables across studies [12]. Specifically, the method uses logistic regression to estimate weights that achieve balance on the selected baseline characteristics between the IPD population and the aggregate population of the comparator trial [12]. Once the weights are applied, the outcomes of different treatments can be compared across the balanced trial populations [14].

MAIC is particularly useful in anchored comparison scenarios where both treatments have been compared against a common comparator, but there are imbalances in effect modifiers between trials [12]. The method can only adjust for observed effect modifiers and cannot account for differences in unobserved variables [12].

Simulated Treatment Comparison (STC)

Simulated Treatment Comparison is another population adjustment method that uses IPD from one trial to model the outcome of interest as a function of baseline characteristics and treatment [12] [5]. Unlike MAIC, which focuses on reweighting, STC uses regression adjustment to account for differences in effect modifiers between trials [12].

The STC methodology involves developing a regression model using the IPD trial that includes treatment, effect-modifying covariates, and treatment-covariate interactions [12]. This model is then applied to the aggregate baseline characteristics of the comparator trial to predict what the outcome would have been if the patients in the comparator trial had received the treatment from the IPD trial [12]. The predicted outcomes are subsequently used to generate an adjusted treatment effect comparison [12].

STC relies on the "shared effect modifier" assumption, which posits that the relationship between effect modifiers and treatment effect is consistent across studies [12]. This assumption is necessary to transport the interaction effects estimated from the IPD trial to the population of the comparator trial.

Anchored versus Unanchored Comparisons

A critical distinction in population-adjusted ITCs is between anchored and unanchored comparisons [12]. Anchored comparisons utilize a common comparator arm shared between studies, thus respecting the within-trial randomization and providing a more reliable basis for inference [12]. In contrast, unanchored comparisons lack a common comparator and therefore require much stronger assumptions that are often difficult to justify [12].

Unanchored comparisons essentially assume that all prognostic variables and effect modifiers have been identified and adequately adjusted for, an assumption that is widely regarded as infeasible in most practical scenarios [12]. Consequently, anchored comparisons should always be preferred when the evidence network contains a common comparator [12]. Unanchored comparisons are generally reserved for situations where the treatment network is disconnected or contains single-arm studies, eliminating the possibility of using a common comparator [12].

Table 2: Comparison of Population-Adjusted Indirect Comparison Methods

Feature	MAIC	STC
Methodological Foundation	Propensity score reweighting	Regression adjustment
Data Requirements	IPD from one trial, aggregate data from another	IPD from one trial, aggregate data from another
Adjustment Approach	Reweighting IPD to match aggregate baseline characteristics of comparator trial	Modeling outcome as function of baseline characteristics and treatment
Key Assumption	Adequate balance on observed effect modifiers eliminates bias	Consistent relationship between effect modifiers and treatment effect across studies
Strengths	Does not require explicit outcome model; relatively straightforward implementation	More efficient use of data when model is correctly specified
Limitations	Can only adjust for observed effect modifiers; may increase variance due to extreme weights	Relies on correct model specification; susceptible to extrapolation

Methodological Workflows and Implementation

Workflow for Population-Adjusted Indirect Comparisons

The following diagram illustrates the general workflow for conducting population-adjusted indirect comparisons, highlighting key decision points and methodological considerations:

Comparator Selection Framework

Selecting appropriate comparators is a critical element in designing valid indirect comparisons. The following workflow outlines a systematic approach to comparator selection, emphasizing empirical assessment of candidate comparators:

Essential Research Reagents and Methodological Tools

Table 3: Essential Methodological Tools for Indirect Treatment Comparisons

Tool Category	Specific Methods/Techniques	Function/Purpose
Statistical Software	R, Python, SAS, WinBUGS/OpenBUGS	Implementation of statistical models for ITC, NMA, and population-adjusted methods
Specialized Packages	gemtc, netmeta, pcnetmeta (R); NetworkMetaAnalysis (Python)	Bayesian and frequentist implementation of network meta-analysis models
Data Requirements	Individual Patient Data (IPD), Aggregate Data (AD)	IPD enables population-adjusted methods; AD sufficient for standard ITC/NMA
Similarity Metrics	Cosine similarity, Standardized Mean Differences (SMD), Mahalanobis distance	Quantification of cohort similarity and covariate balance between studies
Model Diagnostics	Leverage plots, residual analysis, inconsistency tests	Evaluation of model fit, identification of outliers, assessment of consistency assumptions
Visualization Tools	Network diagrams, forest plots, rankograms	Communication of network structure, treatment effects, and uncertainty

Current Landscape and Methodological Considerations

Adoption and Reporting Standards

The use of population-adjusted indirect comparisons has increased substantially in recent years, with approximately half of all published articles on this topic appearing since May 2020 [15]. This growth has been particularly prominent in oncologic and hematologic pathologies, which account for 53% of publications [15]. The pharmaceutical industry is involved in the vast majority (98%) of published PAIC studies, reflecting the importance of these methods in market access applications [15].

Despite their increasing adoption, methodological and reporting standards for PAICs remain inconsistent [15]. A comprehensive methodological review found that key methodological aspects were inadequately reported in most publications, with only three articles adequately reporting all prespecified methodological aspects [15]. This reporting gap threatens the reliability and interpretability of PAIC results and represents a significant challenge for the field.

Empirical Comparator Selection Approaches

Recent methodological advances have introduced empirical approaches to comparator selection that leverage large-scale healthcare data to identify optimal comparators based on covariate similarity [16]. These methods generate new user cohorts for drug ingredients or classes, extract aggregated pre-treatment covariate data across clinically relevant domains (demographics, medical history, presentation, prior medications, and visit context), and compute similarity scores between candidate comparators [16].

The cosine similarity metric, calculated as the dot product of two vectors containing target and comparator cohorts' covariate prevalences divided by the product of their lengths, provides a computationally efficient measure of multivariable similarity [16]. When computed separately for each covariate domain and averaged across domains, this approach yields a cohort similarity score that correlates well with established metrics like standardized mean differences and aligns with clinical knowledge and drug classification hierarchies [16].

Addressing Bias and Enhancing Validity

The methodological review of PAICs revealed strong evidence of reporting bias, with 56% of analyses reporting statistically significant benefits for the treatment evaluated using IPD, while only one PAIC significantly favored the treatment evaluated using aggregate data [15]. This striking asymmetry highlights the need for enhanced methodological rigor and transparent reporting in PAIC applications.

To strengthen confidence in PAIC results, researchers should prioritize comprehensive assessment and reporting of key methodological elements, including clear justification of effect modifier selection, detailed description of weighting or modeling approaches, evaluation of underlying assumptions, and thorough sensitivity analyses [12] [15]. Additionally, the development of standardized guidelines for the conduct and reporting of PAICs would represent a significant step toward improving the reliability and interpretability of these methods [15].

Within the framework of evidence-based medicine, indirect treatment comparisons (ITCs) and network meta-analyses (NMAs) have become indispensable tools for evaluating the relative efficacy and safety of multiple interventions, especially when head-to-head randomized controlled trials (RCTs) are unavailable [17] [7]. These methods are central to health technology assessment (HTA) and inform critical healthcare decisions [7]. The validity of any indirect comparison or NMA hinges on fulfilling three fundamental, interrelated assumptions: similarity, homogeneity, and consistency [17] [18]. A thorough understanding of these assumptions is paramount for researchers, scientists, and drug development professionals conducting robust and defensible analyses for drug comparison research.

Core Concepts and Definitions

The Role of a Common Comparator

Indirect comparisons and NMA rely on a connected network of evidence. The "common comparator" or "anchor" (often a placebo or standard of care) enables indirect estimation of the relative effect between two interventions that have not been directly compared in a trial [17]. For example, if Treatment B and Treatment C have both been compared to Treatment A, their relative effect can be indirectly estimated through the common comparator A [17].

The Triad of Key Assumptions

The validity of these indirect estimations depends on a triad of assumptions [18]:

Similarity (Transitivity): This assumption concerns the entire network. It requires that the trials comparing different sets of interventions are sufficiently similar in all critical characteristics that could influence the treatment effect (effect modifiers) [18]. In essence, it should be theoretically plausible that all treatments could have been included in a single, multi-arm trial.
Homogeneity: This is a within-comparison assumption. For a specific pairwise comparison (e.g., A vs. B), homogeneity means that the true treatment effect is the same across all studies making that direct comparison. A lack of homogeneity is termed heterogeneity [18].
Consistency: This is the agreement between direct and indirect evidence within a network. When both direct and indirect evidence exist for a treatment contrast (e.g., B vs. C), the estimates from these two sources should be in agreement, within the bounds of random error. A lack of consistency is termed inconsistency [17] [18].

The following diagram illustrates the logical relationships between these three core assumptions and the resulting evidence in a network meta-analysis.

The Similarity (Transitivity) Assumption

Definition and Conceptual Framework

The similarity, or transitivity, assumption is the foundational principle that justifies the validity of combining direct and indirect evidence [18]. It posits that the studies forming the network are sufficiently similar in their clinical and methodological characteristics. This extends beyond the PICO (Population, Intervention, Comparator, Outcome) elements to include other potential effect modifiers.

Methodological Protocol for Assessment

Assessing similarity is a qualitative and structured process that should occur before statistical synthesis.

Step 1: Identify Potential Effect Modifiers A potential effect modifier is a variable that influences the magnitude of the relative treatment effect [18]. The following table lists common categories of effect modifiers that must be considered.

Table 1: Key Categories of Potential Effect Modifiers for Similarity Assessment

Category	Examples	Rationale for Assessment
Population	Disease severity, comorbidities, age, gender, prior treatments, genetic markers [18]	Differences in baseline risk can modify the absolute and relative benefit of an intervention.
Intervention	Dosage, formulation (e.g., instant vs. espresso), treatment duration, administration route [18]	Variations in the intervention itself can lead to different treatment responses.
Comparator	Type of control (e.g., placebo vs. active), specific agent used, dosage of comparator	The effect of a new drug may appear different when compared to a strong vs. a weak active control.
Study Design	Trial setting (primary vs. tertiary care, geographic location), blinding, outcome definition and measurement timepoint, risk of bias [18]	Methodological differences can introduce systematic bias or variation in effect estimates.

Step 2: Collect and Tabulate Study Characteristics Systematically extract data on the potential effect modifiers identified in Step 1 from all studies included in the network. Present this data in a structured table to allow for visual comparison across studies and treatment comparisons.

Step 3: Evaluate the Plausibility of Transitivity Critically appraise the compiled data. If the distribution of potential effect modifiers is balanced across the different treatment comparisons, the transitivity assumption is more plausible [18]. For example, one must assess if a common comparator (like "decaf") used in different branches of the network is truly equivalent (e.g., decaffeinated coffee vs. decaffeinated tea) [18].

The Homogeneity Assumption

Definition and Conceptual Framework

Homogeneity is a specific form of the similarity assumption that applies to a single pairwise comparison. It requires that the true underlying treatment effect is the same across all studies directly comparing the same two interventions (e.g., all A vs. B studies) [18]. When this assumption holds, the observed effects from different studies vary only due to random (sampling) error.

Methodological Protocol for Assessment

The assessment of homogeneity involves both statistical and clinical evaluation.

Step 1: Clinical Assessment of Heterogeneity Examine the clinical and methodological characteristics of the studies within the same pairwise comparison (using the table from Similarity Assessment). If studies are clinically diverse, statistical heterogeneity is likely.

Step 2: Statistical Assessment of Heterogeneity Calculate statistical measures of heterogeneity for each pairwise comparison with multiple studies.

Cochran's Q (Chi-Squared) Test: A test for the presence of heterogeneity. A p-value < 0.10 is often used to indicate statistical significance.
I² Statistic: This quantifies the percentage of total variation across studies that is due to heterogeneity rather than chance [18]. It is more useful for interpreting the magnitude of heterogeneity. The following table provides a standard interpretation guide.

Table 2: Interpretation of the I² Statistic for Heterogeneity

I² Value	Interpretation of Heterogeneity
0% to 40%	Might not be important
30% to 60%	May represent moderate heterogeneity
50% to 90%	May represent substantial heterogeneity
75% to 100%	Considerable heterogeneity

Step 3: Investigate and Address Heterogeneity If substantial heterogeneity is detected, investigators should:

Check for data extraction errors.
Explore sources of heterogeneity through subgroup analysis or meta-regression (if sufficient studies are available).
Consider using a random-effects model instead of a fixed-effect model, which incorporates between-study variance into the analysis.
Perform sensitivity analyses by excluding high-risk-of-bias studies or outliers.

The Consistency Assumption

Definition and Conceptual Framework

The consistency assumption requires that the estimates of treatment effect from direct evidence (e.g., from head-to-head trials of B vs. C) and indirect evidence (e.g., from trials of B vs. A and C vs. A) are in agreement for the same comparison [17] [18]. This is the ultimate check on the validity of the transitivity assumption in a closed network.

Methodological Protocol for Assessment

Several statistical methods can be used to evaluate consistency.

Step 1: Design-by-Treatment Interaction Test This is a global test for inconsistency across the entire network. It assesses whether the treatment effects estimated from the network are consistent regardless of the design (set of comparisons) used.

Step 2: Local Tests for Inconsistency: Node-Splitting The node-splitting method is a powerful and widely used technique [18]. It separates the evidence for a particular comparison (the "split node") into its direct and indirect components. It then statistically tests for a difference between the direct estimate and the indirect estimate for that same comparison.

The following diagram illustrates the workflow for assessing inconsistency using the node-splitting method.

Step 3: Investigate and Resolve Inconsistency If significant inconsistency is found:

Re-check Transitivity: Scrutinize the distribution of effect modifiers in the direct and indirect evidence loops. This is the most common cause of inconsistency.
Check for Methodological Biases: Assess if the direct or indirect evidence is at a higher risk of bias.
Use More Complex Models: Employ network meta-regression or models that account for inconsistency to explore its sources.

The Scientist's Toolkit: Essential Reagents for NMA

To conduct a rigorous NMA, researchers require a suite of methodological "reagents" – the essential tools and concepts that facilitate the analysis. The following table details these core components.

Table 3: Essential Methodological Reagents for Network Meta-Analysis

Tool/Concept	Function/Purpose	Key Considerations
Systematic Review	Provides the unbiased and comprehensive evidence base for the NMA [17] [18].	Must be conducted a priori with a pre-specified PICO and search strategy to minimize selection bias.
Risk of Bias Tool (e.g., Cochrane RoB 2.0)	Assesses the internal validity (quality) of individual RCTs [18].	Studies with a high risk of bias can distort network findings; sensitivity analyses are recommended.
Network Geometry Plot	A visual representation of the evidence network, showing treatments (nodes) and direct comparisons (edges) [17].	Allows for quick assessment of the connectedness and completeness of the network. The thickness of edges can represent the number of trials or precision.
Frequentist Framework	A statistical approach for NMA based on p-values and confidence intervals. Implemented in Stata or R (e.g., `netmeta` package) [17].	Well-established and widely understood. Can be less flexible than Bayesian methods with sparse data.
Bayesian Framework	A statistical approach for NMA that uses Markov Chain Monte Carlo (MCMC) simulation. Implemented in OpenBUGS, WinBUGS, or R (e.g., `gemtc` package) [17].	Highly flexible, allows for ranking probabilities, and can handle complex models. Requires careful check of model convergence.
Node-Splitting Method	A statistical technique to test for local inconsistency between direct and indirect evidence for a specific comparison [18].	A crucial diagnostic tool. A significant p-value suggests a violation of the consistency assumption for that loop.

The assumptions of similarity, homogeneity, and consistency form the bedrock of valid and reliable network meta-analysis and indirect treatment comparisons. These assumptions are not merely statistical formalities but are deeply rooted in clinical and methodological reasoning. A robust analysis demands a proactive, multi-faceted approach: a thorough qualitative assessment of study similarities during the protocol stage, followed by rigorous quantitative evaluations of homogeneity and consistency. For drug development professionals and HTA bodies, a transparent and well-documented evaluation of these assumptions is not optional—it is essential for generating credible evidence to inform high-stakes healthcare decisions.

The Growing Demand for ITCs in Regulatory and HTA Submissions

In the contemporary drug development landscape, Indirect Treatment Comparisons (ITCs) have become indispensable tools for demonstrating the relative clinical and economic value of new health technologies. As head-to-head randomized controlled trials (RCTs) are often ethically challenging, economically unviable, or practically impossible – particularly in oncology and rare diseases – healthcare decision-makers increasingly rely on robust ITC methodologies to inform reimbursement and regulatory decisions [5] [6]. The recent implementation of the European Union Health Technology Assessment Regulation (EU HTAR), with its mandatory Joint Clinical Assessments (JCAs), has further amplified the strategic importance of these methodologies by establishing a standardized framework for evaluating comparative clinical effectiveness across member states [19] [20]. This whitepaper provides an in-depth technical examination of the ITC landscape, detailing methodological approaches, implementation protocols, and strategic considerations for successfully navigating evolving evidence requirements within global regulatory and HTA submission pathways.

The Evolving Regulatory and HTA Landscape

The Impact of the EU HTA Regulation

Implemented in January 2025, the EU HTAR establishes a mandatory, unified framework for Joint Clinical Assessments across all member states [19] [21]. This transformative regulation aims to harmonize HTA processes, reduce duplication, and improve patient access to innovative treatments. The JCA process requires health technology developers to submit comprehensive dossiers containing a standardized assessment of relative clinical effectiveness, using the PICO framework (Population, Intervention, Comparator, Outcomes) to structure evidence submissions [20]. For medicinal products, the regulation is being implemented in phases, starting with oncology drugs and advanced therapy medicinal products (ATMPs) in 2025, expanding to orphan medicinal products by 2028, and incorporating all medicinal products by 2030 [19].

A critical challenge within this new framework is the variation in standards of care across EU member states, which leads to diverse comparator choices and population definitions in national PICO frameworks [19]. This variability creates significant evidence generation challenges for manufacturers, who must often rely on ITCs to demonstrate comparative effectiveness against multiple relevant comparators. However, the acceptance of ITC evidence varies considerably among HTA bodies; for example, German HTA bodies have historically rejected approximately 84% of submitted ITCs, while a sample analysis in oncology found an overall acceptance rate of only 30% across five major European markets [2] [19]. This highlights the critical importance of selecting and implementing methodologically robust ITC approaches that can withstand rigorous regulatory scrutiny.

Global Acceptance and Utilization of ITCs

Beyond the EU, ITCs play an increasingly crucial role in healthcare decision-making worldwide. A recent targeted review of oncology drug submissions from 2021-2023 found that ITCs supported 188 unique recommendations across regulatory and HTA bodies, with 306 distinct ITCs referenced in the decision documents [6]. The analysis revealed that authorities more frequently favored anchored or population-adjusted ITC techniques, such as Network Meta-Analysis (NMA) and Matching-Adjusted Indirect Comparison (MAIC), for their effectiveness in data adjustment and bias mitigation compared to naïve or unadjusted comparisons [8] [6]. Furthermore, submissions for orphan drugs incorporating ITCs were more frequently associated with positive decisions compared to non-orphan submissions, underscoring the particular value of these methodologies in disease areas where direct comparative evidence is most scarce [6].

Fundamental Classification of ITC Methods

ITC methodologies can be classified into four primary categories based on their underlying assumptions and the number of comparisons involved. The fundamental assumption of constancy of relative treatment effects (homogeneity and similarity) underpins simpler methods, while more complex approaches accommodate a conditional constancy of effects when effect modifiers are present [7].

Table 1: Fundamental Classification of ITC Methodologies

Method Class	Key Assumptions	Number of Comparisons	Representative Methods
Unadjusted ITCs	Constancy of relative effects	Pairwise	Naïve ITC
Adjusted ITCs	Constancy of relative effects (homogeneity, similarity)	Pairwise	Bucher method
Network Meta-Analyses	Constancy of relative effects (homogeneity, similarity, consistency)	Multiple	Frequentist NMA, Bayesian NMA, Mixed Treatment Comparison
Population-Adjusted ITCs	Conditional constancy of relative effects	Pairwise or Multiple	MAIC, STC, NMR, ML-NMR

Technical Specifications of Primary ITC Methods

Network Meta-Analysis (NMA)

NMA extends standard pairwise meta-analysis to simultaneously compare multiple interventions within a connected network of trials, enabling estimation of relative treatment effects even between interventions that have never been directly compared in clinical trials [7] [5]. The methodology relies on the critical assumption of consistency (also referred to as transitivity), which requires that the direct and indirect evidence estimating the same treatment effect are in agreement [7]. The framework can be implemented through either frequentist or Bayesian approaches, with the latter often preferred when source data are sparse [7]. A 2024 systematic literature review identified NMA as the most frequently described ITC technique, featured in 79.5% of included methodological articles [5].

Population-Adjusted Methods

When heterogeneity exists between trial populations that acts as an effect modifier, population-adjusted ITC methods are necessary to minimize bias. Matching-Adjusted Indirect Comparison (MAIC) utilizes propensity score weighting on individual patient data (IPD) from one trial to match aggregate data from a comparator trial, effectively rebalancing patient characteristics to create a more comparable population [7] [5]. In contrast, Simulated Treatment Comparison (STC) develops an outcome regression model based on IPD from one trial and applies it to the population characteristics of a comparator trial to predict outcomes in the target population [5]. These methods are particularly valuable for single-arm trials in rare disease settings or when substantial population heterogeneity exists across studies [7].

The Bucher Method

The Bucher method (also referred to as adjusted or standard ITC) facilitates pairwise comparisons through a common comparator and represents one of the earliest developed ITC approaches [7] [5]. This frequentist method is limited to simple networks with single common comparators and cannot incorporate evidence from multi-arm trials [7]. Despite these limitations, it remains a widely applied technique, described in 23.3% of methodological articles on ITCs [5].

The following diagram illustrates the strategic decision pathway for selecting an appropriate ITC methodology based on evidence network structure and data availability:

Methodological Implementation and Workflow

Experimental Protocol for Conducting ITCs

Implementing a robust ITC requires a structured, systematic approach to ensure methodological rigor and reproducible results. The following protocol outlines key stages in the ITC development process:

Phase 1: Systematic Literature Review and Feasibility Assessment

Conduct a comprehensive literature search across multiple databases (e.g., PubMed, Embase, Cochrane Central) to identify all relevant RCTs for inclusion [7] [5]
Develop and register a detailed analysis plan specifying primary and secondary outcomes, statistical models, and sensitivity analyses
Assess clinical and methodological homogeneity across trials using the PICO framework to evaluate potential effect modifiers [7] [2]
Determine whether the connectedness assumption is satisfied for the evidence network [7]

Phase 2: Data Extraction and Quality Assessment

Extract data on study characteristics, patient demographics, interventions, comparators, and outcomes using a standardized extraction form
Evaluate risk of bias for individual studies using appropriate tools (e.g., Cochrane Risk of Bias tool)
Assess transitivity assumption by comparing the distribution of potential effect modifiers across treatment comparisons [7]

Phase 3: Statistical Analysis and Model Implementation

Select appropriate statistical model (fixed-effects vs. random-effects) based on heterogeneity assessment
For Bayesian NMA, specify prior distributions and implement using Markov Chain Monte Carlo (MCMC) methods with appropriate convergence diagnostics
For population-adjusted methods, implement propensity score weighting (MAIC) or outcome regression models (STC) to balance population characteristics [7] [5]
Assess consistency assumption using node-splitting or design-by-treatment interaction models [7]

Phase 4: Validation and Sensitivity Analysis

Conduct extensive sensitivity analyses to assess the impact of methodological choices, inclusion criteria, and potential outliers
Evaluate model fit using appropriate statistics (e.g., residual deviance, DIC for Bayesian models)
Validate findings through alternative statistical approaches or scenario analyses

The Scientist's Toolkit: Essential Analytical Components

Successful implementation of ITCs requires specialized methodological expertise and analytical resources. The following table details key components of the ITC research toolkit:

Table 2: Essential Components of the ITC Research Toolkit

Component	Function	Implementation Considerations
Systematic Review Protocol	Identifies all relevant evidence for inclusion	Follow PRISMA guidelines; pre-specify inclusion/exclusion criteria; assess transitivity [5]
Statistical Software Packages	Implements complex statistical models for ITC	R (gemtc, pcnetmeta), SAS, WinBUGS/OpenBUGS, Python; selection depends on frequentist vs. Bayesian approach [7]
Individual Patient Data (IPD)	Enables population-adjusted methods (MAIC, STC)	Often required by HTA bodies for unbiased adjustment; availability may be limited for competitor trials [5]
PICO Framework	Structures clinical questions and evidence assessment	Mandatory for EU JCA submissions; defines populations, interventions, comparators, and outcomes [20]
Consistency Assessment Methods	Evaluates agreement between direct and indirect evidence	Node-splitting approaches; design-by-treatment interaction test; essential for NMA validity [7]

Quantitative Analysis of ITC Application and Acceptance

Recent research provides compelling quantitative evidence of ITCs' growing role in healthcare decision-making. A comprehensive analysis of oncology drug submissions from 2021-2023 revealed significant patterns in ITC utilization and acceptance across global regulatory and HTA bodies [6]:

Table 3: Quantitative Analysis of ITC Application in Oncology Drug Submissions (2021-2023)

Authority	Documents with ITCs	Positive Decisions	Most Frequent ITC Methods	Orphan Drug Advantage
EMA (Regulatory)	33 documents	100% (21 full, 12 conditional approvals)	Unspecified methods (61.9%), PSM (16.7%), MAIC (14.3%)	ITCs in orphan submissions more frequently led to positive decisions
CDA-AMC (Canada)	56 reimbursement reviews	Information missing	Analysis focused on acceptance rather than specific methods	ITCs in orphan submissions more frequently led to positive decisions
PBAC (Australia)	46 public summary documents	Information missing	Analysis focused on acceptance rather than specific methods	ITCs in orphan submissions more frequently led to positive decisions
G-BA (Germany)	40 benefit assessments	Information missing	Analysis focused on acceptance rather than specific methods	ITCs in orphan submissions more frequently led to positive decisions
HAS (France)	10 transparency summaries	Information missing	Analysis focused on acceptance rather than specific methods	ITCs in orphan submissions more frequently led to positive decisions

The data demonstrates that ITCs have become pervasive in oncology drug assessments, with 188 unique recommendations supported by 306 distinct ITCs across the included authorities [6]. This quantitative evidence underscores the critical importance of selecting and implementing methodologically robust ITC approaches to maximize regulatory and HTA success.

The strategic importance of robust Indirect Treatment Comparisons continues to grow within global regulatory and HTA decision-making frameworks, particularly with the implementation of the EU HTA Regulation and its standardized evidence requirements. Success in this evolving landscape requires methodological rigor, strategic evidence planning, and cross-functional collaboration between health economics outcomes research scientists and clinical experts. By selecting appropriate ITC methodologies based on connected network structures, effect modifier considerations, and data availability – and implementing them through systematic, transparent protocols – health technology developers can generate the high-quality comparative evidence necessary to demonstrate product value across diverse healthcare systems. As ITC techniques continue to evolve rapidly in sophistication, their strategic application will remain fundamental to securing patient access to innovative therapies in an increasingly complex global market.

Strategic Selection and Application of ITC Methods and Comparators

In the realm of drug development and health technology assessment (HTA), direct head-to-head randomized controlled trials (RCTs) are considered the gold standard for comparing treatments. However, direct comparative evidence is frequently unavailable due to ethical constraints, feasibility issues, impracticality when multiple comparators exist, or the rapid evolution of treatment landscapes [7] [5]. Indirect treatment comparisons (ITCs) provide a statistical methodology to estimate the relative effects of interventions when no direct trial data exists, by using a common comparator to link treatments across different studies [7] [8]. The fundamental premise of ITC is to preserve the integrity of randomization from the source trials as much as possible, thereby minimizing bias [22]. The selection of an appropriate ITC method is a critical decision that depends heavily on the available data and the structure of the evidence network. This framework guides researchers through this selection process to ensure robust and defensible comparative evidence for HTA submissions.

Researchers have developed numerous ITC methods, leading to varied and sometimes inconsistent terminologies [7]. These methods can be categorized based on underlying assumptions and the number of comparisons involved. Adjusted ITC methods are preferred over naïve comparisons (which compare study arms from different trials as if they were from the same RCT) because the latter are highly susceptible to bias and their outcomes are difficult to interpret [5] [8].

Table 1: Core Classes of Indirect Treatment Comparison Methods

ITC Method Class	Key Assumption	Number of Comparisons	Common Techniques
Adjusted Indirect Comparison	Constancy of relative effects (Homogeneity, Similarity) [7]	Pairwise (two interventions) [7]	Bucher Method [7]
Network Meta-Analysis	Constancy of relative effects (Homogeneity, Similarity, Consistency) [7]	Multiple (three or more interventions) [7]	Network Meta-Analysis (NMA), Mixed Treatment Comparisons (MTC) [7]
Population-Adjusted Indirect Comparison (PAIC)	Conditional constancy of relative or absolute effects [7]	Pairwise or Multiple [7]	Matching-Adjusted Indirect Comparison (MAIC), Simulated Treatment Comparison (STC) [7]
Network Meta-Regression	Conditional constancy of relative effects with shared effect modifier [7]	Multiple [7]	Network Meta-Regression (NMR), Multilevel Network Meta-Regression (ML-NMR) [7]

The Bucher method, also known as adjusted or standard ITC, is a frequentist approach for simple pairwise comparisons through a common comparator but is not suitable for complex networks from multi-arm trials [7]. Network meta-analysis (NMA), including indirect NMA and mixed treatment comparisons (MTC), allows for the simultaneous comparison of multiple interventions using both direct and indirect evidence within a frequentist or Bayesian framework [7] [5]. Population-adjusted methods like MAIC and STC adjust for imbalances in patient-level characteristics across studies when individual patient data (IPD) is available for at least one trial [7] [23]. Meta-regression techniques such as NMR and ML-NMR use regression to explore the impact of study-level or patient-level covariates on treatment effects, relaxing the assumption of constant effects [7].

Key Decision Factors for ITC Method Selection

Evidence Network Connectivity and Data Availability

The structure of the available evidence is the primary determinant in selecting an ITC method. The initial step involves mapping all relevant studies into a connected evidence network, where interventions are linked through one or more common comparators [23]. A shared common comparator, such as placebo or a standard of care, is essential for "anchored" ITCs, which preserve the benefit of randomization and are generally preferred by HTA bodies [22]. "Unanchored" comparisons, which lack this common anchor and rely on absolute treatment effects, are considered more prone to bias and should only be used when anchored methods are unfeasible [22]. The availability of individual patient data (IPD) versus only aggregate data (AgD) further narrows the choice of methods. PAIC methods like MAIC and STC require IPD from at least one study to adjust for population differences [23].

Patient Population Similarity and Effect Modifiers

Even with a connected network, differences in the baseline characteristics of patients across trials can introduce bias. The critical assessment of patient population similarity is required to determine if a simple unadjusted method is sufficient or if a population-adjusted method is necessary [7] [23]. Key considerations include identifying known effect modifiers—baseline characteristics that influence the relative treatment effect (e.g., disease severity, age, biomarker status). If important effect modifiers are unbalanced across trials, methods that can adjust for them, such as MAIC, STC, or meta-regression, are essential to produce valid comparisons [23]. The sufficiency of overlap between patient populations in different studies is a key criterion for the acceptability of an ITC; too little overlap makes any comparison unreliable [23].

HTA agencies worldwide have developed guidelines and preferences for ITC methods. A clear trend favors population-adjusted or anchored ITC techniques over naïve comparisons [8]. Recent data from Canadian and US reimbursement submissions in oncology shows consistent use of NMA and unanchored PAIC, while naïve comparisons and Bucher analyses have decreased [24]. The new EU HTA regulation, effective from 2025, emphasizes methodological flexibility, recommending tailoring the method to the specific evidence context without endorsing a single approach [23]. Pre-specification of the ITC analysis plan is paramount to avoid accusations of selective reporting and to ensure scientific rigor [23].

The ITC Selection Framework: A Step-by-Step Guide

The following decision framework synthesizes the key factors into a step-by-step process for selecting the most appropriate ITC method. This workflow starts with a fundamental question about the evidence base and guides the user to a recommended method based on their specific data context.

ITC Method Selection Framework

Table 2: Detailed ITC Methods and Their Data Requirements

Recommended Method	Data Requirements	Key Assumptions	Strengths	Common Applications
Bucher Method [7]	Aggregate data (AgD) from at least two RCTs sharing a common comparator.	Constancy of relative effects (homogeneity, similarity). [7]	Simple, intuitive pairwise comparison. [7]	Pairwise indirect comparisons with a shared comparator and similar populations.
Network Meta-Analysis (NMA) [7] [5]	AgD from a connected network of RCTs (three or more interventions).	Homogeneity, similarity, and consistency between direct and indirect evidence. [7]	Simultaneously compares multiple treatments; can rank interventions. [7]	Multiple treatment comparisons with a connected evidence network.
Matching-Adjusted Indirect Comparison (MAIC) [7] [23]	IPD for the index intervention and AgD for the comparator.	Conditional constancy of effects; all effect modifiers are measured and adjusted for. [7]	Adjusts for population imbalances using propensity score weighting. [7] [23]	Single-arm trials, or RCTs with considerable population heterogeneity.
Simulated Treatment Comparison (STC) [23] [5]	IPD for one treatment and AgD for the other.	Conditional constancy of effects; correct outcome model specification. [23]	Uses outcome regression to predict results in the AgD population. [7]	Pairwise ITC with population heterogeneity; single-arm studies.
Network Meta-Regression (NMR) [7]	AgD for all studies in the network, with study-level covariates.	Conditional constancy with shared effect modifiers at the study level. [7]	Explores impact of study-level covariates on treatment effects. [7]	Investigating how distinct factors (e.g., year, baseline risk) affect relative treatment effects.

Framework Application and Scenario Analysis

Scenario A (Connected network, two interventions, similar populations): The most straightforward case. The Bucher method is the appropriate and simplest choice. For example, an anchored Bucher ITC was used to compare the ARIA outcomes of lecanemab versus donanemab using placebo as the common comparator [25].
Scenario B (Connected network, three or more interventions): To compare multiple interventions simultaneously, Network Meta-Analysis (NMA) is the standard methodology. NMA was the most frequently described ITC technique in a recent systematic literature review, underscoring its central role in comparative effectiveness research [5].
Scenario C (Connected network but populations differ): This is a common and complex scenario. If IPD is available for at least one study, MAIC or STC should be employed. If only aggregate data is available, Network Meta-Regression can be used to explore the influence of study-level covariates. MAIC is a particularly popular method that combines IPD with AgD and ensures comparability by re-weighting based on propensity scores [23].
Scenario D (No connected network/unanchored): This is the most challenging situation, often involving single-arm trials. Unanchored PAIC methods (MAIC, STC) are the primary option, but they rely on very strong assumptions. HTA acceptance is not guaranteed, and it is key to investigate and quantify any potential sources of bias introduced by these methods [23].

Essential Methodological Protocols and Reporting Standards

Pre-Specification and Transparency Protocols

Adherence to pre-specified protocols is critical for the credibility of an ITC. Key steps include:

Develop and Register a Statistical Analysis Plan (SAP): Before conducting any analysis, determine and document the chosen ITC method, the outcomes, effect modifiers to be considered, and all sensitivity analyses. This prevents selective reporting and ensures scientific rigor [23].
Conduct a Systematic Literature Review (SLR): An SLR following guidelines like PRISMA should be used to identify all relevant evidence for the network [5]. The PICO (Population, Intervention, Comparator, Outcome) framework should guide the review.
Assess Feasibility and Similarity: Critically evaluate the connectedness of the network and the clinical and methodological similarity of the included trials. This includes assessing the balance of effect modifiers across studies [7] [22].
Plan Sensitivity and Subgroup Analyses: Pre-specify analyses to test the robustness of the findings, such as excluding certain studies or using different statistical models. For subgroup analyses, a clear rationale must be provided [23].

The Researcher's Toolkit: Key Reagents for ITC Analysis

Table 3: Essential Tools and Reagents for Conducting ITCs

Tool/Reagent	Function in ITC Analysis	Application Notes
Individual Patient Data (IPD) [7] [23]	Enables population-adjusted methods (MAIC, STC) by allowing direct re-weighting or modeling of patient-level characteristics.	Often sourced from the sponsor's own clinical trials. Essential for adjusting for cross-trial imbalances.
Aggregate Data (AgD) [7]	The foundation for unadjusted ITCs (Bucher, NMA). Typically extracted from published literature or clinical study reports.	Must be sufficiently detailed (e.g., means, counts, standard deviations) for meta-analysis.
Statistical Software (R, Python) [26]	Provides the computational environment for performing complex statistical models like Bayesian NMA, MAIC, and meta-regression.	Offers greater flexibility and customization for advanced methodologies compared to some commercial tools.
Specialized ITC Software (e.g., OpenBUGS, GeMTC)	Facilitates the implementation of specific ITC models, particularly Bayesian NMA.	Can simplify the process for researchers less familiar with hand-coding complex statistical models.
Systematic Review Software (e.g., DistillerSR, Covidence)	Supports the management and screening of large volumes of literature during the evidence identification phase.	Ensures the SLR process is reproducible, efficient, and minimizes human error.

Selecting the correct indirect treatment comparison method is a nuanced decision pivotal to generating valid and reliable evidence for healthcare decision-making. This framework demonstrates that the choice is not arbitrary but is systematically guided by the connectivity of the evidence network, the number of comparators, the similarity of patient populations, and the type of data available (IPD vs. AgD). As the therapeutic landscape evolves and new complex therapies emerge, the role of sophisticated ITC methods like MAIC and ML-NMR is expected to grow. Adherence to recent HTA guidelines, rigorous pre-specification, and transparent reporting are non-negotiable elements for the acceptance of ITC evidence. By applying this structured decision framework, researchers and drug developers can navigate the complexities of ITC selection, ensuring that their comparative analyses are both robust and defensible, ultimately informing better healthcare decisions.

Within the critical discipline of comparative effectiveness research, the identification and use of common comparators forms the foundational pillar for robust indirect analyses. In the context of drug development, head-to-head randomized controlled trials (RCTs) are not always ethically or logistically feasible, creating a critical evidence gap for healthcare decision-makers [5] [6]. Indirect Treatment Comparisons (ITCs) have emerged as a vital statistical methodology to bridge this gap, and among these, the Bucher method holds a fundamental position as a pioneering technique for pairwise comparisons [7] [5].

Also known as adjusted or standard indirect comparison, the Bucher method enables the estimation of the relative treatment effect between two interventions, Treatment A and Treatment B, that have not been directly compared in a clinical trial but have both been studied against a common comparator, Treatment C [7] [2]. This method is a cornerstone in the ITC landscape, providing a relatively simple and transparent framework for evidence synthesis where direct evidence is absent [23]. Its role is particularly crucial for Health Technology Assessment (HTA) bodies worldwide, which must make informed recommendations on the adoption of new health interventions despite frequent limitations in available direct evidence [7] [6]. This guide provides an in-depth technical examination of the Bucher method, detailing its applications, foundational assumptions, methodological protocols, and inherent limitations for an audience of researchers, scientists, and drug development professionals.

The Bucher Method in the Context of Indirect Treatment Comparisons

Definition and Place in the ITC Landscape

The Bucher method is an anchored indirect comparison, meaning it preserves the integrity of randomization within the original trials by using a common reference or "anchor" [23] [27]. This technique constructs an indirect estimate of the relative effect of Treatment A versus Treatment B by leveraging the direct evidence from trials comparing A vs. C and B vs. C [7]. The fundamental principle involves combining these two direct comparisons mathematically to derive the desired indirect comparison.

As illustrated in the network diagram below, the Bucher method operates on a simple, connected evidence network where interventions are linked via a shared comparator.

Figure 1: Basic Star Network for Bucher Method

This method is categorized under a class of ITCs that rely on the constancy of relative treatment effects, an assumption encompassing homogeneity and similarity across studies [7]. It is distinct from more complex techniques like Network Meta-Analysis (NMA), which can simultaneously compare multiple interventions, and population-adjusted methods like Matching-Adjusted Indirect Comparison (MAIC), which adjust for patient-level differences when individual patient data (IPD) is available [7] [23]. A recent systematic literature review found that the Bucher method was described in 23.3% of included methodological articles on ITC techniques, establishing it as a well-recognized approach in the field [5].

When to Use the Bucher Method: Key Applications

The decision to employ the Bucher method is governed by the specific clinical question and the structure of the available evidence. It is a strategically appropriate choice in several scenarios:

Pairwise Comparisons via a Common Comparator: Its primary application is when the relative effect between two treatments of interest (A vs. B) is needed, and only trials comparing each to a common comparator (C) are available [7] [2].
Simple Evidence Networks: It is ideally suited for simple, star-shaped networks where multiple interventions have been compared to a single common comparator but not to each other [23]. The method is not designed to incorporate evidence from multi-arm trials into more complex, interconnected networks [7].
Absence of Individual Patient Data (IPD): Unlike MAIC or Simulated Treatment Comparison (STC), the Bucher method operates solely on aggregate data (AgD) typically found in published trial results, making it applicable when IPD is inaccessible [5] [23].
Rapid, Transparent Analysis: It offers a relatively straightforward and computationally less intensive approach compared to Bayesian NMA, providing a clear audit trail for health technology assessment (HTA) submissions where transparency is paramount [7].

Fundamental Assumptions and Methodological Framework

Core Assumptions Underpinning Validity

The validity of any indirect comparison, including the Bucher method, is contingent upon satisfying several core assumptions. Violations of these assumptions can introduce bias and invalidate the results.

Table 1: Fundamental Assumptions of the Bucher Method

Assumption	Description	Method of Assessment
Homogeneity	The relative treatment effect (e.g., hazard ratio) for A vs. C is consistent across all studies included for that comparison. Similarly for B vs. C.	Compare the study designs, patient populations, and interventions of the A vs. C trials and the B vs. C trials. Statistical tests for heterogeneity (e.g., I², Cochran's Q) can be used in each set of studies.
Similarity (Transitivity)	The trials used for the A vs. C and B vs. C comparisons are sufficiently similar with respect to factors that can modify the treatment effect (effect modifiers), such as patient baseline characteristics, trial design, and definitions of outcomes.	Qualitative review of the distribution of known and unknown effect modifiers across the trials. This involves careful evaluation of the PICO (Population, Intervention, Comparator, Outcome) elements of each trial.
Consistency	This assumption is inherently satisfied in the simple two-way Bucher comparison. It implies that the indirect estimate of A vs. B is consistent with the direct estimate that would have been obtained from a head-to-head trial (if it existed).	In a simple A-B-C network, this cannot be tested statistically. It relies on the validity of the homogeneity and similarity assumptions [7].

Detailed Experimental and Calculation Protocol

The following workflow outlines the step-by-step methodology for conducting a Bucher indirect comparison, from evidence identification to result interpretation.

Figure 2: Bucher Method Implementation Workflow

Step 1: Define the Research Question Clearly specify the PICO elements:

Population: The patient population of interest.
Intervention: Treatment A.
Comparator: Treatment B.
Outcome: The clinical endpoint for comparison (e.g., overall survival, progression-free survival).

Step 2: Conduct a Systematic Literature Review Identify all relevant RCTs comparing A vs. C and B vs. C. The common comparator C must be the same in both sets of trials (e.g., the same drug, dose, and background therapy).

Step 3: Assess Studies for Similarity and Homogeneity Critically appraise the selected trials to evaluate the key assumptions from Table 1. This involves comparing patient baseline characteristics, study designs, and outcome measurements across the A vs. C and B vs. C trials.

Step 4: Extract Aggregate Data For each trial, extract the relative effect estimate (e.g., log hazard ratio, log odds ratio) and its variance for the outcome of interest. The analysis is typically performed on the log scale to normalize the distribution of ratio-based measures.

Step 5: Perform the Bucher Calculation The core calculations are as follows:

Let ( \hat{θ}{AC} ) be the estimated log effect of A vs. C with variance ( V{AC} ).
Let ( \hat{θ}{BC} ) be the estimated log effect of B vs. C with variance ( V{BC} ).
The indirect log effect of A vs. B is calculated as: ( \hat{θ}{AB} = \hat{θ}{AC} - \hat{θ}_{BC} )
The variance of the indirect estimate is the sum of the variances: ( V{AB} = V{AC} + V_{BC} )
The 95% confidence interval for the indirect effect is then: ( \hat{θ}{AB} \pm 1.96 * \sqrt{V{AB}} )
Finally, the point estimate and confidence interval are exponentiated to convert back to the natural scale (e.g., hazard ratio).

Step 6: Validate Results and Conduct Sensitivity Analysis Assess the robustness of the findings through sensitivity analyses. This may include using different sets of trials for the comparisons or applying different statistical models (e.g., fixed-effect vs. random-effects for each pairwise meta-analysis) if multiple trials are available for A vs. C or B vs. C.

Step 7: Report Findings Transparently report all steps, assumptions, extracted data, and results, including any limitations identified during the assessment of similarity and homogeneity.

Table 2: Key Methodological Tools for a Bucher Analysis

Tool / Resource	Category	Function in the Analysis
PICO Framework	Methodological Protocol	Provides a structured approach to defining the clinical question and inclusion/exclusion criteria for the systematic review.
PRISMA Guidelines	Reporting Guideline	Ensures the systematic literature review is conducted and reported thoroughly and transparently [5].
Aggregate Data	Research Reagent	The essential input for the analysis, comprising effect estimates (e.g., log(HR)) and measures of precision (variance or standard error) extracted from published studies or trial reports.
Statistical Software	Analytical Tool	Software like R, Stata, or Python is used to perform the meta-analyses (if needed) and the final Bucher calculation, including confidence interval estimation.
Cochrane Risk of Bias Tool	Assessment Tool	Used to evaluate the methodological quality and potential biases within the individual RCTs included in the analysis.

Applications, Limitations, and Comparative Analysis

Strengths and Advantages

The enduring relevance of the Bucher method in the statistician's arsenal is due to several key strengths:

Simplicity and Transparency: The calculation is straightforward, easy to implement, audit, and explain to non-statistical stakeholders, including clinicians and HTA committee members [7].
Preservation of Randomization: As an anchored method, it maintains the benefit of randomization from the source trials for the comparison against the common comparator, reducing susceptibility to confounding compared to naïve comparisons [28].
No Requirement for IPD: It can be conducted using publicly available aggregate data, enhancing its feasibility [5].
Foundation for Complex Methods: It provides the conceptual and statistical basis for more advanced evidence synthesis techniques, such as NMA.

Critical Limitations and Challenges

Despite its utility, the Bucher method carries significant limitations that researchers must acknowledge and address.

Table 3: Limitations of the Bucher Method and Mitigation Strategies

Limitation	Description	Potential Mitigation Strategy
Requires Common Comparator	The analysis is impossible without a single common comparator (C) that is identical across all trials.	Carefully define the comparator to ensure clinical and methodological consistency. If no single common comparator exists, more complex methods like NMA may be needed.
Inability to Adjust for Cross-Trial Differences	The method cannot adjust for imbalances in patient-level characteristics (effect modifiers) between the A vs. C and B vs. C trials. This is its most significant constraint.	Conduct a thorough assessment of similarity. If important imbalances exist, consider population-adjusted methods like MAIC (if IPD is available) or discuss the limitation transparently.
Limited to Simple Networks	It cannot incorporate evidence from multi-arm trials or more complex, interconnected evidence networks.	For complex networks with multiple treatments and connections, NMA is the required and more efficient approach [7].
Assumptions are Untestable	In its basic form, the critical similarity assumption is qualitative and cannot be statistically verified, introducing potential bias.	Use meta-regression (if multiple trials are available per comparison) to explore the impact of study-level covariates on the treatment effect [7].
Increased Variance	The variance of the indirect estimate is the sum of the variances of the two direct estimates, leading to wider confidence intervals and less precision compared to a direct trial of the same size.	This is an inherent statistical trade-off for indirect evidence and should be considered when interpreting the results.

Acceptance in Health Technology Assessment

The Bucher method is generally accepted by major HTA bodies worldwide, including NICE (UK), CADTH (Canada), and PBAC (Australia), when its use is appropriately justified and its assumptions are met [2]. However, its acceptability is not universal for all contexts. For instance, in Germany, the Institute for Quality and Efficiency in Health Care (IQWiG) has been known to reject a high percentage of ITC submissions, often due to a lack of adjusted comparisons or insufficient data to support the underlying assumptions [2] [28]. A critical letter regarding a recent review noted that the Bucher method, which maintains randomization, has been accepted by the Federal Joint Committee (G-BA), whereas more complex population-adjusted methods are not always favored [28]. Trends show that while the use of naïve comparisons and the Bucher method is decreasing in some jurisdictions like Canada, methods like NMA and unanchored population-adjusted comparisons remain consistently used [24].

The Bucher method remains a fundamental technique in the methodological toolkit for indirect treatment comparisons. Its value is most apparent in well-defined scenarios involving pairwise comparisons through a robust common comparator, where its simplicity and transparency are paramount. It provides a statistically valid and accessible means to address critical evidence gaps in drug development and reimbursement.

However, the modern researcher must be acutely aware of its profound limitations, chief among them the inability to adjust for cross-trial differences in patient populations. The assumption of similarity is a heavy burden of proof, and its violation can severely compromise the validity of the results. Therefore, the choice to use the Bucher method must be guided by a rigorous assessment of the available evidence against its core assumptions. In an evolving HTA landscape, such as the new EU HTA framework, which emphasizes rigorous methodological standards, researchers must be prepared to justify their analytical choices transparently [23] [27]. For more complex evidence structures or when faced with significant effect modifier imbalance, advancing to more sophisticated methods like Network Meta-Analysis or population-adjusted indirect comparisons is not just an option but a necessity for generating credible and influential comparative evidence.

Leveraging Network Meta-Analysis (NMA) for Multiple Treatment Comparisons

Network meta-analysis (NMA) represents an advanced statistical methodology that enables the simultaneous comparison of multiple interventions within a single, coherent analysis. As an extension of traditional pairwise meta-analysis, NMA integrates both direct evidence (from head-to-head comparisons) and indirect evidence (estimated through common comparators) to generate comprehensive treatment effect estimates across all competing interventions [29] [30]. This approach is particularly valuable in drug development and comparative effectiveness research, where clinicians and decision-makers often face multiple treatment options that have not been directly compared in randomized controlled trials (RCTs) [31] [32].

The fundamental principle underlying NMA is the ability to leverage connected networks of trials to make inferences about treatment comparisons that lack direct evidence. For example, if Treatment A has been compared to Treatment C in trials, and Treatment B has also been compared to Treatment C, but A and B have never been directly compared, NMA allows for an indirect comparison of A versus B through their common comparator C [29] [33]. This capacity to fill evidence gaps makes NMA an indispensable tool for informing clinical practice guidelines and health technology assessments [34].

Fundamental Principles and Key Assumptions

Direct, Indirect, and Mixed Evidence

In NMA, three types of evidence contribute to the treatment effect estimates:

Direct evidence: Obtained from studies that directly compare the interventions of interest (e.g., head-to-head RCTs of drug A versus drug B) [30].
Indirect evidence: Derived through a common comparator when two interventions have not been directly compared in trials [29] [32]. For instance, comparing drug A and drug B indirectly through their common comparisons with drug C.
Mixed evidence: The combination of direct and indirect evidence in a network estimate, which typically provides more precise effect estimates than either source alone [33] [32].

Core Assumptions of Validity

The validity of NMA depends on two fundamental assumptions:

Transitivity refers to the methodological and clinical similarity across studies included in the network [33] [32]. This assumption requires that the different sets of randomized trials are similar, on average, in all important factors other than the intervention comparisons being made [33]. Violations of transitivity (intransitivity) occur when studies comparing different interventions differ systematically in effect modifiers—characteristics that influence the treatment effect size—such as patient population characteristics, intervention dosage, or study design [29] [32]. For example, in a network comparing glaucoma treatments, if all trials of prostaglandin analogues enrolled patients with higher baseline intraocular pressure while beta-blocker trials enrolled patients with lower pressures, and baseline pressure is an effect modifier, the transitivity assumption would be violated [32].

Coherence (also called consistency) represents the statistical manifestation of transitivity and refers to the agreement between direct and indirect evidence when both are available for the same comparison [29] [33]. The presence of significant incoherence suggests violation of the transitivity assumption or methodological issues in the included studies [30]. Statistical tests are available to detect incoherence, both globally (across the entire network) and locally (in specific closed loops where both direct and indirect evidence exist) [29].

Table 1: Core Assumptions of Network Meta-Analysis

Assumption	Definition	Implication if Violated	Assessment Methods
Transitivity	Clinical and methodological similarity across different direct comparisons in the network	Biased indirect and mixed treatment effect estimates	Evaluation of distribution of potential effect modifiers across comparisons
Coherence	Statistical agreement between direct and indirect evidence for the same comparison	Reduced confidence in network estimates	Statistical tests for disagreement between direct and indirect evidence

Methodological Framework for NMA

Designing a Network Meta-Analysis

The conduct of a systematic review with NMA follows the same fundamental steps as a traditional systematic review but requires additional considerations at each stage [29]:

Question Formulation and Eligibility Criteria The research question should be developed using the PICO (Participants, Interventions, Comparators, Outcomes) framework, with particular attention to defining the treatment network [32]. Researchers must decide which interventions to include and whether to "split" or "lump" interventions into nodes [29]. For example, decisions must be made about whether to consider all doses of a drug within a single node or separate them into different nodes based on expected differential effects [29]. The network should comprehensively include all relevant interventions, including common comparators like placebo or standard care, even if they are not of primary interest, as they provide crucial indirect evidence [29] [32].

Literature Search and Study Selection Due to the broader scope of NMAs, literature searches must be comprehensive to capture all relevant interventions and comparators [29] [32]. This typically results in screening a larger number of references and including more studies than traditional pairwise meta-analyses, requiring additional time and resources [29].

Data Collection and Risk of Bias Assessment When abstracting data, it is essential to collect information on potential effect modifiers to enable evaluation of the transitivity assumption [32]. These effect modifiers should be pre-specified in the protocol based on clinical expertise or prior literature and typically include study eligibility criteria, population characteristics, study design features, and risk of bias items [32].

Analytical Approach

Network Geometry Evaluation Before statistical analysis, researchers should visualize and understand the network geometry using network diagrams [33] [32]. These diagrams represent interventions as nodes and direct comparisons as lines connecting them, with the thickness of lines and size of nodes often proportional to the amount of evidence available [30] [32]. Understanding the network structure helps identify which interventions have been directly compared and which comparisons rely solely on indirect evidence [32].

Statistical Analysis NMA can be conducted within both frequentist and Bayesian statistical frameworks [29]. The analysis generates estimates of the relative effects between all pairs of interventions in the network, typically reported as odds ratios, risk ratios, or mean differences with confidence or credible intervals [29] [31]. The complexity of NMA requires involvement of a statistician or methodologist with expertise in these techniques [29].

Component Network Meta-Analysis For complex interventions consisting of multiple components, component NMA (CNMA) offers an alternative approach that models the effect of individual intervention components rather than treating each unique combination as a separate node [35]. This method can reduce uncertainty around estimates and predict effectiveness for component combinations not previously evaluated in trials [35].

NMA Workflow: Diagram illustrating the key stages in conducting a network meta-analysis

Critical Considerations for Common Comparators

Identifying and Evaluating Common Comparators

The identification of appropriate common comparators is fundamental to establishing a connected network and generating valid indirect treatment comparisons [29] [33]. Common comparators serve as bridges that allow indirect evidence to flow through the network, enabling comparisons between interventions that lack direct head-to-head evidence [33].

When planning an NMA, researchers should consider including all relevant common comparators, even those not of primary clinical interest, as they contribute important indirect evidence [29]. For example, in a network of active drugs, including placebo or no treatment groups can provide crucial connecting evidence, though caution is needed if placebo-controlled trials differ systematically from head-to-head trials in ways that might modify treatment effects [32].

Evaluating Comparator Suitability The suitability of common comparators depends on their position and connectivity within the network. Ideal common comparators:

Have been compared directly with multiple interventions of interest
Are connected to different parts of the network
Have been studied in trials with similar populations and methodologies

Table 2: Classification of Common Comparators in NMA

Comparator Type	Characteristics	Advantages	Limitations
Placebo/No Treatment	Inert intervention or natural disease course	Provides absolute effect benchmarks; commonly studied	May differ from active comparator trials in design and bias risk
Standard of Care	Established conventional treatment	Clinically relevant comparisons; often well-studied	Definition may vary across settings and time periods
Network Hub	Connected to multiple interventions	Maximizes indirect evidence flow	Potential for effect modifier imbalances across comparisons

Statistical Importance of Studies and Comparators

The contribution of individual studies to NMA estimates can be quantified using statistical importance measures, which generalize the concept of weights from pairwise meta-analysis [36]. The importance of a study for a particular comparison is defined as the reduction in variance of the NMA estimate when that study is added to the network [36]. This approach helps identify which studies—and consequently which common comparators—are most influential in the network [36].

Studies that serve as the only link between different parts of the network have particular importance, as their removal would disconnect the network and prevent certain indirect comparisons [36]. In such cases, these studies have an importance of 1 for the affected comparisons, meaning they are essential for the estimation [36].

Advanced Applications and Methodological Innovations

Component Network Meta-Analysis (CNMA)

For complex interventions consisting of multiple components, CNMA offers a sophisticated approach that models the effects of individual components rather than treating each unique combination as a separate intervention [35]. This method addresses key clinical questions such as:

Which components are the most effective?
Can ineffective components be removed to reduce intervention cost?
How can interventions be optimized to include only the most effective components? [35]

CNMA models range from simple additive models (where combination effects equal the sum of component effects) to full interaction models (equivalent to standard NMA) [35]. The additive model assumes no interaction between components, while more complex models can incorporate two-way or higher-order interactions [35].

Visualization Approaches for CNMA Traditional network diagrams become inadequate for CNMA due to the complexity of representing multiple component combinations [35]. Novel visualization approaches have been developed specifically for CNMA, including:

CNMA-UpSet plots: Present arm-level data suitable for networks with large numbers of components
CNMA heat maps: Inform decisions about which pairwise interactions to include in models
CNMA-circle plots: Visualize component combinations that differ between trial arms [35]

Ranking Methodologies and Interpretation

NMA enables estimation of the relative ranking of interventions, which can inform clinical decision-making [33]. Several ranking metrics are available, including:

SUCRA (Surface Under the Cumulative Ranking Curve): Provides a numerical summary of the ranking distribution (0-100%), with higher values indicating better performance [30] [32]
Mean ranks: The average rank of each treatment across iterations in Bayesian analysis or simulations [33]
Probability of being best: The proportion of iterations in which each treatment has the most favorable outcome [33]

However, ranking methodologies have important limitations. SUCRA values and similar metrics consider only the point estimates of effects and not their precision or the certainty of evidence [30]. Consequently, interventions supported by small, low-quality trials reporting large effects may be ranked highly despite limited evidence [30]. More recent minimally or partially contextualized approaches consider both the magnitude of effect in the context of patient importance and the certainty of evidence [30].

Evidence Network: Example network showing direct (solid) and indirect (dashed) comparisons

Assessing Confidence in NMA Results

The GRADE Approach for NMA

The Grading of Recommendations, Assessment, Development and Evaluation (GRADE) framework provides a systematic approach for rating the certainty of evidence in NMAs [29] [30]. The process begins by rating the certainty of evidence for each direct comparison, considering:

Risk of bias in the included studies
Inconsistency (heterogeneity) in effects across studies
Indirectness of the evidence to the research question
Imprecision of the effect estimates
Publication bias [30] [32]

For NMA specifically, the GRADE approach additionally addresses:

The certainty of indirect evidence, including evaluation of transitivity
The relative contribution of direct and indirect evidence to each network estimate
Coherence between direct and indirect evidence [29]

The presence of incoherence between direct and indirect evidence typically leads to downgrading the certainty of evidence by one level [30]. If serious intransitivity is suspected, the certainty of indirect and mixed evidence may also be downgraded [29].

Reporting and Interpretation

Comprehensive reporting of NMA is essential for transparency and critical appraisal. Key reporting items include:

Clear description of the network structure and geometry
Assessment of transitivity and potential effect modifiers
Evaluation of statistical coherence between direct and indirect evidence
Certainty of evidence ratings for important comparisons
Appropriate interpretation of treatment rankings in clinical context [30] [32]

The PRISMA extension for NMA provides detailed guidance on reporting standards, and protocols should ideally be registered in platforms like PROSPERO before commencing the review [37] [30].

Table 3: Protocol Requirements for NMA on Common Comparators

Protocol Section	Specific Considerations for Common Comparator Research
Eligibility Criteria	Explicit rationale for inclusion of specific common comparators; decision rules for "lumping" or "splitting" comparator definitions
Search Strategy	Targeted search methods to identify all trials using specified common comparators
Data Extraction	Standardized extraction of comparator characteristics (dose, formulation, administration) and potential effect modifiers
Transitivity Assessment	A priori hypotheses about effect modifiers and planned analytical approaches to address intransitivity
Statistical Analysis	Pre-specified methods for evaluating comparator connectivity and statistical importance

Network meta-analysis represents a powerful methodological advancement for comparing multiple treatments simultaneously by leveraging both direct and indirect evidence. The identification and appropriate use of common comparators is fundamental to constructing valid networks and generating reliable estimates of comparative treatment effects. As NMA methodology continues to evolve, recent innovations in component NMA, statistical importance measures, and evidence grading systems offer enhanced tools for addressing complex clinical questions in drug development and comparative effectiveness research. When rigorously conducted and transparently reported, NMA provides invaluable evidence to inform clinical decision-making, treatment guidelines, and healthcare policy.

Population-Adjusted Methods (MAIC, SIM) for Handling Cross-Trial Heterogeneity

In the evaluation of new health technologies, head-to-head randomized controlled trials (RCTs) are considered the gold standard for providing comparative evidence. However, direct comparisons are often ethically problematic, unfeasible, or impractical to conduct, particularly in oncology and rare diseases [5]. In such cases, indirect treatment comparisons (ITCs) provide valuable evidence for health technology assessment (HTA) bodies by enabling comparative effectiveness research between interventions that have not been tested directly against each other in RCTs [7]. Standard methods for indirect comparisons and network meta-analysis (NMA) traditionally rely on aggregate data and operate under the key assumption that no differences exist between trials in the distribution of effect-modifying variables [12]. When this assumption is violated due to cross-trial heterogeneity in patient populations, these standard methods may produce biased estimates, potentially leading to incorrect clinical and reimbursement decisions [38].

Population-adjusted indirect comparisons (PAICs) have emerged as a critical methodological advancement to address cross-trial heterogeneity by relaxing the assumption of perfectly similar trial populations [12] [7]. These methods use individual patient data (IPD) from a subset of trials to adjust for between-trial imbalances in the distribution of observed covariates, enabling more valid comparisons between treatments in a specific target population [12]. The growing importance of PAICs is evidenced by their increasing application in submissions to reimbursement agencies worldwide, including the National Institute for Health and Care Excellence (NICE) [12] [5]. This technical guide focuses on two prominent population-adjusted methods: Matching-Adjusted Indirect Comparison (MAIC) and Simulated Treatment Comparison (STC), providing researchers with a comprehensive framework for their application within the broader context of identifying common comparators for indirect drug comparisons research.

Table 1: Key Terminology in Population-Adjusted Indirect Comparisons

Term	Definition
Effect Modifiers	Covariates that alter the effect of treatment as measured on a given scale [12]
Prognostic Variables	Covariates that affect the outcome regardless of treatment received [39]
Anchored Comparison	Indirect comparison with a common comparator arm connecting the evidence [12]
Unanchored Comparison	Indirect comparison without a common comparator, requiring stronger assumptions [12]
Individual Patient Data (IPD)	Raw data for each participant in a clinical trial [12]
Aggregate-Level Data (ALD)	Published summary data from clinical trials [40]

Theoretical Foundation of Population Adjustment

The Fundamental Problem of Cross-Trial Heterogeneity

The core challenge addressed by population adjustment methods arises from between-trial differences in the distribution of patient characteristics that function as effect modifiers [12]. Effect modifiers are covariates that specifically influence the magnitude of relative treatment effects on a given scale, distinct from prognostic variables that affect outcomes regardless of treatment [12]. When trials have different distributions of these effect modifiers, the conditional relative effects vary across trial populations, making standard indirect comparisons invalid [12]. This problem is particularly acute in unanchored comparisons where no common comparator exists, as these scenarios require much stronger assumptions that are widely regarded as difficult to meet in practice [12].

The theoretical basis for population adjustment rests on distinguishing between population-specific relative treatment effects and developing methods to transport these effects across different populations [12]. Formally, if we consider two trials (AB and AC) comparing treatments A vs. B and A vs. C respectively, with a target population P, the standard indirect comparison estimator assumes that population-specific relative treatment effects are equal across populations: dAB(AB) = dAB(AC) = dAB(P). When effect modifiers are differentially distributed, this assumption fails, and the premise of MAIC and STC is to "adjust for" these between-trial differences to identify a coherent set of estimates [12]. Both methods use IPD from one trial to form predictions of the summary outcomes that would be observed in another trial's population if that population had the same characteristics as the target population [12].

Anchored vs. Unanchored Comparisons

A critical distinction in population-adjusted methods lies between anchored and unanchored comparisons, which dictates the strength of assumptions required and the validity of resulting estimates [12]. In anchored comparisons, the evidence network is connected through a common comparator arm (e.g., both trials share a placebo or standard of care arm), allowing the analysis to respect the within-trial randomization [12]. This connection provides a crucial anchor for estimating relative effects while adjusting for population differences. In contrast, unanchored comparisons occur when the evidence is disconnected due to a lack of a common comparator, as often happens with single-arm studies [12] [39]. Unanchored comparisons require the much stronger assumption that differences in absolute outcomes between studies are entirely explainable by imbalances in observed prognostic variables and effect modifiers [39].

The limitations of unanchored comparisons are significant and well-documented. These analyses assume that all prognostic covariates and treatment effect modifiers imbalanced between the studies have been identified and adjusted for, an assumption generally considered very difficult to meet in practice [39]. Consequently, anchored comparisons should always be preferred when available, as they rely on more plausible assumptions by preserving the benefit of within-trial randomization [12]. For HTA submissions, the choice between these approaches is often dictated by the available evidence base, with unanchored analyses increasingly common in oncology where single-arm trials are frequent [41] [5].

Methodological Approaches to Population Adjustment

Matching-Adjusted Indirect Comparison (MAIC)

MAIC is a propensity score weighting method that uses IPD from one trial to create a "pseudo-sample" balanced with respect to the aggregate baseline characteristics of another trial [40] [42]. The method is based on method of moments to estimate weights that, when applied to the IPD, create a weighted sample where the means of the selected effect modifiers match those reported for the comparator trial [40] [39]. The core implementation involves estimating a logistic regression model for the trial assignment mechanism, with weights derived as the odds of assignment to the comparator trial conditional on selected baseline covariates [40].

The mathematical foundation of MAIC involves finding a vector β such that re-weighting baseline characteristics for the intervention IPD (xi,ILD) exactly matches the mean baseline characteristics for the comparator data (x̄AGG) [39]. The weights are given by: ω̂i = exp(xi,ILD · β), estimated by solving the equation: 0 = Σ (xi,ILD - x̄AGG) · exp(xi,ILD · β) [39]. This estimator is equivalent to minimizing the convex function Q(β) = Σ exp(xi,ILD · β), ensuring that any finite solution is unique and corresponds to the global minimum [39]. In practice, this involves centering the baseline characteristics of the IPD using the mean baseline characteristics from the comparator data before estimating the weights [39].

Table 2: Comparison of Population-Adjusted Indirect Comparison Methods

Characteristic	MAIC	STC
Methodological Foundation	Propensity score weighting [12]	Regression adjustment [12]
Data Requirements	IPD from index trial, ALD from competitor trial [40]	IPD from index trial, ALD from competitor trial [12]
Weight Estimation	Method of moments or entropy balancing [40]	Not applicable
Outcome Modeling	Not required in standard implementation	Required for outcome prediction [12]
Key Assumptions	All effect modifiers observed and balanced [12]	Correct specification of outcome model [12]
Primary Applications	Both anchored and unanchored scenarios [39]	Both anchored and unanchored scenarios [12]

Simulated Treatment Comparison (STC)

STC takes a regression-based approach to population adjustment, using outcome models developed from IPD to predict treatment effects in a target population [12]. Unlike MAIC, which focuses on balancing covariates through weighting, STC develops models of the relationship between covariates, treatment, and outcomes, then uses these models to simulate what outcomes would have been observed under different treatment conditions in the target population [12]. This approach relies on correct specification of the outcome model, including appropriate functional forms and interactions between treatment and effect modifiers [12].

The STC methodology involves constructing a regression model using the IPD from the index trial, typically including main effects for treatment and covariates, as well as treatment-covariate interactions for suspected effect modifiers [12]. This model is then applied to the aggregate data from the competitor trial to predict the outcomes that would have been observed if patients in the competitor trial had received the index treatment [12]. The adjusted treatment effect is calculated by comparing these predicted outcomes with the observed outcomes from the competitor trial [12]. While STC can be more efficient than MAIC when the outcome model is correctly specified, it is vulnerable to model extrapolation and may produce severely biased estimates under model misspecification [40].

Advanced Methodological Extensions

Recent methodological research has developed several extensions to address limitations in standard MAIC and STC implementations. The two-stage MAIC (2SMAIC) incorporates an additional weighting step to control for chance imbalances in prognostic baseline covariates within the IPD trial [40]. This approach uses two parametric models: one estimating the treatment assignment mechanism in the index study, and another estimating the trial assignment mechanism [40]. The resulting combined weights simultaneously balance covariates between treatment arms within the IPD trial and across studies, leading to improved precision and efficiency while maintaining similarly low bias levels compared to standard MAIC [40].

For time-to-event outcomes, recent developments include doubly robust methods that combine elements of both weighting and regression adjustment [41]. These approaches provide protection against model misspecification by requiring only one of the two models (either the treatment allocation model or the outcome model) to be correctly specified to obtain consistent estimates [41]. Simulation studies have demonstrated that doubly robust methods can provide more reliable estimates for unanchored comparisons with time-to-event endpoints, which are common in oncology applications [41]. Additionally, variance estimation techniques have been refined, with evidence suggesting that conventional estimators with effective sample size-scaled weights produce accurate confidence intervals across various scenarios, including those with poor population overlap [43].

Practical Implementation and Workflows

MAIC Implementation Protocol

Implementing MAIC requires a structured process to ensure appropriate methodology and reproducible results. The following protocol outlines the key steps for conducting an MAIC analysis:

Data Preparation and Covariate Selection: Identify and prepare IPD from the index trial, including baseline covariates, treatment assignments, and outcomes. Simultaneously, extract aggregate baseline characteristics from the competitor trial publications or reports. Select covariates for adjustment based on clinical expertise, published literature, and statistical analyses identifying prognostic factors and effect modifiers [39]. Ensure consistent variable definitions and coding across data sources.
Covariate Centering: Center the baseline characteristics of the IPD using the mean baseline characteristics from the comparator data. This involves subtracting the aggregate comparator means from the corresponding IPD covariates [39]. Create an object containing the names of the centered matching variables for use in subsequent analyses.
Weight Estimation: Estimate weights using the method of moments approach, solving for the parameters that balance the covariate means between the weighted IPD and the comparator population. The MAIC package in R provides implementation functions for this step [39]. Evaluate the resulting weights for extreme values that might indicate poor overlap between trial populations.
Assessment of Covariate Balance and Effective Sample Size: Examine the covariate balance after weighting by comparing the weighted means of the IPD covariates with the aggregate means from the comparator trial. Calculate the effective sample size (ESS) after weighting using the formula: ESS = (Σωi)^2 / Σωi^2 [43]. A substantial reduction in ESS indicates poor population overlap and may signal potential precision issues in the analysis [43] [40].
Outcome Analysis: Apply the estimated weights to the outcome data from the IPD and compare the weighted outcomes with those from the competitor trial. For anchored comparisons, estimate the relative effect as: Δ̂BC^(AC) = [g(ȲC^(AC)) - g(ȲA^(AC))] - [g(ŶB^(AC)) - g(ŶA^(AC))] [12]. For unanchored comparisons, use: Δ̂BC^(AC) = g(ȲC^(AC)) - g(ŶB^(AC)) [12].
Variance Estimation and Uncertainty Quantification: Estimate uncertainty using appropriate methods. Recent evidence suggests that conventional estimators with ESS-scaled weights provide accurate coverage across various scenarios, including those with poor population overlap [43]. Alternative approaches include robust sandwich estimators or bootstrapping, though these may underestimate variance in scenarios with moderate to poor overlap [43].

Diagram 1: MAIC Analysis Workflow

Research Reagent Solutions: Analytical Tools for PAIC

Table 3: Essential Methodological Components for Population-Adjusted Indirect Comparisons

Component	Function	Implementation Considerations
Individual Patient Data	Provides detailed covariate and outcome information for weighting or modeling [12]	Data quality assessment, variable harmonization, missing data handling
Aggregate Comparator Data	Supplies target population characteristics for adjustment [40]	Extraction of appropriate summary statistics (means, proportions)
Statistical Software	Enforces implementation of weighting and modeling procedures [39]	R packages (e.g., MAIC), Bayesian software (e.g., WinBUGS, JAGS)
Weight Estimation Algorithm	Calculates balancing weights to match covariate distributions [39]	Method of moments, entropy balancing, convergence assessment
Outcome Model Specification	Predicts counterfactual outcomes in target population (for STC) [12]	Selection of functional form, treatment-covariate interactions
Variance Estimation Method	Quantifies uncertainty in adjusted treatment effects [43]	Conventional estimators with ESS scaling, bootstrap, robust sandwich

Case Study Applications and Empirical Evaluation

Case Study in Psoriasis

An illustrative application of population adjustment methods comes from a case study of biologic therapies for moderate-to-severe plaque psoriasis [38]. This research demonstrated the importance of adjusting for cross-study heterogeneity when conducting network meta-analyses, comparing unadjusted analyses with various covariate-adjusted approaches. Investigators considered multiple covariates to account for cross-trial differences, including baseline risk (placebo response), prior biologic use, body weight, psoriasis duration, age, race, and baseline Psoriasis Area and Severity Index score [38].

The analysis revealed that failure to adjust for cross-trial differences led to meaningfully different clinical interpretations of findings [38]. Specifically, the baseline risk-adjusted NMA, which adjusted for multiple observed and unobserved effect modifiers, was associated with the best model fit [38]. This case highlights how neglecting cross-trial heterogeneity in NMA can have important implications for clinical interpretations when studying the comparative efficacy of healthcare interventions, reinforcing the value of appropriate population adjustment methods [38].

Case Study in Third-Line Small Cell Lung Cancer

A recent study applied multiple PAIC methods to compare nivolumab with standard of care in third-line small cell lung cancer using data from a single-arm phase II trial (CheckMate 032) and a real-world study (Flatiron) in terms of overall survival [41]. This research compared several PAIC methods, including IPD-IPD analyses using inverse odds weighting, regression adjustment, and a doubly robust method, along with IPD-AD analyses using MAIC, STC, and a doubly robust method [41].

The results demonstrated that nivolumab extended survival versus standard of care with hazard ratios ranging from 0.63 (95% CI 0.44-0.90) in naive comparisons to 0.69 (95% CI 0.44-0.98) in the IPD-IPD analyses using regression adjustment [41]. Notably, regression-based and doubly robust estimates yielded slightly wider confidence intervals versus the propensity score-based analyses, highlighting the efficiency-precision trade-offs between different approaches [41]. The authors recommended the doubly robust approach for time-to-event outcomes to minimize bias due to model misspecification, while noting that all methods for unanchored PAIC rely on the strong assumption that all prognostic covariates have been included [41].

Empirical Evaluation of Comparator Similarity

Recent research has developed empirical approaches to rank candidate comparators based on their similarity to target drugs in high-dimensional covariate space, providing valuable methodological support for comparator selection in indirect comparisons [16]. This method involves generating new user cohorts for drug ingredients and classes, extracting aggregated pre-treatment covariate data across clinically oriented domains (demographics, medical history, presentation, prior medications, visit context), and computing similarity scores for cohort pairs [16].

Evaluation of this approach demonstrated that drugs with closer relationships in the Anatomic Therapeutic Chemical hierarchy had higher cohort similarity scores, and the most similar candidate comparators for example drugs corresponded to alternative treatments used in the target drug's indication(s) [16]. This methodology provides a systematic approach to comparator selection that aligns with clinical knowledge and published literature, addressing a fundamental challenge in designing valid indirect treatment comparisons [16].

Assessment of Methodological Performance

Statistical Properties and Simulation Evidence

Simulation studies have provided valuable insights into the statistical performance of population adjustment methods under various conditions. MAIC has generally produced unbiased treatment effect estimation when assumptions are met, but concerns remain about its inefficiency and instability, particularly when covariate overlap is poor and effective sample sizes after weighting are small [40]. These scenarios are common in health technology appraisals and make weighting methods sensitive to inordinate influence by a few subjects with extreme weights [40].

Research on variance estimation methods for MAIC has revealed that the extent of population overlap significantly impacts performance [43]. In scenarios with strong population overlap, all variance estimation methods (conventional estimators with raw weights, ESS-scaled weights, robust sandwich estimators, and bootstrapping) provided accurate estimates [43]. However, in scenarios with poor population overlap (approximately 77% reduction in ESS), variance was underestimated by conventional estimators with raw weights, bootstrapping, and sandwich estimators [43]. The use of conventional estimators with ESS-scaled weights produced standard errors and confidence intervals that were fairly precise across all scenarios [43].

For the recently developed 2SMAIC approach, simulation studies demonstrated improved precision and efficiency compared to standard MAIC while maintaining similarly low bias levels [40]. The two-stage approach was particularly effective when sample sizes in the IPD trial were small, as it controlled for chance imbalances in prognostic baseline covariates between study arms [40]. However, it was not as effective when overlap between the trials' target populations was poor and the extremity of the weights was high [40]. In these challenging scenarios, weight truncation produced substantial precision and efficiency gains but induced considerable bias, while the combination of a two-stage approach with truncation yielded the highest precision and efficiency improvements [40].

Diagram 2: Classification of Indirect Treatment Comparison Methods

Current Adoption and HTA Perspective

Population-adjusted methods, particularly MAIC, have seen growing adoption in health technology assessment submissions. According to a recent systematic literature review, MAIC was the second most frequently described ITC technique (appearing in 30.1% of included articles) after network meta-analysis (79.5%) [5]. Among recent articles (published from 2020 onwards), the majority describe population-adjusted methods, with MAIC appearing in 69.2% of these recent publications [5].

The appropriate choice of ITC technique depends on several factors, including the feasibility of a connected network, evidence of heterogeneity between and within studies, the overall number of relevant studies, and the availability of individual patient-level data [5]. MAIC and STC have become common techniques for single-arm studies, which are increasingly conducted in oncology and rare diseases, while the Bucher method and NMA provide suitable options where no IPD is available [5]. Despite their growing use, ITC submissions to HTA agencies face acceptance challenges, with acceptance rates remaining relatively low due to various criticisms of source data, applied methods, and clinical uncertainties [7]. This highlights the need for continued methodological refinement and clear guidance on the application of population-adjusted methods in HTA submissions.

Population-adjusted indirect comparison methods represent a significant advancement in addressing cross-trial heterogeneity in comparative effectiveness research. MAIC and STC provide complementary approaches—through weighting and outcome modeling respectively—to adjust for imbalances in effect modifiers when comparing treatments across different studies. The theoretical foundation for these methods distinguishes between prognostic variables and genuine effect modifiers, emphasizing the importance of transporting treatment effects to common target populations.

The evidence from simulation studies and case applications demonstrates that these methods can provide valuable adjustments for cross-trial differences when appropriately applied, though they require careful implementation and acknowledgment of their limitations. Current methodological research continues to refine these approaches, with developments such as two-stage MAIC and doubly robust methods offering improvements in precision and protection against model misspecification. As these methods evolve and their application expands, they will play an increasingly important role in generating reliable comparative evidence for healthcare decision-making, particularly in situations where direct head-to-head evidence remains unavailable or infeasible to collect.

Within pharmaceutical development and health technology assessment (HTA), the identification of appropriate comparators is a critical foundation for generating robust evidence on the relative efficacy, safety, and value of new therapeutic interventions. When head-to-head clinical trial data are unavailable, indirect treatment comparisons (ITCs) become indispensable, relying on the use of a common comparator to link interventions across separate studies [44]. The validity of these analyses hinges entirely on the judicious selection of these comparators. This whitepaper examines detailed case studies from oncology and Alzheimer's disease to illustrate successful, real-world approaches to comparator identification, providing researchers and drug development professionals with actionable methodologies and frameworks.

Comparator Identification in Oncology

Oncology drug development presents unique challenges for comparator selection, including rapidly evolving standard of care (SOC) and complex treatment pathways. The following cases demonstrate how strategic comparator identification can rescue a clinical trial and align a study with real-world clinical practice.

Case Study 1: Rescue of a Phase 3 Oncology Trial through Strategic Comparator Sourcing

Background & Challenge: A U.S.-based biotech company was preparing a multisite Phase 3 immuno-oncology trial across the EU. One arm of the study required a specific PD-1 immune checkpoint inhibitor as a comparator [45]. The sponsor faced a critical situation: with only a few weeks before the first patient administration, their initial supplier failed to deliver the required 1,000 packs of the EU-origin product, which mandated a single-batch supply and a full Certificate of Analysis (CoA) to comply with EMA regulations [45].

Methodology & Solution: The rescue strategy, executed by a specialized comparator sourcing partner, involved a highly focused approach [45]:

Precision Sourcing: The team analyzed all available sourcing routes to identify the most viable and compliant supply path for the high-demand oncology comparator.
Batch Exclusivity: Through pre-existing supplier relationships, they secured exclusive access to a single, validated batch that met the 1,000-pack requirement, thus protecting against market competition depleting the stock.
Documentation Assurance: The process guaranteed the provision of the mandatory CoA and all supporting documentation required for regulatory compliance.

Outcome: The project was delivered one week ahead of the five-week deadline, providing all 1,000 packs from a single validated batch with complete documentation. This successful intervention prevented clinical trial delays and avoided potential risks to first-patient dosing [45].

Table 1: Key Challenges and Solutions in Oncology Comparator Sourcing

Challenge	Impact on Trial	Solution Applied
Compressed Timeline (<5 weeks)	High risk of delayed patient dosing	Agile, specialized sourcing partner with established supplier networks
Single-Batch Requirement (1000 packs)	Dramatically narrows available supply	Secured batch exclusivity through supplier negotiations
Mandatory EU-origin with CoA	Limits potential sourcing countries	Activated partners with access to fully documented EU stock

Case Study 2: Aligning Comparator with Real-World Treatment Patterns in NSCLC

Background & Challenge: A sponsor initiated a randomized Phase 3 study in non-small cell lung cancer (NSCLC) comparing a study drug plus nivolumab against chemotherapy in checkpoint inhibitor-refractory patients [46]. The original protocol stipulated that only second-line (2L) patients whose first-line (1L) therapy was an immuno-oncology (IO) platinum triplet/quadruplet were eligible. The comparator arm was docetaxel alone [46].

Analysis & Real-World Data: An analysis of real-world treatment patterns revealed the protocol's misalignment with global SOC [46]:

While 65% of 1L patients received IO regimens, only 51% of those were on an IO platinum triplet, resulting in a low (33%) eligibility rate for 1L patients.
Outside the U.S., the 1L SOC often remained platinum-based chemotherapy followed by a monotherapy checkpoint inhibitor, severely limiting country participation.
The SOC for 2L patients was not docetaxel but an IO-containing regimen (57% of patients). Docetaxel monotherapy was more prevalent in third-line (3L) treatment.

Protocol Amendment & Solution: Based on this data-driven rationale, the protocol was amended to [46]:

Allow 1L patients on a platinum-based chemotherapy doublet followed by IO post-progression.
Include both 2L and 3L patients to better align the docetaxel monotherapy comparator arm with actual clinical practice.

Outcome: The amendments significantly increased the number of eligible patients, leading to higher enrollment and a more viable, extensive country mix for the trial [46].

Methodological Advances in Empirical Comparator Identification

Beyond specific drug sourcing, selecting the right statistical comparator is fundamental for valid indirect comparisons. A novel large-scale empirical method has been developed to systematically rank candidate comparators.

A Data-Driven Framework for Ranking Comparators

Objective: To introduce an empirical approach for ranking candidate comparators based on their similarity to a target drug in a high-dimensional covariate space, thereby aiding study design [47].

Methodology: The process involves three key stages [47]:

Cohort Definition: Define "new user" cohorts for each drug ingredient or class within large administrative claims databases, requiring 365 days of continuous observation prior to the first exposure (index date).
Covariate Extraction: For each cohort, extract aggregated pre-treatment covariate data across five clinically oriented domains:
- Demographics (sex, age)
- Medical History (conditions observed up to 31 days before index)
- Presentation (conditions observed 30 days before index)
- Prior Medications (drug ingredients used before index)
- Visit Context (inpatient or emergency department visits 30 days before index)
Similarity Calculation: For all possible pairs of cohorts, a cohort similarity score is computed. This score is the average of cosine similarities calculated separately within each of the five covariate domains.

Validation & Findings: The method was validated across five claims databases and 922,761 comparisons [47]. Key findings confirmed the method's validity:

Drugs with closer relationships in the Anatomic Therapeutic Chemical (ATC) hierarchy had higher cohort similarity scores.
The top-ranked comparators for example drugs corresponded to alternative treatments used for the same indications.
This empirical approach aligns well with subject-matter knowledge and published literature choices.

The workflow for this empirical method is outlined below.

Comparator Selection in Alzheimer's Disease

Alzheimer's disease (AD) presents a complex landscape for comparator identification due to diagnostic challenges, multiple drug classes, and frequent comorbidities.

The Diagnostic Challenge and Treatment Equipoise

Case Presentation: A 58-year-old woman was referred for evaluation of progressive cognitive decline over four years, with symptoms including memory impairment, attentional deficits, word-finding difficulties, and new neuropsychiatric symptoms including depression, anxiety, and recurrent visual hallucinations [48]. An initial suspected diagnosis was Alzheimer's disease, supported by MRI, EEG, and positive CSF biomarkers [48].

Differential Diagnosis & Comparator Consideration: The presence of well-formed visual hallucinations, "trance-like" states, dream-enacting behaviors, and motor symptoms suggested a contribution from Lewy body disease (LBD) neuropathology, as occurs in dementia with Lewy bodies (DLB) [48]. This complex presentation underscores a critical principle: accurate diagnostic distinction is a prerequisite for meaningful comparator selection. A clinical trial targeting pure AD would require a comparator active in AD (e.g., a cholinesterase inhibitor), whereas a trial for DLB might necessitate a different comparator set. This case illustrates that in diseases like Alzheimer's and related dementias, the patient population's pathological homogeneity is a fundamental factor in choosing an appropriate comparator.

Methodological Approaches for Indirect Comparisons

Formal Methods for Establishing Similarity: In the context of health technology assessment, demonstrating clinical similarity between a target drug and its comparator is essential for cost-comparison analyses. A review of National Institute for Health and Care Excellence (NICE) appraisals found that formal methods for establishing equivalence via ITC are underutilized [10]. The most promising method identified is the estimation of noninferiority ITCs in a Bayesian framework, where the indirectly estimated treatment effect is probabilistically compared against a pre-specified noninferiority margin [10].

Statistical Techniques for Indirect Comparison: The adjusted indirect comparison is the most commonly accepted method for comparing two interventions (A vs. B) via a common comparator (C) [44]. This method preserves the original randomization of the component studies and is superior to a naïve direct comparison, which simply contrasts results from two separate trials without adjustment. The formula for a continuous outcome is [44]: Difference (A vs. B) = [Difference (A vs. C)] - [Difference (B vs. C)] For binary outcomes, the relative effect is calculated as [44]: Relative Risk (A vs. B) = [Relative Risk (A vs. C)] / [Relative Risk (B vs. C)] A key disadvantage of adjusted indirect comparisons is increased statistical uncertainty, as the variances from the two direct comparisons are summed [44].

The Scientist's Toolkit: Key Reagents & Materials

Table 2: Essential Research Reagents and Solutions for Comparator Studies

Research Reagent / Solution	Function in Comparator Identification & Analysis
EU-origin Licensed Product with CoA	Provides the regulated comparator agent for clinical trials in European markets, ensuring compliance with EMA requirements [45].
Real-World Data (RWD) Repositories	Large, structured databases (e.g., administrative claims, electronic health records) used to analyze treatment patterns and define empirical comparators [47] [46].
Anatomic Therapeutic Chemical (ATC) Classification	A standardized international system used to understand drug relationships and hypothesize potential comparator classes [47].
Common Data Model (e.g., OMOP CDM)	Standardizes data from different RWD sources into a common format, enabling large-scale, reproducible analytics across datasets [47].
Bayesian Statistical Software (e.g., R, Stan)	Enables the implementation of complex statistical models for mixed treatment comparisons and noninferiority ITCs [10] [44].

Regulatory and HTA Considerations

The acceptability of evidence derived from indirect comparisons and real-world data (RWE) varies across regulatory and HTA bodies, creating a complex landscape for drug developers.

Divergence in Acceptance: A review of European oncology medicine approvals found that RWE is primarily used as an external control for indirect treatment comparisons or to contextualize clinical trial results [49]. However, this evidence is often rejected due to methodological biases. Critically, a comparative assessment revealed discrepancies in RWE acceptability between the EMA and European HTA bodies, as well as among HTA bodies such as NICE (UK), G-BA (Germany), and HAS (France) [49]. This lack of consensus creates uncertainty for sponsors relying on such evidence for approvals and reimbursement.

The strategic identification of comparators is a multifaceted process critical to the success of drug development and evidence generation. As demonstrated by the case studies, success hinges on several key principles: proactive and agile sourcing of physical comparator drugs, deep analysis of real-world treatment patterns to ensure clinical relevance, and the application of robust statistical methodologies for empirical comparator ranking and indirect comparison. Furthermore, an understanding of the evolving and sometimes divergent requirements of regulators and HTA bodies is essential. By integrating these operational, clinical, and methodological strategies, researchers can enhance the validity of their comparative research, derisk drug development programs, and ultimately deliver meaningful new therapies to patients more efficiently.

Navigating Challenges and Optimizing Comparator Selection in Complex Scenarios

Addressing Heterogeneity in Trial Populations, Designs, and Outcomes

For drug development professionals and researchers, the ability to generate reliable comparative evidence is fundamental. However, head-to-head randomized controlled trials (RCTs) are often unfeasible due to ethical constraints, cost limitations, or patient rarity, particularly in orphan diseases [5]. This reality necessitates indirect treatment comparisons (ITCs), which estimate the relative treatment effects of two interventions that have not been studied directly against each other in a single trial [2]. The validity of any ITC hinges on how well it addresses the inherent heterogeneity—the clinical, methodological, and statistical variability—between the trials being compared. Heterogeneity arises from differences in trial populations, designs, and outcome measurements, and if unaccounted for, can introduce significant bias, confounding results and leading to erroneous conclusions for healthcare decision-makers [42] [50]. This guide provides an in-depth technical framework for identifying, assessing, and adjusting for heterogeneity to establish valid common comparators in indirect drug comparisons research.

Core Concepts and Definitions of Heterogeneity

Understanding the multifaceted nature of heterogeneity is the first step in addressing it. The following table summarizes the primary dimensions of heterogeneity that researchers must confront.

Table 1: Dimensions of Heterogeneity in Clinical Trials

Dimension	Definition	Common Sources	Impact on ITCs
Clinical Heterogeneity	Differences in participant characteristics, intervention details, or outcome definitions. [51]	Age, sex, race, disease severity, comorbidities, drug dose/formulation, outcome measurement scales. [52] [51]	Violates the similarity assumption; patients in different trials may not be comparable, leading to biased effect estimates.
Methodological Heterogeneity	Differences in trial design and conduct. [51]	Randomization, blinding, allocation concealment, study duration, trial setting (e.g., academic vs. community).	Introduces varying levels of bias across studies, affecting the validity of the combined evidence.
Statistical Heterogeneity	Variability in the observed treatment effects beyond what is expected by chance. [51]	Arises as a consequence of clinical and methodological heterogeneity.	Manifests as a high I² statistic or significant chi-square test in meta-analyses, increasing uncertainty.

The foundational assumption for any valid ITC is transitivity (or similarity). This principle requires that the trials being indirectly compared are sufficiently similar in all key factors that could modify the treatment effect [50]. In practice, this means that the patients in one trial (e.g., comparing Drug A to Common Comparator C) could plausibly have been enrolled in the other trial (comparing Drug B to C), and vice versa. Assessing this goes beyond a single variable; it involves a holistic judgment of the clinical and methodological coherence of the evidence network [50].

Methodological Approaches for Indirect Comparisons

Choosing the correct statistical methodology is paramount. The choice depends on the available data, the structure of the evidence network, and the extent of observed heterogeneity. The following sections detail the primary ITC techniques.

Naïve and Adjusted Indirect Comparisons

The simplest and most flawed method is the naïve indirect comparison, which directly contrasts the results from two separate trials as if they were from the same study. This approach breaks the randomization of the original trials and is subject to the same confounding biases as observational studies; it is not recommended for formal analysis [44] [50].

The Bucher method, an adjusted indirect comparison, provides a statistically robust alternative for a simple three-treatment network (A vs. C and B vs. C). It preserves within-trial randomization by using the common comparator C as an anchor. The relative effect of A vs. B is calculated as the difference of their respective effects versus C: ln(HR_A/B) = ln(HR_A/C) - ln(HR_B/C). The variance of this log effect estimate is the sum of the variances of the two component effects, correctly reflecting the increased uncertainty of the indirect comparison [44]. While this method is accepted by many HTA bodies like NICE and CADTH [44], its key limitation is the inability to adjust for population differences between the trials.

Population-Adjusted Indirect Comparisons

When patient-level data (IPD) is available for at least one trial, more advanced population-adjusted methods can be employed to balance cross-trial differences.

Matching-Adjusted Indirect Comparison (MAIC) is a prominent technique that uses IPD from one trial (e.g., of Drug A) and aggregate data from another (e.g., of Drug B). The IPD is re-weighted using propensity score principles so that its baseline characteristics match the published means of the comparator trial. After weighting, the outcomes are compared across the now-balanced populations [42]. MAIC is particularly valuable for aligning trials with different eligibility criteria or baseline prognoses. A critical, untestable assumption of MAIC is that there are no unobserved cross-trial differences that could confound the results [42].

Simulated Treatment Comparison (STC) is another population-adjusted method that uses IPD to build a model of the outcome based on patient characteristics in one trial. This model is then applied to the aggregate baseline data of the comparator trial to predict the outcomes, facilitating an adjusted comparison [5].

Network Meta-Analysis (NMA)

For evidence networks involving multiple treatments, Network Meta-Analysis (NMA) is the most frequently used and comprehensive method, described in 79.5% of methodological literature [5]. NMA integrates direct and indirect evidence for all treatments in a connected network within a single statistical model, typically using Bayesian or frequentist frameworks. This allows for the simultaneous ranking of all treatments and uses data more efficiently, reducing uncertainty. A core assumption of NMA is consistency—that the direct and indirect evidence for the same treatment comparison are in agreement [5]. The validity of an NMA is entirely dependent on the similarity of the trials forming the network.

Quantitative Prevalence of Methods in Regulatory Submissions

Understanding which methods are accepted in practice is crucial for drug developers. A large cross-sectional study of European Medicines Agency (EMA) orphan maintenance procedures between 2012 and 2022 provides revealing data on the real-world application of these methods.

Table 2: Use of Comparison Methods in EMA Orphan Drug Assessments (2012-2022) [53]

Comparison Method	Frequency	Percentage of 418 Comparisons	Key Findings
Indirect Comparisons	182	44%	The most common approach for demonstrating significant benefit.
- Naïve side-by-side	129	71% of ICs	The predominant but less robust form of indirect comparison.
- Inferential methods (MAIC, NMA)	53	29% of ICs	Use of adjusted methods nearly doubled in the latter half of the decade.
Qualitative Comparisons	162	39%	Used where quantitative comparison was not feasible or presented.
Direct Comparisons	74	18%	Head-to-head evidence from within a single trial.

This data underscores the central role of indirect comparisons in regulatory success for orphan drugs. The trend towards more sophisticated inferential methods like MAIC and NMA highlights the growing regulatory expectation for robust statistical adjustments to address heterogeneity [53].

Experimental Protocols for Assessing and Addressing Heterogeneity

A systematic, pre-planned approach is non-negotiable. The following workflow provides a detailed protocol for conducting a robust ITC.

Protocol Workflow for Indirect Treatment Comparisons

Detailed Protocol Steps

Step 1: Define PICO and Evidence Network

Action: Pre-specify the Population, Intervention, Comparator, and Outcomes (PICO) framework in a published protocol. Explicitly list all relevant treatments to map the connected evidence network.
Rationale: A pre-specified protocol minimizes data-driven analysis choices and reduces the risk of spurious findings [54] [51].

Step 2: Conduct Systematic Literature Review

Action: Perform a comprehensive, multi-database search to identify all RCTs for each treatment in the network. Follow PRISMA guidelines.
Rationale: Ensures the evidence base is complete and minimizes selection bias [50].

Step 3: Assess Clinical and Methodological Heterogeneity

Action: Create structured tables to compare trial-level characteristics. These should include:
- Participant Covariates: Age, sex, race, baseline disease severity, prior treatments, comorbidities [52] [51].
- Intervention Covariates: Dose, formulation, treatment duration, concomitant medications.
- Methodological Covariates: Study duration, blinding, randomization method, outcome definitions.
Rationale: This quantitative and qualitative assessment is the empirical basis for judging transitivity. Clinical experts on the review team are essential for identifying which covariates are likely effect modifiers [51].

Step 4: Evaluate Transitivity (Similarity Assumption)

Action: Synthesize the data from Step 3 to make a holistic judgment on whether the trials are sufficiently similar. If major, imbalanced effect modifiers are identified, a standard ITC may be invalid.
Rationale: This is the critical decision point for determining if a population-adjusted method (e.g., MAIC) is necessary [50].

Step 5: Select and Justify Statistical Method

Action: Choose the ITC method based on the evidence network and heterogeneity assessment.
- Connected network with low heterogeneity? → Use Bucher method or NMA.
- Substantial population differences but IPD available? → Use MAIC or STC.
Rationale: The appropriateness of the method is a key criterion for acceptance by HTA bodies [5] [2].

Step 6: Conduct Analysis and Validate Statistical Assumptions

Action: Perform the ITC and test key assumptions.
- Homogeneity: Assess within direct comparisons using Cochran's Q or I² statistic [50].
- Consistency: In NMA, use node-splitting or other methods to check agreement between direct and indirect evidence [5].
Rationale: Unexplained statistical heterogeneity or inconsistency invalidates the results of an ITC [50].

Step 7: Interpret and Report Findings

Action: Clearly state that results are from an indirect comparison. Report all adjustments, highlight remaining uncertainty, and urge caution in interpretation if significant heterogeneity persists.
Rationale: Transparency is vital for informed decision-making. Many published ITCs fail to adequately report their methodological limitations [50].

The Scientist's Toolkit: Key Reagents and Materials

Successfully executing an ITC requires a suite of analytical "reagents." The following table details the essential components.

Table 3: Essential Research Reagents for Indirect Comparisons

Tool / Reagent	Function / Explanation	Application Context
Individual Patient Data (IPD)	Raw, patient-level data from a clinical trial.	Enables population-adjusted methods like MAIC and STC to balance for observed covariates. [42]
Aggregate Data	Published summary statistics (e.g., means, proportions) from trial reports.	The foundation for all ITC methods; used in Bucher, NMA, and as the comparator in MAIC. [44]
Common Comparator	A treatment (e.g., placebo or standard of care) used as a bridge in separate trials.	Serves as the statistical anchor for adjusted indirect comparisons like the Bucher method. [44] [2]
Effect Modifier Covariates	Pre-specified patient or trial characteristics believed to influence treatment effect.	The focus of clinical heterogeneity assessment; used to weight data in MAIC or for subgroup analysis. [51]
PRISMA Checklist	A reporting guideline for systematic reviews and meta-analyses.	Ensures comprehensive and transparent reporting of the literature search and study selection. [5]
Statistical Software (R, WinBUGS/OpenBUGS)	Platforms with specialized packages for complex statistical modeling.	Used to perform NMA (Bayesian/frequentist), MAIC, and statistical tests for heterogeneity/inconsistency. [5]

Addressing heterogeneity is not merely a statistical exercise but a fundamental requirement for generating credible evidence in the absence of head-to-head trials. The process begins with a meticulous, protocol-driven assessment of clinical and methodological differences across studies. The choice of ITC method—from the simple Bucher adjustment to the more complex MAIC and NMA—must be justified by the structure of the evidence and the degree of observed heterogeneity. As regulatory and HTA landscapes evolve, the demand for sophisticated, population-adjusted comparisons that transparently acknowledge and adjust for cross-trial differences will only intensify. By adhering to the rigorous frameworks and protocols outlined in this guide, researchers and drug developers can navigate the challenges of heterogeneity to produce reliable, actionable comparative evidence.

Strategies for Single-Arm Trials and Unanchored Comparisons

Randomized Controlled Trials (RCTs) represent the gold standard for clinical evidence generation, providing robust, unbiased estimates of treatment effects through direct comparison. However, ethical constraints, practical limitations, and small patient populations often render traditional RCTs infeasible, particularly in oncology, rare diseases, and life-threatening conditions with unmet medical needs [55] [1]. In these contexts, drug development has increasingly relied on single-arm trials (SATs) to provide pivotal evidence for regulatory approval [56] [57].

When a SAT serves as the primary evidence for a new treatment, a critical analytical challenge emerges: how to compare its efficacy to established standard-of-care (SoC) treatments when no direct head-to-head trial exists. This challenge has propelled the development of indirect treatment comparison (ITC) methodologies, specifically unanchored population-adjusted indirect comparisons—advanced statistical techniques that enable comparative effectiveness analyses without a common comparator arm [41] [12]. This guide provides researchers and drug development professionals with comprehensive strategies for designing SATs and implementing unanchored comparison methods within the broader framework of identifying common comparators for indirect drug comparisons research.

Single-Arm Trials: Design Principles and Applications

Definition and Rationale

A single-arm trial is a clinical study design in which only one experimental group receives the investigational intervention, without a parallel concurrent control group [55] [58]. All participants receive the same treatment, and outcomes are compared to historical controls or external data rather than an internal control arm. This design becomes scientifically and ethically preferable when randomization is not feasible or would be unethical, particularly when a condition is severe, rare, or lacks effective treatments [55] [57].

Key Application Scenarios

SATs are strategically employed across specific therapeutic contexts where traditional RCTs face significant barriers:

Table: Primary Application Scenarios for Single-Arm Trials

Application Scenario	Key Characteristics	Examples
Oncology (advanced/refractory)	[55] [58] [57]- Life-threatening cancers- Short survival period- Lack of effective treatments	- CheckMate 032 (nivolumab) [41]- Trastuzumab Deruxtecan (advanced gastric cancer) [58]
Rare Diseases	[55]- Small, specific patient populations- Difficulty recruiting for controlled trials- Understanding of pathogenesis	- Malignant perivascular epithelioid cell tumour [55]
Emerging Infectious Diseases	[55]- Rapid spread and high severity- Urgent need for treatments- Willingness to try new therapies	- COVID-19 treatments [55]
Novel Treatment Modalities	[55] [58]- Gene/cell therapies- New medical devices- Significant, durable tumor responses	- TriClip system (tricuspid regurgitation) [55]

Advantages and Limitations

SATs offer distinct advantages that make them valuable in specific development scenarios. They provide equitable treatment access to all participants, respecting patient preferences and avoiding randomization to potentially inferior treatments [55]. They typically require smaller sample sizes and have shorter trial durations, saving costs and expediting development, particularly beneficial for rare diseases [55] [58]. Regulatory agencies including the FDA and EMA have established pathways for SAT data acceptance, especially in oncology [56] [57].

However, SATs present significant limitations that must be addressed. The absence of a concurrent control group introduces potential biases in interpreting results, as outcomes may be influenced by confounding factors or patient selection rather than the treatment itself [55] [58]. Without randomization and blinding, SATs cannot control for unknown confounding factors, limiting the strength of generated evidence compared to RCTs [55]. There is also heavy reliance on historical controls, where differences in patient populations, treatment protocols, or data collection methods may introduce bias and complicate interpretation [58].

Unanchored Indirect Comparisons: Methodological Framework

Conceptual Foundation

Unanchored population-adjusted indirect comparisons (PAICs) are advanced statistical methodologies that enable comparison of treatments from different studies when there is no common comparator arm (or "anchor") connecting the evidence network [41] [12]. This scenario frequently arises when comparing a new treatment evaluated in a SAT against a comparator treatment from a separate RCT that did not share a common control arm.

The fundamental challenge unanchored comparisons address is disconnected evidence: Treatment A is studied in a single-arm trial, while Treatment B (the comparator) was evaluated in a different randomized trial with a different control group (C), but A and B have never been compared to the same common treatment. Unanchored methods aim to balance the distribution of effect modifiers between the study populations to enable a statistically valid comparison [12].

Core Methodological Approaches

Three principal statistical methodologies form the foundation of unanchored PAICs for time-to-event and other endpoints:

Inverse Odds Weighting (IOW) / Matching-Adjusted Indirect Comparison (MAIC) MAIC uses propensity score weighting to create a "pseudo-population" where the weighted distribution of covariates in the IPD study matches that of the aggregate data study [41] [12]. Individual patient data from the SAT are reweighted so that the distribution of prognostic factors and effect modifiers matches the population in the comparator study. This creates a balanced basis for comparison despite the original population differences.

Regression Adjustment (RA) / Simulated Treatment Comparison (STC) STC uses outcome model-based adjustment to predict outcomes for the comparator treatment in the SAT population [41] [12]. A regression model is developed using IPD from the SAT to characterize the relationship between baseline covariates and outcomes. This model is then applied to the aggregate data from the comparator study to estimate the treatment effect that would have been observed if the comparator had been studied in the SAT population.

Doubly Robust (DR) Methods Doubly robust methods combine both propensity score weighting and regression adjustment, offering protection against model misspecification [41]. These methods produce consistent treatment effect estimates if either the propensity score model (allocation model) or the outcome model is correctly specified, making them more robust than approaches relying on a single model.

Critical Assumptions and Limitations

Unanchored comparisons rely on strong assumptions that researchers must carefully consider. The unverifiable exchangeability assumption requires that all prognostic factors and effect modifiers have been identified, measured, and adjusted for—any unmeasured confounding can bias results [41] [12]. There must also be sufficient overlap in patient characteristics between studies, as comparisons can only be made within the region of common clinical characteristics [12].

The model specification assumption is crucial, particularly the "shared effect modifier" assumption—effect modifiers must impact treatments similarly across studies, which may not hold true for therapies with different mechanisms of action [12]. Unlike anchored comparisons that benefit from within-study randomization, unanchored methods cannot verify consistency because there is no common comparator to test the validity of adjustments [12].

Practical Implementation: Protocols and Workflows

Single-Arm Trial Design Protocol

Designing a robust SAT requires meticulous planning to maximize scientific validity despite the absence of a control group. The following workflow outlines key design considerations:

Endpoint Selection Criteria

Objectively measurable endpoints: Primary endpoints must be objectively measurable and able to isolate treatment effects [56]. Outcomes should occur primarily due to effective treatment rather than natural disease variation.
Binary endpoints over continuous: Binary endpoints (e.g., objective response rate) are generally more suitable than continuous endpoints, which are challenging to interpret due to natural variability and regression to the mean [56].
Avoid time-to-event endpoints: Time-to-event endpoints like progression-free survival or overall survival are generally not suitable for SATs due to difficulties in attributing outcomes solely to treatment without a control group [56].

Bias Mitigation Strategies

Assessment bias: Use objectively measurable endpoints and consider blinding assessors to outcome assessment [56].
Selection bias: Precisely predefine inclusion/exclusion criteria and thoroughly document the participant selection process [56].
Attrition bias: Minimize missing data through trial design and conduct, and pre-specify methods for handling missing data [56].
Regression to the mean: Avoid patient selection based on extreme values of baseline characteristics [56].

Unanchored Comparison Analysis Protocol

Implementing unanchored PAICs requires systematic execution to ensure methodological rigor, as detailed in the following workflow:

Prognostic Factor Identification

Systematic literature review: Conduct comprehensive review to identify established prognostic factors and potential effect modifiers in the disease area [12].
Clinical expert input: Engage key opinion leaders to validate identified factors and identify any additional clinically relevant variables [12].
Data availability assessment: Ensure all critical prognostic factors are available in both the IPD and aggregate datasets with consistent definitions [12].

Model Implementation Steps

IOW/MAIC Implementation:
- Develop propensity score model to estimate weights
- Assess covariate balance after weighting
- Apply weights to IPD to create balanced population
- Compare outcomes between weighted populations [41] [12]

RA/STC Implementation:
- Develop outcome model using IPD from SAT
- Validate model performance and calibration
- Apply model to comparator aggregate data
- Estimate treatment effect in target population [41] [12]
Doubly Robust Implementation:
- Develop both propensity score and outcome models
- Implement combined estimation approach
- Compare results to single-method approaches [41]

Regulatory and HTA Considerations

Current Regulatory Landscape

Regulatory agencies and Health Technology Assessment (HTA) bodies have demonstrated increasing acceptance of SATs and supporting ITCs, though with important caveats. A 2024 review of 185 assessment documents from regulatory and HTA agencies found that ITCs in orphan drug submissions were associated with a higher likelihood of contributing to positive decisions/recommendations compared to non-orphan submissions [1]. Among the 306 ITCs supporting these submissions, authorities more frequently favored anchored or population-adjusted ITC techniques for their effectiveness in data adjustment and bias mitigation [1].

The EMA has issued specific reflection papers establishing that SATs may be acceptable as pivotal evidence when RCTs are not feasible, but requires comprehensive justification and robust methodological approaches [56]. Similarly, the FDA has granted approvals based on SATs, particularly in oncology—between 2002 and 2021, 176 new malignant hematology and oncology indications received FDA approval based on SATs, including 116 (66%) accelerated approvals and 60 (34%) traditional approvals [57].

Key Requirements for Regulatory Acceptance

Table: Regulatory Requirements for SATs and Unanchored Comparisons

Agency	SAT Requirements	ITC Method Preferences
EMA	[56]- Strong justification for RCT infeasibility- Objectively measurable endpoints- Pre-specified statistical analysis plan- Comprehensive bias mitigation	[1]- Population-adjusted methods preferred- Anchored comparisons over unanchored- Complete transparency of assumptions
FDA	[57]- Substantial, durable tumor responses- Well-defined natural history of disease- Objective endpoints demonstrating clinical benefit- Context-dependent benefit-risk assessment	[1]- Methodological rigor over specific techniques- Adjustment for all prognostic factors- Sensitivity analyses supporting robustness
HTA Agencies (NICE, CADTH, PBAC)	[1] [2]- Clinical relevance to population- Appropriate external controls- Comparative effectiveness evidence	[1] [2]- NMA and population-adjusted ITCs preferred- Unadjusted comparisons often rejected- Focus on decision uncertainty

Statistical Software and Packages

Implementing unanchored PAICs requires specialized statistical software capable of handling complex weighting and modeling approaches:

R Statistical Programming: Comprehensive packages including stddiff for standardized differences, MatchThem for weighting, survival for time-to-event analyses, and boot for bootstrap confidence intervals [41].
SAS Software: Procedures like PROC GENMOD for generalized linear models, PROC PHREG for Cox regression, and PROC SGPANEL for balance assessment graphics.
Python Scientific Stack: Libraries including pandas for data manipulation, statsmodels for statistical modeling, scikit-learn for machine learning approaches, and matplotlib for visualization.

Bias Assessment Framework

A comprehensive bias assessment framework is essential for evaluating SAT designs and unanchored comparison validity:

Table: Critical Bias Assessment Domains

Bias Domain	Assessment Questions	Mitigation Strategies
Selection Bias	[55] [58] [56]- Was patient population representative?- Were inclusion/exclusion criteria appropriate?- Could selection have influenced outcomes?	- Precisely predefined criteria- Detailed documentation of selection process- Comparison to real-world populations
Confounding Bias	[55] [12]- Were all prognostic factors identified?- Were effect modifiers adequately adjusted?- Could unmeasured confounding remain?	- Comprehensive literature review- Sensitivity analyses for unmeasured confounding
Measurement Bias	[56]- Were endpoints objectively measurable?- Were assessors blinded to potential biases?- Were measurement methods consistent?	- Objective endpoint selection- Blinded endpoint adjudication- Consistent measurement protocols
Analytical Bias	[41] [56]- Was analysis plan pre-specified?- Were appropriate methods selected?- Were sensitivity analyses conducted?	- Pre-specified statistical analysis plan- Methodological justification- Comprehensive sensitivity analyses

Single-arm trials with unanchored indirect comparisons represent a methodologically complex but necessary approach for drug development in contexts where traditional RCTs are not feasible. The successful implementation of these strategies requires meticulous attention to study design, comprehensive identification of prognostic factors, appropriate selection of statistical methods, and transparent reporting of assumptions and limitations.

Based on current regulatory guidance and methodological research, the following best practices emerge:

Engage regulators early through scientific advice procedures to discuss the acceptability of SATs and proposed ITC methods for specific development programs [56].
Invest heavily in pre-trial planning, particularly for endpoint selection, population definition, and statistical analysis plans, which must be finalized before trial initiation [56].
Prefer doubly robust methods for unanchored comparisons when feasible, as they offer protection against model misspecification [41].
Gather robust external data on the natural history of disease and standard care outcomes to support assumptions about the counterfactual [56].
Conduct comprehensive sensitivity analyses to test the robustness of findings to different methodological choices and assumptions [41] [12].
Maintain complete transparency regarding methodological limitations, unverifiable assumptions, and potential sources of bias in study reporting [12].

When rigorously designed and appropriately analyzed, SATs with unanchored comparisons can provide valid evidence for regulatory decision-making and help bring promising treatments to patients with serious conditions and unmet medical needs.

In health technology assessment (HTA), the gold standard for comparing the clinical efficacy and safety of new treatments is the head-to-head randomized controlled trial (RCT) [5]. However, in rapidly evolving therapeutic areas such as rare diseases and vaccine development, direct comparisons are often unfeasible, unethical, or impractical due to small patient populations, the emergence of new pathogens, and the rapid pace of innovation [5] [59]. In these contexts, indirect treatment comparisons (ITCs) provide essential evidence for decision-making by allowing for the comparison of interventions that have not been studied directly against one another in RCTs [7].

The selection of a common comparator is a foundational element for constructing a valid ITC. This common comparator, typically a standard of care or placebo, serves as the statistical bridge that allows for the indirect comparison of two or more interventions of interest. The process of identifying this common comparator is complex and must account for the specific challenges presented by rare diseases, with their very small patient populations, and vaccine development, which faces unique issues such as unpredictable outbreaks and the reliance on platform technologies [59]. This guide provides a technical framework for the strategic selection of common comparators and the application of ITC methods within these dynamic landscapes.

The Methodological Framework of Indirect Treatment Comparisons

ITC encompasses a suite of statistical methods used to compare the relative effects of two or more treatments through a common comparator. The validity of any ITC hinges on underlying assumptions, primarily the constancy of relative effects, which includes homogeneity, similarity, and consistency of treatment effects across the studies being compared [7]. The appropriate selection of an ITC method is dictated by the available evidence, the structure of the treatment network, and the need to adjust for potential biases.

ITC Method	Core Assumption	Framework	Key Application	Primary Limitation
Bucher Method [7] [5]	Constancy of relative effects	Frequentist	Pairwise comparisons via a common comparator	Limited to comparisons with a single common comparator
Network Meta-Analysis (NMA) [7] [5]	Constancy of relative effects (consistency)	Frequentist or Bayesian	Simultaneous comparison of multiple interventions	Complexity increases with network size; assumptions challenging to verify
Matching-Adjusted Indirect Comparison (MAIC) [7] [5]	Constancy of relative or absolute effects	Frequentist (often)	Adjusts for population imbalances using IPD; suited for single-arm trials	Limited to pairwise comparison; requires IPD
Simulated Treatment Comparison (STC) [5]	Constancy of relative or absolute effects	Bayesian (often)	Predicts outcomes using regression models based on IPD	Limited to pairwise ITC
Network Meta-Regression (NMR) [7] [5]	Conditional constancy of effects (effect modifiers)	Frequentist or Bayesian	Explores impact of study-level covariates on treatment effects	Not suitable for multi-arm trials

A systematic literature review has shown that NMA is the most frequently described technique (79.5% of included articles), followed by population-adjusted methods like MAIC (30.1%) and NMR (24.7%), reflecting their growing importance in dealing with heterogeneous study populations [5].

A Strategic Workflow for Comparator and Method Selection

Selecting a common comparator and an appropriate ITC method is a multi-stage process that requires close collaboration between health economics and outcomes research (HEOR) scientists and clinicians [7]. The following workflow provides a structured approach for researchers.

Application in Rare Diseases

The Rare Disease Landscape and Drug Development

A rare disease is statutorily defined in the United States as one affecting fewer than 200,000 people [60]. Despite there being an estimated 7,000-10,000 rare diseases, drug development is concentrated in a few therapeutic areas. Analysis of the Orphan Drug Act reveals that from 1983 to 2022, only 392 rare diseases had an FDA-approved drug, meaning around 5% of rare diseases have an approved treatment [60].

The distribution of orphan drug designations and approvals is highly skewed [60]:

Therapeutic Area	Percentage of Orphan Drug Designations (n=6,340)	Percentage of Initial Orphan Drug Approvals (n=882)
Oncology	38%	38%
Neurology	14%	10%
Infectious Diseases	7%	10%
Metabolism	6%	7%

This concentration, particularly in oncology, influences the available evidence base and the choice of common comparators, often leading to a focus on established chemotherapies or best supportive care within specific cancer indications.

ITC Challenges and Protocol for Rare Diseases

The primary challenge for ITCs in rare diseases is the scarcity of robust clinical data. This often manifests as a lack of RCTs, the use of single-arm trials due to ethical concerns, and small sample sizes leading to imprecise effect estimates. Furthermore, heterogeneity in patient populations across small studies is a major threat to the similarity assumption.

Experimental Protocol for ITC in Rare Diseases:

Systematic Literature Review and Feasibility Assessment:
- Conduct a comprehensive, protocol-driven systematic review to identify all relevant interventional and single-arm studies.
- Map the available evidence to identify potential common comparators. In many cases, historical controls or synthetic control arms may be considered.
- Assess the feasibility of a connected network for NMA or the need for population-adjusted methods like MAIC or STC.
Critical Appraisal of Similarity:
- In collaboration with clinical experts, document and justify the similarity of study populations, interventions (e.g., dosing, administration), and outcome definitions across the selected trials.
- Identify potential effect modifiers (e.g., disease severity, prior lines of therapy, genetic markers) that could invalidate a naive comparison.
Selection and Application of ITC Method:
- If a connected network exists with sufficient homogeneity, proceed with a standard NMA.
- If single-arm trials are the only available evidence for the target intervention, a population-adjusted ITC (MAIC or STC) is required. This involves: a. Obtaining Individual Patient Data (IPD) for the single-arm study. b. Using propensity score weighting (MAIC) or outcome regression (STC) to adjust for imbalances in patient characteristics between the IPD cohort and the aggregate data of the comparator study [7] [5]. c. Validating the adjusted comparison by checking covariate balance and the plausibility of the outcome model.

Application in Vaccine Development

Unique Challenges in Vaccine Development for Rare Diseases

Vaccine development for rare infectious diseases faces a distinct set of challenges that complicate traditional trial design and, by extension, comparator selection for ITCs. A significant scientific hurdle is the unpredictable and sporadic nature of outbreaks. As industry experts note, "It can be very difficult to figure out the exact population that would benefit most from routine immunization" and "Getting enough of those patients to participate in clinical trials takes a very long time or an unexpected outbreak" [59]. This can lead to truncated clinical development, as seen with the Zika virus, where cases declined before trials could be completed.

From an investment perspective, vaccine development is significantly underfunded. Only 3.4% of total venture capital over the past decade went to companies with infectious disease vaccine programs, making rare infectious disease vaccines a "neglected area" [59]. The high risk is compounded because vaccine antigens are typically pathogen-specific, unlike therapeutics in oncology which can often be explored for multiple indications.

ITC Challenges and Protocol for Vaccines

The core challenge for ITCs in vaccines is the frequent lack of a direct common comparator due to the use of placebo controls in pivotal trials for new vaccines, especially against emerging pathogens. Furthermore, differences in trial endpoints (e.g., immunogenicity vs. clinical efficacy), timing of outcome assessment, and circulating viral strains can violate the similarity assumption critical to ITCs.

Experimental Protocol for ITC in Vaccines:

Endpoint Harmonization and Alignment:
- Define a common clinical endpoint (e.g., confirmed infection, severe disease) across all studies included in the comparison.
- If clinical efficacy data is unavailable for a vaccine, explore the use of immunogenicity data (e.g., geometric mean titers) as a surrogate endpoint, but this requires strong validation and introduces additional uncertainty.
- Account for differences in follow-up time and circulating strains through sensitivity analyses or statistical modeling.
Leveraging Platform Technologies as a Conceptual Bridge:
- The growing use of platform technologies (e.g., mRNA, viral vectors) provides a unique opportunity for indirect comparison.
- Vaccines built on the same platform may share properties (e.g., reactogenicity, kinetics of immune response) that can be used to strengthen assumptions of similarity when comparing across different antigens [59].
Application of ITC in a Public Health Context:
- Given the typical use of placebo controls, the common comparator in an ITC is often a "no vaccine" scenario.
- An NMA can be used to compare the relative efficacy of multiple vaccines against this common comparator, even if they were tested in separate trials during different outbreaks.
- Network meta-regression (NMR) is a critical tool to adjust for study-level effect modifiers, such as the prevalence of pre-existing immunity or the virulence of the circulating pathogen in each trial.

The Scientist's Toolkit: Essential Research Reagents and Materials

Successfully navigating ITCs requires a suite of methodological and data resources. The following table details key components of the research toolkit.

Tool/Resource	Function in ITC	Key Considerations
Individual Patient Data (IPD)	Enables population-adjusted methods (MAIC, STC) to balance baseline characteristics across studies.	Essential for single-arm trials; often difficult to obtain; requires significant resources for analysis [5].
Systematic Review Protocol	Provides a pre-specified, reproducible plan for identifying and selecting all relevant evidence.	Mitigates bias in study selection; should be registered (e.g., PROSPERO) for transparency.
Drug/Disease Ontologies (e.g., ATC, SNOMED)	Standardizes the classification of interventions and medical conditions for accurate cohort definition.	Facilitates large-scale, empirical comparator selection by enabling computation of cohort similarity scores [16].
Contrast Checker & Accessible Color Palette	Ensures data visualizations (charts, graphs) are interpretable by all viewers, including those with color vision deficiency.	Use high-contrast color pairs (e.g., blue/orange); avoid red-green combinations; employ patterns and labels alongside color [61] [62] [63].
Statistical Software (R, Python)	Implements complex statistical models for NMA, MAIC, STC, and NMR.	Requires advanced statistical expertise; packages like `gemtc` (R) and `pymc` (Python) are commonly used.

The evolving landscapes of rare diseases and vaccine development demand sophisticated approaches for comparative evidence generation. The strategic selection of a common comparator is not merely a statistical exercise but a multidisciplinary process grounded in clinical reasoning and methodological rigor. By following structured protocols for evidence assessment, leveraging advanced ITC methods like MAIC and NMA, and utilizing the appropriate research toolkit, developers and HTA bodies can navigate the inherent complexities. This ensures that robust, defensible evidence is generated to inform healthcare decisions, even in the absence of direct head-to-head trials, ultimately accelerating patient access to innovative vaccines and therapies for rare conditions.

Best Practices for Justifying Comparator Choice to HTA Bodies

The selection and justification of treatment comparators is a foundational element that directly determines the perceived clinical and economic value of a new health technology. For researchers and drug development professionals, this process has become increasingly critical with the implementation of the European Union Health Technology Assessment Regulation (EU HTAR), which began application in January 2025 [23] [64]. The joint clinical assessment (JCA) process under this regulation requires manufacturers to demonstrate comparative effectiveness against multiple standards of care across member states, making comparator choice and justification one of the most strategically decisive factors influencing market access outcomes [65] [66].

Comparator choice anchors cost-effectiveness analyses, price negotiations, and a product's position within clinical pathways [65]. A poorly chosen or justified comparator can lock a therapy into an unfavourable reference point, erode price potential, and restrict reimbursement options. Conversely, selecting a clinically relevant, forward-looking comparator aligned with the evolving standard of care can reinforce differentiation and preserve value as new entrants reshape the therapeutic landscape [65]. This technical guide provides a comprehensive framework for justifying comparator choices to health technology assessment (HTA) bodies, with specific emphasis on methodologies acceptable within the new EU JCA framework.

The Evolving Regulatory Landscape: EU HTA Regulation and Global Implications

Implementation of the EU HTA Regulation

The EU HTAR (Regulation (EU) 2021/2282) is transforming HTA in Europe, with full enforcement for oncology drugs and advanced therapy medicinal products (ATMPs) beginning in January 2025 [23] [64]. The regulation establishes a framework for joint clinical assessments (JCAs) that will expand to include orphan drugs by January 2028 and all EMA-registered drugs [23]. This harmonized approach aims to reduce duplication of effort across member states, improve efficiency, and ultimately accelerate patient access to innovative therapies [67].

The JCA process involves assessors and co-assessors from different EU member states finalizing a population, intervention, comparator, and outcome (PICO) scope that incorporates input from patient organizations, healthcare professional organizations, and clinical societies [64]. Early experience from the first six months of implementation reveals that multiple PICOs are expected in the final JCA scope, often requiring evidence against numerous comparators reflecting variations in standards of care across member states [64] [67].

Documented Challenges in Comparator Selection

A review of EUnetHTA relative effectiveness assessments (REAs) conducted between 2010 and 2021 provides valuable insights into challenges that will likely persist in the JCA process. The analysis of 23 REAs found that twelve included indirect treatment comparisons (ITCs), with six in oncology indications [64]. Across these assessments, a median of four comparators were required per REA (range 1-18), and 25 comparisons were informed by indirect evidence [64].

Table: Evidence Generation Challenges in EUnetHTA Assessments

Assessment Aspect	Findings from EUnetHTA REA Review	Implications for JCA Preparation
Number of Comparators	Median of 4 comparators per REA (range: 1-18) [64]	Prepare evidence strategies for multiple comparators across member states
ITC Utilization	12 of 23 REAs included ITCs; 6 in oncology [64]	Develop robust ITC capabilities for evidence generation
ITC Acceptance	Suitability categorized as unclear in all but one of 25 comparisons [64]	Enhance methodological rigor and justification for indirect evidence
Oncology Focus	9 of 23 REAs in oncology indications [64]	Prioritize oncology development expertise given early JCA focus

The disconnect between potential PICO requests—particularly the possibility of a request for a "blended comparator" comprising different treatments under one comparator umbrella—and the recommended evidence synthesis options remains a significant concern for manufacturers [67].

Core Methodological Framework for Comparator Justification

Foundational Principles for Comparator Selection

The fundamental question in comparator justification is straightforward yet profound: "Compared with what?" [65] Answering this question effectively requires balancing clinical relevance, ethical considerations, and strategic imperatives:

Clinical Relevance: The comparator must reflect the current standard of care across relevant jurisdictions, ensuring the trial addresses meaningful, decision-relevant questions [65].
Ethical Considerations: Patients must not be deprived of effective therapy—particularly in high-burden or life-threatening conditions where placebo arms are no longer acceptable [65].
Strategic Alignment: The comparator should anticipate the therapy's future role, aligning with today's practice while remaining relevant at launch [65].

Geographic variation adds substantial complexity to comparator selection, as standards of care differ across jurisdictions shaped by formularies, access policies, and clinical culture [65]. For global trials, designing a comparator strategy that holds across major markets (e.g., EU4, United Kingdom, United States, and Japan) is critical, as misalignment between a global trial comparator and local treatment practices can lead to costly post-hoc bridging analyses and delayed reimbursement [65].

Methodological Approaches for Direct and Indirect Comparisons

When direct head-to-head randomized controlled trials (RCTs) are available, they remain the most robust means of generating comparative evidence [65]. However, the EU JCA process will most likely require health technology developers to use various indirect treatment comparison (ITC) approaches to address the multiple PICOs requested, recognizing the inherent limitations of these methodologies [64].

Table: Methodological Approaches for Treatment Comparisons

Method Type	Key Methods	Application Context	HTA Acceptance Considerations
Direct Comparisons	Randomized controlled trials (RCTs)	Head-to-head comparisons when feasible	Gold standard; demonstrates superiority, non-inferiority, or equivalence [65]
Anchored Indirect Comparisons	Network meta-analysis (NMA), Bucher method, MAIC, ML-NMR	Connected evidence networks with common comparator	Preferred ITC approach; preserves randomization integrity [7] [22]
Unanchored Indirect Comparisons	Naïve comparison, STC	Single-arm trials or disconnected evidence	Higher bias risk; use only when anchored methods are unfeasible [22]
External Control Arms	Historical clinical trials, registry data	Rare diseases or oncology with single-arm trials	Considered supportive rather than definitive evidence [65]

The EU HTA methodological guidelines for quantitative evidence synthesis describe two primary statistical approaches for evidence synthesis: frequentist and Bayesian [23]. No clear preference for either approach is stated; instead, the choice should be justified based on the specific scope and context of the analysis [23]. Bayesian methods are particularly useful in situations with sparse data because of the possibility of incorporating information from existing sources for prior distribution modeling [23].

Practical Implementation: Evidence Generation Strategies

Structured Approach to Comparator Justification

Justifying comparator choice requires a systematic, documented process that anticipates HTA body requirements. The following workflow outlines a comprehensive approach to comparator justification:

Figure 1. Workflow for systematic comparator justification. This process begins with comprehensive identification of potential comparators through systematic literature review and analysis of treatment guidelines across target markets. Subsequent steps evaluate clinical practice variations and document rationales for inclusion or exclusion before mapping to specific HTA body requirements and defining appropriate evidence generation strategies.

Addressing Methodological Challenges in Evidence Synthesis

Handling Multi-Arm and Blended Comparators

A particularly complex challenge in the JCA process is the potential requirement for blended comparators (where different treatments are grouped under one comparator umbrella) [67]. This approach creates significant methodological challenges for evidence synthesis, particularly when attempting indirect comparisons. When facing this scenario:

Pre-specify analytical approaches for blended comparators in the statistical analysis plan
Conduct component-level analyses where feasible to understand contributions of individual treatments
Use sensitivity analyses to test robustness of findings to different blending assumptions
Clearly document clinical rationale for which treatments are grouped together

Maintaining Statistical Rigor

The EU HTA guidelines emphasize several key principles for maintaining statistical rigor in comparative analyses [23]:

Pre-specification: Clearly outline and pre-specify models and methods in advance to avoid selective reporting or "cherry-picking" data [23].
Multiplicity adjustment: Investigate numerous outcomes within the PICO framework while accounting for increased risk of false-positive findings [23].
Sensitivity analysis: Assess robustness of analysis by exploring the impact of missing data and methodological choices [23].
Transparency: Identify post-hoc analyses due to their different scientific value compared to pre-specified analyses [23].

Strategic Preparation for Joint Clinical Assessments

Timeline and Planning Considerations

Given the tight timelines of JCAs, preparation is critical [22]. Most of the workload for indirect treatment comparisons can be managed during the preparation phase, well before the final PICOS scoping is confirmed by the member states [22]. A systematic literature review can identify the bulk of relevant studies, and preliminary data extraction sheets and programming codes can be created to allow for swift adjustments and updates once the specific PICOS are confirmed [22].

Planning for a large scope is deemed less risky than updating an existing systematic literature review, and ITC preparation is faster than starting from scratch with an analysis that might have been overlooked in the preparation process [22]. This approach enables faster ITC implementation during the JCA, ensuring that results are delivered on time without compromising quality [22].

Alignment with Evolving Methodological Standards

The practical implementation of the current guidance documents presents several challenges that manufacturers should address proactively [23]:

Uncertainty in practical application: While the guidelines provide a framework, they lack strict requirements, making their practical application uncertain [23].
Pre-specification of statistical analyses: Pre-specifying statistical analyses is crucial to avoid accusations of selective reporting. The more detailed the pre-specification, the better [23].
Adapting to emerging trends: How the guidelines will adapt to new methodologies and emerging trends is still unclear [23].
Collaborative learning: A collaborative spirit between assessors and health technology developers is essential for establishing best practices [23].

The following strategic approach ensures alignment with both EU and national HTA requirements:

Figure 2. Strategic timeline for JCA evidence preparation. This timeline outlines critical activities from early development through final submission, emphasizing early evidence planning, continuous evidence library maintenance, and final intensive evidence synthesis to meet JCA requirements.

Global Transportability of Comparative Evidence

With the implementation of JCAs, there is growing interest in how assessment findings might influence or be utilized in other jurisdictions [67]. The concept of evidence transportability—the ability to apply comparative effectiveness evidence from one country or context to another—becomes increasingly important [68]. A study examining trends in ITC methods used in reimbursement submissions in Canada and the US between 2020 and 2024 found that while naïve comparisons and Bucher analyses were less frequently used over time, the use of network meta-analysis and unanchored population-adjusted indirect comparisons remained consistent [24].

This suggests that methods currently recommended in JCA guidance are likely sufficient for decision problems facing manufacturers in other markets, though this may change as trial designs become more complex to address more specific therapeutic areas [24]. When designing global evidence generation strategies, manufacturers should:

Identify potential effect modifiers that might differ between populations [68]
Select appropriate target populations for transportability analyses [68]
Use statistical methods that can adjust for cross-country differences [68]
Integrate transportability assessments into HTA submission materials [68]

Justifying comparator choice to HTA bodies requires a methodologically rigorous, strategically informed, and proactively implemented approach. With the implementation of the EU HTA Regulation, the stakes for appropriate comparator selection and justification have never been higher. Success depends on early and continuous preparation, careful attention to methodological guidelines, and strategic alignment of evidence generation plans with both European and national HTA requirements.

The first wave of JCAs will provide invaluable real-world guidance for manufacturers navigating this new landscape. By applying the best practices outlined in this guide—including systematic comparator identification, appropriate use of direct and indirect comparison methodologies, proactive evidence planning, and careful attention to transportability considerations—researchers and drug development professionals can enhance their ability to demonstrate product value and secure market access in an increasingly complex regulatory environment.

Overcoming Data Gaps and Timing Issues for Joint Clinical Assessments (JCA)

The implementation of the EU HTA Regulation (HTAR) represents a fundamental shift in the European market access landscape, instituting a unified process for Joint Clinical Assessment (JCA) [69]. For health technology developers (HTDs), this new framework presents a significant methodological challenge: building evidence packages that simultaneously meet the needs of both the EU JCA and diverse national decision-making processes [69]. A central aspect of this challenge involves establishing robust comparative effectiveness in the frequent absence of head-to-head clinical trials, requiring sophisticated approaches to identify and utilize common comparators for indirect treatment comparisons (ITCs) [44] [2].

This technical guide addresses the core methodological and practical considerations for overcoming data gaps and timing issues in this new environment, with particular focus on strategies for identifying and justifying common comparators that will satisfy the rigorous standards of the JCA process and national HTA bodies.

The Regulatory and Methodological Landscape

EU JCA Requirements and National Variations

The EU HTAR establishes that JCAs will focus exclusively on comparative clinical effectiveness and safety, while final reimbursement decisions incorporating economic, social, and other contextual factors remain at the Member State level [69]. This creates a complex evidentiary environment where HTDs must navigate both centralized and decentralized requirements.

An environmental scan of methodological guidance reveals that while there is consensus that clinical assessments should be based on a systematically identified, unbiased evidence base, significant differences exist in agency guidance regarding evidence derived from indirect treatment comparisons [69]. These differences are particularly pronounced in countries like France, Germany, Spain, and the Netherlands, each with established but distinct HTA methodologies [69]. The scoping process, which defines the assessment framework using the PICO format (Population, Intervention, Comparator, Outcomes), is especially critical as it establishes the foundation for all subsequent evidence generation and analysis [69].

Prevalence and Impact of Evidence Gaps

The challenge of unavailable head-to-head evidence is substantial. Analysis of Institute for Clinical and Economic Review (ICER) reports found that indirect comparisons were deemed infeasible in 54% of assessments covering 53% of medicines, primarily due to differences in trial design and patient populations [70]. The most frequently cited reasons preventing valid indirect comparisons include:

Population differences (71% of cases): Variations in entry criteria and baseline characteristics [70]
Outcome measurement issues (55%): Differing methods of assessment (e.g., investigator vs. patient-reported) [70]
Timing inconsistencies (52%): Outcome assessment at different time points [70]
Study design variations (55%): Fundamental differences in trial architecture [70]

Table 1: Reasons Preventing Feasible Indirect Treatment Comparisons

Category	Frequency (%)	Examples
Population Differences	71%	Different entry criteria, baseline characteristics (e.g., number of prior therapies) [70]
Outcome Measurement	55%	Investigator-reported vs. patient-reported outcomes [70]
Time Frame Issues	52%	Outcome assessment at 12 vs. 24 weeks [70]
Study Design	55%	Crossover vs. parallel arm designs [70]
Intervention Differences	36%	Variations in dosing, administration, or concomitant therapies [70]

Methodological Framework for Common Comparator Identification

Foundational Concepts in Indirect Treatment Comparison

In the absence of direct head-to-head evidence, indirect treatment comparisons provide a methodological framework for estimating relative treatment effects between interventions that have not been studied in direct comparison [44] [2]. The most fundamental approach involves the use of a common comparator that serves as an anchor or link between the treatments of interest [44].

The underlying assumption of this approach is that the treatment effect of Drug A versus Drug B can be indirectly estimated by comparing the treatment effects of A versus C and B versus C, where C represents the common comparator [44]. This method preserves the randomization of the originally assigned patient groups within each trial, though it introduces additional statistical uncertainty [44].

Types of Indirect Comparison Methods

Several statistical approaches have been developed for implementing indirect comparisons, each with distinct methodological considerations and acceptance levels among HTA bodies:

Adjusted Indirect Comparisons: This method uses a common comparator (typically standard of care or placebo) as a link between two treatments [44]. The difference between Treatment A and Treatment B is estimated by comparing the difference between A and C against the difference between B and C [44]. This approach is generally accepted by HTA agencies including NICE, PBAC, and CADTH [44].
Network Meta-Analysis (NMA): NMAs extend the concept of adjusted indirect comparisons to incorporate multiple treatments and comparisons simultaneously within a connected network [8]. This approach uses Bayesian statistical models to incorporate all available data for a drug, including data not directly relevant to the comparator drug, which can reduce uncertainty [44].
Population-Adjusted Indirect Comparisons: When study populations differ significantly, methods such as matching-adjusted indirect comparison (MAIC) and simulated treatment comparison (STC) can be employed to adjust for cross-trial differences [8]. These techniques are increasingly referenced in HTA guidelines but require careful implementation and justification.

Table 2: Comparison of Indirect Treatment Comparison Methods

Method	Key Principle	HTA Acceptance	Key Limitations
Adjusted Indirect Comparison	Uses common comparator C to link Treatments A and B [44]	Widely accepted (NICE, PBAC, CADTH) [44]	Increased statistical uncertainty; requires shared comparator [44]
Network Meta-Analysis	Incorporates multiple treatments in connected network [8]	Increasingly accepted with specific methodology requirements [8]	Requires similarity and consistency assumptions across network [8]
Population-Adjusted Methods (MAIC/STC)	Statistical adjustment for cross-trial population differences [8]	Conditional acceptance with rigorous validation [8]	Limited to addressing observed differences; no adjustment for unmeasured confounding [8]

Experimental Protocols for Valid Common Comparator Identification

Systematic Approach to Common Comparator Selection

The following workflow provides a methodological protocol for identifying and validating common comparators for JCA submissions. This systematic approach ensures transparency and methodological rigor throughout the process.

Step 1: Define the PICO Framework Initiate the process by establishing the comprehensive PICO (Population, Intervention, Comparator, Outcomes) framework based on JCA scoping requirements [69]. This should reflect the diverse healthcare priorities of EU Member States and guide all subsequent evidence identification. Document all potential comparators relevant to different national settings, even if not uniformly applicable across all jurisdictions.

Step 2: Evidence Mapping Conduct systematic literature reviews to identify all available randomized controlled trial evidence for each potential comparator. This mapping should extend beyond the immediate interventions of interest to include trials connecting potential common comparators. Create an evidence matrix documenting trial characteristics, including design, population characteristics, outcome measures, and timing of assessment.

Step 3: PICOTS Alignment Assessment Evaluate the alignment of potential comparators using the PICOTS framework (Population, Intervention, Comparator, Outcomes, Timing, Setting) [70]. Assess each trial for heterogeneity across these domains, with particular attention to population definitions (cited in 71% of failed ITCs) and outcome measurement approaches (cited in 55% of failed ITCs) [70]. Document any identified discrepancies and their potential impact on the validity of indirect comparisons.

Step 4: Methodological Similarity Evaluation Assess the methodological similarity across trials, including randomization procedures, blinding, statistical analysis plans, and handling of missing data. Studies have shown that differences in study design account for 55% of cases where indirect comparisons are deemed infeasible [70]. Prioritize common comparators with trials that share fundamental methodological approaches.

Step 5: Optimal Comparator Selection Select the optimal common comparator based on the strength of evidence and alignment assessment. The preferred common comparator typically has:

Multiple high-quality RCTs with consistent results
Similar patient populations across trials
Comparable outcome definitions and assessment timepoints
Modern standard of care relevance across multiple EU markets

Step 6: Documentation and Justification Thoroughly document the rationale for comparator selection, including transparent assessment of limitations and potential biases. This documentation should preemptively address potential criticisms from HTA bodies and demonstrate systematic consideration of alternative approaches.

Protocol for Time Point-Specific Indirect Comparisons

The timing of outcome assessment represents a critical consideration in common comparator identification, as treatment effects may vary across different time horizons [71]. The following protocol addresses this specific challenge:

Implementation Guidelines:

Identify Clinically Relevant Time Points: Establish time points relevant to clinical decision-making (e.g., 12, 24, and 36 months) based on the natural history of the disease and treatment effect patterns [71].
Map Outcome Data Availability: Document available outcome data at each time point across all trials, noting any gaps or inconsistencies in assessment timing.
Assess Timing Alignment: Establish maximum acceptable differences in assessment windows (e.g., ±3 months for 12-month assessment). Research indicates that timing issues prevent feasible ITCs in 52% of cases [70].
Perform Time Point-Specific Analyses: Conduct separate network meta-analyses for each pre-specified time point rather than relying on combined analyses [71]. Evidence demonstrates that relative treatment rankings can substantially differ across time points [71].
Evaluate Consistency: Assess whether treatment effects relative to the common comparator remain consistent across time points. Time point-specific NMAs in osteoporosis research have revealed substantial variations in treatment effectiveness rankings at 12, 24, and 36 months [71].
Comparator Selection: Prioritize common comparators with stable relative effects across relevant time horizons, as this strengthens the validity of indirect comparisons.

Successfully implementing common comparator strategies for JCA requires both methodological expertise and practical tools. The following table summarizes key resources for researchers addressing these challenges.

Table 3: Essential Methodological Resources for Common Comparator Research

Resource Category	Specific Tools/Methods	Application in Common Comparator Research
ITC Guidelines	NICE DSU TSD 18; ISPOR Good Practices [8]	Provide standardized methodologies for conducting and reporting indirect comparisons; ensure HTA compliance [8]
Statistical Software	R (gemtc, pcnetmeta); SAS; Stata	Implement network meta-analyses and population-adjusted indirect comparisons [44] [8]
PICOTS Framework	Structured assessment template [70]	Systematically evaluate transitivity assumptions across trials; identify heterogeneity sources [70]
Quality Assessment Tools	Cochrane Risk of Bias; ROB-MEN	Assess internal validity of trials included in the evidence network; inform sensitivity analyses
Data Curation Methods	Exact matching; IPSW [72]	Adjust for cross-trial differences when using real-world evidence as supplementary data [72]

Navigating the evidentiary requirements of the EU JCA process requires sophisticated approaches to common comparator identification and validation. By implementing systematic protocols for comparator selection, addressing timing issues through time point-specific analyses, and leveraging appropriate methodological tools, health technology developers can build robust evidence packages that withstand scrutiny from both the JCA and national HTA bodies. The increasing methodological acceptance of advanced indirect comparison techniques provides opportunities to demonstrate comparative effectiveness even in the absence of head-to-head trials, but success depends on rigorous implementation, transparent documentation, and careful attention to the nuanced requirements of the evolving EU HTA landscape.

Validation, Trends, and HTA Acceptance of ITC Evidence

Indirect Treatment Comparisons (ITCs) have become indispensable methodological tools in Health Technology Assessment (HTA) for generating comparative evidence when head-to-head randomized controlled trials are not available or feasible. As health technology developers (HTDs) seek market access for new pharmaceuticals, HTA bodies worldwide increasingly rely on robust ITC methodologies to determine the relative efficacy, safety, and cost-effectiveness of new interventions compared to established standards of care. The implementation of the EU HTA Regulation in January 2025 has further elevated the importance of ITCs by establishing mandatory Joint Clinical Assessments (JCAs) for oncology drugs and Advanced Therapy Medicinal Products (ATMPs), with plans to expand to orphan drugs by 2028 and all new medicines by 2030 [73] [23]. This whitepaper examines current trends in ITC application across major HTA systems, analyzes evolving methodological preferences, and provides strategic guidance for researchers navigating this complex evidentiary landscape.

Current Trends in ITC Application Across HTA Systems

Utilization Patterns in North America and Europe

Recent analyses of HTA submissions reveal distinctive patterns in ITC application across different jurisdictions. In North America, between 2020 and 2024, 64% of oncology reimbursement reviews (61 of 95) submitted to Canada's Drug Agency (CDA-AMC) incorporated ITCs, whereas the Institute for Clinical and Economic Review (ICER) in the United States included ITCs in only one of 42 oncology assessments during the same period [24]. This disparity highlights significant differences in evidence requirements and acceptance thresholds between these neighboring systems.

In Germany, which operates under the rigorous AMNOG process, a comprehensive analysis of 334 subpopulations across 222 benefit assessments revealed that ITCs were most frequently employed in oncology (51.2%), followed by metabolic (15.0%) and infectious diseases (11.4%) [74]. However, only 22.5% of the submitted ITCs were accepted by the Federal Joint Committee (G-BA), with methodological deficiencies and insufficient similarity between compared studies representing the primary reasons for rejection [74].

The table below summarizes the evolving methodological preferences for ITCs in Canadian oncology submissions between 2020 and 2024:

Table 1: Trends in ITC Method Usage in Canadian Oncology Reimbursement Submissions (2020-2024)

ITC Method	2020 Usage (%)	2024 Usage (%)	Trend
Network Meta-Analysis (NMA)	35%	36%	Consistent usage
Unanchored Population-Adjusted Indirect Comparisons	22%	21%	Consistent usage
Naïve Comparisons & Bucher Method	26%	0%	Significant decline

Source: Adapted from ISPOR 2025 analysis of CDA-AMC database [24]

The data demonstrates a clear methodological maturation in HTA submissions, with simple naïve comparisons being phased out in favor of more sophisticated population-adjusted techniques and NMAs. This trend reflects both the increasing complexity of therapeutic landscapes and the growing methodological sophistication of HTA bodies.

Methodological Acceptance and Challenges

The acceptance criteria for ITCs vary substantially across HTA systems, creating a complex landscape for evidence generation. The German G-BA maintains particularly stringent acceptance criteria, typically requiring adjusted comparisons like the Bucher method, though unadjusted comparisons may be accepted under exceptional circumstances such as highly vulnerable populations or significant therapeutic challenges [74]. For instance, in assessments of treatments for chronic hepatitis C virus infection and lysosomal acid lipase deficiency, unadjusted comparisons with historical controls were accepted due to ethical constraints preventing randomized trials [74].

Similarly, in Japan, analyses of 31 products evaluated under the HTA system revealed that ITCs were less frequently accepted for orphan drugs due to heightened uncertainty associated with limited data and the lack of appropriate comparators in clinical trials [75]. Products granted "usefulness premiums" for attributes not fully captured by QALYs (such as improved convenience and prolonged effect) showed greater discrepancies in incremental cost-effectiveness ratios between manufacturer and HTA agency calculations [75].

Methodological Framework for ITC Analysis

Classification of ITC Methods

ITC methodologies can be systematically categorized based on their underlying assumptions and analytical frameworks. Contemporary literature classifies ITC methods into four primary classes according to their assumptions regarding the constancy of treatment effects and the number of comparisons involved [7]:

Bucher Method: Also known as adjusted or standard ITC, this frequentist approach enables pairwise comparisons through a common comparator but is limited to simple networks without closed loops from multi-arm trials.
Network Meta-Analysis (NMA): This comprehensive framework, implementable through frequentist or Bayesian approaches, allows simultaneous comparison of multiple interventions while accounting for both direct and indirect evidence.
Population-Adjusted Indirect Comparisons (PAIC): These methods, including Matching Adjusted Indirect Comparisons (MAIC) and Simulated Treatment Comparisons (STC), adjust for population imbalances when individual patient data (IPD) is available for at least one study.
Naïve ITC: These unadjusted comparisons represent the simplest approach but are highly prone to bias and increasingly discouraged by HTA bodies.

Table 2: Key ITC Methods: Applications, Strengths, and Limitations

ITC Method	Analytical Framework	Key Applications	Strengths	Limitations
Bucher Method	Frequentist	Pairwise indirect comparisons with common comparator	Preserves randomization; relatively simple	Limited to single common comparator; cannot handle multi-arm trials
Network Meta-Analysis (NMA)	Frequentist or Bayesian	Multiple intervention comparisons; treatment ranking	Incorporates all available evidence; enables ranking	Complex assumptions difficult to verify; requires connected evidence network
Matching Adjusted Indirect Comparisons (MAIC)	Frequentist (typically)	Pairwise comparisons with population adjustment using IPD and aggregate data	Adjusts for cross-trial differences; no aggregate data needed for index treatment	Limited to pairwise comparison; reduced effective sample size after weighting
Multilevel Network Meta-Regression (ML-NMR)	Bayesian	Multiple ITCs with effect modifiers; population adjustment	Adjusts for effect modifiers; connects different data sources	Computational complexity; requires IPD for at least one trial

Source: Adapted from comprehensive ITC methods overview [7]

Methodological Selection Framework

Selecting an appropriate ITC method requires systematic consideration of the available evidence and the decision problem. The following diagram illustrates the key decision points in the ITC selection process:

Diagram 1: ITC Method Selection Framework

This methodological selection framework emphasizes the critical distinction between anchored and unanchored approaches. Anchored ITCs, which rely on randomized controlled trials with a common control group, are generally preferred by HTA bodies like the EU HTA Coordination Group as they preserve randomization and minimize bias [22]. Unanchored ITCs, typically employed when randomized controlled trials are unavailable and based on single-arm trials or observational data, are more susceptible to bias and should only be utilized when anchored approaches are infeasible [22].

Implementation in Evolving HTA Systems

EU Joint Clinical Assessment Requirements

The implementation of the EU HTA Regulation has established standardized methodological requirements for ITCs in Joint Clinical Assessments. The Methodological and Practical Guidelines for Quantitative Evidence Synthesis, adopted in March 2024, specify that both frequentist and Bayesian statistical approaches are acceptable, with selection requiring justification based on the specific scope and context of the analysis [23]. Bayesian methods are particularly valuable in situations with sparse data due to their ability to incorporate information from existing sources for prior distribution modeling [23].

The guidelines emphasize several critical success factors for ITC acceptance in JCAs [23]:

Sufficient population overlap: Demonstrating adequate similarity between patient populations across studies
Comprehensive effect modifier identification: Accounting for all relevant baseline characteristics that could influence treatment effects
Analytical transparency: Pre-specifying models and methods to avoid selective reporting
Bias assessment: Investigating and quantifying potential sources of bias, particularly in unanchored scenarios

For the complex evidence networks typically encountered in JCAs, the guidelines illustrate various potential configurations:

Diagram 2: Example Evidence Network with Direct and Indirect Comparisons

Strategic Preparation for JCA Timelines

The tight statutory timelines governing JCAs necessitate advanced preparation for evidence generation. Since most ITC workload can be managed during preparation phases before final PICOS (Population, Intervention, Comparator, Outcome, Study Design) scoping is confirmed by member states, researchers should conduct systematic literature reviews early and create preliminary data extraction sheets and programming codes to facilitate rapid adaptation once specific PICOS are finalized [22]. Planning with a broad scope is considered less risky than updating existing reviews, enabling faster ITC implementation during official JCA periods [22].

Emerging Challenges and Future Directions

Demonstrating Similarity for Cost-Comparison Approaches

HTA bodies are increasingly employing cost-comparison analyses (cost-minimization) to manage assessment demand, requiring demonstration of clinical similarity between interventions. A review of 33 National Institute for Health and Care Excellence (NICE) appraisals using cost-comparison based on ITCs found that none incorporated formal methods to determine equivalence; instead, companies relied on narrative summaries asserting similarity, often based merely on non-significant differences [10]. The most promising methodological approach identified was estimating noninferiority ITCs in a Bayesian framework followed by probabilistic comparison of indirectly estimated treatment effects against pre-specified noninferiority margins [10].

Methodological Evolution and Research Gaps

As clinical trial designs grow more complex to address specific therapeutic areas, ITC methodologies must correspondingly evolve. Future methodological development should focus on:

Advanced population adjustment techniques: Enhancing methods like ML-NMR that can simultaneously adjust for multiple effect modifiers across complex evidence networks
Time-varying outcome analyses: Developing approaches that relax proportional hazards assumptions to better model long-term treatment effects
Real-world evidence integration: Establishing robust methodologies for incorporating real-world data into ITCs while addressing inherent biases
Standardized similarity assessment: Creating formal quantitative frameworks for evaluating similarity between trial populations in the absence of common comparators

The application of ITCs in HTA submissions has evolved significantly, with sophisticated methods like NMA and population-adjusted approaches becoming standard while simpler methods like naïve comparisons have dramatically declined. The implementation of the EU HTA Regulation and JCAs has further standardized methodological expectations, emphasizing pre-specification, transparency, and comprehensive bias assessment. Successful navigation of this landscape requires early strategic planning, careful methodological selection based on available evidence networks, and rigorous attention to HTA-specific guidelines. As therapeutic landscapes continue to evolve toward more targeted interventions and complex trial designs, ITC methodologies must correspondingly advance to maintain their critical role in informing healthcare decision-making.

Comparative Analysis of ITC Method Acceptance Across Major Agencies (EMA, NICE, CADTH)

In the realm of drug development and health technology assessment (HTA), indirect treatment comparisons (ITCs) have become indispensable statistical tools for evaluating the relative efficacy and safety of interventions when head-to-head randomized controlled trials (RCTs) are unavailable, impractical, or unethical [76] [5]. The acceptance of evidence generated by these methods varies significantly across major regulatory and HTA bodies, creating a complex landscape for researchers and drug developers to navigate [76] [1]. This whitepaper provides a comprehensive analysis of ITC method acceptance across three major agencies: the European Medicines Agency (EMA), the National Institute for Health and Care Excellence (NICE) in England, and the Canadian Agency for Drugs and Technologies in Health (CADTH). Situated within broader research on identifying common comparators for indirect drug comparisons, this analysis synthesizes current quantitative data on acceptance rates, details preferred methodological approaches, and identifies common criticisms to support robust evidence generation for regulatory and reimbursement submissions.

Quantitative Analysis of ITC Acceptance Rates

The acceptance of ITC methodologies varies considerably across HTA agencies, influenced by factors such as methodological rigor, underlying evidence base, and jurisdictional preferences. The table below summarizes key acceptance metrics for NICE, CADTH, and other major HTA bodies based on recent analyses.

Table 1: Health Technology Assessment Agency Acceptance of Indirect Treatment Comparisons

HTA Agency	Reports with ITCs	ITC Acceptance Rate	Most Accepted Methods	Context and Timeframe
NICE (England)	51% of oncology reports [76]	47% overall [76]	NMA, Bucher ITC [76]	Analysis of oncology evaluations (2018-2021) [76]
CADTH (Canada)	Frequently included [1]	Not specified (Favors anchored/PA methods) [1] [6]	NMA, Population-Adjusted ITCs [1] [6]	Oncology submissions (2021-2023) [1] [6]
France (HAS)	6% of oncology reports [76]	0% overall [76]	Not applicable	Analysis of oncology evaluations (2018-2021) [76]
Germany (G-BA)	Not specified	Not specified (Used in certain situations) [76]	Considered for novel ingredients [76]	Varying assessment frameworks [1] [6]
Overall (5 European Countries)	22% of oncology reports [76]	30% overall [76]	NMA (39% acceptance), Bucher (43% acceptance) [76]	Analysis of oncology evaluations (2018-2021) [76]

For the EMA, which serves a regulatory rather than HTA function, the acceptance pattern differs. A review of 33 EMA submission documents for oncology drugs found that all received positive decisions (either standard or conditional marketing authorization) [6]. Of these, 51.5% (n=17) included at least one ITC informed by comparative trials, while the remainder utilized ITCs based on non-comparative evidence [6]. Among the 42 specific ITCs identified in EMA submissions, the methodological breakdown was: 61.9% unspecified methods, 16.7% propensity score methods (PSM), 14.3% matching-adjusted indirect comparisons (MAIC), and 7.1% naïve comparisons [6].

Table 2: European Medicines Agency ITC Analysis (Oncology Submissions 2021-2023)

Parameter	Metric	Details
Total Submissions	33 documents	All received positive decisions [6]
Submissions with ITCs	51.5% (17/33)	Included ≥1 ITC informed by comparative trials [6]
ITC Methods Identified	42 ITCs across submissions	Naïve (7.1%), MAIC (14.3%), PSM (16.7%), Unspecified (61.9%) [6]
Primary Justification	Absence of direct RCT comparisons	Most common rationale provided [6]

A striking finding across agencies is that ITCs in orphan drug submissions more frequently led to positive decisions compared to non-orphan submissions [1] [6], highlighting the particular value of these methods in disease areas where conducting direct head-to-head trials is most challenging.

Major HTA agencies and regulatory bodies consistently express a preference for adjusted indirect comparison methods over naïve comparisons, which are considered prone to bias due to their failure to account for differences in trial designs and patient populations [5] [8] [7].

NICE: Guidance indicates that where direct comparison is impossible, ITC methods may be utilized, showing particular acceptance of network meta-analysis (NMA) and Bucher method ITCs, with acceptance rates of 39% and 43% respectively [76]. The agency provides comprehensive Technical Support Documents through its Decision Support Unit [76] [5].
CADTH: As part of Canada's Drug Agency, favors anchored or population-adjusted ITC techniques for their effectiveness in data adjustment and bias mitigation [1] [6]. The agency demonstrates preference for NMA and population-adjusted indirect comparisons over naïve or unadjusted methods [6].
EMA: As a regulatory body, the EMA accepts various ITC methods in submissions, with analyses showing inclusion of MAIC, propensity score methods, and other techniques [6]. The agency considers ITCs on a case-by-case basis, with particular consideration when direct evidence is unavailable [6].

Key ITC Techniques and Workflows

Network Meta-Analysis (NMA): NMA extends standard pairwise meta-analysis to simultaneously compare multiple interventions using both direct and indirect evidence [5] [7]. The key assumptions include homogeneity (similarity of treatment effects across studies with the same comparison), similarity (similar distribution of effect modifiers across studies), and consistency (agreement between direct and indirect evidence) [7].

Table 3: Fundamental ITC Methodologies and Applications

ITC Method	Key Assumptions	Data Requirements	Strengths	Common Applications
Bucher Method	Constancy of relative effects (homogeneity, similarity) [7]	Aggregate data from trials with a common comparator [5]	Simple approach for pairwise comparisons [7]	Pairwise indirect comparisons through a common comparator [7]
Network Meta-Analysis (NMA)	Constancy of relative effects (homogeneity, similarity, consistency) [7]	Aggregate data from multiple trials forming connected network [5]	Simultaneous comparison of multiple treatments [5] [7]	Multiple intervention comparisons or treatment ranking [5] [7]
Matching-Adjusted Indirect Comparison (MAIC)	Constancy of relative or absolute effects [7]	IPD from one trial and aggregate data from another [5]	Adjusts for population differences using propensity score weighting [7]	Studies with population heterogeneity, single-arm studies, unanchored comparisons [7]
Simulated Treatment Comparison (STC)	Constancy of relative or absolute effects [7]	IPD from one trial and aggregate data from another [5]	Predicts outcomes using outcome regression model [7]	Pairwise ITC with population heterogeneity [7]
Network Meta-Regression (NMR)	Conditional constancy of relative effects with shared effect modifier [7]	Aggregate data with study-level covariates [7]	Explores impact of study-level covariates on treatment effects [7]	Connected network evidence to investigate effect modifiers [7]

The appropriate selection of ITC technique depends on several factors, including the feasibility of a connected network, evidence of heterogeneity between studies, the overall number of relevant studies, and the availability of individual patient-level data (IPD) [5].

Diagram 1: ITC Method Selection Workflow

Common Criticisms and Agency Requirements

Frequent Criticisms from HTA Agencies

Despite their utility, ITC methods face significant scrutiny from HTA agencies. The most common criticisms relate to data limitations and methodological concerns [76].

Data Heterogeneity (48%): Differences in study designs, patient populations, outcomes definitions, and follow-up durations between trials included in the ITC [76].
Lack of Data (43%): Insufficient evidence to form reliable comparisons, particularly for novel therapies with limited available data [76].
Statistical Methods (41%): Concerns about the appropriateness of chosen statistical models, handling of multi-arm trials, or failure to assess underlying assumptions [76].

Additional criticisms include violation of key assumptions (such as similarity and consistency), choice of comparator therapy, and transparency in methodology and reporting [76] [1].

The Researcher's Toolkit: Essential Methodological Components

Table 4: Essential Components for Robust ITC Analysis

Component	Function	Application Notes
Individual Patient Data (IPD)	Enables population-adjusted methods like MAIC and STC [7]	Critical when significant heterogeneity exists between trial populations [5]
Systematic Literature Review	Identifies all relevant evidence for inclusion in ITC [5]	Foundation for defining the evidence network and assessing similarity [5]
Effect Modifier Identification	Determines key variables influencing treatment response [7]	Clinical input essential for selecting appropriate adjustment variables [7]
Statistical Software Packages	Implements complex ITC methods (R, WinBUGS, OpenBUGS) [5]	Bayesian frameworks preferred when source data are sparse [7]
Sensitivity Analyses	Tests robustness of results to different assumptions [7]	Should include assessments of heterogeneity and inconsistency [7]

The acceptance of indirect treatment comparisons across major agencies reveals a complex landscape where methodological rigor, transparency, and appropriate application serve as critical determinants of success. While overall acceptance rates remain modest (approximately 30% across European HTA agencies [76]), specific methods like network meta-analysis and population-adjusted techniques demonstrate higher acceptance when properly applied [76] [1] [6]. The significant variation in acceptance rates between agencies - from 47% in England to 0% in France according to one study [76] - underscores the importance of understanding jurisdiction-specific preferences and requirements.

For researchers engaged in identifying common comparators for indirect drug comparisons, this analysis suggests several strategic considerations: First, anchored comparison methods with a common comparator are generally preferred over unanchored approaches [1] [8]. Second, population-adjusted techniques like MAIC and STC demonstrate particular value when comparing across heterogeneous populations or incorporating single-arm studies [5] [7]. Third, proactive engagement with agency guidelines during the planning phase can prevent methodological missteps that undermine ITC credibility [76] [7].

As therapeutic landscapes continue to evolve, particularly in oncology and rare diseases, the strategic application of robust ITC methodologies will remain essential for demonstrating comparative effectiveness and securing patient access to innovative therapies. Future developments in ITC methodologies, particularly those addressing cross-study heterogeneity and leveraging real-world evidence, will likely further enhance their utility and acceptance across regulatory and HTA agencies worldwide.

HTAR implementation timeline: Uses a table to explain the phased implementation schedule and key components of the regulation.
Comparator requirements under HTAR: Describes how PICO frameworks and multiple comparators increase evidence demands.
ITC methodological guidance: Summarizes accepted indirect comparison methods and their applications in a table.
Practical implementation challenges: Discusses real-world application barriers and national adaptation needs.
Strategic recommendations: Provides evidence planning and stakeholder engagement advice for compliance.

The Impact of the EU HTA Regulation (HTAR) on Comparator Requirements

The European Union Health Technology Assessment Regulation (EU 2021/2282), which entered into force in January 2022 and became applicable from January 2025, represents a transformative shift in how health technologies are evaluated across EU Member States [77]. This landmark legislation establishes a framework for joint clinical assessments (JCAs) that aim to harmonize the assessment of relative clinical effectiveness while respecting member states' competence in pricing and reimbursement decisions [78]. The HTAR fundamentally changes the evidence requirements for market access by introducing standardized comparator requirements that will significantly impact evidence generation strategies for drug developers.

The implementation of HTAR follows a stepwise approach over six years, beginning with oncology drugs and advanced therapy medicinal products (ATMPs) in January 2025, expanding to orphan medicinal products in January 2028, and ultimately encompassing all new medicines by January 2030 [79] [78]. This phased implementation provides developers with transitional periods to adapt to the new evidence requirements, particularly concerning comparator selection and the corresponding need for indirect treatment comparisons (ITCs) when direct head-to-head evidence is unavailable.

HTAR Implementation Timeline and Key Components

Table 1: Stepwise Implementation of EU HTAR (2025-2030)

Implementation Date	Therapeutic Categories Included	Key Components Activated
January 12, 2025	Oncology medicines and Advanced Therapy Medicinal Products (ATMPs)	Joint Clinical Assessments (JCAs), Joint Scientific Consultations (JSCs)
January 13, 2028	Orphan medicinal products	JCAs, JSCs, horizon scanning
January 13, 2030	All new medicines containing new active substances	Full implementation of all HTAR components

The HTAR institutional framework is built around several key structures. The Member State Coordination Group on HTA (HTACG) comprises representatives from member states, primarily from HTA authorities and bodies, and issues guidance for joint work including JCAs and joint scientific consultations [77]. The European Commission's HTA Secretariat provides administrative, technical, and IT support to the Coordination Group and its subgroups [77]. Additionally, the HTA Stakeholder Network ensures input from patient associations, health technology developers, healthcare professionals, and other non-governmental organizations in the field of health [77].

A critical aspect of the regulation is its limited scope, covering only the clinical assessment of relative effectiveness, while economic evaluation, pricing, reimbursement decisions, and ethical considerations remain at the national level [78]. This separation creates an interface where JCAs inform national decisions without dictating them, requiring manufacturers to navigate both European and national evidence requirements simultaneously.

Changing Comparator Requirements Under HTAR

The PICO Framework and Expanded Comparator Scope

At the core of the JCA process lies the Population, Intervention, Comparator, and Outcomes (PICO) framework, which provides a structured approach to defining the scope of clinical assessments [67]. Under HTAR, the PICO framework undergoes a significant transformation, moving from nationally-defined comparators to harmonized EU-level comparator requirements that aim to reflect clinical practice variations across member states.

The JCA process mandates that EU member states collectively define the comparators through a scoping process that incorporates input from patient organizations, healthcare professional organizations, and clinical societies [64]. This process results in multiple comparator options representing different standards of care across member states, creating a complex evidence generation challenge for manufacturers. As noted in analysis of early EUnetHTA assessments that piloted the JCA approach, REAs required a median of four comparators per assessment, with some including up to 18 comparators [64].

Impact on Evidence Requirements

The expanded comparator scope directly increases the need for indirect treatment comparisons (ITCs). With multiple comparators to address and the impracticality of conducting head-to-head trials against all potential standards of care, manufacturers must increasingly rely on advanced statistical methods to generate comparative evidence [64]. Analysis of EUnetHTA relative effectiveness assessments (REAs) conducted between 2010-2021 found that more than half (12 out of 23) included evidence based on ITCs, with oncology indications being particularly prevalent [64].

The methodological rigor expected for ITCs under HTAR presents another significant challenge. In the EUnetHTA experience, assessors considered the ITC data and/or methods appropriate in only one submission, categorizing most as 'unclear' in terms of suitability [64]. This demonstrates the high evidence threshold that manufacturers will need to meet under the formal JCA process and highlights the importance of robust ITC methodologies that can withstand scrutiny from assessors and co-assessors from different member states.

Figure 1: EU HTA Regulation JCA PICO Scoping Process and Impact on Comparator Requirements

Methodological Guidance for Indirect Treatment Comparisons

Accepted ITC Methods Under HTAR

The European Commission has published specific methodological guidelines for quantitative evidence synthesis to support JCAs, including detailed recommendations on direct and indirect comparisons [23]. These guidelines establish a framework for evidence synthesis that manufacturers must follow when conducting ITCs to address comparator requirements.

Table 2: Accepted Indirect Treatment Comparison Methods Under EU HTAR

ITC Method	Key Principle	Data Requirements	Appropriate Use Cases
Bucher Method	Adjusted indirect treatment comparison using common comparator	Aggregate data (AgD) from studies with common comparator	Simple networks with one common comparator
Network Meta-Analysis (NMA)	Simultaneous comparison of multiple interventions using direct and indirect evidence	AgD from multiple studies forming connected network	Comparing three or more interventions when both direct and indirect evidence exists
Matching Adjusted Indirect Comparison (MAIC)	Re-weighting individual patient data (IPD) to match aggregate data baseline characteristics	IPD for index treatment, AgD for comparator	When population differences exist but IPD is available for one treatment
Simulated Treatment Comparison (STC)	Adjusting population data using outcome models	IPD for index treatment, AgD for comparator	When effect modifiers are known and can be modeled
Population-Adjusted Methods	Advanced statistical adjustment for population differences	IPD from multiple studies	When cross-study heterogeneity is present

The guidelines do not endorse a single specific methodological approach but emphasize that the choice of method must be justified based on the specific evidence base, research question, and characteristics of the available data [23]. This flexibility acknowledges that different clinical contexts may require different methodological approaches, but places the burden on manufacturers to provide adequate justification for their selected methodology.

Methodological Requirements and Standards

A fundamental requirement under HTAR is the pre-specification of ITC analyses in study protocols [23]. Manufacturers must clearly outline and pre-specify models and methods in advance to avoid selective reporting or "cherry-picking" of data, thus maintaining scientific integrity. This includes pre-specifying approaches for handling multiplicity when investigating numerous outcomes within the PICO framework, as well as plans for sensitivity analyses to assess the robustness of findings.

The guidelines emphasize transparency and scientific rigor throughout the ITC process [23]. Key considerations include the sufficiency of overlap between patient populations in different studies, comprehensive knowledge and use of effect modifiers, and quantification of uncertainty through appropriate statistical measures. For unanchored comparisons (often from single-arm studies), the guidelines note that these approaches "rely on very strong assumptions" and require extensive investigation and quantification of potential bias.

Practical Implementation Challenges

Operational and Methodological Hurdles

The practical implementation of HTAR presents several significant challenges for manufacturers and assessors alike. A primary concern is the disconnect between potential PICO requests and the recommended evidence synthesis options to cover such analytical scenarios [67]. Manufacturers face the particular challenge of "blended comparators" or "individualized treatment" – where different treatments are grouped under one comparator umbrella – which creates complex evidence synthesis requirements that may not align with available methodological approaches.

Another operational challenge is the need for early evidence planning in clinical development programs [67]. With the potential impact of "unforeseen PICO requests" on evidence generation and synthesis activities, manufacturers must engage in PICO simulation exercises at different phases of product development to anticipate potential evidence requirements. These activities can be resource-intensive, especially if performed close to the JCA submission deadline rather than earlier in the process, creating significant strategic planning challenges.

National Implementation and Adaptation

While HTAR creates harmonized EU-level assessments, national adaptation remains a critical challenge. Member states will continue to conduct their own HTA processes for determining reimbursement and pricing, using the JCA reports as input but not as binding documents [67]. This creates a complex interface between EU and national levels, where manufacturers must navigate both harmonized and country-specific requirements.

The experience from early implementation illustrates these national adaptation challenges. In Germany, for example, the JCA dossier does not replace the need for a national AMNOG benefit assessment, which remains evaluative rather than descriptive [67]. Although methodological alignment between HTAR and German HTA practices is relatively close, significant differences exist in comparator selection – German assessments require justification for selecting one treatment comparator only, following a systematic literature review, and prioritize randomized controlled trials over indirect comparisons [67].

Similarly, in France, the Haute Autorité de Santé (HAS) is actively adapting methods and processes to accommodate JCA outputs, but no changes are expected to the SMR and ASMR appraisal criteria, which continue to set a high acceptance bar for clinical evidence [67]. Companies may need to submit local French dossiers as soon as the Committee for Medicinal Products for Human Use opinion is positive, requiring close alignment of EU JCA and local dossier preparation [67].

Strategic Recommendations for Drug Developers

Evidence Planning and Generation

Successful navigation of the HTAR comparator requirements demands proactive evidence planning that begins early in the drug development lifecycle. Manufacturers should initiate integrated evidence planning at least by Phase 2 of clinical development, with a focus on anticipating potential comparator scenarios and corresponding evidence needs [79]. This includes conducting systematic literature reviews to map the evolving treatment landscape and identify potential comparators across member states.

For products already in Phase 2 or 3 development, creating living evidence libraries (including treatment guidelines, landscape analyses, systematic literature reviews, and evidence synthesis assumptions) forms the cornerstone of a robust JCA preparatory strategy [67]. These dynamic resources should be updated regularly to reflect changes in the competitive landscape and treatment standards across EU markets.

Stakeholder Engagement and Alignment

Early and meaningful engagement with regulatory and HTA bodies through mechanisms such as joint scientific consultation (JSC) provides valuable opportunities to align on evidence generation plans and anticipate PICO requirements [79]. These consultations can help identify potential challenges in comparator selection and evidence generation early enough to adapt clinical development plans accordingly.

Manufacturers should also prioritize engagement with patient organizations and clinical experts throughout the development process [67]. The HTAR mandates patient involvement in the assessment process, and early understanding of patient and clinician perspectives on meaningful comparators and outcomes can inform more targeted evidence generation strategies.

Table 3: Strategic Preparedness Timeline for HTAR Compliance

Development Phase	Key Activities for HTAR Comparator Readiness	Stakeholder Engagement
Early Development (Phase 1)	Disease area analysis, preliminary PICO simulations, initial evidence gap assessment	Early scientific advice, patient organization consultation
Mid-Development (Phase 2)	Living evidence libraries, comparative analysis planning, JSC preparation	Joint scientific consultations, regulatory-HTA parallel advice
Late Development (Phase 3)	Refined PICO simulations, ITC analytical plans, dossier preparation	Updated scientific advice, clinical expert engagement
Pre-submission (1-2 years before)	Final evidence synthesis, ITC execution, dossier drafting	Pre-submission meetings, patient input collection

The EU HTA Regulation fundamentally transforms comparator requirements for market access in Europe, establishing a harmonized framework for defining relevant comparators while accommodating clinical practice variations across member states. The expanded comparator scope, coupled with the mandated use of the PICO framework, significantly increases the need for robust indirect treatment comparisons when direct head-to-head evidence is unavailable.

Successful navigation of this new landscape requires methodological rigor in evidence generation, strategic foresight in clinical development planning, and operational flexibility to adapt to both EU-level and national evidence requirements. Manufacturers who proactively address these changing requirements through early evidence planning, comprehensive stakeholder engagement, and methodological excellence will be best positioned to demonstrate product value in this new regulatory environment.

As the implementation of HTAR progresses, ongoing monitoring of methodological guidance updates, learning from early JCAs, and adaptation to evolving evidence standards will be essential for continuous compliance. The regulation represents not just a procedural shift but a strategic turning point with far-reaching implications for evidence generation and market access of new health technologies in Europe.

Why Network Meta-Analysis and Population-Adjusted ITCs are Increasingly Favored

In modern drug development and comparative effectiveness research, the simultaneous comparison of multiple treatments is essential for informed decision-making. However, head-to-head randomized controlled trials for all treatments of interest are often unavailable due to logistical and financial constraints. This whitepaper examines the growing preference for network meta-analysis (NMA) and population-adjusted indirect treatment comparisons (ITCs) as robust methodologies for synthesizing evidence across studies. These techniques enable researchers to estimate relative treatment effects by leveraging a common comparator framework, even when direct evidence is limited. We explore the methodological foundations, advantages, and implementation considerations of these approaches within the context of identifying appropriate common comparators for indirect drug comparisons.

The Evidence Gap in Comparative Effectiveness Research

Evidence-based healthcare decision-making requires comparisons of all relevant competing interventions [80]. Traditional pairwise meta-analysis is limited to synthesizing evidence from studies comparing the same two interventions directly (head-to-head) [13] [81]. In reality, healthcare providers and policymakers need to choose among multiple available treatments, few of which have been directly compared in randomized controlled trials [81]. This evidence gap arises because pharmaceutical development typically focuses on placebo-controlled trials for regulatory approval rather than active comparator trials, which are more expensive and require larger sample sizes [81].

Network meta-analysis and population-adjusted indirect treatment comparisons have emerged as sophisticated statistical methodologies that address this challenge by combining direct and indirect evidence [32] [31]. These approaches allow for the estimation of relative treatment effects between interventions that have not been studied in head-to-head trials, while simultaneously synthesizing a greater share of the available evidence than traditional meta-analysis [80].

Fundamental Concepts and Definitions

Indirect treatment comparison refers to the estimation of relative effects between two interventions via one or more common comparators [13] [80]. The simplest form involves interventions A and B that have both been compared to intervention C but not to each other, enabling an indirect A versus B comparison [13].

Network meta-analysis extends this concept to simultaneously compare multiple interventions by combining direct and indirect evidence across a network of trials [32] [81]. When both direct and indirect evidence exist for a particular pairwise comparison, this combined evidence is termed mixed treatment comparison [80].

Population-adjusted indirect comparisons are advanced methods that adjust for differences in patient characteristics between trials when individual patient data (IPD) is available for only a subset of studies [12] [15]. These methods, including Matching-Adjusted Indirect Comparison (MAIC) and Simulated Treatment Comparison (STC), relax the assumption that patient characteristics are similarly distributed across studies [12].

Table 1: Key Terminology in Indirect Comparisons

Term	Definition	Key Reference
Common Comparator	An intervention used as a bridge to enable indirect comparison between two other interventions	[13] [81]
Direct Evidence	Evidence from head-to-head randomized controlled trials comparing interventions directly	[32]
Indirect Evidence	Evidence obtained through one or more common comparators	[32]
Transitivity	The assumption that there are no systematic differences between comparisons other than the treatments being compared	[32]
Consistency	The agreement between direct and indirect evidence for the same comparison	[13]

Methodological Foundations and Statistical Framework

Basic Principles of Indirect Comparisons

The foundational approach for adjusted indirect comparisons was described by Bucher et al. [13]. In the simplest case of three interventions (A, B, and C), where A and B have both been compared to C but not to each other, the effect of B relative to A can be estimated indirectly using the direct estimators for the effects of C relative to A (effectAC) and C relative to B (effectBC) [13]:

effectAB = effectAC - effect_BC

The variance of this indirect estimator is the sum of the variances of the two direct estimators:

varianceAB = varianceAC + variance_BC

This approach maintains the randomized structure of the original trials, as it operates on the relative effect estimates rather than naively comparing outcomes across trial arms [13]. For relative effect measures (e.g., odds ratios, relative risks), this additive relationship holds true only on a logarithmic scale [13].

Network Meta-Analysis Framework

Network meta-analysis extends the basic indirect comparison to complex networks involving multiple treatments [81]. NMA provides effect estimates for all possible pairwise comparisons within the network by simultaneously combining direct and indirect evidence [13]. The analysis can be performed using either frequentist or Bayesian approaches, with Bayesian methods particularly common in health technology assessment for their ability to provide probabilistic interpretations [13] [81].

A key output of NMA is the ranking of treatments, often presented as probabilities (e.g., the probability that each treatment is best) or using statistics like the Surface Under the Cumulative Ranking Curve (SUCRA) [32] [31]. These rankings must be interpreted cautiously, considering the uncertainty in effect estimates and the clinical relevance of differences [31].

Population-Adjusted Methods

When effect modifiers are distributed differently between trials, standard indirect comparisons may be biased [12]. Population-adjusted methods use IPD from one trial to adjust for cross-trial differences in patient characteristics [12] [15].

Matching-Adjusted Indirect Comparison (MAIC) uses propensity score-based weighting to make the IPD sample resemble the aggregate data trial population with respect to observed effect modifiers [12]. This is typically implemented through a method similar to raking or entropy balancing [12].

Simulated Treatment Comparison (STC) uses regression adjustment on the IPD to model the outcome, then applies this model to the aggregate data population characteristics to predict the counterfactual outcomes [12].

These methods can be implemented in either anchored comparisons (where a common comparator exists) or unanchored comparisons (where no common comparator exists, requiring stronger assumptions) [12]. Anchored comparisons are generally preferred as they respect within-trial randomization [12].

Diagram 1: Population-Adjusted ITC Workflow (14 words)

The Critical Role of Common Comparators

Common Comparators as Analytical Anchors

The common comparator serves as the foundational anchor that enables valid indirect comparisons [81]. In a network of three treatments (A, B, and C), if A is directly linked to B while C is also directly linked to B, treatment B functions as the common comparator [81]. This anchor preserves the randomization within trials, as each direct comparison maintains its internal validity [12].

The choice of common comparator significantly influences the results of indirect comparisons [13]. Ideally, the common comparator should be a standard treatment consistently used across trials, with well-understood effects and mechanisms of action [13]. In many drug development contexts, placebo or standard of care serves this function, enabling comparisons between new treatments that have each been tested against these common references [81].

Transitivity and Similarity Assumptions

The validity of indirect comparisons rests on the transitivity assumption (also called the similarity assumption), which requires that there are no systematic differences between the available comparisons other than the treatments being compared [13] [32]. This means that in a hypothetical multi-arm trial including all treatments in the network, patients could be randomized to any of the treatments [32].

Transitivity has three key components [13]:

Clinical and methodological similarity: The trials should be comparable in terms of potential effect modifiers (e.g., patient characteristics, trial design, outcomes measurement)
Homogeneity: There should be no relevant heterogeneity between trial results in pairwise comparisons
Consistency: There should be no relevant discrepancy between direct and indirect evidence

Violations of transitivity occur when treatment effect modifiers are distributed differently across comparisons [32]. For example, if all trials comparing A versus B enrolled severely ill patients while all trials comparing A versus C enrolled mildly ill patients, and disease severity modifies treatment effects, the indirect B versus C comparison would be biased [32].

Diagram 2: Common Comparator Network Structure (12 words)

Advantages Driving Adoption

Comprehensive Evidence Synthesis

NMA and population-adjusted ITCs enable simultaneous comparison of all relevant interventions for a condition, providing a complete evidence base for decision-making [81]. This comprehensive approach allows healthcare decision-makers to assess the relative benefits and harms of all available treatments, rather than being limited to pairwise comparisons [31]. By synthesizing both direct and indirect evidence, these methods maximize the use of available clinical trial data, potentially leading to more precise effect estimates than pairwise meta-analysis alone [31].

The ability to rank treatments is particularly valuable for clinical guideline development and formulary decisions [81]. While rankings should not be the sole basis for decisions, they provide useful supplementary information when considered alongside the magnitude of differences and certainty of evidence [31].

Ethical and Economic Efficiency

Indirect comparison methods address ethical and practical constraints in clinical research [81]. When numerous interventions exist for a condition, conducting head-to-head trials of all possible combinations is logistically challenging and potentially unethical if it requires recruiting unnecessarily large numbers of patients [81]. NMA provides a framework for leveraging existing evidence more efficiently, potentially reducing the need for additional clinical trials [81].

From a health technology assessment perspective, these methods allow for comparative effectiveness research even when direct evidence is lacking, supporting timely decision-making for new drug approvals and reimbursement [15]. This is particularly important for conditions with multiple treatment options and rapidly evolving therapeutic landscapes [15].

Enhanced Generalizability and Transparency

Population-adjusted methods enhance the generalizability of comparative effectiveness research by enabling estimation of treatment effects in specific target populations [12]. This is crucial for health technology assessment, where decisions must be made for specific healthcare systems and patient populations that may differ from those enrolled in clinical trials [12] [15].

When properly conducted and reported, NMA and population-adjusted ITCs increase transparency in treatment comparisons by making the evidence base and underlying assumptions explicit [13]. Guidelines such as the PRISMA extension for NMA have standardized reporting, facilitating critical appraisal and appropriate interpretation [13].

Table 2: Quantitative Assessment of Methodological Advantages

Advantage	Impact Measure	Evidence
Evidence Comprehensiveness	Increased proportion of relevant comparisons included	NMA allows simultaneous comparison of all interventions in a network, while pairwise meta-analysis is limited to two interventions at a time [81]
Precision of Estimates	Reduction in confidence interval width	Combined direct and indirect evidence in NMA can provide more precise estimates than direct evidence alone [31]
Decision-Making Utility	Provision of treatment rankings	NMA provides hierarchies and ranking probabilities (e.g., SUCRA values) to inform decisions [31]
Population Relevance	Adjustment for cross-trial differences	MAIC and STC enable comparison in specific target populations when IPD is available for one trial [12]

Implementation Protocols and Methodological Considerations

Checklist for Conducting and Evaluating Indirect Comparisons

Based on established guidelines, the following checklist provides key considerations for conducting and evaluating indirect comparisons and NMAs [13]:

Pre-specified Research Question: The clinical question and statistical hypotheses should be clearly defined in advance in a written protocol [13].
Rationale for Indirect Approach: The publication should explain why indirect comparisons are necessary, typically due to absence of head-to-head trials [13].
Common Comparator Justification: The choice of common comparators should be clinically justified and transparently explained [13].
Comprehensive Literature Search: Systematic searches should identify all relevant evidence for all treatments of interest and common comparators [13].
Pre-established Inclusion Criteria: Clear inclusion and exclusion criteria should be defined a priori and applied consistently [13].
Complete Data Reporting: Publications should report characteristics of all included trials, network diagrams, and results for all relevant outcomes [13].
Assessment of Key Assumptions: The assumptions of similarity, homogeneity, and consistency should be explicitly examined and reported [13].
Appropriate Statistical Methods: The statistical approach should be clearly described, including handling of multi-armed trials, choice of fixed or random effects, and software implementation [13].
Sensitivity Analyses and Limitations: Methodological uncertainties and limitations should be thoroughly discussed, with sensitivity analyses addressing key assumptions [13].

Statistical Implementation Protocols

Network Meta-Analysis Protocol

Data Collection and Network Geometry Assessment

Conduct comprehensive systematic review for all interventions of interest
Abstract data on potential effect modifiers (patient characteristics, study design, risk of bias)
Create network graph to visualize direct comparison evidence and identify potential gaps
Evaluate network connectivity - poorly connected networks yield less reliable estimates [32]

Statistical Analysis

Perform pairwise meta-analyses for all direct comparisons first to assess heterogeneity
Select reference treatment (typically placebo or standard care)
Choose between fixed-effect and random-effects models based on heterogeneity assessment
For random-effects models, select approach for modeling heterogeneity (common or comparison-specific)
Assess consistency between direct and indirect evidence using node-splitting or other appropriate methods [32]

Software and Computation

Implement using either frequentist (e.g., R netmeta package) or Bayesian (e.g., WinBUGS, JAGS) frameworks
For Bayesian approaches, specify prior distributions carefully, typically using non-informative priors
Run sufficient iterations to ensure convergence (assessed using Gelman-Rubin statistics) [81]

Population-Adjusted Indirect Comparison Protocol

Anchored MAIC Implementation

Identify effect modifiers based on clinical knowledge and preliminary analyses
Using IPD, estimate propensity scores via logistic regression predicting trial membership
Calculate weights for IPD patients to balance effect modifiers with aggregate data trial
Assess weight distribution and consider truncation if weights are highly variable
Compare weighted outcomes between treatments in the IPD trial
Conduct anchored indirect comparison using the weighted IPD estimates [12]

Anchored STC Implementation

Using IPD, develop outcome model including treatment, effect modifiers, and treatment-effect modifier interactions
Validate model performance using appropriate internal validation techniques
Apply model to aggregate data population characteristics to predict counterfactual outcomes
Compare predicted outcomes between treatments
Conduct anchored indirect comparison using these adjusted estimates [12]

Assumption Verification

Assess overlap in effect modifier distributions between trials
Evaluate whether all important effect modifiers have been included
Conduct sensitivity analyses using different sets of effect modifiers or modeling approaches [12]

The Scientist's Toolkit: Essential Methodological Components

Table 3: Research Reagent Solutions for Indirect Comparisons

Component	Function	Implementation Considerations
Individual Patient Data (IPD)	Enables population adjustment and detailed covariate analysis	Typically available only for sponsor's own trial in health technology assessment submissions [12]
Aggregate Data	Summary statistics from published trials or clinical study reports	Must include sufficient detail on baseline characteristics and effect modifiers [12]
Common Comparator	Analytical anchor connecting different treatments	Should be clinically relevant and consistently defined across trials [13]
Effect Modifiers	Patient characteristics that influence treatment effect	Identification requires clinical knowledge and may be scale-dependent [12]
Raking Algorithms	Weighting method to balance covariate distributions	Similar approach used in MAIC through propensity score weighting [82]
Network Geometry Visualization	Diagrammatic representation of evidence network	Identifies evidence gaps and informs feasibility of indirect comparisons [32]

Current Trends and Methodological Challenges

Rapid Adoption in Health Technology Assessment

Population-adjusted indirect comparisons have seen substantial growth in recent years, with one methodological review finding that half of all identified publications appeared after May 2020 [15]. This trend is particularly prominent in oncology and hematology, which accounted for 53% of published PAIC studies [15]. The pharmaceutical industry is heavily involved in these applications, participating in 98% of published PAICs [15].

This rapid adoption reflects increasing acceptance by health technology assessment bodies such as the National Institute for Health and Care Excellence (NICE) [12]. However, this growth has outpaced the development of reporting standards, with only three of 133 reviewed articles adequately reporting all key methodological aspects [15].

Reporting Biases and Methodological Concerns

A major concern in the field is evidence of reporting bias. In the methodological review of PAICs, 56% of analyses reported statistically significant benefits for the treatment evaluated with IPD, while only one PAIC significantly favored the treatment evaluated with aggregate data [15]. This strong asymmetry suggests selective publication or reporting of analyses that favor the sponsor's product [15].

Methodological quality varies substantially, with inconsistent reporting of key aspects such as [15]:

Justification of effect modifier selection
Assessment of underlying assumptions
Handling of uncertainty
Software and code availability

These reporting gaps undermine the reliability and reproducibility of population-adjusted indirect comparisons [15].

Future Methodological Developments

Several areas require further methodological development and standardization:

Standardization of Reporting

Development and enforcement of specific reporting guidelines for population-adjusted methods
Requirements for protocol registration before conducting analyses
Standards for software and code sharing to enhance reproducibility [15]

Statistical Methods Development

Improved approaches for assessing and reporting uncertainty in complex analyses
Methods for handling unmeasured effect modifiers
Standardized approaches for sensitivity analyses addressing key assumptions [12]

Evidence Integration Frameworks

Methodologies for incorporating real-world evidence alongside randomized trial data
Approaches for combining different types of indirect evidence
Standardized methods for evaluating the credibility of indirect comparison results [80]

Network meta-analysis and population-adjusted indirect treatment comparisons represent significant methodological advances that address critical evidence gaps in comparative effectiveness research. Their growing favor stems from the ability to provide comprehensive treatment comparisons, enhance decision-making efficiency, and improve the relevance of evidence to specific target populations. The proper application of these methods requires careful attention to the transitivity assumption, appropriate selection of common comparators, and transparent reporting of methodologies and limitations. As these techniques continue to evolve, standardization of reporting practices and ongoing methodological refinement will be essential to maintain scientific credibility and maximize their value for healthcare decision-making.

In the landscape of drug development, Indirect Treatment Comparisons (ITCs) have become indispensable for assessing the relative efficacy and safety of new therapeutics when head-to-head randomized controlled trials (RCTs) are unavailable or infeasible [5] [6]. Health Technology Assessment (HTA) bodies worldwide increasingly rely on ITCs to inform reimbursement and pricing decisions, particularly in oncology and rare diseases [6]. The strength and validity of this evidence directly impact patient access to innovative treatments. However, ITC methodologies are complex and rest on assumptions that, if unmet, can introduce significant uncertainty or bias. This technical guide examines common critiques of ITC evidence from an HTA perspective and outlines robust validation techniques to strengthen its credibility for regulatory and reimbursement submissions.

Foundational ITC Methods and Their Link to Evidence Strength

The choice of ITC method is foundational to evidence strength, as each technique carries specific assumptions and data requirements. Understanding this landscape is crucial for selecting an appropriate method and anticipating potential critiques.

Table 1: Overview of Common Indirect Treatment Comparison Methods

ITC Method	Core Assumptions	Key Strengths	Inherent Limitations & Common Critiques
Bucher Method [7] [5]	Constancy of relative effects (homogeneity, similarity)	Simple for pairwise comparisons via a common comparator	Limited to simple networks with a single common comparator; cannot incorporate multi-arm trials [7].
Network Meta-Analysis (NMA) [7] [5] [83]	Constancy of relative effects (homogeneity, similarity, consistency)	Simultaneously compares multiple interventions; can incorporate both direct and indirect evidence [7].	Complex with challenging-to-verify consistency assumptions; threatened by sparse data and publication bias [7] [83].
Matching-Adjusted Indirect Comparison (MAIC) [7] [5] [23]	Constancy of relative or absolute effects	Adjusts for population imbalances using IPD; useful for single-arm trials.	Limited to pairwise comparisons; requires IPD; can only adjust for known, measured effect modifiers [7] [23].
Simulated Treatment Comparison (STC) [5] [23]	Constancy of relative or absolute effects	Uses outcome regression models to predict comparative effectiveness.	Limited to pairwise comparisons; relies on strong modeling assumptions and correct model specification [23].
Network Meta-Regression (NMR) [7] [5]	Conditional constancy of relative effects with shared effect modifiers	Explores impact of study-level covariates on treatment effects.	Does not work for multi-arm trials; requires a connected network [7].

Figure 1: ITC Method Selection and Its Impact on Evidence Strength. This workflow outlines the decision process for selecting an ITC method based on data availability and network structure, highlighting how choices impact the potential strength and HTA acceptability of the resulting evidence. Methods in red, like unanchored comparisons, rely on stronger assumptions and are viewed as less robust [5] [23].

Common Critiques of ITC Evidence from HTA Bodies

HTA agencies meticulously evaluate submitted ITCs, with critiques often focusing on several key methodological and clinical areas.

Violation of Fundamental Assumptions

The validity of any ITC hinges on its foundational assumptions. Similarity (of trial designs and patient populations), homogeneity (of treatment effects across studies for the same comparison), and consistency (between direct and indirect evidence within a network) are paramount [7] [83]. HTA bodies frequently critique unmeasured or uncontrolled for effect modifiers, which are baseline characteristics that influence the relative treatment effect [23]. For example, differences in disease severity, prior lines of therapy, or standard of care across trials can violate the similarity assumption, making the ITC results unreliable [7].

Inadequate Assessment of Heterogeneity and Inconsistency

A significant critique is the failure to adequately assess and account for statistical heterogeneity (variability in treatment effects beyond chance) and inconsistency (discrepancies between direct and indirect evidence) [7] [83]. Submissions that do not provide sensitivity analyses using different statistical models (e.g., fixed-effect vs. random-effects) or that ignore significant inconsistency in the network are viewed as less credible [23].

Lack of Pre-specification and Risk of Selective Reporting

HTA guidelines emphasize the necessity of pre-specifying ITC methods, analysis populations, and key outcomes in a statistical analysis plan (SAP) before conducting analyses [23]. Analyses conducted post-hoc without a pre-specified plan are critiqued for increasing the risk of selective reporting and data-driven results, which inflate the chance of false-positive findings [23] [10]. The EU HTA methodology specifically requires pre-specification to mitigate this risk [23].

Use of Naive Comparisons and Unanchored Analyses

HTA bodies strongly discourage naive comparisons (simple, unadjusted cross-trial comparisons that ignore differences in trial design and patient populations) due to their high susceptibility to bias [5] [6]. Similarly, unanchored ITCs (e.g., MAIC or STC performed without a common comparator arm) are viewed with skepticism because they rely on the untestable assumption that all prognostic factors and effect modifiers have been identified and correctly adjusted for [6] [23]. A review of oncology submissions found that authorities more frequently favored anchored or population-adjusted techniques for their superior ability to mitigate bias [6].

Validation Techniques and Best Practices for Robust ITCs

To address common critiques and bolster the strength of ITC evidence, researchers should implement the following validation techniques and strategic practices.

Rigorous Pre-Study Planning and Pre-Specification

The single most effective practice is comprehensive pre-planning. A prospectively written SAP should detail the chosen ITC method, the rationale for its selection, all effect modifiers to be considered, and the planned approach for assessing heterogeneity and inconsistency [23] [84]. For the EU JCA, pre-specification is not just a best practice but a formal requirement [23]. Starting this process early—even before Phase 3 trial finalization—ensures that pivotal trials are designed with future ITCs in mind, improving the validity of similarity assumptions [84].

Comprehensive Assessment of Clinical and Methodological Similarity

Before statistical analysis, a thorough investigation of the PICO (Population, Intervention, Comparator, Outcome) elements across trials is essential. This involves comparing trial designs, patient baseline characteristics, and outcome definitions [7] [84]. Engaging clinical experts to evaluate the plausibility of a class effect and the relevance of identified differences is crucial for validating the clinical similarity assumption [10].

Statistical and Sensitivity Analyses to Validate Assumptions

Statistical validation is critical for supporting ITC findings. Key techniques include:

Goodness-of-fit and Model Comparison: Evaluating how well different NMA models (fixed vs. random effects) fit the data [7].
Inconsistency Checks: Using design-by-treatment interaction models or node-splitting to statistically assess consistency between direct and indirect evidence [7] [83].
Sensitivity Analyses: Testing the robustness of results by excluding high-risk-of-bias studies, using different priors in Bayesian analyses, or adjusting for different sets of effect modifiers [23]. For population-adjusted methods like MAIC, this includes assessing the post-weighting effective sample size and covariate balance [7].

Table 2: Essential Toolkit for ITC Validation and Analysis

Tool/Reagent Category	Specific Examples	Function in ITC Validation
Statistical Software Packages	R (e.g., `gemtc`, `netmeta`), SAS, WinBUGS/OpenBUGS, JAGS	Performs core statistical analyses for NMA, MAIC, and other ITC methods; facilitates consistency checks and model fitting [5].
Effect Modifier & Prognostic Factor Lists	Clinical expert consultation, systematic literature reviews, clinical guidelines	Identifies key patient and disease characteristics that must be balanced or adjusted for to uphold the similarity assumption [23] [84].
Risk of Bias/Study Quality Tools	Cochrane Risk of Bias tool, modified tools for non-randomized studies	Assesses the internal validity of included studies; informs sensitivity analyses by excluding high-risk studies [23].
Pre-Specified Statistical Analysis Plan (SAP)	Protocol documenting methods, covariates, outcomes, and sensitivity analyses	Mitigates risks of selective reporting and post-hoc data dredging; required by EU HTA guidance [23].

Quantitative Evaluation of Equivalence for Cost-Comparison Studies

In scenarios where ITCs are used to establish clinical equivalence for cost-comparison analyses, moving beyond a mere lack of significant difference is vital. The most promising method involves estimating a non-inferiority ITC within a Bayesian framework, followed by a probabilistic comparison of the indirectly estimated treatment effect against a pre-specified non-inferiority margin [10]. This provides a quantitative and transparent measure of the evidence for equivalence, which is more persuasive to HTA bodies than narrative summaries [10].

The strength of evidence from Indirect Treatment Comparisons is not inherent but is built through meticulous methodology, rigorous validation, and transparent reporting. Common critiques from HTA bodies largely stem from failures in addressing core assumptions, inadequate pre-specification, and insufficient sensitivity analyses. By adopting a strategic approach that integrates early planning, comprehensive similarity assessment, robust statistical validation, and quantitative evaluation of equivalence, researchers can significantly enhance the credibility and acceptability of their ITC evidence. This, in turn, facilitates informed healthcare decision-making and ensures that patients have timely access to effective new therapies.

Conclusion

The strategic identification of common comparators is a cornerstone of robust Indirect Treatment Comparisons, directly influencing their acceptance by regulatory and HTA bodies. Success hinges on a deep understanding of the available methodological toolkit—from foundational approaches like the Bucher method to advanced population-adjusted techniques—and the deliberate selection of the most appropriate method based on the specific clinical and evidentiary context. As the landscape evolves with initiatives like the EU HTA Regulation, researchers must proactively address challenges of heterogeneity and data timing. Future efforts should focus on standardizing methodologies, developing best practices for complex scenarios like vaccine assessments, and fostering early dialogue with HTA bodies to ensure that ITCs continue to provide reliable, decision-grade evidence for the evaluation of new therapeutics.