# Bayesian adaptive design for pediatric clinical trials incorporating a community of prior beliefs | BMC Medical Research Methodology

0
40

### Case study

The case study is a published phase III placebo-controlled randomized pediatric clinical trial to evaluate the safety and efficacy of a single treatment of two doses (4 U/kg and 8 U/kg) of Botox with standardized physical therapy (PT) in pediatric patients with lower limb spasticity on which pediatric approval was based. The same product was previously approved in adults on the basis of a single-phase III placebo-controlled study in a similar indication. In the pediatric trial, 412 subjects 2 to 16 years and 11 months of age were randomized in a 1:1:1 ratio to the Botox 8 U/kg group, Botox 4 U/kg group, or control group. The full label information is available at https://www.fda.gov/media/131444/download [22].

The original analyses for both the adult and pediatric trials were frequentist approaches, so we re-analyzed the primary efficacy endpoints using a Bayesian model to obtain posterior mean with standard deviation for the convenience of applying Bayesian adaptive design methods.

Table 1 summarizes both the pediatric and adult clinical trial designs and results of the primary efficacy endpoints used in the approval of Botox for the treatment of pediatric lower limb spasticity. For normal endpoint, the posterior distribution is approximately normal, so an approximate 95% credible interval (CI) can be computed as: posterior mean ± 2 × posterior SD. Then the approximate 95% CI for the treatment difference between Botox 4 U/kg group and control is (-0.10, 0.30) which contains zero, i.e., not enough evidence to declare treatment superiority to control. Therefore, we aimed at proposing an innovative Bayesian adaptive design to achieve treatment efficacy while maintaining good trial property.

### Prior beliefs

For the case study, we focused on the Bayesian analysis on two arms, the Botox 4 U/kg group and control group as the Botox 4 U/kg group was less efficacious (Table 1) and arm dropping is not the focus of our proposed method. We specified the priors separately for the two arms, which would lead to a prior on the difference between the Botox 4 U/kg treatment group and control group, and then we created a community of priors to be imposed on the difference between treatment (Botox 4 U/kg) and control to be consistent with the original analysis.

The skeptical prior is the pediatric stand-alone prior following a normal distribution with mean zero and standard deviation (SD) 0.5, which indicates no difference between treatment and placebo, i.e., skeptical viewpoint about treatment benefit. Our choice for standard deviation (SD) of the proposed skeptical prior was based on prior sensitivity analysis. We’ve investigated the impact of different choice of SD (0.1, 0.2, 0.5, 1, 2, 5, 10) on the posterior estimates of difference between treatment control and found that the posterior estimates were similar when SD ≥ 0.5. Therefore, we decided on a weakly-informative prior of (N(mathrm0,0.5^2)) for the difference between treatment and control. The enthusiastic prior is extrapolated from the adult trial results with mean 0.20 and SD 0.10 obtained from the adult trial posterior distribution, i.e., enthusiastic viewpoint about treatment benefit. The noninformative prior is a flat distribution with heavy tails centered at zero and SD 100, which provides no prior information with large variability and is therefore equivalent to frequentist approach, i.e., let the data speak for itself with no underlying strong opinion about treatment benefit. The choice of SD for noninformative prior was also based on sensitivity analysis. We also calculated prior effective sample size (ESS) to quantify the amount of information borrowed from the adult data through the prior [23]. We used variance-ratio (VR) method [24] to compute for prior ESS in our case of normal-normal model with conjugate prior. Based on Table 1, the variance of pediatric trial data is (sigma ^2=0.1^2), the prior ESS is (fracsigma ^2sigma _skep^2=frac0.1^20.5^2approx 0.04) for the skeptical prior and (fracsigma ^2sigma _enthus^2=frac0.1^20.1^2=1) for the enthusiastic prior. Therefore, both the skeptical and enthusiastic prior have minimal informativeness. Additionally, the prior ESS is (fracsigma ^2sigma _noninf^2=frac0.1^2100^2approx 0).000001 for the non-informative prior.

Figure 1 plots the distributions of these three different prior beliefs: the pink dashed line is the skeptical prior, the black solid line is the enthusiastic prior, and the green dashed line is the noninformative or flat prior.

In this section, we will re-design the phase III pediatric clinical trial to illustrate an innovative Bayesian adaptive design method incorporating two prior distributions which represent two extreme ends of prior beliefs: skeptical and enthusiastic. For demonstration purposes, we focused on the Bayesian sequential monitoring for the treatment difference between the Botox 4 U/kg group and control group in the virtual execution of the pediatric trial. So, we are re-designing a new trial that has two arms and randomization is 1:1 for allocation to control and treatment (the Botox 4 U/kg group).

Under the context of the re-design using the proposed Bayesian adaptive design method, the early stopping criteria for success was based on skeptical prior and the early stopping criteria for futility was based on enthusiastic prior. We adopted the Haybittle–Peto approach for the choice of early decision boundaries [25, 26], i.e., the same threshold at every interim analysis:

1. a)

stop early for success based on skeptical prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmskeptical;mathrmpriorright)>s_e$$

2. b)

stop early for futility based on enthusiastic prior if posterior probability

$$Pr;left(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmenthusiastic;mathrmpriorright)<f_e$$

Where the early success boundary (s_e) is the early success boundary and (f_e) is the early futility boundary. The success and futility criteria were also evaluated at the final analysis:

1. a)

achieve late success based on skeptical prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmskeptical;mathrmpriorright)>s_l$$

2. b)

achieve late futility based on enthusiastic prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmenthusiastic;mathrmpriorright)<f_l$$

If the trial does not achieve any of the early or late success/futility criteria, inconclusive results will be obtained. Inconclusive pediatric clinical trials need to fulfill post marketing requirements without getting subsequent trials. Therefore, definitive answer is important in pediatric as it would prevent the delayed or non-use of beneficial therapies [4].

Under the framework of Bayesian methodology, null and alternative hypotheses are defined as different scenarios under which we assess the performance of the simulated trials [27]. The null and alternative hypotheses are (H_0:delta =0) versus (H_1:delta >0), where (delta) is the difference between the true treatment effect for the Botox 4 U/kg group and control group. For all the adaptive designs, the following Operating Characteristics were evaluated:

1. 1)

Type 1 error rate: under the null hypothesis scenario ((H_0:delta =0)) of having no difference, the proportion of such simulations that falsely declared the treatment was superior to control, i.e., the total proportions of early and late success under (H_0)

2. 2)

Power: under a particular alternative hypothesis scenario ((H_1:delta =delta _mathrmtarget)), of having a target difference of 0.05 (i.e., the observed difference between Botox 4 U/kg group and control is 0.05), the proportion of such simulations that concluded that the treatment was superior to control, i.e., the total proportions of early and late success under (H_1)

3. 3)

Futility rate: the total proportions of early and late futility under (H_0) or (H_1) separately

4. 4)

Mean number of subjects: the average sample size across all the simulations under (H_0) or (H_1) separately

5. 5)

Mean trial duration: the average trial duration (in weeks) across all the simulations under (H_0) or (H_1) separately

We need to calibrate and justify the decision boundary for the proposed innovative Bayesian adaptive design by exploring the effect of these boundaries on the Operating Characteristics. When determining the Haybittle–Peto boundary using the frequentist approach, the same threshold for level of significance is chosen at every interim analysis, i.e., 0.001 for the interim analysis, and the final analysis is performed using a standard threshold of 2.5% for level of significance. When using the Bayesian approach, the trade-off between the strength of skepticism in the prior and the early success boundary allows for more flexible decision making in the trial relative to the Haybittle-Peto boundary, i.e., a relaxed Haybittle-Peto approach. More skepticism in the prior impacts the final analysis, whereas increasing the early decision threshold avoids some of this impact, possibly at the cost of a lower early stopping rate when favorable results are seen. We chose 99.8% as the early success boundary because it balanced these concerns and controlled for overall type I error rate. The early futility boundary (f_e) was tuned as 70% to maintain power. At the final analysis, the late futility boundary (f_l) was set to be more stringent as 85%.

In addition to the innovative design, we also investigated the fixed design and several alternative adaptive designs with variations in early stopping criteria (Table 2). We started with fixed design which did not include any interim analysis, then moved on to investigate adaptive design options. As a comparison to adaptive design 3, we also looked at similar designs which only incorporate one type of prior belief at interim analysis: Bayesian adaptive design 1 only stop early for success based on skeptical prior while adaptive design 2 only stop early for futility based on enthusiastic prior. Similar to adaptive design 3, adaptive design 4 includes both early success and early futility decision rules, but all based on non-informative prior.

Frequentist group-sequential design (GSD) is often considered as the benchmark for comparison. To ascertain that the Bayesian adaptive design 4 with non-informative prior is comparable to the frequentist GSD, we rerun the simulation with frequentist decision rule chosen to form 1-to-1 correspondence to the respective Bayesian decision boundary under non-informative prior, and calculated p-value based on one-sided t-test at both interim and final analyses. The Bayesian and corresponding frequentist decision rule at interim analysis:

1. a)

stop early for success based on noninformative prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmnoninformative;mathrmpriorright)>99.8%$$

Comparable to frequentist one-sided t-test p-value < 0.002

2. b)

stop early for futility based on Noninformative prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmnoninformative;mathrmpriorright)<70%$$

Comparable to frequentist one-sided t-test p-value > 0.3

The Bayesian and corresponding frequentist decision rule at the final analysis:

3. c)

achieve late success based on noninformative prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmnoninformative;mathrmpriorright)>97.5%$$

Comparable to frequentist one-sided t-test p-value < 0.025

4. d)

achieve late futility based on noninformative prior if posterior probability

$$Prleft(mathrmtreatment>mathrmcontrolvertmathrmdata,mathrmnoninformative;mathrmpriorright)<85%$$

Comparable to frequentist one-sided t-test p-value > 0.15

We could compare the operating characteristics of the frequentist GSD to Bayesian adaptive design 4 with non-informative prior.

### Simulation Settings

Design simulations were performed using the Fixed and Adaptive Clinical Trial Simulator (FACTS) version 6.3 [28]. As for the execution aspects of the simulated trial, the maximum sample size was set to be 256 and the accrual rate was simulated in FACTS using a mean of 2 subjects per week with no dropouts, according to the original trial property. Patients were randomized to two arms (control, Botox 4 U/kg treatment) with equal allocation (1:1) and their scheduled visit was 12 weeks after randomization. The primary endpoint is a continuous variable following a normal distribution; therefore, Bayesian independent dose model was used under the FACTS Core Design-Continuous module:

$$Ysim N(theta _d,sigma ^2)$$

$$theta _dsim N(mu _d, v_d^2)$$

$$sigma ^2 sim textInvers e-Gammaleft(alpha ,beta right)=textScaled-inverse-chi-squaredleft(fracsigma _n2,fracsigma _mu ^2sigma _n2right)$$

where (d=1) denotes the control group, (d=2) denotes the Botox 4 U/kg treatment group. As mentioned before, different prior beliefs will be imposed on the difference between treatment and control, i.e., (theta _2-theta _1). In FACTS, prior for each experimental arm needs to be specified separately, so to achieve the same prior specification as denoted in Fig. 1, we could introduce priors for (theta _d, d=1, 2) as follows:

• Under skeptical prior belief: (theta _1sim N(0, 0.3536^2)), (theta _2sim N(0,0.3536^2)), so that (theta _2-theta _1sim Nleft(0, 0.5^2right)) since (sqrt0.3536^2+0.3536^2=0.5.)

• Under the enthusiastic prior belief: (theta _1sim N(0, 0.0707^2), theta _2sim N(0.2, 0.0707^2),) so that (theta _2-theta _1sim Nleft(0.2, 0.1^2right)) since (sqrt0.0707^2+0.0707=0.1).

• Under the noninformative prior belief:(theta _1sim Nleft(0, 70.71^2 right), theta _2 sim Nleft(0,70.71^2right)), so that (theta _2-theta _1sim N(0, 100^2)) since(sqrt70.71^2+70.71^2=100).

For the prior imposed on (sigma ^2), the Inverse-Gamma distribution could be reparametrized as the Scaled-inverse-chi-squared distribution [29]:

$$chi ^-2left(sigma ^2|sigma _n, sigma _mu right)=frac1Gamma left(fracsigma _n2right) left(fracsigma _mu ^2sigma _n2right)^fracsigma _n2left(sigma ^2right)^-fracsigma _n2-1mathrmexpleft(-fracsigma _mu ^2sigma _n2sigma ^2right)$$

where the parameter (sigma _n>0) is the degree of freedom or weight, and (sigma _mu >0) is the scale or central value. As denoted in Gelman et al. [29], the Scaled-inverse-chi-squared distribution provides the information equivalent to (sigma _n) observations with squared standard deviation (sigma _mu ^2), and increasing (sigma _n) corresponds to increasing the effective strength of the prior. As for prior choice, weakly informative prior instead of noninformative prior was considered since the resulting posterior distribution was highly sensitive to the choice of weight (sigma _n) and scale (sigma _mu ), and noninformative on the log scale may not work [30]. Prior sensitivity analysis was conducted to investigate the impact of different prior distribution of (sigma ^2) (different combinations of weight (sigma _n) and scale (sigma _mu )) on type I error rate, and we chose (sigma _n=1, sigma _mu =0.07) to control for type I error at the nominal level of 2.5%.

Using the specified model, we then performed FACTS simulations under different hypothetical subject response scenarios presented in Table 3. To optimize the number of interims, we also simulated trials which had between 1 and 18 interim analyses that were evenly spaced by number of patients enrolled (Table 4). Note that scenario with 0 interim is corresponding to the fixed design, which works as a reference for each of the adaptive designs. For each adaptive design candidate, 10,000 virtual trials were simulated in FACTS under each hypothetical scenario and each specification of number of interims. These simulations allow us to evaluate Operating characteristics including type I error rate and power, as well as estimating expected trial duration and number of subjects enrolled when performing an increasing number of interim analyses.

Operating characteristics could be directly obtained from FACTS for fixed design & Bayesian adaptive design 1, 2, 4. As for the proposed adaptive design 3, Additional handling was conducted using R [31] for the FACTS output generated under the FACTS Core Design-Continuous module, and figures were produced using the package ggplot2 following the steps below (The FACTS screen-cuts and R code were provided in supplementary file 1):

Step 1: Create a FACTS adaptive design with the skeptical prior and include the interims and the QOIs but do not implement any stopping criteria so all interims are evaluated, and every simulation runs to full accrual and final analysis, then output weeks files for every simulation.

Step 2: Create a new FACTS adaptive design and change the prior to the enthusiastic prior and re-simulate without adaptation by keeping the same random number seed and making no other changes so that exactly the same patient responses are simulated.

Step 3: Aggregate the weeks files for designs simulated of the same trials but with skeptical or enthusiastic prior from Step 1 & 2 separately.

Step 4: Load the 2 sets of aggregated weeks files into R and join them on the Sim and Scenario ID columns so we have posterior probabilities under either skeptical or enthusiastic prior at each interim.

Step 5: Analyze the joined data for each simulated trial to see which stops early for success on the skeptical prior at interims, which stops early for futility on the enthusiastic prior at interims, which makes no early stopping up to full accrual or reach inconclusive at final analysis.