Hepatocellular carcinoma prediction model performance decreases with long-term antiviral therapy in chronic hepatitis B patients
Article information
Abstract
Background/Aims
Existing hepatocellular carcinoma (HCC) prediction models are derived mainly from pretreatment or early on-treatment parameters. We reassessed the dynamic changes in the performance of 17 HCC models in patients with chronic hepatitis B (CHB) during long-term antiviral therapy (AVT).
Methods
Among 987 CHB patients administered long-term entecavir therapy, 660 patients had 8 years of follow-up data. Model scores were calculated using on-treatment values at 2.5, 3, 3.5, 4, 4.5, and 5 years of AVT to predict three-year HCC occurrence. Model performance was assessed with the area under the receiver operating curve (AUROC). The original model cutoffs to distinguish different levels of HCC risk were evaluated by the log-rank test.
Results
The AUROCs of the 17 HCC models varied from 0.51 to 0.78 when using on-treatment scores from years 2.5 to 5. Models with a cirrhosis variable showed numerically higher AUROCs (pooled at 0.65–0.73 for treated, untreated, or mixed treatment models) than models without (treated or mixed models: 0.61–0.68; untreated models: 0.51–0.59). Stratification into low, intermediate, and high-risk levels using the original cutoff values could no longer reflect the true HCC incidence using scores after 3.5 years of AVT for models without cirrhosis and after 4 years of AVT for models with cirrhosis.
Conclusions
The performance of existing HCC prediction models, especially models without the cirrhosis variable, decreased in CHB patients on long-term AVT. The optimization of existing models or the development of novel models for better HCC prediction during long-term AVT is warranted.
Graphical Abstract
INTRODUCTION
The hepatitis B virus (HBV) is the primary etiology of hepatocellular carcinoma (HCC) worldwide. Antiviral treatment (AVT) can profoundly suppress HBV DNA replication, attenuate hepatic necroinflammation and fibrosis, and halt the progression to HCC, thereby reducing liver-related mortality [1,2]. Nonetheless, AVT reduces but does not eliminate the development of HCC. Therefore, accurate risk prediction is needed to assist with optimized surveillance for the development of HCC.
The existing prediction models for HCC are mainly derived from untreated CHB patients or patients within 1–2 years of AVT initiation. We previously found that most HCC prediction models demonstrated acceptable performance using variable values within two years of AVT [3]. However, long-term AVT modifies the clinical course of CHB by significantly decreasing serum HBV DNA levels, improving liver functions (e.g., lowering alanine aminotransferase [ALT] and improving serum albumin [ALB]), and even regressing the cirrhosis status [4,5]. The model performances, especially those based mainly on pretreatment variables, are expected to decline as the duration of AVT increases. The retrieval of pretreatment or early on-treatment information for HCC prediction in treated patients is often not feasible in the clinical setting. Shifting the basis of calculating HCC risk using on-treatment variables would be an alternative approach. Therefore, it is still worth exploring any declining trajectories in the predictive performance of existing HCC models.
The validation of the existing HCC models with on-treatment variables during long-term AVT is essential for future model refinement and development. Therefore, in the present study, we comprehensively validated and reassessed the predictive performance of 17 HCC models in a multicenter cohort of CHB patients on long-term AVT.
MATERIALS AND METHODS
Study design
This is an external validation study for 17 HCC prediction models in CHB patients administered long-term AVT from a multicenter prospective cohort in China [6]. This cohort enrolled 987 treatment-naïve CHB patients aged 18–65 between 2013 and 2015 who were followed for nearly eight years until September 2022. Patients coinfected with the hepatitis C virus or human immunodeficiency virus were excluded. The inclusion criteria for the validation cohort were as follows: (1) men or women aged 18–70 years; (2) treatment-naïve with chronic HBV-induced fibrosis F2/F3 or with histological or clinical evidence of cirrhosis; (3) pretreatment HBV DNA >2,000 IU/mL for HBeAg-positive or >200 IU/mL for HBeAg-negative cirrhotic patients, and pretreatment HBV DNA >20,000 IU/mL for HBeAg-positive or >2,000 IU/mL for HBeAg-negative noncirrhotic patients. The study was approved by the ethics committee of Beijing Friendship Hospital, Capital Medical Universit y (IRB numbers BJFHEC/2013-027 and 2016-P2-021-01) and informed consent was signed for every patient.
At the initiation of the study, all participants were treated with entecavir at a dosage of 0.5 mg/day. During the follow-up period, ten patients (1%) shifted their therapy to tenofovir disoproxil fumarate or tenofovir alafenamide. Follow-up with liver biochemistry, HBV DNA, and liver stiffness measurement (LSM) was performed at baseline and every 26 weeks thereafter. HBeAg was tested at baseline and every 1–2 years. Liver histology was reviewed in patients with liver biopsies available at baseline, week 78, and week 260, to evaluate the stage of fibrosis.
The status of cirrhosis during AVT was defined by liver biopsy, presence of gastroesophageal varices on endoscopy, or by meeting at least two of the following four criteria: a) Liver surface irregularity and parenchymal nodularity on imaging (ultrasonography [US], contrast-enhanced computed tomography [CT], or magnetic resonance imaging [MRI]); b) Platelet (PLT) <100×109/L with no other causes; c) ALB <35.0 g/L, or international normalized ratio >1.3; and d) LSM >12.4 kPa (when ALT <5×upper limits of normal) [7].
HCC surveillance was conducted by alpha-fetoprotein (AFP) measurement and liver ultrasonography every 26 weeks. Diagnosis of HCC was performed in accordance with the recommendations of the American Association for the Study of Liver Diseases.
HCC prediction models evaluated in the present study
We externally validated 14 HCC prediction models identified in previous systematic review [8] and 3 other models (aMAP [9], CAGE-B [10], and SAGE-B10) published afterward, with an overview of the cohort characteristics of these models presented in Appendix 1.
We classified the models as “treated”, “untreated”, or “mixed” according to the treatment status of the derivative cohort of CHB patients (all treated, all untreated, or a mix of treated and untreated) in the original reports. Furthermore, we directed special attention to the inclusion of the variable “cirrhosis” in the original model scoring formula. Therefore, we stratified the models into four categories: 1) untreated models without the cirrhosis variable (REACH-B [11], NGM1-HCC [12], and NGM2-HCC [12]); 2) untreated models with the cirrhosis variable (GAG-HCC [13]); 3) treated or mixed models without the cirrhosis variable (mREACH-BI [14], mREACH-BII [14], LSM-HCC [15], SAGE-B, mPAGE-B [16], PAGE-B [17], and aMAP); 4) treated or mixed models with the cirrhosis variable (AASL-HCC [18], CAMD [19], REAL-B [20], CU-HCC [21], RWS-HCC [22], and CAGE-B).
Working definitions for predictors and outcomes used in the current study
For each model, six serial analyses were performed to predict the three-year HCC occurrence and 2.5, 3, 3.5, 4, 4.5, and 5 years of AVT were defined as the respective reference timepoints. In each analysis, on-treatment variable values at the corresponding reference timepoint were used as the “baseline inputs” in calculating model risk scores and model performances in predicting subsequent three-year HCC occurrence were evaluated (Appendix 2).
Patients were eligible for inclusion at a reference timepoint if they had never been diagnosed with HCC before that timepoint and had any clinical visits within the subsequent three years. Follow-up duration was defined as the interval between each “reference timepoint” and the date of the last clinical visit or HCC diagnosis, whichever came first, within the subsequent three years.
Statistical analysis
Model discriminations were assessed by the area under the receiver operating curve (AUROC) with 95% confidence intervals (CIs). The criteria used to judge the discrimination with AUROC values were: poor <0.60; possibly helpful between 0.60 and 0.75; and clearly useful >0.75 [23]. Head-to-head comparisons of AUROCs were performed using the Benjamini and Hochberg method to minimize the false discovery rate [24].
The association of the HCC risk score at every on-treatment timepoint with subsequent three-year HCC incidence was evaluated by Cox’s proportional hazards regression model. The model score was treated as a continuous variable. The hazard ratio (HR) of HCC due to every 10% increase in score for each model was calculated to ensure comparability. A lower limit of HR >1 means that the increase in risk score is consistent with the increased probability of HCC events.
To assess the performance of originally recommended score cutoffs in HCC risk stratification, cumulative HCC incidences in high, intermediate, and low-risk groups were calculated for each model at different on-treatment timepoints using the Kaplan‒Meier method. HCC differences between risk groups were compared using the log-rank test.
Calibration was evaluated both quantitatively using Brier scores and graphically using calibration plots for four models (REACH-B, REAL-B, mPAGE-B, and CAMD) with the projected three-year HCC risks for corresponding model scores reported in the original studies.
Missing values of variables including demographic variables (age and sex), medical history (family history of HCC and diabetes), lifestyle factors (alcohol consumption), and laboratory variables (HBeAg, HBV DNA, PLT, ALB, ALT, total bilirubin [TBIL], AFP, and LSM) were handled with multiple imputation. Rubin’s rule was adopted to combine the point estimates and standard errors based on the five imputation sets.
Sensitivity analyses were performed by using completecase and imputation datasets; and by using on-treatment cirrhosis and pretreatment cirrhosis when calculating scores for models involving the variable “cirrhosis”. Subgroup analyses were conducted in both cirrhotic patients and patients stratified as intermediate or high-risk according to each model score and cutoff when initiating AVT.
Statistical analyses were conducted using R version 4.2.1 (R package MICE, survival, psfmi, iterativeBMA, and ggplot2). All reported P-values are 2-sided, with <0.05 considered statistically significant.
RESULTS
Patient characteristics
Among the 987 patients with CHB receiving AVT, 660 who did not develop HCC until year 2.5 and with at least one visit after year 2.5 were included in the present study. At the time of recruitment, 75.3% of the included patients were males, 14.7% had a family history of HCC, 3.2% had diabetes mellitus, and 62.9% had pretreatment cirrhosis (Table 1).
At AVT year 2.5, 45.2% of the patients had cirrhosis, whereas the percentage was reduced to 35.8% at AVT year 5. On-treatment laboratory profiles significantly improved after the initiation of AVT and stabilized or slightly reduced during years 2.5 to 5, with the median levels of HBV DNA decreasing from 1.0 to 0.5 log IU/mL, ALT decreasing from 23.0 to 21.0 U/L, and the LSM decreasing from 7.9 to 7.4 kPa (Table 1).
During a median follow-up of 7.04 years (interquartile range [IQR], 6.97–7.25), 72 HCC cases were diagnosed. Specifically, from year 2.5, 45 HCC cases were diagnosed. With 2.5, 3, 3.5, 4, 4.5, and 5 years as the reference on-treatment timepoints, subsequent three-year HCC incidences were 33 (5.33%), 30 (4.65%), 28 (4.24%), 21 (3.38%), 20 (3.59%), and 16 (2.85%), respectively (Table 1).
On-treatment changes in HCC risk scores
After an early dramatic decline in the first two years, the HCC risk scores decreased slowly or were maintained steadily from years 2.5 to 5 in the total cohort (Table 2, Appendix 3). Model scores also declined with the prolongation of AVT in both patients with and without the development of HCC. However, the difference in scores between HCC and non-HCC patients was narrower in patients with pretreatment cirrhosis than in the total cohort (Appendix 3).
On-treatment changes in model discriminations predicting HCC development
From the initiation of AVT until year 5, a steadily decreasing trend in AUROC was observed for all models when using serial on-treatment variables (Appendix 4). When using on-treatment scores from years 2.5 to 5, the AUROCs of the risk models varied from 0.51 to 0.78 (Fig. 1).
For all three untreated models without the cirrhosis variable, the AUROCs using on-treatment variable values were poor at years 2.5 to 5, and the pooled AUROC estimates varied from 0.51 to 0.59 (Table 3). All seven treated or mixed models without the cirrhosis variable showed possibly helpful AUROCs, with the pooled estimates varying from 0.61 to 0.68 (Table 3). Importantly, models with the cirrhosis variable (independent of untreated, treated, or mixed, excluding CAGE-B), showed numerically higher AUROCs, with the pooled estimates varying from 0.65 to 0.73 (Table 3).
Head-to-head comparison after adjustments of multiple testing showed significantly higher AUROCs in models with the cirrhosis variable than in models without when using on-treatment scores at years 3.5 and 4 (P-values from 0.0011 to 0.0495). No significant difference was found between two models with cirrhosis as a variable at all on-treatment timepoints (P-values from 0.0512 to 0.9996) (Appendix 5).
HR trends in HCC development were associated with changes in on-treatment risk scores
In the study period, the magnitude of increase in HCC risks associated with every 10% increase in model scores was lowered over time, with a decreasing trend in HR estimates observed for all models (Fig. 2).
In the three untreated models without the cirrhosis variable, on-treatment scores did not significantly correlate with HCC risks at either timepoint (all P-values >0.05). For all seven treated or mixed models without the cirrhosis variable, HCC incidence increased significantly with the increase in on-treatment scores at years 2.5 and 3 but then became gradually nonsignificant.
For most models with the cirrhosis variable derived from treated, mixed, or untreated CHB patients, scores remained significantly correlated with HCC incidence, even when using year 5 variable values. A 10% increase in scores signaled a parallel 47% increase in HCC risks for GAG-HCC: 1.47 (1.02, 2.10), 41% increase for REAL-B: 1.41 (1.01, 1.96), 34% increase for CU-HCC: 1.34 (1.03, 1.75), 32% increase for RWS-HCC: 1.32 (1.00, 1.74), and 29% increase for AASL-HCC: 1.29 (1.01, 1.65).
Performance of the score cutoffs originally recommended for HCC risk stratification
In the original reports, 12 models recommended score cutoffs that stratified patients into low, intermediate, and high-risk groups (Appendix 1, Fig. 3, Appendix 6). For treated or mixed models without the cirrhosis variable, HCC incidence did not significantly differ among the three risk groups according to on-treatment scores after year 3.5, except for mPAGE-B, which remained significant until year 4 (P-value=0.0481). For the untreated model with the cirrhosis variable, the recommended cutoff of 100 for model GAG-HCC failed to significantly stratify HCC risks between low and high-risk groups at most timepoints (P-values >0.05). For treated or mixed models with the cirrhosis variable, HCC risks remained significantly different across on-treatment risk groups until year 4 (P-values <0.05). The recommended cutoff value of 4.5 for the RWS-HCC model significantly distinguished risk groups even at 5 years of AVT (1.80% in the low-risk group and 5.15% in the high-risk group, P-value=0.0115).
However, the difference in HCC incidence between the high-risk and intermediate-risk groups gradually diminished with prolonged AVT (Fig. 3). For the mPAGE-B, CAMD, CAGE-B, and SAGE-B models, HCC risks were numerically higher in intermediate-risk groups than in high-risk groups according to on-treatment scores after year 4 (Fig. 3, Appendix 6).
HCC risk in patients stratified into low-risk groups was consistently found to be low for most models (ranging from 0.0% to 3.69%), except for GAG-HCC (2.72% to 4.61%) and LSM-HCC (2.32% to 4.58%). No HCC developed in the low-risk group defined by aMAP at a cutoff of 50 when using on-treatment scores from year 2.5 to year 4.5.
On-treatment model calibration for three-year HCC development
Models with the cirrhosis variable (CAMD and REAL-B) showed lower Brier scores than models without the cirrhosis variable (REACH-B and mPAGE-B). Irrespective of the on-treatment timepoint, REAL-B had the lowest Brier score, ranging from 0.022 to 0.046 (Appendix 7). The calibration plot revealed that REACH-B continuously underestimated HCC risks at all timepoints (Appendix 7). For mPAGE-B, CAMD, and REAL-B, the HCC risks were only well calibrated for patients within the low and relatively low-risk quantiles predicted by serial on-treatment model scores (Appendix 7).
Sensitivity and subgroup analysis
Sensitivity analysis with the complete-case datasets or using pretreatment cirrhosis rather than on-treatment cirrhosis showed similar results (Appendix 8). Subgroup analysis in patients with cirrhosis or patients stratified as intermediate or high-risk categories by the original models at the initiation of AVT demonstrated lower AUROCs than in the total cohort for most models (Appendix 9). The relative merits of models with the cirrhosis variable were not preserved in these high-risk subgroup patients (Appendix 9).
DISCUSSION
In this study, we validated and compared the predictive performance of 17 published HCC models in a prospective CHB cohort receiving long-term AVT. The predictability of all HCC risk models decreased with the prolongation of AVT, with modest to poor discriminations when using on-treatment values for AVT from years 2.5 to 5. However, models with the cirrhosis variable, derived from treated, untreated, or mixed CHB patients, achieved higher discrimination than models without the cirrhosis variable. We also found that the reported cutoffs for HCC risk stratification might require some amendment in the era of long-term AVT.
The key finding of the present study is that the predictability of the models attenuates when using serial on-treatment values in long-term AVT. A previous report on Caucasian patients with CHB also found that the performances were suboptimal when estimated at year 5 of AVT for PAGE-B, CU-HCC, or GAG-HCC [25]. Two possible reasons might explain this decreasing trend. First, long-term antiviral treatment modifies the baseline HCC risks evaluated by the values of key predictive variables such as HBV DNA, ALT, AST, PLT, and LSM before AVT [9,10]. Therefore, if calculated with on-treatment values, the prognostic significance of these predictive variables in existing HCC models was lowered after long-term AVT. Our study showed this by the diminishing trend in HR estimates of each variable (Appendix 10). Second, patient age, an independent risk factor for HCC, increases with the prolonged duration of AVT. This leads to an increase in the model score of HCC risk. The relative weights of these on-treatment predictors might change with long-term AVT. Thus, the model scores that were based on the relative weights of predictors at pretreatment or early on-treatment timepoints without including dynamic changes in the key predictors would perform suboptimally after long-term AVT. Future studies are required to investigate whether the adjustment of the relative weights of these predictors and the use of artificial intelligence might help to improve the predictability of model scores during long-term AVT.
Nevertheless, the performance of models containing cirrhosis as a variable generally descended more slowly than those of other models. Cirrhosis is a crucial risk factor for HCC development regardless of the use of antiviral drugs [26], but it could be regressed with long-term AVT. Indeed, the proportion of patients with cirrhosis in our validation cohort decreased from 62.9% at pretreatment to 35.8% at treatment year 5. Furthermore, cirrhosis is also a more stable factor and less susceptible to acute flares than other indicators [27]. Therefore, it is not surprising that the inclusion of cirrhosis would allow for better discrimination of HCC [28]. The inclusion of other on-treatment predictors that also gauged the severity of liver fibrosis (e.g., LSM, PLT, or ALB) [29-31] did not add significant value during long-term AVT in our cohort. Taken together, this evidence suggests that models that include the cirrhosis variable might be better options for HCC surveillance in CHB patients on long-term AVT [26].
Furthermore, an amendment to the original cutoffs for HCC risk stratification might be required when applying model scores in patients using AVT long-term. Accurate cutoff values are essential for stratifying HCC risks and optimizing HCC surveillance in patients with CHB. Using the originally recommended cutoffs, we found that the magnitude of the difference in HCC risks between the high and intermediate-risk groups was lessened with the prolongation of AVT. Therefore, further optimization for cutoffs to identify truly high-risk patients would be justified. On the other hand, the three-year HCC risk in the low-risk group defined by GAG-HCC and LSM-HCC was relatively higher (2.32% to 4.61%) in our validation cohort. These HCC risk values are far beyond the recommended surveillance threshold of 0.2%/year in hepatitis B carriers and near or exceeding the threshold of 1.5%/year in patients with cirrhosis [32]. In the late antiviral period, an amendment to these cutoffs might help to identify patients with minimal HCC risks, enable less intensive HCC surveillance, and spare patients from undue anxiety and unnecessary interventions.
In addition, subgroup analysis showed that the AUROCs of HCC prediction models decreased more profoundly with AVT in patients with pretreatment cirrhosis than in the total cohort. Our results are consistent with previous findings of numerically lower AUROCs in cirrhotic patients using pretreatment model scores [33-36]. In our study, we found that the difference in the risk scores between patients with HCC development and those without was less obvious in cirrhotic patients than in the total cohort and further narrowed with the length of AVT. It is probable that in this relatively “homogeneous” subgroup, the predictive value of conventional predictors themselves or the classification of the predictors might be attenuated in discriminating HCC development [23]. Future development of novel biomarkers is warranted to improve the predictability for these high-risk CHB patients [37].
With a follow-up of eight years after the initiation of AVT, our external validation conducted in patients from 22 tertiary medical centers provided meaningful results on the utility of the 17 HCC prediction models. In addition, the comprehensive statistical analysis, including sensitivity and subgroup analysis with multiple imputation, increased the robustness of the study results.
However, several limitations should be mentioned. First, our study population involved only a single ethnicity. Thus, these results cannot be generalized to other ethnic groups without further validation. Second, since the data were generated from a cohort treated with entecavir, it is not clear whether the conclusions may apply to patients treated with other nucleoside/nucleotide analogs. Third, the relatively small number of endpoints might result in a statistically insignificant difference in model comparisons. Fourth, calibrations were not evaluated for all models due to limited parameters reported in the original literature. Further external validation studies with larger sample sizes and broader population characteristics are needed to confirm the findings.
In conclusion, our study found that the performance of existing HCC prediction models in CHB patients with long-term AVT decreased to modest or poor levels. In addition to the baseline measurements, on-treatment modification of HCC risk factors should be emphasized in the future refinement and novel development of HCC prediction models for patients on long-term AVT.
Notes
Authors’ contribution
Hong You, Yuanyuan Kong, and Jidong Jia designed the study. Xiaoning Wu and Xiaoqian Xu drafted the manuscript and prepared the figures and tables. Xiaoqian Xu performed the data analysis. Hong You, Yuanyuan Kong, Jidong Jia and Hwai-I Yang revised the manuscript. Jialing Zhou, Yameng Sun, Huiguo Ding, Wen Xie, Guofeng Chen, Anlin Ma, Hongxin Piao, Bingqiong Wang, Shuyan Chen, Tongtong Meng, and Xiaojuan Ou collected the data and interpreted the results. All authors approved the final version of the paper.
Conflicts of Interest
The authors have no conflictsto disclose.
Acknowledgements
This work was supported by the National Major Science and Technology Projects of China (No. 2018ZX10302204, 2017ZX10203202-003), the High-level Public Health Technical Talents of the Beijing Municipal Health Commission (No. XUEKEGUGAN-010-018), and Beijing Municipal Administration of HospitalsIncubating Program (No. PX2023005).
SUPPLEMENTAL MATERIAL
Supplementary material is available at Clinical and Molecular Hepatology website (http://www.e-cmh.org).
Abbreviations
AFP
alpha fetoprotein
ALT
alanine aminotransferase
AVT
antiviral therapy
AUROC
area under the receiver operating curve
CHB
chronic hepatitis B
CI
confidence intervals
HBV
Hepatitis B virus
HCC
hepatocellular carcinoma
HR
hazard ratio
LSM
liver stiffness measurement
References
Article information Continued
Notes
Study Highlights
• The performance of the 17 HCC prediction models decreased with the prolongation of antiviral therapy, with modest to poor AUROCs using on-treatment scores at year 2.5 to 5.
• Models containing the variable of cirrhosis showed higher predictive performance and decreased less profoundly than models without during late antiviral treatment.
• During long-term antiviral therapy, further optimization for existing HCC prediction models or development for novel modelsis justified.