Clinical and Molecular Hepatology



Yeo and Yang: Correspondence on Letter regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Correspondence on Letter regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Yee Hui Yeo1, Ju Dong Yang1,2,3
Received November 10, 2023       Accepted November 13, 2023
Dear Editor,
We thank the LLM-Liver Investigators for commenting on our recent publication [1]. Their commentary presents a comprehensive assessment of the capability of various Large Language Models (LLMs) in providing patient education on liver diseases [2]. The authors’ work to assess various LLMs’ performance in providing patient education on liver diseases is a timely study that contributes to understanding the role of multiple recently-introduced LLMs in healthcare dissemination. In the study, LLM’s high performance in delivering appropriate responses underscores the potential for AI to support healthcare providers in disseminating accurate medical information. By setting a benchmark for LLM performance, the study not only contributes to the academic field but also lays the groundwork for the development of AI-driven patient education tools. Their findings could play a crucial role in bridging the gap in health literacy and ensuring equitable access to medical information across diverse patient populations [3].
There are several aspects that need clarification for proper interpretation of the results. First, the term “steatotic liver disease” is a recently developed nomenclature [4]. The reliability of LLMs to provide up-to-date information depends on their training with current datasets. As the authors used “fatty liver disease” instead of steatotic liver disease, the study would benefit from disclosure of the end dates of the datasets used to train each LLM to ensure that the information provided meets contemporary standards. Furthermore, using the same terminology as that used in recent guidelines would enhance the study’s applicability and clarity. Second, the methodology behind the question selection process remains unclear in the authors’ study. An explanation of how the 30 questions for each LLM were chosen, potentially from clinical guidelines and the authors’ experience, would solidify the study’s robustness. It would pre-emptively address concerns regarding the selection of questions that might disproportionately favor the capabilities of LLMs.
Additionally, the “washout” period used in the study to minimize recall bias may raise concerns. Recalling responses from previous rounds may influence subsequent evaluations. Finally, it is unsure if there is any statistically significant difference in the performance among the LLMs as there was no analysis performed.
In conclusion, the authors’ study represents an important contribution to the field of AI in patient education. By addressing the areas outlined above, the study can achieve greater validity and provide a more reliable framework for assessing the capability of LLMs in patient education. It is with great anticipation that the medical community looks forward to additional research that builds on this work. We again congratulate the authors for performing the study to enhance our ability to understand and harness AI’s potential in enhancing patient outcomes and health literacy.

Authors’ contribution

Drafting manuscript: YYH. Critical review and supervision: JDY.

Conflicts of Interest

The authors have no conflicts to disclose.


Large Language Models



1. Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023;29:721-732.
[PubMed] [PMC]
2. Zhang Y, Wu L, Mu Z, Ren L, Chen Y, Liu H, et al. Assessing stateof-the-art online large language models for patient education regarding steatotic liver disease. Clin Mol Hepatol 2023 Nov 10. doi: 10.3350/cmh.2023.0440.

3. Singh N, Lawrence K, Richardson S, Mann DM. Centering health equity in large language model deployment. PLOS Digit Health 2023;2:e0000367.
[Article] [PubMed] [PMC]
4. Rinella ME, Lazarus JV, Ratziu V, Francque SM, Sanyal AJ, Kanwal F, et al. A multisociety Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 2023;78:1966-1986.

Go to Top