Letter 2 regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Yiwen Zhang; Liwei Wu; Zepeng Mu; Linlin Ren; Ying Chen; Hanyun Liu; Lili Xu; Yangang Wang; Yaxing Wang; Susan Cheng; Yih Chung Tham; Bin Sheng; Tien Yin Wong; Hongwei Ji

doi:10.3350/cmh.2023.0440

Clin Mol Hepatol > Volume 30(1); 2024 > Article

Zhang, Wu, Mu, Ren, Chen, Liu, Xu, Wang, Wang, Cheng, Tham, Sheng, Wong, and Ji: Letter 2 regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Letter to the Editor

Clin Mol Hepatol. 2024; 30(1): 113-117.

Published online: November 10, 2023

DOI: https://doi.org/10.3350/cmh.2023.0440

Letter 2 regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”

Yiwen Zhang^1,^*, Liwei Wu^2,^*, Zepeng Mu^1,^*, Linlin Ren^3,^#, Ying Chen^4,^#, Hanyun Liu^5,^#, Lili Xu^1,^#, Yangang Wang^1,^#, Yaxing Wang^6,^#, Susan Cheng^7,^#, Yih Chung Tham^8,^9,^#, Bin Sheng^10,^#, Tien Yin Wong^9,¹¹

, Hongwei Ji^11,¹²

Corresponding author : Tien Yin Wong Tsinghua Medicine, Tsinghua University, Haidian District, Beijing, 100084, China
Tel: +86-10-62782835, Fax: +86-10-62782835, E-mail: wongtienyin@tsinghua.edu.cn

Hongwei Ji Tsinghua Medicine, Tsinghua University, Haidian District, Beijing, 100084, China
Tel: +86-10-62782835, Fax: +86-10-62782835, E-mail: hongweijicn@gmail.com

^*Contributed equally.

^#LLM-Liver Investigators.

Editor: Ji Won Han, Catholic University of Korea, Korea

Received October 29, 2023 Revised November 8, 2023 Accepted November 10, 2023

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

See the letter "Correspondence on Letter regarding “Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma”" on page 124.

See the Original "Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma" on page 721.

Keywords: Natural language processing; Artificial intelligence; Non-alcoholic fatty liver disease; Patient education as topic

Dear Editor,

We read with great interest the recently published research analyzing the performances of ChatGPT with respect to the management of cirrhosis and hepatocellular carcinoma [1]. In addition to these advanced liver diseases, steatotic liver disease (SLD) also represents a considerable burden on global health, as it affects one-third of the worldwide population [2]. SLD requires long-term self-management and continuous support. This stems from its slow progression, the emphasis on lifestyle changes, and the constant need for regular patient-physician interactions. Therefore, for patients diagnosed with SLD, education plays a pivotal role in understanding, managing, and possibly reversing their condition. In our evolving digital era, large language models (LLMs), which are sophisticated generative AI systems trained on vast volumes of data that are capable of producing human-like textual responses, have emerged as promising aids for patient education [3], particularly in facilitating interactions through natural language dialogues [4]. However, given that the efficacy of LLMs in advancing SLD patient education might vary, it is imperative to compare their performances. Therefore, we conducted a comparative evaluation study to assess the performance of five leading LLMs in responding to SLD-related queries.

Our study was performed between Sep 8th and 28th, 2023. We curated 30 common SLD-related queries spanning domains such as risk factors, clinical test and diagnosis, treatment, follow-up, and prognosis based on guideline-based topics and our clinical experience (Table 1) [5,6]. As a separate and independent prompt, each query was posed to five LLMs: ChatGPT-3.5, ChatGPT-4, Google Bard, Meta Llama2 and Anthropic Claude2, which yielded a total of 30 responses per LLM-chatbot. The generated responses were then randomly ordered within each set of questions and stripped of revealing information (e.g., statements such as “I’m not a doctor” from ChatGPT) to blind reviewers to the LLM-specific response identity. Three seasoned attending-level physicians independently graded the responses as either “appropriate” or “inappropriate” over five rounds, each on a separate day, with an overnight washout interval in between to mitigate memory bias (Supplementary Fig. 1). Specifically, the responses were graded as “appropriate” when they were free from errors and “inappropriate” when they contained potential factual errors that could harm or mislead the average patient. The final grade for each chatbot response was determined using a majority consensus approach, based on the grade most often assigned by the three expert graders.

We assessed the performances of the five LLMs in responding to SLD-related queries. As shown in Table 1, ChatGPT-4 provided 29 of 30 (96.7%) appropriate responses, followed by Bard and Llama2 with 27 of 30 (90.0%), and ChatGPT-3.5 and Claude2 both with 24 of 30 (80.0%), Chi-square test χ²=6.17, P=0.18. A notable area of concern was the frequent misclassification of fatty liver disease as synonymous with nonalcoholic fatty liver disease (NAFLD). This oversimplification can lead to inaccuracies. For example, ChatGPT-3.5 replied to the question “Are there different stages of fatty liver disease, and how do they differ?” with the following response: “Yes, there are different stages of fatty liver disease, which is also known as nonalcoholic fatty liver disease (NAFLD). …. The stages of NAFLD are typically categorized as follows: 1. Simple Steatosis (Fatty Liver): ….2. Nonalcoholic Steatohepatitis (NASH): .... 3. Fibrosis: …. 4. Cirrhosis: ….”

This rigorous evaluation study revealed that, among five state-of-the-art LLMs, ChatGPT-4 could generate largely appropriate responses to patient queries regarding SLD, boasting an impressive appropriateness rate of 96.7%. Other LLMs provided 80% to 90% appropriate responses. Health literacy—commonly defined as the degree to which individuals have the skills and abilities to obtain, process, and utilize health-related information—has emerged as a critical priority in reducing inequities among patients, including those with SLD [7,8]. Our findings underscore the varied potential of LLM chatbots to provide professional yet patient-friendly health literacy guidance to SLD patients [3]. Whereas prior investigations predominantly focused on ChatGPT3.5 [1], our study offers a comprehensive assessment of popular LLMs, namely ChatGPT-3.5, ChatGPT-4, Bard, Llama2 and Claude2, and we specifically evaluated their proficiency in addressing typical SLD-related patient queries. Notably, one in five responses from ChatGPT-3.5 and Claude2 was inappropriate, thus highlighting the need for further iterations and probably domain-specific fine-tuning. Although the exact parameters of ChatGPT-4 remain undisclosed, its impressive performance may result from the large parameter set, extensive user feedback, advanced reasoning abilities, and the integration of insights from previous models into the system [9]. This study derived benefits from implementing a robust study design with proper randomization, wash-out periods and a majority consensus grading process. However, there are also limitations. These sample queries may represent only a small proportion of real-world scenarios. In addition, as the field of LLM evolves at an unprecedented speed, future research is needed to confirm whether LLMs are adapting to new nomenclatures, such as metabolic dysfunction associated steatotic liver disease (MASLD). Generative AI with LLMs—especially ChatGPT-4—may offer yet further valuable insights into opportunities for patient education about SLDs.

ACKNOWLEDGMENTS

This study was funded in part by the National Key R & D Program of China (2022YFC2502800), National Natural Science Foundation of China (82103908), the Shandong Provincial Natural Science Foundation (ZR2021QH014), the Shuimu Scholar Program of Tsinghua University, and National Postdoctoral Innovative Talent Support Program (BX20230189). The funding sources had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

FOOTNOTES

Authors’ contribution

Acquisition of data: Yiwen Zhang, Hongwei Ji, Liwei Wu, Zepeng Mu. Analysis and interpretation of data: Hongwei Ji. Drafting of the manuscript: All authors. Critical revision of the manuscript for important intellectual content: All authors. Statistical analysis: Yiwen Zhang Hongwei Ji. Obtained funding: Hongwei Ji. Study supervision: Hongwei Ji.

Conflicts of Interest

The authors have no conflicts to disclose.

SUPPLEMENTAL MATERIAL

Supplementary material is available at Clinical and Molecular Hepatology website (http://www.e-cmh.org).

Supplementary Figure 1.

Evaluation Process for State-of-the-Art Online Large Language Models in Responding to Steatotic Liver Disease- Related Queries. This figure illustrates the assessment workflow for evaluating the responses generated by LLMs to 30 steatotic liver disease (SLD)-related queries. Each LLM's responses are represented by a unique color. These responses were randomly ordered and stripped of revealing information to ensure a blind assessment. Subsequently, three physicians independently graded the responses as either "appropriate" or "inappropriate" over five days. The final grade for each response was determined using a majority consensus approach.

cmh-2023-0440-Supplementary-Fig-1.pdf

Table 1.

Performance of large language models in addressing patient queries regarding steatotic liver disease

	GPT-3.5	GPT-4	Bard	Llama2	Claude2
Appropriateness, n (%)	24 (80.0)	29 (96.7)	27 (90.0)	27 (90.0)	24 (80.0)
1. Risk factors
Who is more likely to get fatty liver disease?	1	3	3	3	3
What type of diet can help better manage fatty liver disease?	1	0^*	3	3	3
How does alcohol consumption affect my fatty liver disease, and should I abstain from alcohol completely?	3	3	2	3	2
What type and amount of physical activity is recommended for someone with fatty liver disease?	3	2	3	3	3
I have a lean build; how did I develop fatty liver disease?	3	3	2	3	3
How does my family’s health history impact the monitoring of my fatty liver disease?	3	3	3	3	1
2. Test and diagnosis
What are the early signs and symptoms of fatty liver disease that I should be aware of?	3	2	3	2	3
How is fatty liver disease diagnosed?	3	3	3	1	3
Are there different types of fatty liver disease, and how do they differ?	3	3	2	2	2
Are there different stages of fatty liver disease, and how do they differ?	1	3	1	1	1
At what point is a liver biopsy recommended for individuals with fatty liver disease?	0	2	2	1	2
What is the role of imaging tests such as ultrasound, MRI, or CT scan in diagnosing fatty liver disease?	3	3	2	3	2
I have fatty liver disease and my ALT is 100 U/L; how should I interpret this?	2	2	3	2	1
I have fatty liver disease and my FIB-4 score is 1.1; how should I interpret this?	3	3	2	3	3
3. Treatment
How is fatty liver disease treated?	3	2	3	3	3
Are there any specific medications that are commonly prescribed for fatty liver disease?	2	3	3	2	1
How should medication be used to avoid liver damage in fatty liver disease?	1	3	1	2	3
What lifestyle interventions can aid in the treatment of fatty liver disease?	2	3	2	3	3
In severe cases, are there surgical options available for treating fatty liver disease?	2	3	3	3	3
4. Follow up and monitoring
How often should I be monitored if I have fatty liver disease?	2	2	3	2	3
I have fatty liver disease. What tests or procedures will be performed during follow-up appointments?	3	2	3	3	3
I have fatty liver disease. What signs or symptoms should prompt me to seek immediate medical attention?	3	3	3	3	3
5. Comorbidities and prognosis
What other health conditions are commonly associated with fatty liver disease?	3	2	3	3	3
What is the typical prognosis for someone with fatty liver disease?	3	3	3	3	3
Is there an increased risk of heart disease when living with fatty liver disease?	3	3	3	2	1
How does fatty liver disease affect diabetes management, and vice versa?	3	3	3	3	3
Does having fatty liver disease increases my risk for liver cancer?	2	3	2	2	1
How can I understand the stage of my fatty liver disease and the potential progression over time?	1	2	3	2	3
Can fatty liver disease be reversed?	3	3	3	3	3
Can children develop fatty liver disease, and if so, how does it affect their health as they grow up?	2	3	1	3	2

The numbers represent the count of ‘appropriate’ ratings given by three reviewers. ≥2 is considered ‘appropriate.

^* Response from ChatGPT-4 for this particular query: Fatty liver disease, which includes conditions such as non-alcoholic fatty liver disease (NAFLD) and non-alcoholic steatohepatitis (NASH)… 5. Hydrate: - Drink plenty of water throughout the day to stay hydrated…7. Regular Physical Activity: - Along with a healthy diet, engaging in regular physical activity can help in managing the condition…

Two graders found the advice to ‘drink plenty of water throughout the day’ to be inappropriate, noting that while hydration is important, excessive water intake can disrupt electrolyte balance, particularly in individuals with conditions like advanced liver disease that affect water regulation. One grader deemed the physical activity suggestion irrelevant to the query at hand. Additionally, the simplified examples such as NAFLD and NASH was seen as inappropriate by two graders, as the query specifically pertains to dietary modifications and such simplifications could downplay the role of alcohol and other dietary factors.

Abbreviations

SLD

steatotic liver disease

LLMs

large language models

NAFLD

nonalcoholic fatty liver disease

MASLD

metabolic dysfunction associated steatotic liver disease

REFERENCES

1. Yeo YH, Samaan JS, Ng WH, Ting PS, Trivedi H, Vipani A, et al. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. Clin Mol Hepatol 2023;29:721-732.

2. Devarbhavi H, Asrani SK, Arab JP, Nartey YA, Pose E, Kamath PS. Global burden of liver disease: 2023 update. J Hepatol 2023;79:516-537.

3. Varghese J, Chapiro J. ChatGPT: The transformative influence of generative AI on science and healthcare. J Hepatol 2023 Aug 5. doi: 10.1016/j.jhep.2023.07.028.

4. Sarraju A, Bruemmer D, Van Iterson E, Cho L, Rodriguez F, Laffin L. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023;329:842-844.

5. Younossi ZM, Corey KE, Lim JK. AGA clinical practice update on lifestyle modification using diet and exercise to achieve weight loss in the management of nonalcoholic fatty liver disease: Expert review. Gastroenterology 2021;160:912-918.

6. European Association for the Study of the Liver (EASL); European Association for the Study of Diabetes (EASD); European Association for the Study of Obesity (EASO). EASL-EASD-EASO Clinical Practice Guidelines for the management of non-alcoholic fatty liver disease. J Hepatol 2016;64:1388-1402.

7. Carroll AM, Rotman Y. Nutrition literacy is not sufficient to induce needed dietary changes in nonalcoholic fatty liver disease. Am J Gastroenterol 2023;118:1381-1387.

8. Coleman C, Birk S, DeVoe J. Health literacy and systemic racismusing clear communication to reduce health care inequities. JAMA Intern Med 2023;183:753-754.

9. OpenAI. GPT-4 Technical Report. arXiv 2303.08774 [Preprint]. 2023;[cited 2023 Oct 27]. Available from: https://doi.org/10.48550/arXiv.2303.08774.