Generative Artificial Intelligence in (laboratory) medicine: friend or foe?

AUTORI

Davide Negrini, Giuseppe Lippi
Section of Clinical Biochemistry and School of Medicine, University of Verona, Verona, Italy

ABSTRACT

The artificial intelligence (AI) model ChatGPT, which is capable to generate human-like responses, has recently raised paramount interest and attention due to its inherent capability to write scientific text. The current thinking is that AI cannot be granted the authorship of manuscripts because there is a lack of responsibility, though many scientists still believe that it could help the facility and speed of writing. The currently available online tools of generative AI do not easily allow to accurately acknowledge whether a given scientific text has been composed by human individuals or AI systems. To this end, some aspects could be analyzed, namely repetition (by means of plagiarism checks), style and tone, coherence and structure, context and accuracy, though we proffer that these distinguishing elements may be more nuanced in the foreseeable future. In this article we have also tested the capacity of some different generative AI systems to answer to an easy laboratory medicine query, concluding that the output does not exactly match a text written by a skilled scientist and the algorithms still present imprecisions, suggesting the need for better training. Authoring scientific articles implies skills that could only be developed after years of training and experience, involving a good knowledge of the argument and the ability to think creatively and make connections. At this point in time, generative AI systems could certainly assist scientific writing, but they cannot replace the knowledge, skill and creativity of a human writer.

Generative Artificial Intelligence
Artificial Intelligence (AI) is not a “new topic”, considering that it has been first used in 1956, with the precise meaning that computer systems could be capable to perform some tasks which normally required the use of human intelligence (1). Nonetheless, in the last few years, due to a remarkable increase in computational power and storage capacity, AI has spread throughout many scientific (and also not-scientific) fields, without even mentioning the contributions and opportunities given during the recent coronavirus disease 2019 (COVID-19) pandemic in helping to screen patients with suspected SARS-CoV-2 infection and stratify the risk of unfavourable disease progression (2).
In very recent times, ChatGPT is one of the most interesting applications that has emerged (3), an AI language model developed by OpenAI (OpenAI Inc., San Francisco, CA, USA), which is capable to generate human-like responses (answers or texts) to text-based inputs (questions or information). The amusing aspect is that it works as a chatbot, simulating conversation with a human in a variety of contexts, for example in customer service, education, entertainment, and even more. It can also be trained on specific domains or topics to provide more specialized responses. Notably, not only the online service based on GPT algorithms has emerged as a powerful subsidiary “writing instrument”, but the entire category of online tools capable to generate texts has enormously spread, with a multitude of ChatGPT-like services now available on the Web.
To prove the burning interest in this field, the term “chatgpt” (i.e., the most popular and widespread) on PubMed is constantly increasing in terms of published items, with just 5 documents available in 2022 (the first one appeared on December 8th, 2022), but with as many as 173 new items that have become available during the first 3 months of 2023 (with of a total of 178 items on March 31st 2023) (PubMed Query “(chatgpt) AND ((“2020/01/01”(Date – Publication) : “2023/03/31”(Date – Publication)))” conducted on April 3rd 2023), thus confirming that the previously known interest in all AI technology has really detonated during the last few months (4). The category of AI, in which ChatGPT can be included, is called generative AI, which is a type of AI that can create new and original content, such as images (Figure 1), text, or other content that is similar to (or inspired by) existing data.

The concepts beneath Generative AI
One of the most popular approaches to generative AI is using generative models, such as Variational Autoencoders (VAEs) or Generative Adversarial Networks (GANs). These models are trained on a dataset of existing examples, and they learn to generate new content that is similar to the input data (5,6). The generation process in generative AI usually recognizes three steps (7), similar to all the other supervised or semi-supervised Machine Learning (ML) methods (8). Specifically:
– model training: in this phase AI is trained using input data; the algorithm used depends on the type of input and needed output;
– generation of new data: once the model acquires input data patterns, it can generate autonomously new data never seen before; the generation process is based on probability distributions learned during the training process;
– evaluation of results: the last step consists in evaluation of newly generated data, to understand if they meet the desired standards, using both automated metrics and/or human review; on top of these evaluations the model can be trained again with new information to improve data quality.

ChatGPT-like AIs in the health scientific literature
AI language models are capable of generating texts on a wide range of topics, including scientific articles (9). Asking ChatGPT if it can write scientific articles, the answer is that “My abilities are limited to generating text based on patterns and associations found in large amounts of data. While I can produce coherent and grammatically correct sentences, I do not have the same level of understanding, critical thinking, and creativity as a human writer, particularly when it comes to scientific research”.
Furthermore, the current thinking is that generative AI models cannot be considered “Authors” of manuscripts, because there is a lack of responsibility on the publication and they do not even fulfil widely adopted authorship criteria (10,11,12). Yet, others assert that it can be useful to help reducing writing times (13), or even for producing better texts (14). A large number of online services that offer a chat with a texts generative AI is based on ChatGPT, with some modifications (15), for example New Bing (Microsoft Corp., Redmond, WA, USA), Jasper AI (Jasper Inc., Rollingwood, TX, USA) or ChatSonic (WriteSonic Inc., San Francisco, CA, USA). Besides, other online tools have also been developed (or are currently under development) like Google Bard (Alphabet Inc., Mountain View, CA, USA), Perplexity AI (Perplexity, San Francisco, CA, USA), DeepL Write (DeepL SE, Koln, Germany), Bearly.ai (Bearly Inc.,New York City, NY, USA) or YouChat (SuSea Inc., Palo Alto, CA, USA).

Review of PubMed results about ChatGPT
Out of the 178 articles available in PubMed with publication date until March 31st, 2023 (PubMed Query “(chatgpt) AND ((“2020/01/01”(Date – Publication) : “2023/03/31”(Date – Publication)))” conducted on April 3rd 2023), 11 were inaccessible and 7 were content-unrelated to ChatGPT or generative AI. The largest number of published articles (n=104) were editorials, opinions and commentaries, without any specific research conducted. Only 7 articles reviewed or specifically analysed the ethical implications of generative AI, and 2 articles were systematic reviews (16,17).
The most intriguing category of articles available on PubMed is that including conversations with ChatGPT, or challenging ChatGPT against various tests (n=47). Interestingly, two articles tested ChatGPT against the US Medical Licence Examination (USMLE) and obtained good results (18,19), whilst others tested the performances in answering clinical questions or evaluating clinical cases, generating either acceptable (20,21) or poor (22,23) results in term of quality of the text produced.
Testing some online available generative AIs about a clinical laboratory question
We tested some different generative AIs to answer an easy question for a laboratory professional: “Can you explain me the correlation between troponin and myocardial infarction giving some scientific references?”. The results of such query are provided in the Appendix. Basically, all the generative AI systems seem to follow the same path and present large similarities and overlapping areas. For example, the introductory sentence of most of the generative AI systems tested was nearly identical. The output is not exactly what a skilled scientist would write in the introduction of an article submitted to a peer-reviewed journal and is neither sufficiently accurate nor scientifically sound. With the exception of one AI system, which correctly introduces the concept that “Troponin is a protein complex”, the remaining narrate that “Cardiac troponin is a protein”, which is biologically wrong as there are three cardiac troponins (i.e., C, I and T), only two of which are currently used for diagnosing acute coronary syndromes (24). Interestingly, some of these generative AI systems provide a variety of distinctive details about the clinical usage of cardiac troponins, such as “… troponin levels were highly sensitive and specific for the diagnosis of MI, with a pooled sensitivity and specificity of 0.93 and 0.88, respectively”, “… cTn assays ought to be quantitative rather than binary”, “… cardiac troponin concentrations at presentation are insufficient to distinguish type 1 myocardial infarction from other causes of myocardial injury” and “… a threshold 50 times the upper reference limit is required to achieve a positive predictive value ≥70%”, whose combination may actually contribute to develop a sufficiently usable scientific text. On the other hand, when we tested a couple of this generative AI systems with the query “who is Giuseppe Lippi?” (i.e., the name of one of the two authors of this manuscript), some were indeed capable to generate a sufficiently recent and accurate biography, while others provided outdated (i.e., lower number of scientific articles) or inaccurate (being member of the wrong scientific society) answers. This ultimately suggests that the algorithms underneath would need to be additionally trained and/or improved.

Can we easily understand if it’s human or not?
The answer to this question is rather easy and straightforward: no, not so easily (at least now). We could only suspect that someone could have used generative AI to produce some part of the text of a manuscript by identifying a high-level similarity with other previously published documents using instruments like iThenticate (Turnitin LLC, Oakland, CA, USA), Plagiarisma (Plagiarisma Ltd, London, UK), Grammarly (Grammarly, San Francisco, CA, USA) or other popular “plagiarism-check” tools. Due to remarkable advancements in AI language models, it can hence be very challenging to distinguish between text generated by AI and written by humans. However, there are a few points to search for, when checking if a text has been written by an algorithm (25).
Style and tone: AI language can sometimes result in generic or repetitive sentences; humans tend to have unique writing styles and can infuse their writing with personality.
Coherence and structure: AI models can generate coherent and grammatically correct sentences, but they may struggle to create a long logically flowing text.
Context and accuracy: AI models may produce errors or inaccuracies when it comes to specific information or contexts, such as referring other meanings of the same word, or referencing outdated research. Humans, on the other hand, can rely on their experience and knowledge to provide more correct, updated or integrated information.
Response time: AI language models can generate text much more quickly than humans, often in a matter of milliseconds; if you receive a long text response immediately after sending a request, is very likely to have been generated by an AI.
Nevertheless, it is also noteworthy that further advancements in AI language models would contribute to make these distinguishing factors less apparent in the future, also considering the increasing performances of the GPT-4 model, whose technical report has been published on March 15th, 2023 (26), and which has already been included in the development of New Bing and ChatSonic.

Conclusions
Authoring scientific articles necessitates a deep knowledge of the subject matter, good familiarity with the pertinent literature, critical thinking, along with the capability to conceive and describe new hypotheses. These skills can be typically trained after years of education and experience, involving a good knowledge of the argument treated, but also the ability to think creatively and make connections between often largely heterogeneous pieces of information. It can certainly assist in generating text for a given scientific topic (i.e., a short abstract), but cannot replace the knowledge, skill, and creativity of a human writer. Moreover, the quality of a scientific article depends on the quality of the research and analysis that lie “under the hood”. Generative AI can be used also in the medical field to enhance medical support and maybe help in improving quality of life, but is probable that it will only help in enhancing what humans normally do (27).
The decision of OpenAI to block ChatGPT access from Italy* after the requests of the Italian Garante della privacy (28) (not of the other European entities, considered we are under the same GDPR law in Europe) was unexpected, but on the other hand have many ethical implications. Nevertheless, if we raise privacy concerns about OpenAI’s ChatGPT, the same worries should exist for many other online services, the number of which is increasing every day, built on top of ChatGPT or on other similar technologies.

*OpenAI blocked ChatGPT for Italian users from March 31st 2023 to April 28th 2023
CONFLICT OF INTEREST
None.

BIBLIOGRAFIA

1. McCarthy J, Minsky ML, Rochester N, Shannon CE. A proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955. AIMag 2006;27:12.
2. Negrini D, Danese E, Henry BM, Lippi G, Montagnana M. Artificial intelligence at the time of COVID-19: who does the lion’s share? Clin Chem Lab Med 2022;60:1881-6.
3. OpenAI. GPT-3.5B: A large-scale autoregressive language model with improved training and modeling techniques. Technical Report 2021.
4. Lippi G. Machine learning in laboratory diagnostics: valuable resources or a big hoax? Diagnosis 2021;8:133-5.
5. Brock A, Donahue J, Simonyan K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. 2019 doi 10.48550/arXiv.1809.11096.
6. Dai AM, Le QV. Semi-supervised Sequence Learning. In: Advances in Neural Information Processing Systems. Curran Associates, Inc., hash/7137debd45ae4d0ab9aa953017286b20 (last accessed April 2023).
7. Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley B, Ozair S et al. Generative Adversarial Networks. Epub aop 2014. doi 10.48550/ARXIV.1406.2661.
8. Negrini D, Zecchin P, Ruzzenente A, Bagante F, De Nitto S, Gelati M, et al. Machine Learning Model Comparison in the Screening of Cholangiocarcinoma Using Plasma Bile Acids Profiles. Diagnostics (Basel) 2020;10:551.
9. Salvagno M, Taccone FS, Gerli AG. Can artificial intelligence help for scientific writing? Crit Care 2023;27:75.
10. Thorp HH. ChatGPT is fun, but not an author. Science 2023;379:313.
11. Siegerink B, Pet LA, Rosendaal FR, Schoones JW. ChatGPT as an author of academic papers is wrong and highlights the concepts of accountability and contributorship. Nurse Educ Pract 2023;68:103599.
12. Brainard J. Journals take up arms against AI-written text. Science 2023;379:740-1.
Tregoning J. AI writing tools could hand scientists the ‘gift of time’. Nature Epub aop 2023. doi: 10.1038/d41586-023-00528-w.
13. Rozencwajg S, Kantor E. Elevating scientific writing with ChatGPT: a guide for reviewers, editors… and authors. Anaesth Crit Care Pain Med 2023;42:101209.
14. https://beebom.com/best-chatgpt-alternatives/ (last access April 2023.)
15. Sallam M. ChatGPT Utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. Healthcare 2023;11:887.
16. Vaishya R, Misra A, Vaish A. ChatGPT: Is this version good for healthcare and research? Diabetes Metab Syndr 2023:15;17:102744.
17. Gilson A, Safranek CW, Huang T, Socrates V, Chi L, Taylor RA, et al. How does ChatGPT perform on the United States medical licensing examination? The implications of large language models for medical education and knowledge assessment. JMIR Med Educ. 2023;9:e45312.
18. Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023;2:e0000198.
19. Fijačko N, Gosak L, Štiglic G, Picard CT, John Douma M. Can ChatGPT pass the life support exams without entering the American heart association course? Resuscitation 2023;185:109732.
20. Johnson SB, King AJ, Warner EL, Aneja S, Kann BH, Bylund CL. Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information. JNCI Cancer Spectr 2023;7:pkad015.
21. Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol 2023:S0002-9378(23)00154-0.
22. Morreel S, Mathysen D, Verhoeven V. Aye, AI! ChatGPT passes multiple-choice family medicine exam. Med Teach 2023:1 aop, doi: 10.1080/0142159X.2023.2187684
23. Lippi G, Cervellin G. Genetic polymorphisms of human cardiac troponins as an unrecognized challenge for diagnosing myocardial injury. Int J Cardiol 2014;171:467-70.
24. https://trickmenot.ai/how-do-you-tell-if-something-was-written-by-an-ai/ (last access 03 April 2023).
25. OpenAI. GPT-4 Technical Report. Epub aop 2023. doi: 10.48550/ARXIV.2303.08774.
26. Will ChatGPT transform healthcare? Nat Med 2023;29:505-6.
27. https://www.garanteprivacy.it/home/docweb/-/docweb-display/docweb/9870847 (last access 05 April 2023).