Do We Need to Redefine “Author”?

Letter from the Executive Director, February 2023

Agreeing on common vocabulary is one of the fundamental steps of standardization. Some projects are strictly about defining terminology, others focus on encoding that terminology in systems, while others may have nothing to do specifically with terminology per se, yet they normally include a section of definitions. These definitions could draw on existing terminology  and include definitions set in other documents, but inevitably, nearly every standard defines new terms or it recommends applying terms in new ways. As we think about how new technology should be applied, we must first think about how we describe existing practice when using this new approach.

This applies interestingly in the new world created by advances in artificial intelligence. In November 2022, with the launch of OpenAI’s ChatGPT, the world’s attention was drawn to the significant advances that have taken place in natural language processing and text generation. ChatGPT is a new prototype service that can answer specific questions, craft new text based on prompts, and summarize content. The tool is certainly not without its problems, from not having any citations to sources of information or lacking novel insight to, and most troublingly, simply making things up.

From the perspective of definitions, as we have seen, people have been using ChatGPT to generate text for distribution. I did so myself on the SSP Scholarly Kitchen blog a few weeks back. There are several articles that list ChatGPT as a coauthor. This situation leads to an important question: Who (or what) should be recognized as an author? If I’m using Microsoft Word and use the Thesaurus tools, or the autocorrect feature when typing on my phone, we wouldn’t normally consider this to be “authoring.”  What about if I were using Google Docs, or other writing assistant tools such as Grammerly, and I swipe to accept the automated text suggestion to complete my sentence? Isn’t that AI steering me in a direction, one that is statistically predicting my next phrase? Perhaps it’s offering me a word that I might have used, but often it isn’t. We wouldn’t think of this writing support as being “co-authoring.” Yet when one prompts ChatGPT with a question and the tool produces 500 words in a few dozen seconds, this has somehow crossed an important line. The tool is no longer supporting your work, it is oddly doing the hard work for you.  

Let’s set aside the fascinating question about whether the output is actually correct, either factually or logically well-constructed, or even based on objective reality. Perhaps the “creativity” of using an AI system is the review, editing, and fact-checking necessary to ensure that what you put out under your own name but was written by your AI tool is, in fact, true and worthy of sharing.

There has been a lot of talk of late about retractions, paper mills, and other types of professional malpractice in the publication of science. A concern that has caught on in the academy has been whether ChatGPT and other tools might be used to “cheat” or skirt the understood rules of creation. This newfound fear builds on a long history of people finding ways around doing the hard work of learning or discovery. In the 2010s, there were indications that contract writing of academic papers was employing tens of thousands of writers in India, Kenya, Pakistan, and Morocco. While paying someone to write a paper for you is viewed by the academic community as professional malfeasance, this isn’t true in every domain, particularly in publishing. 

In trade nonfiction publishing, it is estimated that between 50% and 60% of titles are written by ghostwriters. For as long as there has been publishing, there have been people willing to put forward the paid work of others as their own. It is surmised that some works claimed to have been written by Aeschylus in the fifth century BCE were, in fact, produced by his son, Euphorion. This has been particularly true in the realm of politics, notably George Washington’s farewell address, but among many 20th-century US politicians as well. It seems, at least in the trade, having a well-written text is more valued than having one written by the actual “author.”

Which gets to the important questions of definition. Should we consider the role of computational tools to be part of the authoring process, and how should we support transparency in the use of these tools?  Some have begun decrying the practice and Nature, for one, stated that they “would not allow AI to be listed as an author on a paper” it published. Already, some people have worked on developing tools to identify ChatGPT-produced papers. One can envision an adversarial model where one AI system, such as ChatGPT, produces text, while another AI system tries to determine whether the text was written by a computer or a person. Then, the two systems would go back and forth again and again, constantly improving their ability to write and critique a work, ad infinitum, perhaps yielding the most perfect prose—distinguishable as being non-human in its absolute perfection.

Which brings me back to the role of clarity in our definitions. The NISO CRediT terminology was approved last year as a NISO standard. This first version was an adoption of the existing terminology, but it is already recognized that there needed to be additions and edits to the terminology, which the group has begun work on.  It will be interesting to consider whether and how machine-learning tools should be incorporated into the standard.  Should we qualify participation in the writing or research process?  “We used this machine learning algorithm to process the data, this generative tool to create the associated images, and this NLP writing tool to help draft/edit the text.” We might also want to include in the taxonomy a term for ghostwriter, since it is an all too often used role that almost never receives the appropriate recognition. Last week, Nature proclaimed its position in an editorial that stated it would not accept machine generated tools as co-authors, since these tools can not accept responsibility for their outputs.

Sincerely,

Todd Carpenter
Executive Director, NISO