A Quarter Century In and 25 Years Forward

January 2025

Letter from the Executive Director, January 2025

As the calendar turns from one year to the next, it’s common to look back as well as forward. At the start of 2025, we’re at a reflection-worthy inflection point. It is an exciting time to think in much broader terms about NISO and the information community’s transformations over the past two and a half decades. It's also a good time to ponder where the information world might be headed in the coming decades. While my 2017 Newsline introduction seems reasonably prescient upon reflection, it also feels a bit constrained. In that missive focused on trends for the coming year, I highlighted information security and privacy as well as issues around the “live web” and ensuring the preservation of a version of record for a post-factual world. As we near the quarter-century mark, we might take a broader view.

Back at the start of 1999, the information community was consumed by the excitement of the first internet bubble. And while nearly $50 million in grant funding was directed toward these efforts in the 1990s, publishers and libraries were struggling with the scale of the digitization project that lay before them. For example, JSTOR, launched in 1995, was digitizing upwards of 100,000 pages of journals per month in 1999 and was working on digitization of the entire run of the Transactions of the Royal Society.

Simultaneously, each player was independently struggling to determine their own approach to the basics of digital formats: linking, licensing, and metadata. Recognition that those with disabilities should be supported led to the first efforts to make web content more accessible. In 1999, the W3C released the first version of the Web Content Accessibility Guidelines (WCAG). The role of licenses, contracts, and business models was also at the forefront of people’s minds. Business models and licensing terms began to gain traction, with the UK Publishers Association and JISC announcing their first model license in January. Two years later, in 2001, Creative Commons would launch as a model whereby authors could exert control over distribution and licensing of their content.

In February 1999, NISO, along with the Digital Library Federation (DLF), the National Federation of Advanced Information Services (NFAIS), and the Society for Scholarly Publishing (SSP), co-hosted a workshop outside of Washington, DC on “Linkage from Citations to Electronic Journal Literature.” This was the first of a series of workshops and papers NISO produced as part of its reference linking initiative. At that point, it was far from clear that the nascent Digital Object Identifier (DOI) system would be widely adopted. Content providers were arranging reciprocal linking agreements, and the connection between different versions of an object was yet to be clearly defined. Questions swirled about whether a print version was different from an electronic version and whether they should be cited differently.

In early 2000, a separate initiative looking at how the move toward the web was redefining the publication process was also getting underway. NISO and NFAIS co-hosted another workshop as a premeeting of the NFAIS conference, which would become NISO Plus some 20 years later. The topic of this event was “Details on the Electronic Journals.” The outputs suggested a range of developing best practices around e-journal distribution, including DOI presentation, ISSN and masthead content for digital versions, updating the standards for bibliographic references, consistent terminology used in Document Type Definitions (DTDs), contextual linking, and archival formats for content as well as archival metadata. A sentence from the conclusion of the event noted that “this Workshop did not cover many of the issues relevant to electronic publishing, including access control, rights management, licensing and contracts, and privacy of users.”

In the fall of 2000, NISO and the National Institute of Standards and Technology (NIST) hosted the third annual Electronic Book Conference 2000: Changing the Fundamentals of Reading. That meeting drew nearly 1,000 people, and the agenda included the changing role of authors and agents in the digital marketplace, digital rights management, hardware and software format standards, accessibility, and changing business models.

When it came to search and description of digital content, there was also much activity 25 years ago. The Dublin Core (DC) metadata element set was balloted in 2000 and the draft was approved in August, although final publication would happen the next year. This propelled the DC terms toward worldwide adoption as an ISO standard in 2003. The indexing and thesauri standards were being reconsidered for a new generation of digital applications. Discussions were also just beginning on how to locate and direct people to the appropriate copy of an object. This effort would eventually lead to the development of the OpenURL specification.

While some of these topics might make up similar workshops today, it should be recognized how much progress has been made in many of these areas. One can see throughlines from these nascent conversations to the significant advances we see today. The discussions around the need for a standard format for eBooks led to the creation of the Open eBook Publication Structure in 1999. The group behind this work eventually coalesced into the International Digital Publishing Forum (IDPF) in 2000 and the subsequent development of the EPUB standard. The focus on accessibility in these early eBook conversations led directly to the inclusion of accessibility as a foundation of the eventual EPUB standard. Today, EPUB has become the dominant format for creating and distributing eBooks in the marketplace; it has been adopted as an ISO standard, and by dozens of reading and composition systems.

The linking conversations launched in 1999 led to the formalization of DOI within NISO and supported its establishment as the primary method for citation linking in scholarly publishing. The idea of consistent markup vocabularies for publication processes led to the creation of the National Library of Medicine DTD and eventual standardization of the Journal Article Tag Suite (JATS) models. Subsequent advances in adoption and extensions of the Dublin Core metadata, OpenURL, and the many components of the E-Resource Management Initiative (ERMI), such as COUNTER/SUSHI, KBART, and institutional identifiers, all grew slowly over time. Along the way, there have been significant advances in a variety of standards and technologies—everything from reference structures to transliteration of non-Latin text for mapping western keyboard input structures on smartphones.

What will we focus on for the next 25 years?

If we consider the thorny questions emerging today, what might we be wrestling with for the next several decades? William Gibson’s quote about the future being here, just unevenly distributed, is applicable. I don’t mean this in the sense that we don’t have any agency or that our actions can’t impact the Silicon Valley tech tools passing down from on high. We do have agency, and our collective work over the past 25 years proves that we can and do impact the broader ecosystem.

We are not so much at an inflection point as amid a broad curve. The shape of that curve and the trajectory in which we are headed isn’t entirely clear, but having watched this industry and participated in it over the past 30 years, I believe that none of the transformations will happen quickly. The immediate changes put into place today will inevitably adapt and change over time. We are also only at the beginning, at a time in which bespoke solutions are normal and the industry has yet to coalesce around consistent approaches.

Here are my thoughts on several themes that will continue through 2050 to be the focus of our collective attention in the world of information management:

Integration: A great deal of time has been spent on developing different silos for different types of content. This started with journals, but then the energy moved to books and then data sets. Now software, methods, models, and other elements of the research process are being digitized, identified, and made discoverable and reusable. Much of this is happening in content-type-specific repositories, and the ecosystem isn’t yet fully integrated into a cohesive, integrated whole. Integration need not mean that all content is housed in a single place or experience, such as a journal article wrapper. It does mean that the various elements are not simply linked, but also seamless in the user experience. It will facilitate an understanding of the impact of an entire effort, not just one output, and make it far more difficult to fake or mask the integrity of any one element. Integration, however, particularly across platforms, is difficult consensus work.

Identity and validation: With the recent proliferation of machine-generated content, deep fakes, and other malicious content generation applications, we are quickly returning to a world where one asks for proof of who you are or what you have done. The provenance of content and the verifiability of credentials will be a long journey for the entire digital landscape. The security and trustworthiness of people, content, credentials, authentication methods, and systems are all being rethought. Unfortunately, the lack of secure identity for people, devices, and things attached to networks or managed in digital systems was the original sin of the internet in its earliest days. We will be working to determine how we verifiably connect the things in the real world to the digital world, and how we ensure that information about those items is valid, trustworthy, and reliable, for decades to come.

Processing: At present, artificial intelligence is a broad term that applies to literally dozens of digital technologies that fundamentally manipulate, analyze, and curate data, whether it be in the form of text, audio, video, or images. What are its appropriate applications and uses? What are the best practices for those environments? What are the common understandings of the business processes associated with algorithmic processing of information? The next 25 years will be spent trying to determine the scope and application of those technologies.

Despite the current hype, LLMs, generative artificial intelligence systems, and other AI-based tools are simply the next iteration of a variety of algorithmic processing techniques. We have seen several iterations already, such as indexed search and linked open data. The reliability of these systems will rest on the quality of content used to train and validate the results of these new processing tools. We are just beginning to address the thorny questions of licensing, business models, and interoperability that will become the foundation of the next algorithmic processing age. This started with text and data mining licenses in the scholarly community, but the next generation of agreements will be far trickier, because the applications won’t be for a single analytical use. Rather, they will involve repurposing, reworking, and republishing the ingested content in novel ways. Authors and the publishers will want to ensure credit and remuneration for their works. What that will look like in the future will take years to work through. While there remain many technical problems, the social and business issues always take longer to address.

Curation: To support the algorithmic processing I just described, content must be well formed and structured in a way that algorithms can navigate, consume, and understand. A great deal of excitement (and money) has been generated by the notion of self-training neural network models. Aside from a few game-related examples, most of the models available today are based on human-curated, training data sets that form the basis of future machine training. The worldwide market for data annotation services, primarily for AI-based companies, was over $1 billion US in 2023 and is projected to grow at nearly 24% per year for the next five years. Human knowledge curation of data of all types is a critical but underappreciated component of the AI ecosystem. Unfortunately, content creators, be they authors, machine generation tools, data collection instruments, or scientists, are generally not great at the curation and annotation necessary to support robust analytics on data sets. How this gets done—effectively and efficiently—remains to be determined.

Preservation: Much like accessibility, preservation requires constant attention and ongoing investment. Digital preservation was noted as a significant problem years ago. While progress has been made, far too often more pressing concerns shunted this issue to the side. Archiving and preservation have always been poor stepchildren to the technology systems of search, discovery, and service provision. Bad user experience for patrons or readers will always attract more attention than the environmental conditions of infrequently visited storage facilities. With digital systems, pressure to improve preservation is even less visible. While there have been successes such as the launch of the LOCKSS initiative in 1999 and the adoption of Portico, launched in 2005, a great deal of work remains. In some ways, because of the success of these projects and the creation of the Internet Archive in 1996 (although its crawls were not publicly available until 1999), many view the problems of digital preservation as resolved. This is far from the case. Even understanding what exactly should be preserved and what counts as the version of record are concepts that will need constant updating. They are currently very much based on notions of the physical rather than digital world. Frequently an environment generated automatically just for the user, the digital world exists only as long as one’s browser tab remains open.

A Path Forward
The purpose of putting together a list of the trends of the past quarter century is to consider the broader trends of our experience in information, creation, and distribution since 1999. We have collectively spent the last two and a half decades—or longer—dealing with the transition from a print-based economy to a digital ecosystem for information. We have managed our way through many of the basic details of creating digital content: selling it, efficiently distributing it, and measuring and assessing its use. We can already see some of the outlines of the next 25 years, even if many futurists may say it will happen much more quickly. Much attention will be focused on the application and novel use of new technologies. Artificial intelligence—or rather the algorithmic application of tools against a corpus of digital data—is an example of this type of novel use.

For all the hype around new technologies and their potential to transform how communication is undertaken, these changes take time to implement. While people can envision what a radically transformed ecosystem will look like, developing real-world implementations of those visions is a slow process. Getting users to change their behavior and buy into the new tools or services and coalescing around standard processes, implementations, and interchange all take time as well.

People often presume that adoption curves should be measured in months, but this is rarely the case. While we can see the outlines of an era of content management and engagement that is algorithmically controlled, this vision isn’t actually that new. We’re moving through a decades-long transition, taking small steps every day. We at NISO have been supporting the community throughout this transition. Historically, we have been a place to convene to discuss trends and discern future development actions. The NISO Plus conference coming up in February is a continuation of that mission. NISO has also played a key role in developing and fostering deployment of the technical solutions that have improved and accelerated information exchange. This has been our legacy over the past quarter-century, and it will remain so for the next several decades as we continue this transformative journey.

With best wishes for 2025,

Todd Carpenter
Executive Director

Todd Carpenter

Executive Director

Todd Carpenter joined NISO as Executive Director in September 2006.

Related Information

Stay in touch with what's happening with NISO and the broader information commu…

A Quarter Century In and 25 Years Forward

Letter from the Executive Director, January 2025

Todd Carpenter

Related Information

Developing Trust in Your Sources in a Generative AI World

Important But Not Urgent

We Put A Lot of Faith in Machines