Editor’s note: This editorial originally appeared in ACRL’s Policy, Politics and International Relations Section Spring 2019 newsletter.
By Roger Valade, Chief Technology Officer, ProQuest
By the time I was 12 years old, I was an avid reader, a trait I may have picked up from my grandmother, who devoured books in insomniac chunks. I spent evenings, weekends, and holidays scanning as many pages — Stephen King, Tolkien, Asimov — into my brain as consciousness would afford. Pepsi fueled my pre-coffee caffeination needs. Drowsing, I would take a break to wash my face in cold water, hoping I could enjoy as many hours of Michigan’s dark, reading-friendly winter nights as possible. And, past my 9 p.m. curfew, I would stuff tube socks beneath my door to keep the light from my lamp hidden from parental surveillance.
One of my goals was to read every book that had ever been written, and it was with great regret that I realized soon enough that my reading pace was not keeping up with the growth of my to-read list. OK, great, I’ll have to be more selective. How do I read the best books? Should I be reading widely from many authors and many genres, or focus on one author or region? FOMO, or “fear of missing out,” wasn’t an ad campaign in the 1980s, but I really, really feared missing out.
Fast-forward a few decades and the percentage of books I’ve read has plummeted. I asked our Bowker team to pull a report of the new books published in the U.S. in the last five years. Since 2014, over 2 million new books have been published. In 2018, that number jumped up to almost 3.4 million. Looking at my reading log, I’ve only read 15.
I’ve indulged in this story because it’s identical to the plight of the modern researcher, though in research the problem is naturally much more dire. Attempting to stay current with the amount of material being published is not humanly possible. On the ProQuest platform alone, we host more than a billion documents, from this morning’s newspapers to books written in the fifteenth century. We’ve been working on this problem since Eugene Power founded University Microfilms in 1938, dreaming of photographing and microfilming the world’s knowledge to catalyze its distribution and dissemination.
Helping researchers find the content they seek – whether it illuminates a hunch, points them in a new direction, or provides a frustrating counterpoint – is our fundamental mission, and we’ve typically focused on that challenge primarily with content, search tools and workflows. Today, though, you can’t watch a baseball game or a prime-time show (I think they still exist) without also hearing about machine learning or artificial intelligence and how it is going to fuel your workout, improve your commute/diet/playlist, or power your business.
My 12-year-old now asks me before each Michigan Wolverine basketball game who the ESPN app is predicting will win the game — and we watch the Win Probability update in real-time, right on our TV. Bringing these new technologies to bear to analyze text, images, video, and newer assets like raw data, programs, and algorithms is an exciting reality. At ProQuest, we’ve worked with university students to use open-source tools to analyze newspaper content to break pages into their constituent articles, to disambiguate Paris, TX, from the Paris, to assess the reportability of a clinical trial report.
None of these options were available to me when I was studying literature. I’m unnecessarily jealous of today’s students as they can answer a question that would have taken a week to maneuver with paper and pen in mere minutes with a laptop and a Jupyter interactive notebook. You don’t have to read all the books that exist — you can build a model to read them, abstract them and maybe answer some interesting questions about them using entity extraction, sentiment analysis, and topic modeling. It’s dazzling. But bringing these new technologies to bear is also a surgically delicate operation.
For example, the workflows that some of our drug safety services enable are tied into government-regulated processes. Missing an adverse effect reported by a clinical trial is not an option. How do we build models that guide researchers transparently, without bias, and securely? Examples of artificial intelligence agents who develop bad traits based on fake content abound – and clearly these are early days. And yet the hope is strong for these new approaches.
In a February 2019 New York Times article, drug discovery researcher Derek Lowe said, “It is not that machines are going to replace chemists. It’s that the chemists who use machines will replace those that don’t.” Our use of these machines will only become more sophisticated as well.
In his book Hit Refresh, Microsoft CEO Sataya Nadella writes, “AI must be transparent. All of us, not just tech experts, should be aware of how the technology works and what its rules are. We want not just intelligent machines but intelligible machines; not just artificial intelligence but symbiotic intelligence.”
Artificial intelligence has been a dream for decades, but today’s realities of cloud computing and big data are finally realizing some of those dreams. But should we dread the negative outcomes often predicted by AI futurists? Will you be locked out of your home or your car or your computer by an AI that has taken on HAL 9000 characteristics? Stanford’s “One Hundred Year Study on Artificial Intelligence” 2016 report states, “Contrary to the more fantastic predictions for AI in the popular press, the Study Panel found no cause for concern that AI is an imminent threat to humankind.”
To confirm, I asked Alexa if it liked Siri. Alexa’s reply? “I’m partial to all AIs.” It’s a good start.