Today, ProQuest launches TDM Studio, a powerful new solution that puts the power of text and data mining directly in the researcher’s hands.
TDM Studio unlocks a vast collection of current and historical ProQuest content for text and data mining (TDM), including news, journals, dissertations and theses, primary sources and more. Users can also upload their own content and combine it with ProQuest content for a truly comprehensive dataset.
A New COVID-19 Dataset: ProQuest has built a dataset of 500,000 recent articles – mostly from newspapers – relating to COVID-19. When analyzed by TDM Studio, this data can help researchers better understand trends as they’re reported in local and national news. Any user of TDM Studio can get access to this data. Sign up for a demo.
John Dillon, Product Manager for TDM Studio, said he’s seeing the product breathe new life into research across disciplines.
“Researchers now have these new methods to answer questions they haven’t had a good way to answer before,” he said. “It’s mind-blowing when you think about it. For years, so many researchers needed access to this content in machine-readable formats, in the appropriate computing environment and with the analytical tools to perform TDM, but it wasn’t always possible to put all these elements together at the same time. With TDM Studio, we were able to solve these challenges.”
Before he joined ProQuest, Dillon was one of those researchers. With a Ph.D. in English Literature, he spent his early career trying to attribute authorship to disputed or anonymous texts. A lot of the statistical methods he needed to use required coding knowledge, so he took coding and data science classes. Then, as a post-doc, he worked with IBM Research on studies to predict students’ emotions based on their comments and actions in online learning platforms.
But many people who could benefit from TDM aren’t skilled in computer or data science. That’s why TDM Studio is being designed for researchers of all skill levels.
“Those who know coding can either use predefined data analysis methods or their own methods created by open-source programming languages like R and Python,” said Dillon. “And, in a future release, non-coders will be able to use an interface that has embedded analytical methods and guides the user to visualize and partially manipulate the results.”
As the product continues to evolve, its goal is to provide a TDM solution across the university regardless of how familiar users are with TDM or coding.
Adding to the product’s flexibility are real-time collaboration and “anytime, anywhere” access. “This is especially crucial in today’s environment, where campuses are closed and most researchers are working remotely,” said Mindy Pozenel, Director of Product Management for TDM Studio. “Using TDM Studio, they can collaborate in ‘real time’ with their colleagues on projects, and they can also log in from home without being on the university’s network.”
Academic libraries – who already have a wealth of content for research – can use TDM Studio to drive more value from their existing collections, creating new opportunities for partnerships with research teams and enhanced teaching and learning.
“Libraries are already subscribing to a significant portion of this content,” said Pozenel. “But even if you have a database that’s highly used, nobody can read a million articles, and significant value remains untapped. When you can use TDM to derive value from a large amount of this content, it amplifies a library’s role as a service center – to spread knowledge and create more value for the research workflow.”
While TDM Studio is new to the market, a few researchers have already been using it. Over the past year, ProQuest has been collaborating with development partners and early-access researchers on more than 50 different research projects.
Caleb Rawson, an assistant professor of accounting at the University of Arkansas, is one of those development partners. Rawson had been working on a research project to determine how a CEO’s confidence may contribute to a company’s future success. He was trying to understand why some company leaders talk confidently about their trade secrets while others don’t, and the ramifications of both types of behaviors.
“For example,” Rawson said in an interview with ProQuest, “Elon Musk, the CEO of Tesla, likes to talk about everything Tesla is doing. But why? It’s giving his rivals an opportunity to spend more money working on competing products. This is what we call proprietary costs – the costs of disclosing what your trade secrets are, because competitors now know what you’re working on.”
To determine the outcome of this behavior, Rawson needed to conduct a broad analysis of years of media coverage: CEO profiles, interviews, features, news and other details. But answers would likely only come from something Rawson couldn’t do: reading hundreds of thousands of articles himself. That’s when he turned to ProQuest, and TDM Studio, for help.
With TDM Studio, the time scholars spend creating a content set has been reduced to hours, rather than the months required with traditional approaches.
Rawson said he started his TDM pilot project by giving ProQuest 2,500 pairings of firms and CEOs (Tesla and Musk, for example), a list of his desired publications, and a range of dates. His initial search results topped out at more than 323,000, which was – to put it lightly – “more than I was anticipating,” he said. “I didn’t want to spend six months reading hundreds of thousands of articles about CEOs if I didn’t need to. That’s where TDM stepped in and saved the day for me.”
Rawson took several steps to narrow down his dataset using TDM Studio. First, he removed all articles that mentioned a firm or CEO’s name in an advertisement. Then, he kept only the articles that contained certain phrases, like confident, cautious, optimistic, gloomy and conservative. He continued to refine his dataset using additional text-mining measures until he reached about 22,000 articles.
“Now, for each article, I’m able to use an algorithm that looks at words that occur around the CEO’s name that describe them in a confident or cautious manner. TDM studio saved me months of time I would have spent collecting and reading articles by hand,” said Rawson.
Ultimately, Rawson learned that CEOs who are overconfident reveal more insider information and trade secrets. This can give rival firms an advantage and it can hinder research efficiency at the overconfident CEO’s firm. This research is currently pending peer-reviewed publication.
Rawson, who asserts that accounting research isn’t as boring as most people make it out to be, what he’s done so far is only uncover the tip of the iceberg of possibilities. “I’m really excited about TDM Studio,” said Rawson. “I see it having a lot of applications in the kind of research I’m doing.”
Learn more about TDM Studio, including how it can be applied in not only research, but also teaching and learning.