Mining Institutional Knowledge

Mining Institutional Knowledge: Using Text and Data Mining to Enhance Discovery

The speakers were Mary Ellen Bates and Chris Bendall.

Mary Ellen’s bio: Mary Ellen Bates is the owner of Bates Information Services, providing strategic business decision support to business professionals and consulting services to the information industry. Based near Boulder, Colorado, her passion projects are beekeeping and coaching fellow solopreneurs.

Chris’ bio: During the last 14 years at Springer (and since 2015 Springer Nature) Chris has worked in Editorial, Product and Business Development roles, with a focus on Springer Nature’s fastest growing sectors including: regional expansion, open access, corporate markets and data services. One of his current projects is developing the infrastructure and business models to enable text and data mining of SN content for a variety of use cases from hypothesis generation to knowledge management. Chris came to Springer after a postdoc in geochemistry and a focus on gold exploration. While no longer an active researcher, his work at Springer Nature supports researchers everywhere.

The session was described as ” This session looks at the role info pros can play in mapping content with specialized tools and resources to enhance discoverability and support the strategic goals of their organization. Mary Ellen Bates reviews some of the initiatives that knowledge managers and special librarians have led to enhance information and map internal and external content through text and data mining, and offers a checklist of the questions an info pro needs to ask when evaluating knowledge mapping tools. Chris Bendall of Springer Nature discusses how info pros can leverage their specialized internal knowledge structure and work with online content providers to best address the needs of their clients and researchers. A corporate librarian/knowledge manager will describe how they implemented a KM project in which text and data mining tools were effectively applied.”

There is a white paper associated with this presentation. She also wrote an earlier paper on TDM and info pros. Springer Nature now has a new blog about modern librarians. I haven’t had a chance to look at any of these resources yet.

Examples of datasets can be found at: and at:

Text and Data Mining (TDM)

  • automated process
  • large amounts of data
  • purpose is either increase discoverability of underlying content or discerning patterns
    • increase recall with precision
    • outcome of the most relevant articles – semantically enriched data exposes the relationships so you get exactly what you are looking for
    • find patterns and trends across a dataset. TDM helps you find these when you don’t know what you are looking for

Enhance access to full text. By getting better result you can associate value with output. The outcome isn’t full text articles, so the value is invisible. The datasets create relationships that leads to full text articles.

TDM licensing is a challenge for all parties, because it is new and not as well understood.

What TDM requires:

  • good dataset with consistently applied metadata
  • info pros can help evaluate fee vs. free datasets
  • info pros know about internal datasets
  • info pros can evaluate datasets outside the clients’ subject area
  • info pros can bring in the right structure such as APIs, other taxonomies that might be appropriate for current project. We make relationships with other departments so we know what is out there
  • We can evaluate open access models
  • We bring focus and functionality to the data once it comes in house. I really like this comment. We can do good work once the we have access to the info
    • expand ideas of how to use info
    • find datasets with consistent metadata
    • know how users query

Be Chief Ontonoly Officer

  • id internal silos, etc
  • sell the value of sharing
  • facilitate resource coordination
  • show value of cross platform searchability of resources

Explore adding TDM to content licensing

  • we are all figuring this out as we go along
  • we bring an enterprise wide knowledge to these discussion

What to Ask Before a TDM project

  • Truly, what is the outcome of the project supposed to be. Helps define and clarify TDM projects
    • uncover existing content
    • OR find new patterns
  • What data do you need?
  • Do they need APIs developed?
  • Already have a Knowledge map or structure or do they need that evaluated?
  • Can this dataset or metadata be shared later enterprise-wide?
  • Should we get an institutional license later?
  • Are there other stakeholders we can get involved to help with funding?
  • What are your plans for archiving content and metadata later?
  • What user supprot are you expecting?

Springer Nature has TDM licensing examples

Chris Bendall talked second. He is responsible for understand customer needs, determining value for TDM.

One thing TDM can help combat is information overload. AI and machine learning technologies helps with this. There were several examples of genes discovered by AI that could help develop drugs for a variety of conditions including COVID-19.

Chris mostly showed use cases related to drug development as well as other research areas. i asked about any concerns about AI getting out of control and Mary Ellen responded “AI is still just an algorithm – at least in the context of TDM – and all it’s doing is calculating and telling us what it discovers. It’s up to humans to make decisions and take action. And unforeseen directions can be valuable – sometimes TDM surfaces unexpected insights or connections.”

He mentioned KM systems linking internal and external information.

AI can also auto-summarize datasets and put them together into a variety of formats such as books and papers. I wasn’t clear whether these groupings were edited or reviewed by humans.

FAIR principles were also mentioned. I am not familiar with these. I was told that FAIR comes from the data world We want data world wide to be FAIR.