The Cost of Knowledge: Evaluating Open vs. Paid Data for Pharma

Learn how choosing the right data source can be a game-changer in the competitive landscape of drug discovery and development.

DrugBank Team

Mar 22, 2024 • 6 min read

There are many obstacles that you have to overcome on the road to success in early-stage drug discovery and drug repurposing research. The sector is constantly evolving and has become an increasingly competitive space to work. Research is emerging at a never before seen rate, and it is causing challenges for researchers and scientists to not only stay on top of the latest publications, but to validate conflicting information. As budgets tighten and machine learning earns its place as an integral part of every research team, effective management of drug knowledge and databases has proved to be a critical determinant of success.

It is no longer realistic to have an in-house team fully responsible for sourcing data. As a result, the industry is turning to managed data sources and are having to weigh the costs and benefits of open data sources versus enterprise research tools. As with any big decision, there is no one-size-fits-all solution; each option brings its own unique challenges and opportunities.

Open Source Data

Open source data provides researchers easy and transparent access to vast amounts of information, often with a low barrier of entry. These data sources are known for being highly referenced and usable for specific research activities; further yet, a number of recognizable open sources, along with their identifiers, are considered industry standard resources. Due to open source data’s accessibility, it is an appealing resource for drug research as it often enables cost-effective sources for emerging pharmaceutical companies and academic research projects.

Open source data can be very helpful, however, there are a handful of factors that should be considered to ensure it is the right solution for your work.

Key Benefits of Open Sources

Cost-Effective: One of the best things about open data sources is that they are often free and readily accessible.

Customizable: You’re in control here. Open source data gives you the freedom to customize what data you intake to meet your unique needs. Although, it should be noted that this may require some level of technical expertise or bringing developers onto your team.

Self-Hosting: These ontologies and datasets are often downloadable and can be used within your own servers, providing a high level of control over security and infrastructure.

Considerations for Open Sources

Low cost and readily available data sound like an offer that can’t be beat, but open sources bring with them their own challenges, many of which can lead to delayed timelines due to unstructured data and headaches from data inaccuracies. Before betting everything on open source data it is important to consider the following:

Manual curation: Open sources typically require a significant amount of clean up, which will mean a larger time investment early in your research phases. They also have a reputation of containing hidden inaccuracies and out-of-date information.

Limited coverage: Open source ontologies and datasets have a very narrow and particular focus on what they cover. Their focus can be on anything from drug targets, indications, drug product details, clinical trial details, or drug categorization. If you need very specific information this can be valuable, however, it is rare to find open source coverage that provides a broad scope of information. An often unseen consequence of these singularly focused datasets is the need to staff positions solely for the purpose of updating and maintaining individual sources. The result can be a team full of individuals who are overqualified and underutilized.

Low interoperability: Due to the narrow focus of many open data sources, they often don’t prioritize crosswalks between other data sources, and are regularly formatted in ways that can make linking to correlated identifiers difficult. In order to improve interoperability of open sources your team may need to take this on themselves, which can take up a substantial amount of time.

Inconsistent quality: Open sources often have specific cadences for updates and new version releases. While updates are vital to good research, teams often say they struggle to keep up with the irregular frequency of different sources’ updates. Further, and even more challenging, is the complex work necessary to integrate or update data within data lakes or tools.. This challenge increases the chances for inaccuracies or that outdated information would be missed.

Limited support: Open source knowledge bases do not often provide dedicated customer support and services. This may or may not be important for you depending on your team’s comfort, however, if challenges arise you are often left to rely on community forums for assistance.

Paid Knowledge Bases

Paying for access or licensing a knowledge base can be a difficult decision to make. Will the cost be worth it, and can’t you just get all that data on your own? It can be tough to know when the time is right to make an investment into a paid knowledge base, so let’s explore the benefits and the potential risk.

Paid knowledge bases are designed and curated solutions maintained by a dedicated team of experts; they generally focus on a broader and more robust scope of data, compared to the more specific information found in open sources. Often, paid knowledge bases provide users access to an internal support team, which can help shorten startup times and maximize value. Many of the paid knowledge bases on the market can be deployed as self-hosting solutions or through software as a service (SaaS) subscriptions.

Key Benefits of Paid Knowledge Bases

Easy deployment: Enterprise knowledge bases are intentionally designed for ease of use so you can get in and get started on your project as quickly as possible. They often offer integration alternatives and solutions that are flexible enough to support a wide range of users, goals, and problems.

High interoperability: Any good paid knowledge base will prioritize interoperability by offering multiple ways to receive data, cater to different types of users (both technical and non-technical), and offer integrations to make the data easier to use alongside other products. This enables users to tap into different resources on a deeper level, and to understand and analyze with less effort. Paid knowledge bases will also focus on offering interoperability by linking to a range of common identifiers and building crosswalks between disparate sources.

Customer support: With licensing or subscription fees often come access to dedicated customer support teams who are there to help assist you with setup, troubleshooting, and ongoing maintenance.

High reliability: One major benefit of going with a paid knowledge base is that you get to take advantage of their entire team of experts working to curate, validate, structure, and build the data you will be using. Most of these paid knowledge bases will have substantial teams with a variety of domain expertise focused on data accuracy, curating evidence-based information, and keeping their sources as up-to-date as possible. These teams can also be relied on to build connections within their information and map back to trusted sources to ensure the highest level of accuracy.

Augment team capacity: The strategic addition of a paid knowledge base can enable smaller teams to move at a similar capacity of larger organizations, while requiring less resourcing and overall labour. It’s prevalent for roles like data scientists and researchers to spend 35%-80% of their time on data cleanup before they can even focus on model building. By outsourcing this heavy lifting, teams can get to work faster and focus on the important work they set out to do.

Accelerated breakthroughs: When combined, the benefits of a paid knowledgebase (easier and shorter startup times, robust and interconnected data, and evidence-based sources curated by experts) it becomes easier for teams to uncover novel insights, respond to unexpected findings, and make big breakthroughs.

Considerations for Paid Knowledge Bases

Marketplace Overload: As paid knowledge bases become increasingly popular it can be extremely challenging to know what products and solutions are best suited for the specific goals your team has. This can be made more complex when trying to navigate your team’s unique research focus against an ever expanding market.

Cost: Enterprise knowledge base solutions typically do involve upfront costs and ongoing subscription fees that can act as a barrier to entry, especially when compared to open-source alternatives which are offered at lower rates or as free access.

Choosing the Right Knowledge Base Solution

The decision between a free open source knowledge base and a paid knowledge base depends on what will best suit your organization's specific requirements, budget, and long-term goals. It is important to take into consideration your customization needs, security concerns, scalability, and access to support teams when evaluating your options. Again, there is no one right solution, and what works best for you might change over time as you scale or your focus shifts.

How DrugBank Fits

DrugBank offers a comprehensive drug knowledge base and platform that maps to many disparate sources and offers a range of data integration options, making it easier to use alongside metadata and a variety of sources. Whether you opt for an open source or enterprise solution, DrugBank is here to enrich your knowledge base with trusted, evidence-based pharmaceutical data, unlocking the doors to novel insights.

Stay informed by signing up for our newsletter, where you'll gain early access to the latest insights, trends, and breakthroughs in drug discovery, powered by cutting-edge data and analysis from industry-leading experts.