ARTICLE

Artificial intelligence’s fit in drug development, life sciences

Chemical Reagent

Bloomberg Intelligence

This analysis is by Bloomberg Intelligence Industry Analyst Andrew Galler and Senior Associate Analyst Jack Maltby. It appeared first on the Bloomberg Terminal.

Life sciences is among the most suitable sectors for AI disruption, we believe, given the confluence of decades-old paradigms in drug discovery and highly complex, multidimensional datasets characteristic of biology and chemistry. The proprietary nature of data paired with the inherent unreliability of published literature may disadvantage smaller developers, yet data federation can facilitate more equitably trained models.

AI well equipped for life-sciences work

Drug development is the synthesis of large, heterogeneous datasets spanning multiple formats, including text-based, graphical and numerical. Though humans are naturally wired to detect patterns in datasets, the amount of noise and diversity in them can make it hard to find meaningful patterns. Well-trained artificial-intelligence and machine-learning algorithms and models can more consistently synthesize these and draw meaningful conclusions, theoretically without the presence of bias.

To use AI as a tool, drug developers need good models, robust training data and informed data scientists to effectively synthesize multimodal data and draw meaningful conclusions or inferences.

Scale of Data in Life Sciences

Discover more with Bloomberg newsletters

Subscribe now

Generalizability an important quality for models

Generalizability for an AI/ML model typically refers to its ability to function with new datasets or settings. The gold standard is referred to as external generalizability, meaning a model can effectively function in a setting outside of the data on which it was trained. Internally or temporarly generalizable models are typically seen as less attractive in drug development, given their limited scope of use and inability to adapt new data in an unknown space.

Generalizability is typically an extension of diverse training sets and proper model logic that’s not over or underfit or subject to bias from its training set, both of which let models work with novel datasets that can span known and unknown spaces.

Other qualities that make for a good AI model

Generalizability is arguably the most important quality for a value-accretive AI model, but we also see a lack of bias as an important quality for drug-development AI models. Bias is largely a reflection of its training dataset that may itself be redolent of inherent biases. As a result, the model learns and incorporates these biases, which distorts its ability to process data and offer meaningful output because it’s based on the data it was trained upon.

On the explainability side, a key aspect of artificial intelligence and machine learning is that they’re able to visualize and treat data along many dimensions in a way humans can’t. In biochemical models, due to their complexity, it seems unlikely that humans could fully explain the outputs.

Considerations for an adequate training dataset

There’s no one-size-fits-all approach in determining what constitutes a robust training set for AI models in drug development. Yet we believe the most useful to industry will be diverse training sets constituting both positive and negative outcomes derived from multiple sources, multimodal across data types and relatively free of bias through means like random sampling. To the final point, larger-cap companies with long histories of drug development and broad internal datasets might be advantaged, given the universe of published data is frequently cherry picked to propagate a hypothesis.

We think combinations of clinical, cellular and chemistry data can provide a more robust decision engine than any one individually, but it carries the challenge of properly annotating multimodal datasets.

Annual publications on Pubmed

Reproducibility underscores issues with public data

There’s no one-size-fits-all approach in determining what constitutes a robust training set for AI models in drug development. Yet we believe the most useful to industry will be diverse training sets constituting both positive and negative outcomes derived from multiple sources, multimodal across data types and relatively free of bias through means like random sampling. To the final point, larger-cap companies with long histories of drug development and broad internal datasets might be advantaged, given the universe of published data is frequently cherry picked to propagate a hypothesis.

We think combinations of clinical, cellular and chemistry data can provide a more robust decision engine than any one individually, but it carries the challenge of properly annotating multimodal datasets.

Survey on Whether a Reproducibility Crisis Exists

Federation as a way to resolve data inequality

Data federation, which allows for multiple databases functioning as a single dataset, has been noted as a way to bridge the gap between upstart AI-enabled drug developers and larger biopharmaceutical companies. In the absence of initiatives like data federation, it’s hard to see how AI-enabled developers can make up the gap, given the deep knowledge base from decades of basic science research, without performing much of it themselves.

Companies like Verge Genomics and InSitro are doing this in generating their own experimental datasets. Yet as they’re doing so on a disease basis, or locally, it may limit generalizability in the near term.

Schema of Data Federation

In-focus: the MELLODDY federation initiative

In 2019, the MELLODDY project was initiated by the European Innovative Medicines Initiative as a data federation project bringing together large and small biopharmaceutical companies. Members would pool proprietary data and assays as a way to improve the companies’ individual models. Data privacy remained an issue, preventing companies from accessing the broader dataset or assays, and only received their optimized assays in return for participation.

MELLODDY was ultimately declared a success by some observers, based on improvements in the assays submitted, albeit with relatively small relative improvements. Yet as far as novel conclusions from the submitted data, there were few.

Improvements Observed in MELLODDY

Reluctance to share data may be due to prior investment

Apart from the integral role of data in drug development more broadly, the desire to keep data proprietary rather than open sourcing it to others is reflected in the number of biopharmaceutical companies that paid in the past to either gain or maintain access. GSK, as part of its Open Targets initiative under prior chief science officer Hal Barron, entered an exclusive partnership with 23andMe, making a $300 million equity investment and 50/50 cost-sharing agreement.

An unwillingness to federate data cuts both ways, to an extent, as it might further pigeonhole large players in therapeutic areas where they have a presence historically, given their broad amount of data in such verticals.

Large Pharma Data Acquisitions

Can small drug developers skirt the data gap?

Given the issues with training AI models exclusively on publicly available data and the significant upfront investment to generate the preclinical experimental data to adequately train a model, the question is whether smaller drug developers can avoid these. Though it remains to be validated, some models have been constrained by natural parameters, such as biochemical or physical axioms, to control model outputs, preempting the need for large training sets. Imparting domain knowledge like this, however, does carry some less-appealing characteristics.

Schrodinger is the most advanced AI drug developer to take this tact, and early clinical data readouts in 2025 could inform if this supports a physics-based model’s ability to generate best-inclass molecule profiles.

The data included in these materials are for illustrative purposes only. The BLOOMBERG TERMINAL service and Bloomberg data products (the “Services”) are owned and distributed by Bloomberg Finance L.P. (“BFLP”) except (i) in Argentina, Australia and certain jurisdictions in the Pacific Islands, Bermuda, China, India, Japan, Korea and New Zealand, where Bloomberg L.P. and its subsidiaries (“BLP”) distribute these products, and (ii) in Singapore and the jurisdictions serviced by Bloomberg’s Singapore office, where a subsidiary of BFLP distributes these products. BLP provides BFLP and its subsidiaries with global marketing and operational support and service. Certain features, functions, products and services are available only to sophisticated investors and only where permitted. BFLP, BLP and their affiliates do not guarantee the accuracy of prices or other information in the Services. Nothing in the Services shall constitute or be construed as an offering of financial instruments by BFLP, BLP or their affiliates, or as investment advice or recommendations by BFLP, BLP or their affiliates of an investment strategy or whether or not to “buy”, “sell” or “hold” an investment. Information available via the Services should not be considered as information sufficient upon which to base an investment decision. The following are trademarks and service marks of BFLP, a Delaware limited partnership, or its subsidiaries: BLOOMBERG, BLOOMBERG ANYWHERE, BLOOMBERG MARKETS, BLOOMBERG NEWS, BLOOMBERG PROFESSIONAL, BLOOMBERG TERMINAL and BLOOMBERG.COM. Absence of any trademark or service mark from this list does not waive Bloomberg’s intellectual property rights in that name, mark or logo. All rights reserved. © 2024 Bloomberg.

Related Content

Get insights delivered to your inbox

Sign up for Bloomberg Professional Services newsletter