William Whitford

April 17, 2024

16 Min Read

Pharma 4.0 has initiated smart integration of digital technologies to enhance efficiency, quality, and innovation throughout a pharmaceutical production life cycle. The paradigm has transformed the industry with tools such as data analytics, the Internet of Things (IoT), enterprise software and process control systems, automation, and Artificial intelligence (AI). Rapid development of AI and machine learning (ML) applications is revolutionizing medicine in fields such as pharmacology, drug discovery, and bioinformatics (1). AI brings new capabilities to biomarker and therapeutic target discovery, expediting processes and enabling developers to make well-informed decisions about their programs.

For example, AI can mine databases, digital libraries, and research publications to discover and characterize biological pathways, molecules, and individual moieties associated with disease. Its ability to provide understanding of molecular relationships advances biodiscovery and molecular development of therapeutics (BMDT) investigation and validation (2). AI also helps scientists develop tools that function without deterministic programming. In turn, probabilistic tools help analyze the rapidly growing store of biological data needed to identify the functional relevance and ligandability of structural domains within potential drug candidates.

Diagnostic, staging, and prognostic biomarkers arise from a broad array of biological phenomena and offer insights into a patient’s specific physiological system. Objective measurements of cellular, tissue, and organ activity are used to gauge the state or progression of disease.

Figure 1: Artificial intelligence (AI) can explore data across many parameters using techniques from multiple disciplines. It drives discovery of biomarkers and therapy targets by providing rapid and multivariate analysis of DNA, RNA, protein, metabolite, cytometric data, and more.


Biomarker research is evolving continually. New biomarkers are discovered through analysis of bodily tissues, cells, and fluids using methods such as genome-wide association analysis powered by next-generation sequencing (NGS) and mass-spectrometry (MS)–based proteomics, as well as imaging techniques such as magnetic resonance imaging (MRI) and positron emission tomography (PET).

Like biomarkers, therapy targets are composed of different structural and functional molecular elements — often proteins — and are directly associated with disease. But a drug target can be modulated by a therapy to improve a disease state. A classic example of a therapeutic antibody target is human epidermal growth factor receptor 2 (HER2). HER2 is overexpressed in some cancers, contributing to uncontrolled cell growth. Trastuzumab is a monoclonal antibody (mAb) that targets the extracellular domain of HER2, disrupting a critical signaling pathway and decreasing cell growth.

Following innovations in assay and experimental technologies, target-identification technology is increasingly important in antibody-therapy discovery. But the power provided by AI in newer multiomic studies promises further significant gains (Figure 1). Researchers can use AI-enabled meta-analysis of available literature to map out a therapy’s primary and off-target activities and suggest treatment effects and clinical outcomes. In personalized medicine, AI-enabled tools or functions can analyze open-access libraries, databases, and published research and diagnostic data to help hone BMDT activities for specific populations or individual patients


AI in Biomarker/Target Discovery and Analysis

AI is accelerating both the identification and validation of potential drug targets, beginning by improving traditional approaches to BMDT. Such tools extend laboratory-based affinity measurement and comparative profiling. They also enable researchers to incorporate high-throughput chemogenomic-library screening to examine thousands of possibilities rapidly.

However, perhaps the most remarkable impact of AI lies in its ability to transform data-mining approaches, notably in comparing -omic profiles from current “panomic” gene-disease association methods, as well as in ligand docking and pharmacophore analysis within computational-structure approaches. Both sets of approaches rely on diverse, expansive biological data sets (3). Such analyses can reveal important signaling, endocrine, immune, or metabolic pathways to identify critical nodes in disease-related networks disease-related networks, especially in the context of complex targets, agonists, multispecifics, or multivalent biologics.

A tremendous array of data is generated by analyzing the chemical, physical, and biological systems associated with new approaches to discovering candidate biomarkers or revealing their possible disease correlations. Similar data help developers to identify new targets for diseases and predict new applications for existing drugs. Some small-molecule drugs, such as aspirin, operate by covalent binding. AI helps developers investigate the more supramolecular (nonionic/covalent binding) interactions between biological molecules, such as those active in mAb-based therapies.

AI enables developers to identify and validate potential biomarkers and drug candidates with remarkable efficiency, first by supporting traditional research approaches. For example, “wet-lab” affinity measurements and comparative profiling are improved by AI because those methods rely upon enormous chemogenomic libraries and high-throughput techniques to screen thousands of possibilities based on initial experimental findings.

However, AI shines brightest in two areas. First, it excels at data mining for panomic gene-disease association methods that correlate various -omic profiles. It also is ideal for computational structure-based approaches that examine the orientation and conformation of ligands with putative BMDT-identified macromolecules, including pharmacophores. Those two approaches apply the diverse, enormous, and still growing biological data sets and libraries that are now available (4).

Developers also use in vitro methods such as yeast two-hybrid (Y2H) assays and coimmunoprecipitation to study interactions among proteins. Many analytical approaches use tools such as high-throughput screening (HTS) and high-content analysis (HCA). Although those methods already have significantly contributed to omics-based target discovery, AI can improve them (4, 5).


AI-enabled functions can be organized into distinct fields, with each providing distinct utility to help identify candidates (Table 2). Such functions can handle large amounts of disparate data types, identify patterns, provide classifications, and make predictions from their findings (6). Applications of such functions can transform slow and biased human-operated approaches into fast and more objective systems. For example, supervised machine- and deep-learning classifiers calculate mathematical scores from training databases to create operative nodes in neural-network models. The results can be used to organize data into groups or classes.

AI can characterize a biomolecule’s potential value in reporting on a disease. AI models can correlate data from a biological candidate’s genotypic or phenotypic properties to the desired molecule and thereby predict its value or suggest alternatives. Dynamic in silico modeling uses deep-learning algorithms to model cell- and vesicle-surface structural moieties, properties, activities, and molecular interactions, as well as a drug candidate’s potential disease-modulating effects. The power of that activity is demonstrated by Insilico Medicine’s AI-discovered small-molecule drug, INS018_055, which is now undergoing a phase 2 clinical trial (7).

Automated and high-throughput laboratory techniques measure several distinct variables in thousands of experimental samples. Because such variables in biological systems typically are correlated, their enormous data sets are multiparametric and sometimes of high dimensionality. Yet they are not always analyzed as such. Accurate results depend upon multivariate analysis (MVA).

AI-enabled MVA reveals the relationships, dependencies, interactions, and feedback within complex, dynamic biological systems. It provides approaches to detect and model numerous interconnected factors to provide an accurate interpretation of complex realities. AI-empowered MVA provides pattern recognition, understanding, and recommendations to drug developers (6).

ConPLex is an open-access ML program that was developed by researchers at Massachusetts Institute of Technology (MIT). The software predicts drug-target interactions, using only simple descriptions of the candidate drugs and the sequences of a system’s proteins (8). And the AI-Bind model combines network-based sampling with unsupervised pretraining to improve binding predictions for novel proteins and ligands. It is a deep-learning drug–target combination identification algorithm that shows promise in drug discovery (9).

AI also is used in chemoinformatics. Although in silico screening applications have used neural networks for years, synthetic chemistry and chemoinformatics — based upon AI techniques — promise even more powerful analysis using new virtual libraries that represent a much larger chemical space (10).

Current ML models can predict the properties of new compounds by using measured features of a chemical itself or by using theoretical descriptors derived from its chemical graph or three-dimensional–structure data (11). In a related AI system, graph neural network (GNN) algorithms integrate panomic data sets and extract valuable insights from the resulting interconnected and interdependent data nodes, edges, or graphs (12).

AI can create system insights from the data sets generated from laboratory experimentation, such as results from HTS; online libraries of raw chemical, biological, and pharmaceutical data; and online biomedical literature. Derivative relationship-specific databases provide annotations and understanding from the above primary libraries.

For example, the String database has a comprehensive library of known and predicted protein–protein interactions, including direct and indirect associations. That database gains information from computational prediction, knowledge transfer between organisms, and interactions aggregated from other biobanks. Another example is the Iuphar database, which is a compendium of molecular, biophysical, and pharma properties of mammalian ion and cyclic nucleotide-modulated channels. Finally, the UniProt database is a central hub with functional information on proteins. It is known for accurate, consistent, and rich annotation.

Recently, powerful AI-based resources have appeared to support efficient ingestion and processing of such data, contributing to the democratization of AI. As both open-access and proprietary algorithms become available, entity developers need not maintain expensive subject-matter experts on staff for data science, AI theory, and program coding.

Ontoforce’s Disqover platform is available commercially and integrates public, private, and licensed data sets to interface with linked data for optimized exploration and search functionality (13). Such knowledge discovery platforms often are built on knowledge graph technology. The platform can use ML and AI algorithms to predict links between entities, revealing insights into potential novel targets.

Many other AI-powered applications such as virtual and augmented reality (VR and AR) aid in therapeutic-candidate and BMDT understanding. Such tools enable developers to visualize and interact with complex biological structures and data sets. That capability was reviewed at the Computer Vision and Pattern Recognition (CVPR) conference in Vancouver, Canada, in June 2023 (14).

AI also can speed up existing data-processing activities and applications radically. For example, AI-driven natural language processing (NLP) dramatically reduces the time involved in examining and summarizing vast amounts of research and medical literature during review.

Examples of AI Success in Biopharmaceutical Discovery

The AI-empowered AlphaFold program can predict protein structures from amino-acid sequences. The activity of immunogens, receptor traps, and enzymes often is mediated by a small number of functional residues, with their domains properly presented by the overall protein structure. Open-AI’s generative pretraining transformer (GPT-4) model is an AI network that demonstrates therapeutically functional design of proteins from simple molecular specifications. Such successes have nurtured hope in AI’s ability to design novel protein structures as customized antibodies that contain prespecified and understood functional sites in an effective orientation for disease therapy (15).

ML techniques enabled researchers at University of California, San Francisco (UCSF) and IBM Research to expand chimeric antigen receptor (CAR) T-cell technology and make a more quantitative and predictive design. The team used neural networks that were trained to decode combinatorial CAR signaling motifs to discover key system-design rules (16).

The ProSurfScan algorithm from Chemotargets is a new commercially available program used to model the compatibility and binding mode of drug candidates on different regions of a protein surface (17). The protein surface is represented as a graph of nodes that define local supramolecular-interaction features. Convolutional neural networks then can estimate agnostically candidate interactions with distinct regions on a protein surface. For instance, Biologic Design used AI technology to create AU-007, a highly differentiated epitope-specific mAb, by using computational design and big data (18). In early 2024, Aulos Bioscience entered a patient dosed with AU-007 into a phase 2 clinical trial.

Emerging Therapeutic Modalities and Related Technologies

Remarkable developments are emerging for advanced therapy medicinal products (ATMPs), their delivery, and their production. AI is becoming essential for handling the structured data that comes from the many automated, high-throughput, and high-density analytics and screenings inherent to ATMP development and production (19).

Deep-learning algorithms such as recurrent neural networks (RNNs) are well suited for analyzing massive amounts of multivariate time-series data and providing high-level analysis of cell motion and microfluidics. New single-cell–analysis techniques enable researchers to study individual cell behavior and -omics, leading to a more nuanced understanding of cellular and tissue heterogeneity in disease mechanisms. Here, AI provides point-to-point analysis of hundreds of individual cells, yielding deeper knowledge than would be attained by averaging data from millions of cells.

Recently, improvements in single-cell whole-genome sequencing have shown promise in treating tumors that display a mosaic of genetic variation or subclonal mutations that compound their aggression (20). AI classifiers can reveal information for developers when applied to the massive amounts of actionable information.

Leveraging Evidence-Based Associations

By examining structural, biological, or functional similarity during BMDT for diseases of related ontology, AI-based systems can identify candidates with appropriate genetic, biological-pathway, or cellular activity. Existing disease modulators can provide valuable data for understanding properties of other disease etiologies.

Studying disease ontology requires developers to understand genetic and clinical data, as well as symptoms through evidence-based associations. The Disease Ontology Knowledgebase (DO-KB) group pledges to provide an open-source tool for integrating biomedical data associated with human diseases, disease features, and mechanisms. It will serve as a framework for multiscale biomedical-data integration and analysis toward strengthening the disease information ecosystem (21). Focusing on function rather than mere affinity or other facile properties enables the identification of more operative biologic relationships.

Existing therapies have immense amounts of research, clinical, and incremental post-launch data that can reveal the nature and associations of disease phenotype, environment, and genetics. AI can enable comprehensive, multiparametric collection and analysis of those data as well as information from clinical trials, electronic health records, and less structured sources to identify potential BMDT and therapeutic candidates.

Challenges: AI algorithms that lack transparency — known as black boxes — make it difficult to know whether a system is fair, accurate, and complete. Using an AI system also may be inappropriate when developers make assumptions about its application context, users, or their current needs and procedures.AI also may contribute to cybersecurity risks.

Drug developers desire explainability and transparency in algorithms to ensure accuracy and compliance in outputs — yet that goal remains difficult to achieve. Some internal processes can be hard to interpret. An emerging standard called the Open Neural Network Exchange (ONNX) is working to address that concern. ONNX seeks to create an open ecosystem where AI models can be developed, trained, and deployed across the different frameworks and tools provided by member organizations to promote transparency and interoperability.

The dynamic nature of some AI models combines with drift in both real-world scenarios and user behavior to require continued process monitoring and data/model updating. Collaboration among domain experts, data scientists, and software testers establishes and maintains the quality, reliability, safety, and alignment of AI systems throughout an entire product life cycle (22).

The growing size and complexity of technology systems increase the difficulty of identifying critical functionalities, defining appropriate testing, and establishing acceptance criteria. Software as a service (SaaS); risk-based validation; compliance, safety, and accountability (CSA); and pharma 4.0 initiatives determined the need for a change in validation requirements and solutions. To address that, the US Food and Drug Administration (FDA) released two discussion papers to spur conversation about AI use in drug development and manufacturing (23).

Generating sufficient high quality data may be the greatest limitation for many applications. The biopharmaceutical industry needs improved techniques for data generation, capture, storage, and processing. Thankfully, AI itself aids both the generation of purpose-built databases and the curation of existing data to make such information accessible and interoperable (24).

A Promising Future

The combination of modern data science, power computing, system connectivity, and AI is creating a sea change in therapeutic target discovery, lead testing, and development. AI use in biology, chemistry, and computational sciences even could lead to the eventual development of digital twins of human cells and advanced human virtual models (HVMs) (22).


1 Miura K, et al. Deep Learning-Based Model Detects Atrial Septal Defects from Electrocardiography: A Cross-Sectional Multicenter Hospital-Based Study. EClinicalMedicine 63(4) 2023: 102141; https://doi.org/10.1016/j.eclinm.2023.102141.

2 Hosseini M, Hammami B, Kazemi M. Identification of Potential Diagnostic Biomarkers and Therapeutic Targets for Endometriosis Based on Bioinformatics and Machine Learning Analysis. J. Assist. Reprod. Genet. 40, 2023: 2439–2451; https://doi.org/10.1007/s10815-023-02903-y.

3 Pun FW, Ozerov I, Zhavoronkov A. AI-Powered Therapeutic Target Discovery. Trends Pharmacol. Sci. 44(9) 2023: 561–572; https://doi.org/10.1016/j.tips.2023.06.010.

4 Qureshi R, et al. AI in Drug Discovery and Its Clinical Relevance. Heliyon 9(7) 2023: e17575; https://www.doi.org/10.1016/j.heliyon.2023.e17575.

5 Kupczyk E, et al. Unleashing High Content Screening in Hit Detection — Benchmarking AI Workflows Including Novelty Detection. Comput. Struct. Biotechnol. J. 20, 2022: 5453–5465; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530837.

6 Manzano T, Whitford W. Chapter 4 — AI Applications for Multivariate Control in Drug Manufacturing. A Handbook of Artificial Intelligence in Drug Delivery. Academic Press: Cambridge, MA, 2023: 55–82; https://doi.org/10.1016/B978-0-323-89925-3.00023-X.

7 Fultinavičiūtė U. Insilico’s AI Drug Enters Phase II IPF Trial. Clinical Trials Arena 27 June 2023 ; https://www.clinicaltrialsarena.com/news/insilico-medicine-ins018055-ai.

8 Singh R, et al. Contrastive Learning in Protein Language Space Predicts Interactions Between Drugs and Protein Targets. PNAS 120(24) 2023: e2220778120; https://doi.org/10.1073/pnas.2220778120.

9 Chatterjee A, et al. Improving the Generalizability of Protein-Ligand Binding Predictions with AI-Bind. Nat. Commun. 14(1) 2023: 1989; https://doi.org/10.1038/s41467-023-37572-z.

10 Sadybekov AV, Katritch V. Computational Approaches Streamlining Drug Discovery. Nature 616, 2023: 673–685; https://doi.org/10.1038/s41586-023-05905-z.

11 Lo Y-C, et al. Artificial Intelligence-Based Drug Design and Discovery. Cheminformatics and its Applications. IntechOpen: London, UK, 2019; http://dx.doi.org/10.5772/intechopen.89012.

12 Gaudelet T, et al. Utilizing Graph Machine Learning Within Drug Discovery and Development. Brief. Bioinform. 22(6) 2021: bbab159; https://doi.org/10.1093/bib/bbab159.

13 Transform Data into Knowledge. Ontoforce: Gent, Belgium, 2023; https://www.ontoforce.com.

14 The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024. CVPR: Seattle, WA, 2024; https://cvpr2023.thecvf.com.

15 Avery C, et al. Protein Function Analysis Through Machine Learning. Biomolecules. 12(9) 2022: 1246; https://doi.org/10.3390/biom12091246.

16 Daniels KG, et al. Decoding CAR T Cell Phenotype Using Combinatorial Signaling Motif Libraries and Machine Learning. Science 378(6625) 2022: 1194–1200; https://www.science.org/doi/10.1126/science.abq0225.

17 Chemotargets Announces First AI-Designed Drug for Huntington’s Disease To Enter Clinical Trials. Chemotargets: Barcelona, Spain, 22 April 2021; https://chemotargets.com/chemotargets-announces-first-ai-designed-drug-for-huntingtons-disease-to-enter-clinical-trials.

18 First Patient Dosed in Phase 2 Portion of Aulos Bioscience’s Phase 1/2 Clinical Trial for AU-007, a Computationally Designed IL-2 Antibody for Solid Tumor Cancers. BioSpace: West Des Moine, IA, 25 January 2021; https://www.biospace.com/article/releases/first-patient-dosed-in-phase-2-portion-of-aulos-bioscience-s-phase-1-2-clinical-trial-for-au-007-a-computationally-designed-il-2-antibody-for-solid-tumor-cancers.

19 Build and Deploy Digital Twins of Bioprocessing Facilities. Basetwo: Toronto, Canada, 2024; https://www.basetwo.ai/bioprocessing.

20 Hård J, et al. Long-Read Whole Genome Analysis of Human Single Cells. Nat. Commun. 14(1) 2023: 5164; https://doi.org/10.1101/2021.04.13.439527.

21 About. Disease Ontology: Baltimore, Maryland, 2024; https://disease-ontology.org/about.

22 Manzano T, et al. Artificial Intelligence Algorithm Qualification: A Quality by Design Approach To Apply Artificial Intelligence in Pharma. PDA J. Pharm. Sci. Technol. 75(1) 2021: 100-118; https://www.doi.org/10.5731/pdajpst.2019.011338.

23 Cavazzoni P. FDA Releases Two Discussion Papers To Spur Conversation About Artificial Intelligence and Machine Learning in Drug Development and Manufacturing. US Food and Drug Administration: Rockville, MD, 10 May 2023; https://www.fda.gov/news-events/fda-voices/fda-releases-two-discussion-papers-spur-conversation-about-artificial-intelligence-and-machine.

24 Wossnig L, et al. Best Practices for Machine Learning in Antibody Discovery and Development. arXiv 2023; https://arxiv.org/abs/2312.08470.

Longtime BPI editorial advisor William Whitford is life sciences strategic solutions leader at Arcadis;

[email protected].

You May Also Like