Where Software Meets Bio: A Tale of Caution

In this piece, I will be sharing my thoughts on where software has a role in designing and manufacturing biologic drug candidates. More precisely, I want to argue for why it is really, really hard for today’s computational approaches to add meaningful value to already incumbent ways of designing biologics, and why there is a lot of hesitancy among pharma customers towards introducing software driven approaches to manufacturing biologic candidates. My argument comes from the time I spent diving into the broader software in biopharma space with Romulus Capital, and comes from conversations I’ve had with dozens of entrepreneurs, end customers, academics, and other investors in the space.

What are biologics?

Pharmaceutical drugs can be segmented into two classes: small molecules, which are chemically synthesized, and biologics, which are products exclusively synthesized from other living organisms (E. coli bacteria, mice, cows, humans, etc.). For decades, the dominant paradigm of drug discovery was in the small molecule realm, dominated by the giant Merck. It used to be the case that Merck had a greater market cap than all other pharma companies combined: their model was to have an army of chemists in a massive industrial complex in New Jersey, testing one small molecule after the other on any noteworthy druggable target. However, about two decades ago, smaller pharma companies began shifting their headquarters to Kendall Square, one after the other, the reason being that they believed the future of drug discovery lay in biology, not chemistry. They were right: one by one, they started taking over Merck, and today Merck doesn’t even stand in the top 5 by market cap.

So what is biotech? At its core, it is the process of synthesizing biological molecules, i.e. biologics. The best example of this is insulin. We first found the gene responsible for producing insulin, put it inside bacterial cells, and got those bacteria to pump out a bunch of more insulin. Today, the most common biologics are monoclonal antibodies (mAbs). Antibodies are, in brief, large Y-shaped proteins mainly used by the immune system to neutralize foreign antigens like pathogenic bacteria and viruses. By engineering very particular antibodies, we can instead design them to target things that have gone awry inside our body, whether it be cancer cells or receptors involved in autoimmune attacks. After we design these antibodies, we typically use mice to mass produce them, through a process that looks something like this:

These mass produced antibodies, all originating from a single parent cell, are called mAbs. Today, biologics constitute 7 out of the 10 most profitable drugs, most of which are mAbs:

Drugs ranked by sales in 2017. Source.

My interest in all this lies in the prospect of scalable, software driven ventures that lie at the intersection of biologics design and machine learning. A blog written by Simon Smith at BenchSci does a great job of delineating the broad buckets where ML can intersect with different components of the drug discovery process, and I encourage you to take a look at the companies that he maps out. On the preclinical side, the main buckets are:

  1. Aggregating and synthesizing medical knowledge
  2. Understanding mechanisms of disease
  3. Establishing biomarkers (indicators of disease)
  4. Repurposing existing drugs
  5. Generating novel drug candidates
  6. Validating and optimizing drug candidates
  7. Designing drugs
  8. Designing preclinical experiments

Here’s just a small sample of deals that have been done in this space that I’ve looked deeply into:

  • Visterra, which has developed their Hierotope platform “to identify unique disease targets and to design and engineer precision [antibodies]”, recently exited for $430M
  • Asimov.io, which is working to create the “building blocks” for designing and manufacturing bispecifics (which, for example, can penetrate the blood-brain barrier), recently raised a $5M seed from the venture fund a16z
  • There are a number of companies that have spun out of Professor Baker’s lab at the University of Washington, working to commercialize different aspects of the Rosetta platform (which in part can model protein structures). All have recently raised rounds.

It is exceptionally easy to get pulled along by the sweeping allure of “AI for Drug Discovery”, but I want the rest of this piece to provide a cautionary tale against this narrative.

Biologic Target Identification

This is where it all begins: the first stage of drug discovery involves finding a biological target that we can design a drug to modulate. These targets generally have to be easily accessible: either GCPRs or ion channels on the cell’s surface, extracellular proteins like VEGFs, or freely circulating cytokines. The issue is that there are so many protein, receptor, and pathway databases out there, each with an overwhelming number of potential biological targets for drug intervention; it is often difficult to parse out which targets are the most worth pursuing. Furthermore, these databases can be quite fragmented and messy, which explains the fact that there are a ton of startups “using AI” to read scientific articles, review literature, aggregate and impute disjoint databases, and clean data. I personally have a lot of skepticism about this space, the main reason that most of the companies operating here usually start by selling to academic labs and never successfully enter commercial markets. They often offer free or heavily discounted academic licenses, but when it comes time to sell, most are not able to demonstrate that they have a clearly differentiated value to pharmaceutical companies, which more often suffer from an overabundance of promising targets.

Biologic Drug Design

When analyzing a company building out software approaches for biologic drug design, I frame my discussion with three main questions:

  1. Why do we need a computational approach?
  2. What can the computational approach actually do?
  3. Is it a software company or a biotech company?

Question 1: Why do we need a computational approach?

This is typically the hardest to answer. Consider a platform like Rosetta, which can design proteins de novo; that is to say, for certain classes of targets, Rosetta can computationally come up with the scaffold (structural backbone for a protein) that is just right for the particular target. Said otherwise, it can design protein scaffolds that aren’t based on the traditional Y-shape of a monoclonal antibody, which as discussed before, provide the backbone of most biologics today. It can also predict characteristics like binding affinity (how well the drug binds to its target), toxicity, and so on entirely in silico. What it learns from one iteration can be fed back into the next, and through ML, the prediction gets better and better.

While exploring new scaffolds is great, do we actually need to? Companies like Molecular Partners and Adnexus, respectively working on DARPins and Fibronectin scaffolds, have gone public or been acquired, but to my knowledge, no non-antibody scaffold has led to an approved drug. There is such a strong infrastructure around mAbs: non-antibody scaffolds in theory proffer advantages over mAbs, but it’s hard to say that there’s a compelling case to diverge away from mAbs.

So what about software for designing mAbs? Visterra, which as I mentioned earlier had a successful exit, comes to mind. But again, it’s unclear what they’ve actually been able to produce and how valuable it’s been.

Question 2: What can the computational approach actually do?

This question is also complicated. While ML iteration is great and can perhaps predict binding affinity and toxicity, it’s almost impossible to predict whether factors like immunogenecity (whether or not the body’s immune system will try to attack the treatment) will make the drug candidate non-viable. The work to validate factors like immunogenecity are entirely in vivo, and it’s hard to make the case that the work done to validate one candidate transfers to the work that would have to be done to validate another, as we would love to have in a software company, which leads to…

Question 3: Is it a software company or a biotech company?

Software companies are very alluring in the context of biotech. The issue is that more often than not, these software approaches are fundamentally no different than any of the hundred other biotech “programs” coming out of Kendall startups; at the end of the day, they all produce a binary outcome. The programs either produce a viable therapeutic and are rewarded by a large amount of cash, or don’t and get rewarded by close to nothing; this is very different from traditional software companies that typically experience much more steady, incremental growth. Therefore, particularly when investing, it’s really important to stay cognizant of the fact that a software approach to drug design is no different than any other traditional biology driven program for drug design in terms of its return profile.

Biologic Manufacturing

Another use case of ML in biologic design lies in the manufacturing process. Certain classes of biologics (like bispecific antibodies) tend to be very difficult to manufacture to exact specifications because of their complexity. Companies like Asimov have arisen to address this problem. Described by Vijay Pande, the General Partner at a16z who led an investment in the company: 

“With Asimov’s approach, high-accuracy simulation, and circuit building-blocks, we can greatly speed the development of biological circuits — decreasing their cost, and greatly increasing their sophistication and complexity. Continuing with our analogy of computers here, we’re still in the “transistor phase” of things, so are not yet at the point where the full complexity of a modern microprocessor can be realized into the circuits of cells. But there are many initial applications where this technology can make major advances — much like how early microprocessors, as simple as they were, became a dramatically enabling technology.”

Vijay Pande, General Partner at a16z

It’s a remarkable concept, to be able to design biological circuits along the lines of Electronic Design Automation used in traditional chip design. The chief concern that I’ve seen in diving into companies in this space is from the pharma customer side. These technologies will become very tightly integrated into the biologic design process, in which so many things can already go wrong; as a result, pharma customers are frustratingly unwilling to introduce another piece of uncertainty into the biologic discovery flywheel without an enormous burden of validation. It’s quite difficult to predict when a startup in this space will be able to acquire that validation.


  • Biologics constitute a very exciting space today
  • A software approach doesn’t make a biotech company a software company
  • Software has a long way to go to establish its value in biologic drug design
  • Software also has a long way to go to establish its reliability in biologic manufacturing

Thanks for reading! I’ll be following up this post with a similar dive into software applications within the clinical trial space. Feel free to reach out if you have any questions!

About The Author

Co-President of the Harvard Technology Review.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

You don't have permission to register