David Baker from the University of Washington presents breakthrough advancements in de novo protein design. Deep learning pattern recognition hallucinates the desired protein structure and also generates the correct peptide sequence for accurate folding, and predicted proteins are highly transferable to actual proteins produced in a lab.

Results are easily transferable to the production scale on a rapid timeline of weeks. Applications are vast in breadth and depth. Present interests are designing protein assemblies for molecular machines. Components have been successfully assembled and although results are not yet satisfactorily validated, they appear to perform controlled work. Another application is targeting cells with a more accurate computational recognition method. A third application is clinical trials that are underway of a novel vaccine that is highly effective against coronavirus.

Longer-term applications:

1 – Universal Vaccines
2 – Larger Alphabet of Novel Amino Acids
3 – Advanced Drug Delivery
4 – Smart Therapeutics
5 – Next-Generation Materials


Presentation: Protein-based Assemblies and Molecular Machines

  • The way we’ve been thinking about protein folding is that each protein structure is likely the global energy minima for each sequence.  The thought for predicting protein folding was that we would generate many different structures computationally and then sample them for the lowest energy conformation.
  • What we are doing with de novo protein design is the opposite. We start with the protein design, and then we search for the amino acid sequence that is that desired backbone and protein structure.
  • We use a program called Rosetta that samples through the space of possible structures and the space of possible sequences.  Very recently we started using deep learning applied to these problems
  • The landscape of potential proteins is absolutely enormous and this large volume of data to work through has been the primary reason that de novo protein design has not happened yet.  The scale of the possible sequences and conformations of proteins is literally astronomical with only 100 residue amino sequences per unique strand representing 1.3x 10^130 possible variations.
  • With the very recent emergence of deep learning as a tool for protein design, we now have access to creating novel proteins in the vast space of non-existent potential proteins that is represented by the grey background.
  • The red dots represent existing proteins which are clustered in small dots because of the mechanisms of nature which provide for incremental change branching out among families of proteins.
  • Previous protein structure prediction methods work by starting with an amino  acid sequence and the structure prediction network outputs a sharp distance map of distances between residues within the protein.
  • With network hallucination, random noise is fed into the algorithm and the blurry distance map is sequentially revised becoming sharper and more coherent until the map looks like the sharp distance map of the target protein. 
  • This process is very similar to a deep  network that recognizes cat pictures on the internet, where random noise is fed into the cat recognition deep network and progressively revised until the random noise evolves into a novel cat that is recognized as a cat by the deep network.
  • This is a residue-by-residue contact map. The maps at step 0 show random noise fed into the deep network.  With iterations, a de novo structure emerges that is not related to natural proteins.  A black dot in the map indicates that residues are in close proximity.
  • When we start with random seeds to hallucinate de novo protein outputs, we get a large volume of de novo protein outputs.  These hallucinated protein structures are not related to naturally occurring proteins.
  • We’ve made many de novo proteins that are unrelated to natural proteins.  Not only can we hallucinate them into existence but we can also create these proteins in the lab; we have three successful proteins in hand from this generation process.
  • A shortcoming of previous physical-based protein prediction methods is that because the mechanism was seeking energy minima of the amino acid sequence, sometimes the protein structure that was predicted would end up within one low energy well when there would also be one or more additional energy wells available for the same amino acid sequence that were undiscovered.
  • With the use of Rosetta we can see the full landscape of probable structures for a given amino acid sequence. 
  • Previous methods used only the existing physical points of a protein in its calculations.  This method uses also the probability distances between all residues when it is calculating its probability function.  As a sequence is changed, the probability distributions are all updated.  The deep learning knows about the partition function implicitly when it’s doing its probability function and this implicit data awareness is the key for the new method.
  • We can design by function by constraining select portions of the overall structure, and allow the hallucinations to design the space between so that the constrained portions are positioned as desired.
  • The exact mechanisms for filtration and selectivity of potassium have been debated for years.  We proved simple principles of potassium filtration by construction. 
  • We recently learned how to design a wide variety of bigger beta barrels for all kinds of molecular filtration including for single-molecule DNA sequencing.  Single-molecule DNA sequencing works by reading the molecular charge of each sequence and we are beginning that design process.  To make larger transmembrane beta barrels, we need more precise control over the backbone and placement of side chains to compensate for the loss of compact stabilizing interactions.
  • Antibodies have a stable and consistent structural component that is uniform across antibodies.  There is also a variable functional component of antibodies. We use the stable component with its 2-fold symmetry of  the antibodies to bind to a 5-fold symmetry protein; these two structures combine into a self-assembling nanocage that also has active antibody sites. 
  • The stable portion of the antibody binds the 5-fold structural protein; using the structural component of the antibody for the cage structure while leaving the active sites exposed.  The antibody cages are highly effective as coronavirus vaccines when compared to mRNA coronavirus vaccines, and they are already in clinical trials.
  • We designed de novo proteins that block the coronavirus spike protein by binding to the active site and so preventing the virus from entering cells.  The coronavirus is not able to bind to the ace2 receptor that it uses to enter cells.  These proteins effectively block coronavirus in animal trials.
  • Variant de novo proteins were designed that block other strains of the coronavirus including South Africa, the UK, and Brazil.  Rapid turnaround time is possible with a 2-week development cycle until ready for trials.  Mass scale production is possible with standard bacterial production of proteins.
  • Biosensor cages open when a target is present, allowing luciferase to bind to the cage and to generate light.  This type of sensor can be used for numerous types of targets including botulin toxins, coronavirus, and troponin.
  • Cellular surface AND logic using de novo proteins allows for more discriminatory cellular targeting.  This can kill tumor cells and spare healthy tissue.
  • Markers 1 and 2 are present on disease cells and marker 3 is present on healthy cells.   Car T cells will now target markers 1 and  2 and not 3.  This targets the cancer cell.  It ignores healthy cells because marker 3
    is present.
  • Rotary Motion with de Novo Protein Design
  • Can we use de novo protein design to make axles rotors and wheels towards creating nano-assemblies with precision proteins?  How would we thread the wheels onto the axles?  How could we power the assembly?
  • Protein Rotary Machine: Components
  • We already structurally validated a wide variety of de novo protein assembly components.   Connecting the rotors around axles seemed to be a big challenge, but we were able to connect the components with relative ease.
  • Design for Smooth Motion
  • We have rotors and axles of varying symmetries and we have successfully connected rotors and axles with the same symmetry and with different symmetries.   Matched symmetry rotors with matched symmetry axles produce energy minima “ruts” while mismatched symmetries produce smoother mobility about the axle.
  • Threading Rotors onto Axles
  • Primarily electrostatics is used to connect the components of our de novo protein assemblies.  In this instance, changing the pH was used to connect the rotor around the axle.  When the pH is lowered the rotor components self assemble around the axle.  The rotor is stable and locked on to the axle and ready to be powered. 
  • Rotary Work Mechanism
  • Once connected we use catalysis to drive the rotary component around the axle.  There is a simple catalytic site between the rotor and the axle.  A small molecule enters the site and the catalytic reaction drives the angular momentum of the rotor.
  • It is suspected that the motion is unidirectional and there is limited and inconclusive evidence pointing in this direction.  Characterizing and validating this unidirectional motion is the biggest challenge in this research project.
  • Rotary Work Fuel
  • The catalysis involves Aldol Fuels and Diketone Suicide inhibitors.  The suicide inhibitors can be used to chemically modify the active enzyme site that drives reactions. 
  • D3 and D2 symmetry proteins self assemble into a hexagonal lattice of protein chainmail.  These were used to signal cell changes by surface interactions, and they can be used for large-scale receptor clustering, block endocytosis, computing, catalysis, and bottom-up atomic precise manufacturing.

There are five grand challenges that can be addressed with de novo protein design:


1 – Universal Vaccines

We can design a flu vaccine where 1 shot gives a lifetime of immunity from the flu.  Also, we can protect against intentional acts of bio terrorism with rapid two-week development times until clinical trials.


2 – Larger Alphabet

Instead of only 20 amino acids from nature, we can design thousands of novel amino acids.


3 – Advanced Drug Delivery

We will be able to target disease cells with more precision and deliver drugs to target cells that were previously inaccessible.


4 – Smart Therapeutics

We will be able to perform calculations within the body.  This will enable for example smart targeting that differentiates between subsets of the same types of immune cells.


5 – Next-Generation Materials

Silk, abalone shell, tooth, horns, and hair are all protein-based.  We can approach ecological issues by mimicking or creating new types of materials.


5 to 15 Year Possibilities

Repatterning matter at the atomic scale is a major goal and is within reach with the broad mechanosynthetic capabilities of de novo protein design. 

A key step to achieve this is asymmetric reactions which may be achieved with ropes and walkers to facilitate more complex reactions; or with other means of facilitating more complex reactions including surface chemistry, lego chemistry, or compartmentalized chemistry.

Other possibilities include more complex logic; with additional features such as time delays.


Antiviral: Longxing Cao, Brian Coventry, Inna Goreshnik with Mike Diamond, Brett Case, Dan Barouch, David Veesler

Diagnostic: Alfreedo Rubio, Andy Yeh, Byung-Ha Oh

Antibody Nanocages: Robby Divine with David Veesler, Ha Dang

Self Assembling 2D lattice: Ariel Bel-Sasson with Joe Watson, Emmanuel Derivery

De novo designed membrane proteins: Chunfu Xu, Peilong Lu, Anastassia Vorobeieva, Samuel Lemma, Yulai Liu

Designed Protein Logic: Marc Lajoie, Scott Boykenm Jilliane Brruffey, w Stan Ridell, Alex Salter

Foldit: Briand Koepnick

Rotary Motor: Alexis Courbet, Yakov Kipnis with Jesse Hansen, Justin Kollman

Protein folding by deep network hallucination: Ivan Anischanka, Sam Pellock, Tamuka Chidyausiko


Seminar summary by Tim Potter.