Francesca-Zhoufan Li

Francesca-Zhoufan Li

AI for Science & Engineering, currently focusing on machine learning for proteins

California Institute of Technology

Arnold Lab

Yue Crew

About

With a broad interest in applying AI to science and engineering problems, I am currently focusing on machine learning-assisted protein engineering as a Bioengineering Ph.D. student at Caltech co-advised by Frances Arnold and Yisong Yue. My current project is on multi-modal representation learning for predicting top protein fitness from site-saturation mutagenesis libraries. I have also worked with Kevin K. Yang, Alex X. Lu, and Ava P. Amini through my summer internship at Microsoft Research New England on transfer learning for pretrained protein language models.

During and shortly after my time at the University of California, Berkeley obtaining my B.S. in Bioengineering and B.S. in Chemical Biology, I worked on RNA-seq software tool development at Zymergen, genetic circuit component discovery with Richard Murray, cancer immunotherapy and SARS-CoV-2 antibody therapeutics development at NYU Langone Health with Shohei Koide, cell-free platform optimization at Tierra Biosciences, and hands-on wetlab metabolic engineering and synthetic biology tool building projects at the Dueber Lab.

Outside of research, I most enjoy being active outdoors, experiencing diverse cultures, solving fun puzzles, and doing minimalism iPhoneography. Given my personal background and journey, I strive for providing equitable opportunity and individualized education, especially in STEM, through mentoring, teaching, and outreach volunteering.

Download my resumé .

Interests
  • Machine learning
  • Protein engineering
  • Computational biology
  • Bioengineering
Education
  • Ph.D. in Bioengineering, 2025

    California Institute of Technology

  • B.S. in Bioengineering, 2019

    University of California, Berkeley

  • B.S. in Chemical Biology, 2019

    University of California, Berkeley

ML Experience

 
 
 
 
 
BioML Research Intern
Jun 2022 – Sep 2022 Cambridge, MA
Transfer learning for pretrained protein language models
 
 
 
 
 
Machine Learning for Proteins
Arnold Lab & Yue Group, Caltech
Jan 2021 – Present Pasadena, CA
  • Multi-modal (ie. sequence, structure) representation learning pipeline for protein fitness (ie. binding, catalysis) prediction
  • Protein single- to multi-mutant fitness prediction

Projects

Multimodal Representation Learning for Proteins
Multi-modal representation learning for predicting top protein fitness from site-saturation mutagenesis libraries
Multimodal Representation Learning for Proteins