Pretrained protein language model transfer learning: is the final layer representation what we want?

Large pretrained protein language models have improved protein sequence-to-function prediction. This often takes the form of transfer learning, where final-layer representations from large pretrained models are extracted for downstream tasks. Although pretrained models have been empirically successful, there is little current understanding of how the features learned by pretraining relate to and are useful for downstream tasks. In this work, we investigate whether transferring a partial model, by using the output from a middle layer, is as effective as full model transfer, and if so, whether successful transfer depends on the downstream task and model properties. Across datasets and tasks, we evaluate partial model transfer of pretrained transformer and convolutional neural networks of varying sizes. We observe that pretrained representations outperform the one-hot baseline for most tasks. More importantly, we find that representations from middle layers can be as effective as those from later layers. To our knowledge, our work is the first to report the effectiveness of partial model transfer for protein property prediction. Our results point to a mismatch between the pretraining and downstream tasks, indicating a need for more relevant pretraining tasks so that representations from later layers can be better utilized for downstream tasks.