Internship and thesis proposals
Controlling and understanding representations in state-space models of biological sequences

Domaines
Condensed matter
Statistical physics
Biophysics
Nonequilibrium statistical physics
Physics of living systems
Non-equilibrium Statistical Physics

Type of internship
Théorique, numérique
Description
State-space models (SSMs) process sequences token by token, storing past information in a latent state vector. Capturing long-range dependencies is crucial, especially for biological sequences like proteins or RNA, where distant sites interact in 3D structures. This project aims to understand and control these latent representations to design artificial biomolecules with desired properties. Computationally, we will train modern SSMs (e.g. MAMBA) on biological data and develop methods to steer state vectors during sequence generation, using ideas from disentangled representations and guidance techniques. Designed molecules may be experimentally tested with collaborators. Theoretically, we will build simplified models and use tools from statistical physics, such as dynamical mean-field theory, to analyze how SSMs learn long-range correlations and how representation control affects generated sequences. The internship and PhD project target students interested in the intersection of statistical physics, machine learning, biology, and computational modeling.

Contact
Jorge FERNANDEZ DE COSSIO DIAZ
Laboratory : IPhT -
Team : Statistical and condensed matter physics
Team Website
/ Thesis :    Funding :