Group of Immunosequencing Algorithms

The Group main focus lies in developing computational methods and bioinformatic solutions dedicated to pre-processing, analysis and interpretation of deep immune repertoire sequencing data. We aim at transforming the readout of millions of T- and B-cell receptor sequences into immunologically relevant features, such as the overall immune repertoire diversity, the magnitude of selection pressure coming from self and foreign antigens, and, ultimately, the ability of certain T- and B-cells to mount an effective response against specific pathogens.

The Group collaborates with Immunosequencing methods laboratory (Prof Chudakov DM), Structural organization of T-cell immunity group (Dr Britanova OV), Laboratory of Comparative and Functional Genomics (Prof Lebedev YuB, Dr Mamedov IZ, Dr Zvyagin IV), Prof. Can Kesmir, Utrecht University, Prof. Andrew Sewell, Cardiff University, Prof. David Price, Cardiff University.

The group has branched from the Prof Chudakov's Genomics of Adaptive Immunity Lab at 2017.

  • Developing algorithms for comprehensive analysis of T- and B-cell receptor sequencing data. These algorithms include basic summary statistics such as rearrangement probabilities for certain germline segments, physicochemical characteristics of hypervariable CDR3 regions, somatic hypermutation profiles and lineage tracing for B-cells.
  • Developing and maintaining the database of T-cell receptors (TCRs) with known antigen specificities (VDJdb, The database contains T-cell receptor sequences encoding receptors that were experimentally validated to recognize known epitopes in a given HLA context.
  • Developing and maintaining the database of TCR:peptide:MHC structures. Developing TCR specificity prediction algorithms that integrate structural data with the TCR specificity database to predict TCR recognition based on linear amino acid sequences of TCR, peptide and MHC.
  • HLA binding prediction. Inferring the effect of certain HLA alleles on the structure of T-cell repertoire and the set of so-called "public" clonotypes. Using HLA binding prediction and cognate TCR repertoire data to infer antigen immunogenicity.

All publications (show selected)


Mikhail Shugay

A new approach to the functional analysis of sequencing data for T-lymphocyte repertoires

In collaboration with Laboratory of immunosequencing methods,  Laboratory of comparative and functional genomics

In order to extract clinically relevant information from large high-throughput sequencing of TCR repertoires we create a new statistical approach - Antigen-specific Lymphocyte Identification by Clustering of Expanded sequences (ALICE) /fig a/. We applied our algorithm to distinguish naïve from the effector memory cells in available TCR beta repertoires /fig b/, to identify reactive T-cell clones in mixed lymphocyte reaction (MLR) assay /fig c, d/, to fractionate TCR repertoires of patients with autoimmune disease or ones being under cancer immunotherapy, or subject to an acute viral infection. In summary, implementation of ALICE facilitate the identification of TCR variants associated with diseases and conditions, which can be used for diagnostics and rational vaccine design.