Fifth International Conference on
Principles and Practice of Constraint
Programming

October 12-16, 1999
Alexandria, Virginia, USA










xywang@gmu.edu

 

Tutorial

Bioinformatics and Constraints

Rolf Backofen (1) and David Gilbert (2)

(1) Ludwig-Maximillians University, Munich, Germany: backofen@informatik.uni-muenchen.de (2) City University, London, UK: drg@cs.city.ac.uk

ABSTRACT

Bioinformatics is an exciting and rapidly developing field, and focuses on solving problems arising from biology using methodology from computer science. This tutorial introduces this topic for computer scientists, and highlights those areas which we believe to be suitable for the application of constraint solving techniques. The tutorial covers the following:
  1. Biological background: sequence, structure, function.
  2. Review of problem areas, including: genome and physical map & physical map, transcription, expression, properties of DNA language, alignment (sequence, structure), protein docking, ligand binding, metabolic pathways, protein design, structure prediction, models of evolution.
  3. Computational techniques: representation & visualisation of biological knowledge, database design for biological resources, data mining, pattern searching & discovery, phylogenetic trees, clustering.

OUTLINE

  1. Introduction
    1. Definition of bioinformatics -- solving problems arising from biology using methodology from computer science.
    2. The elicitation of DNA sequences from genetic material, their annotation, the control of gene expression (transcription of proteins from DNA), the relationship between the amino acid sequence of proteins and their structure.
    3. The importance of data: bio-databases (sequence, structure, expression, biochemical networks), and the design & implementation of algorithms to exploiting the nature of the data.
  2. Biological background:
    1. Genes / DNA; proteins: coding via DNA, expression, mutual interaction, structure & function; metabolic pathways and regulatory networks.
    2. The Central Dogma: the sequence of amino acids making up a protein, and hence its (folded state) structure and thus its function proteins is determined by transcription from DNA via RNA.
    3. Holy grails: development of computational methods to determine relationship between (protein) amino-acid sequence and structure and function.
  3. Classification of current problem areas in Bioinformatics
    1. Central Dogma related: sequence, structure, function
    2. Data related: storage, retrieval, analysis and abstraction. Data mining, database design, representation/ visualisation of biological knowledge.
    3. Simulation of biological processes: protein folding (molecular dynamics), composition of viruses, metabolic pathways
  4. Details of current problem areas
    1. Comparison and alignment of sequences or structures, in order to determine homology (evolutionary relationships). String matching, graph comparison, mixtures of both.
    2. Search and pattern discovery in genome, protein, expression and metabolic/regulatory databases. (large databases, noisy data). Stochastic approaches including Hidden Markov Models. Machine learning. Pattern languages.
    3. Phylogenetic trees, and models of evolution.
    4. Structure prediction: relationship between sequence and structure (RNA or protein). Protein folding and inverse protein folding.
    5. Genome related: genome map / physical map, genomic rearrangement, computational methods for DNA sequencing
    6. Biological function: protein docking, ligand binding; metabolic pathways and regulatory networks.
  5. Bioinformatics and Constraints
    1. NMR Structure Determination: finding a (maximal) consistent subset of distance constraints.
    2. Alignment and threading
    3. Protein structure prediction, protein docking
    4. Metabolic pathway analysis
    5. Patterns and databases
  6. The future of Bioinformatics
  7. How to get involved
  8. On-line resources and bibliography

Brief CVs

Dr. Rolf Backofen is a researcher and lecturer in the computer science department of the University of Munich. His main research interest are constraint programming, natural language processing, and bioinformatics. He is actively working on the protein structure prediction problem, where he applies sucessfully constraint techniques to solve this problem on a simplified models. He has regularly given course on bioinformtics for 3 years.

Dr David Gilbert holds a PhD in Computing from Imperial College and is a Senior Lecturer in the Department of Computing at City University where he is leader of the Bioinformatics Group. He is also a Visiting Research Fellow at the European Bioinformatics Institute, and the Department of Biochemistry, UCL. His primary research interest is the application of constraint solving techniques, within the framework of logic programming to problems in Bioinformatics, and his focus is on the use of constraints in pattern-based search, and the analysis of bio-databases (genomic, proteomic and metabolic). He has developed, and maintains, a fast topology-based protein structure comparison service at the EBI and is currently developing tools for analysing biochemical networks, as part of a project at the EBI.

His other research interests include semantics of interaction and distributed computations, agents, and the application of computational logic and constraint technology to the design and construction of software systems.