|








xywang@gmu.edu
|
Tutorial
Bioinformatics and Constraints
Rolf Backofen (1) and David Gilbert (2)
(1) Ludwig-Maximillians University, Munich, Germany:
backofen@informatik.uni-muenchen.de
(2) City University, London, UK: drg@cs.city.ac.uk
ABSTRACT
Bioinformatics is an exciting and rapidly developing field, and focuses
on solving problems arising from biology using methodology from computer
science. This tutorial introduces this topic for computer scientists, and
highlights those areas which we believe to be suitable for the application
of constraint solving techniques. The tutorial covers the following:
- Biological background: sequence, structure, function.
- Review of problem areas, including: genome and physical map & physical
map, transcription, expression, properties of DNA language, alignment
(sequence, structure), protein docking, ligand binding, metabolic
pathways, protein design, structure prediction, models of evolution.
- Computational techniques: representation & visualisation of biological
knowledge, database design for biological resources, data mining,
pattern searching & discovery, phylogenetic trees, clustering.
OUTLINE
- Introduction
- Definition of bioinformatics -- solving problems arising from biology
using methodology from computer science.
- The elicitation of DNA sequences from genetic material,
their annotation, the control of gene expression (transcription of
proteins from DNA), the relationship between the amino acid sequence of
proteins and their structure.
- The importance of data: bio-databases (sequence, structure,
expression, biochemical networks), and the design & implementation
of algorithms to exploiting the nature of the data.
- Biological background:
- Genes / DNA; proteins: coding via DNA, expression,
mutual interaction, structure & function; metabolic pathways and
regulatory networks.
- The Central Dogma: the sequence of amino acids making up a protein,
and hence its (folded state) structure and thus its function proteins
is determined by transcription from DNA via RNA.
- Holy grails: development of computational methods to determine
relationship between (protein) amino-acid sequence and structure and
function.
- Classification of current problem areas in Bioinformatics
- Central Dogma related: sequence, structure, function
- Data related: storage, retrieval, analysis and abstraction. Data mining,
database design, representation/ visualisation of biological knowledge.
- Simulation of biological processes: protein folding (molecular dynamics),
composition of viruses, metabolic pathways
- Details of current problem areas
- Comparison and alignment of sequences or structures, in order to determine
homology (evolutionary relationships). String matching, graph comparison,
mixtures of both.
- Search and pattern discovery in genome, protein, expression and
metabolic/regulatory databases. (large databases, noisy data).
Stochastic approaches including Hidden Markov Models. Machine learning.
Pattern languages.
- Phylogenetic trees, and models of evolution.
- Structure prediction: relationship between sequence and structure
(RNA or protein). Protein folding and inverse protein folding.
- Genome related: genome map / physical map, genomic rearrangement,
computational methods for DNA sequencing
- Biological function: protein docking, ligand binding; metabolic pathways
and regulatory networks.
- Bioinformatics and Constraints
-
NMR Structure Determination: finding a (maximal) consistent subset of distance
constraints.
- Alignment and threading
- Protein structure prediction, protein docking
- Metabolic pathway analysis
- Patterns and databases
- The future of Bioinformatics
- How to get involved
- On-line resources and bibliography
Brief CVs
Dr. Rolf Backofen
is a researcher and lecturer in the computer science
department of the University of Munich. His main research interest are
constraint programming, natural language processing, and bioinformatics.
He is actively working on the protein structure prediction problem, where
he applies sucessfully constraint techniques to solve this problem on
a simplified models. He has regularly given course on bioinformtics for
3 years.
Dr David Gilbert
holds a PhD in Computing from Imperial College and is a Senior Lecturer in
the Department of Computing at City University where he is leader of the
Bioinformatics Group. He is also a Visiting Research Fellow at the
European Bioinformatics Institute, and the Department of Biochemistry,
UCL. His primary research interest is the application of constraint
solving techniques, within the framework of logic programming to problems
in Bioinformatics, and his focus is on the use of constraints in
pattern-based search, and the analysis of bio-databases (genomic,
proteomic and metabolic). He has developed, and maintains, a fast
topology-based protein structure comparison service at the EBI and is
currently developing tools for analysing biochemical networks, as part of
a project at the EBI.
His other research interests include semantics of interaction and distributed
computations, agents, and the application of computational logic and
constraint technology to the design and construction of software systems.
|