Projects

CoXpress

We have developed CoXpress, a software package for R that can be used to find groups of genes that display differential co-expression patterns in microarray datasets. It examines the relationship between genes, rather than their absolute expression. It looks at the differences between groups of co-expressed genes in two datasets and compares those differences to a random distribution in order to determine a p-value.

Traditional methods of analysing gene expression data often include a statistical test to find differentially expressed genes, or use of a clustering algorithm to find groups of genes that behave similarly across a dataset. However, these methods may miss groups of genes which form differential co-expression patterns under different subsets of experimental conditions. CoXpress is an R package that allows researchers to identify groups of genes that are differentially co-expressed.

ProGenExpress

ProGenExpress is a software package for R that we have developed. It allows users to quickly and easily visualise numerical data, such as microarray data or mutation scores, in the context of the genome organisation of sequenced prokaryotes.

The integration of genomic information with quantitative experimental data is a key component of systems biology. An increasing number of microbial genomes are being sequenced, leading to an increasing amount of data from post-genomics technologies. The genomes of prokaryotes contain many structures of interest, such as operons, pathogenicity islands and prophage sequences, whose behaviour is of interest during infection and disease. There is a need for simple and novel tools to display and analyse data from these integrated datasets, so we have developed ProGenExpress as a tool for visualising arbitrarily complex numerical data in the context of prokaryotic genomes.

DetectiV

Another software package that we have developed is DetectiV. It enables users to visualise, normalise and statistically test pathogen-detection microarray data.

DNA microarrays offer the possibility of testing for the presence of thousands of micro organisms in a single sample. However, there is a lack of reliable bioinformatics tools for the analysis of such data. We have developed DetectiV, a package for the statistical software R. DetectiV offers powerful yet simple visualisation, normalisation and significance testing tools. DetectiV performs better than previously published software on a large publicly available dataset.

IAH Bioinformatics Resources

The bioinformatics group offers support, advice and analysis in several areas:
Sequence Analysis and Genome Annotation

From sequence assembly through to genome annotation, we can offer advice and custom analysis of DNA and protein sequences. We have experience of analysing both prokaryotic and eukaryotic sequences. Problems we can often solve include the assembly of sequencing reads into a contig, large BLAST searches (i.e. when you have thousands of sequences to analyse), genome annotation of small fragments or entire genomes, gene prediction and gene annotation.

Microarray Data Analysis

We have custom pipelines set up for the analysis of virtually any kind of microarray data, either diagnostic or expression arrays. We can normalise, quality check and statistically analyse data, either for differential expression or more complex pattern recognition techniques, such as cluster analysis or class prediction. We can help organise data into a MIAME format for submission to public databases, and we can also analyse array sequences to get the latest annotation for each spot.

Statistics

We can offer statistical advice in any area of biology, from simple significance testing to visualisation tools and multivariate analysis.

Custom Analysis

In addition to the above, we can usually install and run any published software that one has encountered in the literature, as long as it is available to download.

Software we have at present includes:

BLAT
E-Predict
EMBOSS_4
GFMerge
GeneSplicer
GenoMap
GlimmerHMM
RepeatMasker
SNAP
TransTerm
apache
artemis
base
bioperl
NCBI blast
clustalw
dialign2
ensembl
fasttrans
glimmer2
hmmer
inforsense3.0
mfold
mysql
probelynx
rnahybrid
signalp
trf
wise2.2.0
wublast
If you have any questions relating to bioinformatics, then we encourage you to get in touch: michael.watson@bbsrc.ac.uk

Sample Code

KEGGSOAP: Example code

Here is some example code that may be used with various bioconductor (http://www.bioconductor.org) packages and R (http://www.r-project.org) to overlay microarray data on KEGG pathways. The code uses the GEOquery, biomaRt, KEGGSOAP and geneplotter packages, so make sure you have these properly installed. This is just one way of performing this task, and I do not claim that it is the best nor most efficient way of doing so!

Data is downloaded from GEO from an experiment using the Affymetrix chicken array. Links from that array to ensembl gene identifiers are then obtained from biomaRt. Files from the KEGG ftp site are downloaded and merged, and finally we arrive at a data frame of values linking the array data to KEGG genes and pathways. This data is then overlayed on the MAPK Signalling Pathway from KEGG.

The code is offered without any warranty or guarantee. Feel free to share or redistribute this code free of charge, but please acknowledge the author when doing so.

Publications:

  • Watson M (2007) DetectiV: Visualisation, normalisation and significance testing for pathogen-detection microarray data. Genome Biology 2007, 8:R19 [Abstract]
  • Watson M (2006) CoXpress: differential co-expression in gene expression data. BMC Bioinformatics 2006, 7:509 [Abstract]. 
  • Widdison S, Schreuder LJ, Villarreal-Ramos B, Howard CJ, Watson M, Coffey TJ (2006) Cytokine Expression Profiles of Bovine Lymph Nodes: Effects of Mycobacterium bovis Infection and BCG Vaccination. Clinical and Experimental Immunology 2006, 144:281-289 [Abstract]. 
  • Widdison S, Watson M, Piercy J, Howard C and Coffey TJ (2007) Granulocyte chemotactic properties of M. tuberculosis versus M. bovis-infected bovine alveolar macrophages. Molecular Immunology, [Abstract] . 
  • Watson M (2005) ProGenExpress: Visualization of quantitative data on prokaryotic genomes. BMC Bioinformatics 2005, 6:98 [Abstract]. 
  • Kaiser P, Poh T-y, Rothwell L, Avery S, Balu S, Pathania U, Hughes S, Goodchild M, Morrell S, Watson M, Bumstead N, Kaufman J, Young J (2005) A genomic analysis of chicken cytokines and chemokines. J Interferon and Chemokine Research 2005, 25(8): 467-484 [Abstract].