Data mining: The TranscriptomeBrowser approach (TBrowser)
We recently developed a novel clustering algorithm, DBF-MCL (“Density Based Filtering and Markov Clustering”), that relies on nearest neighbor call analysis and on subsequent graph partitioning step using the Markov clustering. One interesting aspect of DBF-MCL is that it was designed to handle noisy expression datasets as it can detect informative genes (those that fall in a cluster) prior to classification procedure. Taking advantage of the capabilities of DBF-MCL we searched clusters of co-regulated genes in a large panel of human, mouse and rat Affymetrix microarray datasets stored In the Gene Expression Omnibus database. All transcriptional signatures (TS) where stored in a relational database and a JAVA interface, TranscriptomeBrowser (TBrowser), was developed. As reported earlier, TBrowser can be used to search though hundreds of experiments for the joint regulation of several genes. In our first work, we provided several case study using TranscriptomeBrowser to identify novel, biologically interesting groups of genes associated with breast cancer related-genes or T-Cell developement using very simple Boolean queries. This study provided compelling evidence regarding the usefulness of TranscriptomeBrowser.
T-cell development and differentiation
Adaptative responses relies on the generation of T-cell populations that can recognize a broad spectrum of foreign pathogens. In order to generate a large repertoire, stochastic rearrangements of the TCR (T Cell Receptor) are performed during thymocyte development. This process leads to the production of self-reactive and potentially harmful T-cell clones that are mostly eliminated before their export to the periphery. TCR stimulation by antigen presenting cells in the periphery leads to differentiation of CD4 T cells towards several phenotypes known as Th1, Th2, Th17 or Treg. This differentiation processes are driven by cytokines in the microenvironment. Indeed, interferon (IFN)-γ and interleukin-12 (IL-12) are known to be potent inducers of Th1 whereas IL-4 enforces commitment toward Th2 phenotype. More recently, the crucial role of IL-23 in inducing Th17 phenotype as been shown. Each of this differentiated CD4 T cells subtype plays a particular role in immune system functions. Th1 cells participate in the elimination of intracellular pathogens and induce production of complement-fixing antibodiesby B-cells. Th2 cells produce IL-4, IL-5, IL-13 and IL-25 and participate in clearance of extracellular pathogens and parasites. Th2 CD4 T-cells may play an important role in the pathophysiology of allergic diseases, including asthma. To get new insight into the development of Th sub-populations we are using both classical molecular biology tools, high-throuput methods (such as microarrays) and data-mining approaches (microarrays, ChIP-Chip, ChIP-Seq, PPI, GO,,...). The aim is to get to create a dynamical models of developing T-cell integrated into the GINSim software.
RTools4TB:Programmatic access to TBrowser
In order to ease programmatic access to the TranscriptomeBrowser (TBrowserDB) database we have developed an R package (RTools4TB) implementing functions that allow to retrieve TS through a dedicated web service. Furthermore, the library also implements the DBF-MCL algorithm, a fast and robust alternative to conventional clustering algorithms. For the representation of DBF-MCL results (DBFMCLresult class), we used the ‘S4’ system of formal classes and methods, that was popularized by the bioconductor project. The core subroutine of DBF-MCL algorithm were written in C and are linked dynamically into R. Currently, the partitioning step is performed using a system call to the MCL application. This limits the use of RTools4TB to unix-like platforms. RTools4TB implements several popular normalization methods that can be applied to the dataset prior to classification (normal score transformation, quantile normalization, rank normalization). Furthermore, the DBF-MCL function can be used with various metrics for distance calculation (Euclidean distance, Pearson's correlation coefficient-based distance, Spearman's rank correlation-based distance).