Analysis of High-throughput Gene Expression Profiling1Why to Measure Gene Expression1.Determines which genes are induced/repressed inresponse to a developmental phase or to anenvironmental change.2.Sets of genes whose expression rises and fallsunder the same condition are likely to have arelated function.3.Features such as a common regulatory motif can bedetected within co-expressed genes.4.A pattern of gene expression may be used as anindicator of abnormal cellular regulation.A useful tool for cancer diagnosis2Why to Measure Gene Expression in Large Scale?Transitional vs.High-throughput Approaches3Techniques Used to Detect Gene Expression Level Microarray(single or dual channel)Microarray(single or dual channel)SAGESAGEEST/cDNA libraryEST/cDNA libraryNorthern BlotsSubtractive hybridisationDifferential hybridisationRepresentational difference analysis(RDA)DNA/RNA Fingerprinting(RAP-PCR)Differential Display(DD-PCR)aCGH:array CGH(DNA level)High-throughput High-throughput 4Basic Information of Microarray,SAGE and cDNA Library5(DNA)Microarray1.Developed around 1987.2.Employ methods previously exploited in immunoassay context specific binding and marking techniques.3.Two types of probes:Format I:Format I:probe cDNA(5005,000 bases long)is immobilized to a solid surface such as glass;widely considered as developed at Stanford University;Traditionally called DNA microarrays.Format II:Format II:an array of oligonucleotide(2080-mer oligos)probes is synthesized either in situ(on-chip)or by conventional synthesis followed by on-chip immobilization;developed at Affymetrix,Inc.Many companies are anufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies.Historically called DNA chips.6MicroarraySingle Channel:sub-type classification Dual Channel:differential expression gene screeningTissue microarrayProtein microarray7Array CGHDetecting DNA copy variation via microarray approachA hotspot in recent research works,especially in Cancer research 8Microarray Analysis gene discovery pattern discoveryinferences about biological processesclassification of biological processesWhich genes are up-regulated,down-regulated,co-regulated,not-regulated?9SAGEExperimental technique assigned to gain a quantitive measure of gene expression.10-20 base“tags”are produced(immediately adjacent to the 3 end of the 3 most NlaIII restriction site).The SAGE technique measures not the expression level of a gene,but quantifies a tag which represents the transcription product of a gene.10SAGETags are isolated and concatermized.Relative expression levels can be compared between cells in different states.11SAGEmap()12SAGE:comparing two relational libraries13EST library(UniGene)14Gene expression info from Unigene Library15An Example of In-house EST Library Analysis 16The Algorithms and Challenges of High-throughput Gene Expression Analysis17Seeing is believing?No,need to correct errors.18SAGE:A typical experiment requires 30,000 gene expression comparisons where normal and a diseased cell is compared.The results were subject to the size and reliabilities of the SAGE libraries.Statistical measures are used to filter out candidate genes to reduce the dimensionality of the data but it is tedious and time consuming to play with these measures until a good set is found.19SAGETPM:a simple normalization methodTPM=Count*1000,000/TotalCountBayesian approach 20Microarray:Sources of errorssystematicrandom log signal intensity log RNA abundance21Sources of Errors(Cont.)Printing and/or tip problemsLabeling and dye effects(differing amounts of RNA labeled between the 2 channels)Differences in the power of the two lasers(or other scanner problems)Difference in DNA concentration on arrays(plate effects)Spatial biases in ratios across the surface of the microarray due to uneven hybridizationcDNA array cannot distinguish alternatively spliced forms22Errors that cannot be corrected by statisticsCompetitive hybridization of different targets on the chipFailure to distinguish different splicing formsMisinterpretation of time course data when there are not sufficient pointsMisinterpretation of relative intensity23Does clustered time course really mean co-expression?Picture taken from Yes,you can studyknown system(such as cell cycle)this way;but,how about the unknown systems?24Normalization by iterative linear regressionfit a line(y=mx+b)to the data set set aside outliers(residuals 2 x s.e.)repeat until r2 changes by 0.001then apply slope and intercept to the original datasetD Finkelstein et al.25Normalization(Curvilinear)G Tseng et al.,NAR 200126After Normalization Differentially Expressed(DE)Gene screeingT-testT-statisticsSVMClusteringHierarchicalSOMK-meansNetwork(Pathway)analysisBioCarta,KEGG,GO databasesBayesian network learningTopology 27Bioinformatics challenges1.data management2.utilizing data from multiple experiments3.utilizing data from multiple groups*with different technologies*with only processed data available28Bioinformatics Analysis of Integrated Analysis of Gene Expression Profiling29Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progressionDaniel R.et al.PNAS,2004(101),9309-9314 T-test Q values(estimated false discovery rates)were calculated as where P is P value,n is the total number of genes,and i is the sorted rank of P value.30Cont.Meta-Profiling.The purpose of meta-profiling is to address the hypothesis that a selected set of differential expression signatures shares a significant intersection of genes(a meta-signature),thus inferring a biological relatedness.3167 genes were screened by mata-analysis32Integrated Cancer Gene Expression Map 337 genes were discovered by the system


