二代测序实验与测序原理

上传人:t****d 文档编号:243404151 上传时间:2024-09-22 格式:PPT 页数:43 大小:3.30MB
返回 下载 相关 举报
二代测序实验与测序原理_第1页
第1页 / 共43页
二代测序实验与测序原理_第2页
第2页 / 共43页
二代测序实验与测序原理_第3页
第3页 / 共43页
点击查看更多>>
资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,*,二代测序的建库与测序原理,上海生物信息技术研究中心,1,内容,样本处理与测序原理简介,罗氏,454,Illumina solexa,原始数据质量控制,2,TRUSEQ RNA AND DNA SAMPLE PREPARATION,3,CLUSTER GENERATION OVERVIEW, 1000-6000 molecules per cluster,4,OH,OH,flowcell,diol,P7,P5,Cluster Generation, Template Hybridization,diol,diol,Template,hybridization,diol,diol,Initial extension,diol,diol,Denaturation,5,diol,diol,1,st,cycle,denaturation,1,st,cycle,annealing,diol,diol,n=25,1,st,cycle,extension,diol,diol,diol,diol,2,nd,cycle,denaturation,2,nd,cycle,annealing,diol,diol,diol,Cluster Generation, Bridge PCR,diol,diol,diol,2,nd,cycle,extension,6,TEMPLATE PREPARATION-BRIDGE RCR,Adaptor ligation,Surface attachment,Bridge amplification,Denaturation,Trends in Genet 24:133(2008),7,C,A,G,T,C,A,T,C,A,C,C,T,A,G,C,G,T,A,5,G,T,C,A,G,T,C,A,G,T,C,A,G,T,3,5,First base incorporated,Cycle 1: Add sequencing reagents,Detect Signal,Cleave Terminator and Dye,Cycle 2-n: Add sequencing reagents,and repeat,SEQUENCING BY SYNTHESIS OVERVIEW,8,CYCLIC REVERSIBLE TERMINATION,All four labeled reversible terminators are added per cycle,Remove unincorporated bases and detect signal,Remove the terminating group and the fluorescent dye,Trends in Genet 24:133(2008),Terminating group,Fluorophore cleavage,Nat Rev Genet 11:31(2010),9,BASE CALLING,10,FLOWCELL,LAYOUT ON GAII,A flow cell contains,8,lanes,Lane 1,Lane 2,Lane 8,.,.,.,Column 1,Column 2,Tile,Each lane contains,2,columns,Each column contains,60,tiles,Each tile is imaged,4,times per cycle,11,PRIMARY DATA ANALYSIS BY FIRECREST AND BUSTARD IN RTA/OLB,tiff image file,Intensity file,Firecrest,Bustard,X, Y,A C G T,Cycle 1,Cycle 2,Position,Tile#,Lane#,A C G T,X, Y,Lane#,Sequence,Sequence file,12,OH,diol,diol,OH,Cluster Generation, Sequencing Primer Hybridization,(,Single,测序方式处理步骤),Linearization,OH,Blocking with,ddNTP (,),Denature and,Hybridization,SBS3,OH,13,SEQUENCE MULTIPLE SAMPLES IN THE SAME LANES,DNA insert,Read 1,Index Read,Read 2,DNA insert,Index,Index SP,Rd2 SP,Rd1 SP,Multiplexing, multiple samples in the same lanes,14,PAIR-END,测序优势,Read 1,Read 2,Known Distance,Repetitive DNA,Single read maps to multiple positions,Paired read maps uniquely,15,MATE-PAIR,建库和测序,Read 1,Read 2,Known Distance,Molecular Ecology Resources (2011),16,TEMPLATE PREPARATION- EMULSION PCR,Trends in Genet 24:133(2008),Fragmentation,Ligation,Water-in-oil emulsion,Mirco-reactor,emPCR,PicoTiter Plate loading,17,PYROSEQUENCING,Single dNTP type flows per cycle,Inorganic pyrophosphate (PPi) drives visible light through a series of reactions,Remove unincorporated nucleotide,Trends in Genet 24:133(2008),18,BASE CALLING,Homopolymer error,GV6330,19,灵活的多样本标签技术,Primer A,MID,Key,Library fragment,Primer B,Sequencing primer,20,454,、,SOLEXA,测序模式,454,solexa,Single,Single,或什么都不说,Pair end,Pair end,Mate pair,21,Detect H,+,released as a voltage changefast,Common microchip design standardslow-cost manufacturing,Sequencing volume is increasing,Semiconductor sequencing,22,FASTA,序列格式,Fastq,文件用,4,行记录一条序列,第一行以,字符开头,跟在后面的是序列标识和描述,第二行是序列字符,第三行以,+,字符开头,后面可以为空,或者和第一行一样,第四行是第二行序列质量数据的编码,长度需和第二行一样,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG,CGACAATTTTTTTTGATATTAATAAAGATAGAACTTTCTTCCTATGAGTTTTCTCTC,+,CCCFFDFFHHHHGJJGHIIJGIIJJJJIIJJHJJJJJIJJIIIGIIIJGGIHJDIJIGAHEHFFGHGHE,Example:,23,ILLUMINA SEQUENCE IDENTIFIERS,HWI-EAS364_0004:4:1:995:9044#0/1,HWI-EAS364_0004,仪器唯一名称,4,Flowcell Lane,1,在,Flowcell Lane,中,Tile,编号,995,在,Tile,中簇的,x,坐标,9044,在,Tile,中簇的,y,坐标,#0,混合样本中的,index,编号,(0,代表没有,index),/1,Pair,配对的成员,Casava 1.8,以前的序列标识,24,ILLUMINA SEQUENCE IDENTIFIERS,HWI-ST507:211:C18E6ACXX:2:1101:1688:1992 1:N:0:GAGTGG,HWI-ST507,仪器唯一名称,211,Run ID,C18E6ACXX,Flowcell ID,2,Flowcell Lane,1101,在,Flowcell Lane,中,Tile,编号,1688,在,Tile,中簇的,x,坐标,1992,在,Tile,中簇的,y,坐标,1,Pair,配对的成员,(1,或者,2),N,Read,是未通过过滤,(Y:read,是坏的,,N:read,是好的,),0,Control bits,,,0,表示,control bits,没有设置,GAGTGG,Index,序列,Casava 1.8,的序列标识,25,序列质量,附:,Solexa 1.3,以前的,quality,计算公式是,:,Quality,计算:,Q,是用,phred quality score,的计算方式计算得到,:,p,是对应的碱基,call,错的概率,计算得到的,Q,值是一个整数,将这个,Q,值加上,33,或者,64,后再转换成,ASCII,字符,26,SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS.,.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.,.IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII.,.,J,JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ.,LLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLLL.,!#$%?ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqr,| | | | |,33 59 64 73 104,0.26.31.40,-5.0.9.40,0.9.40,3.9.40,0.26.31.41,S - Sanger Phred+33, raw reads typically (0, 40),X - Solexa Solexa+64, raw reads typically (-5, 40),I - Illumina 1.3+ Phred+64, raw reads typically (0, 40),J - Illumina 1.5+ Phred+64, raw reads typically (3, 40),with 0=unused, 1=unused, 2=Read Segment Quality Control Indicator (bold),(Note: See discussion above).,L - Illumina 1.8+ Phred+33, raw reads typically (0, 41),Q,值对应,ASCII,码,27,28,29,30,454,原始数据图片、,SFF,格式、,FASTA,格式(,QUAL,),HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_39,ACGTGTTCTGAGCCATATTGCGGTACTGGAAGGTGCGCCTGCACTGTCTGAGCACTGGTCACTGCTCGATACCAATGAAGCCTTATTTGATGAGGCGCGCACCACGCAGGCGGCGACTATTATCTTCTCGTTTGATCCAGAATAACCAAATCGAAAACGCTGGCAAGGCACACAGGGGATA,HSAPGDX01D1KDA length=181 xy=1540_3788 region=1 run=R_2012_08_01_00_39_39,40 40 40 40 40 40 40 39 37 38 36 34 24 23 19 19 19 24 20 19 18 18 26 26 18 18 19 18 20 20 20 25 25 26 19 20 20 22 22 22 25 28 26 24 22 22 22 25 24 28 28 28 29 29 28 30 30 30 26 2626 27 27 27 31 31 30 28 28 28 30 30 30 30 26 21 21 20 20 26 27 28 24 25 20 20 20 20 19 19 19 27 28 28 30 30 31 30 28 28 30 31 31 32 32 31 31 30 30 30 31 27 24 24 22 20 20 20 22 2626 22 22 23 16 16 16 19 22 16 13 13 13 16 22 23 23 23 26 26 24 24 26 13 13 11 11 12 12 19 22 18 18 11 11 13 13 18 24 24 24 24 26 26 26 27 29 29 31 33 32 31 31 27 27 27 29 29 28 2622,31,454,原始数据长度分布(质控后一样),32,Yield,data size produced by sequencer.,Reads,sequenced fragments.,Read length and quality.,Coverage fold,number of times a nucleotide is represented.,Depth,the average coverage fold.,Coverage rate,ratio of the region sequenced to the whole genome.,Homopolymer,e.g. AAAAA,一些测序中提到的基本概念,通常深度测序数据处理流程,Image data,SCS, IPAR, Pipeline,Quality control,Statistics calculation,Coverage, depth, mapping efficiency,Basic analysis procedure,Consideration,Common software,Data filtering,Seqclean, lucy, fastx-tools,Length & Quality,Sequences data,Quality calibrated,Sff_extract script,Assembly,(no reference),Mapping,(with reference),Bowtie, SOAP, Bwa, SSAHA2,CAP3, gassembler, MIRA, Celera,Advanced Analysis,Different application,SNP: MAQ, Pyrobayes, ssahasnp,QualitySNP,Gene forecast:,Glimmer, glimmerHMM,Gene annotation:,Blast,wublast,序列质量评估, FastQC: A quality control tool for high throughput sequence data, Java, Function:,35,36,37,38,39,40,QC PIPELINE,Raw reads,Format conversion,Filter low-quality reads,Trim low-quality ends,QC report,Analysis-ready reads,Pass,Fail,41,原始数据的质控过滤,Sequence level,Short sequences,Adaptor/primer,polyA | T region,Overall low-complexity sequence (Dust),Contamination/unwanted sequences,Ns (low quality ends),Quality level,Low quality base or region,目标:所有保留的都是高质量的,真正参与生物信息分析的数据。,42,CLEAN READS,去掉含有接头序列的,reads,;,当单端测序,read,中含有的,N,的含量超过该条,read,长度比例的,10%,时,去除此对,paired reads,;,当单端测序,read,中含有的低质量,(,低于,5),碱基数超过该条,read,长度比例的,50%,时,需要去除此对,paired reads,。,Reads,中不合格的碱 基判断标准:,reads,中出现,N,, 记个数,reads,中碱基质量分数低于,20,分, 记个数,去除的,reads,条件:,质,量不合格的碱基占,reads,长度的,10%,以 上(即,10bp,),没,有,3,接 头的,reads,5,接头污染的,reads,没,有插入判断的,reads,43,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 大学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!