hg19-(GRCh37)-与-hg38-(GRCh38)-数据差异比较

上传人:su****e 文档编号:244936113 上传时间:2024-10-06 格式:PPTX 页数:31 大小:2.18MB
返回 下载 相关 举报
hg19-(GRCh37)-与-hg38-(GRCh38)-数据差异比较_第1页
第1页 / 共31页
hg19-(GRCh37)-与-hg38-(GRCh38)-数据差异比较_第2页
第2页 / 共31页
hg19-(GRCh37)-与-hg38-(GRCh38)-数据差异比较_第3页
第3页 / 共31页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,9/29/2017,#,h,g19(GRCh37),vs.,hg38(GRCh38),Human Genome Reference,Comparison,Zuotian Tatum,Department of Human Genetics,Leiden University Medical Center,Timeline,GRCh37:,First release:,Feb 27,2009,Latest patch:,Jun 28,2013(p13),GRCh38:,First release:,Dec 24,2013,Latest patch:,Oct 14,2014(p1),http:/www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/,Content,GRCh37.p13:,Total bases:,3.23 Billion,2.99 Billion(without N),N50:,46 Million,Number of,alternative loci:,9,Non-nuclear genome:,No,GRCh38.p2:,Total bases:,3.21 Billion,3.05 Billion(without N),N50:,67 Million,Number of alternative loci:,261,Non-nuclear genome:,Yes,http:/www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/data/,UCSC tracks for GRCh38,UCSC RefSeq available since April 2014.,Ensembl regulatory build available since September 2014.,dbSNP 141 available since October 2014.,ENCODE and FANTOM5 track hubs are still not available(Nov 2014).,New in GRCh38 release,Three new sequence files,in addition to the standard assembly files:,-,GCA_000001405.15_GRCh38_top-level.fna.gz,-,GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz,-GCA_000001405.15_GRCh38_full_analysis_set.fna.gz,The analysis set files are created to avoid false mapping in NGS alignment pipelines.,GCA_000001405.15_GRCh38_top-level.fna.gz,A,ll,the top-level objects in the,full-assembly,Chromosomes,unlocalized scaffolds,unplaced scaffolds,alternate,locus,scaffolds,mitochondrial genome,The sequence,identifiers are International Sequence Database,Collaboration(INSDC,)accession.versions and the definition lines are GenBank style,.,No sequences have been hard-masked.,GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz,C,hromosomes,from the GRCh38 Primary Assembly unit.,Note,:the two PAR regions on chrY have been hard-masked with Ns.,The chromosome,Y sequence provided therefore has the same coordinates,as the,GenBank sequence but it is not identical to the GenBank,sequence.Similarly,duplicate copies of centromeric arrays and WGS on,chromosomes,5,14,19,21&22 have been hard-masked with Ns,.,M,itochondrial,genome from the GRCh38 non-nuclear assembly unit,.,U,nlocalized,scaffolds from the GRCh38 Primary Assembly unit,.,U,nplaced,scaffolds from the GRCh38 Primary Assembly unit,.,Epstein-Barr,virus(EBV),sequence,Note,:The EBV sequence is not part of the genome assembly but is,included,in the analysis set as a sink for alignment of reads,that are,often present in sequencing samples.,GCA_000001405.15_GRCh38_full_analysis_set.fna.gz,=,GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz,+,alt-scaffolds,from the GRCh38 ALT_REF_LOCI_*assembly units,Alt-loci add complexity to RNASeq quantification,Ideogram of GRCh38.p2,RNASeq quantification,-Fragments(reads)per million per killobase(FPKM/RPKM)values to quantify gene expression,-Unique mapping only,Analysis tools do,not distinguish allelic duplication from paralogous duplication,-Non overlapping gene regions,To understand the effect of alt-loci on RNASeq quantification,Compare alignment of chromosome 6 MHC region between,-hg19 full set with 7 alt-loci,-hg38 analysis set without alt-loci,Sequence content are largely unchanged between hg19 and hg38.,Mapping/alignment for RNASeq,hg19,hg38,mapped,14,655,299,14,704,427,mappedDiffChr,4,959,4,017,mappedPairProper,14,639,261,14,690,090,mappedPairProperPct,92.62,92.94,total,15,805,561,15,805,561,totalSplice,5,060,829,5,078,133,unmapped,1,150,262,1,101,134,hg19:with alt loci,hg38:without alt loci,Effect of alt loci in RNASeq alignments,Gene RPKM(hg38),Distribution of RPKM difference,Major Histocompatibility complex region on chromosome 6,HLA-A,hg19 full set chr6,D1,hg19 full set chr6_mann_hap4,D1,hg19 full set chr6_qb1_hap6,D1,hg19 full set chr6_dbb_hap3,D1,HLA-A,hg19 full set chr6,hg38 analysis set,D1,D2,D3,D1,D2,D3,HLA-C,hg19 full set,D1,D2,D3,hg38 analysis set,D1,D2,D3,HLA-DRA,hg19 full set,D1,D2,D3,hg38 analysis set,D1,D2,D3,Major Histocompatibility complex region on chromosome 6,Class III,MHC Class III,700kb stretch,60 genes.,The most gene-dense region of the human genome,14%coding,72%transcribed,Highly conserved,Only a free have clearly defined and proven function,TNF,hg19 full set chr6,D1.control,D1.treated,hg38 analysis set chr6,D1.control,D1.treated,Highly variant immune regions,retiled,LILRA3 moved to alt-loci in hg38,hg19,hg38,LILRB2LILRA3LILRA5,LILRB2 LILRA5,Phantom LILRA3,LILRA3 in hg19,Intergenic,LILRB3,LILRA4,LILRB5,Gene length calculation,We need gene length for calculating RPKM.,If alignment uses alt loci,RPKM would be artificially lowered for alt loci genes.,If alignment does not alt loci,Remove alt loci annotations from the official set.,Need more comprehensive approach to genome variation.,Assembly,model is neither haploid nor diploid,Analysis tools penalize reads mapping to 1 location,do not distinguish allelic duplication from paralogous duplication,A graph structure is a
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 大学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!