【病毒外文文献】2005 Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different s

上传人:工*** 文档编号:8954485 上传时间:2020-04-02 格式:PDF 页数:9 大小:498.21KB
返回 下载 相关 举报
【病毒外文文献】2005 Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different s_第1页
第1页 / 共9页
【病毒外文文献】2005 Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different s_第2页
第2页 / 共9页
【病毒外文文献】2005 Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different s_第3页
第3页 / 共9页
点击查看更多>>
资源描述
ORIGINAL PAPER Guang Wu Shaomin Yan Reasoning of spike glycoproteins being more vulnerable to mutations among 158 coronavirus proteins from different species Received 18 March 2004 Accepted 30 August 2004 Published online 9 December 2004 C211 Springer Verlag 2004 Abstract In this study we used the probabilistic models developed by us over the last several years to analyze 158 proteins from coronaviruses in order to determine which protein is more vulnerable to muta tions The results provide three lines of evidence sug gesting that the spike glycoprotein is di erent from the other coronavirus proteins 1 the spike glycoprotein is more sensitive to mutations this is the current state of the spike glycoprotein 2 the spike glycoprotein has undergone more mutations in the past this is the history of spike glycoprotein and 3 the spike gly coprotein has a bigger potential towards future mutations this is the future of spike glycoprotein Furthermore this study gives a clue on the species susceptibility regarding di erent proteins Keywords Coronavirus Protein Probability SARS Introduction With the occurrence of new cases of severe acute respi ratory syndrome SARS the prognosis of a possible return of SARS in the near future is coming true Also hypothesis that the new SARS cases could be somewhat di erent from the previous SARS cases in possible mu tated forms appears to be true Accumulating evidence shows that there are mutations in the SARS related coronavirus SARS CoV 1 2 which may lead to dif ficulties in diagnosis treatment and prevention The SARS CoV is an enveloped RNA virus Natu rally we would expect that the di erent components in human SARS CoV would have di erent sensitivities to mutation therefore it would minimize the di culties in identification of SARS CoV and facilitate diagnosis treatment and prevention of SARS if we could identify which component of human SARS CoV is most subject to mutations Doubtlessly we should not limit ourselves to sole SARS CoV not only because many species carry coronaviruses 3 4 but also more importantly because the coronavirus from civets is likely to be the source of SARS 5 Among various components in coronavirus we are more interested in the proteins because over the last several years we have developed three models to analyze the protein primary structure for a review see 6 including the proteins from SARS CoV 7 8 In gen eral our first model can classify a protein into the ran domly predictable and unpredictable portions and our findings demonstrate that the unpredictable portion is more sensitive to mutations than the predictable one Thus we can find which protein is more vulnerable to mutations by comparing the unpredictable portion with the predictable one among proteins So far the envelope protein hemagglutinin esterase precursor membrane glycoprotein nonstructural pro tein nucleocapsid protein spike glycoprotein replicase polyprotein and hypothetical proteins have been iden tified in coronavirus 9 12 These proteins have the following functions the hemagglutinin esterase is the major receptor determinant binding to sialic acid con taining receptors on the host cell and penetrating of virus genome into host cell cytoplasm by fusion of virus and host cell membranes Both the envelope and mem brane glycoproteins are components of the viral enve lope that play a central role in virus morphogenesis and assembly via its interactions with other viral proteins The nonstructural proteins mediate nuclear export of viral RNPs and bind RNA thereby inhibiting host ElectronicSupplementaryMaterialisavailableforthisarticleifyou access the article at http dx doi org 10 1007 s00894 004 0210 0 G Wu S Yan Computational Mutation Project DreamSciTech Consulting Co Ltd 301 Building 12 Nanyou A zone Jiannan Road Shenzhen Guangdong Province 518054 China E mail hongguanglishibahao Tel 86 755 22029353 J Mol Model 2005 11 8 16 DOI 10 1007 s00894 004 0210 0 mRNA translation and regulating viral pre mRNA splicing and translation The nucleocapsid protein is the major structural component of virons that associates with genomic RNA to form a helical nucleocapsid The replicase polyprotein is a multifunctional protein con taining the activities necessary for the transcription of negative stranded RNA leader RNA subgenomic mRNAs and progeny virion RNA as well as proteinases responsible for the cleavage of the polyprotein into functional products The spike glycoprotein is respon sible for both binding to receptors on host cells and for membrane fusion 13 21 Currently the sequences of 158 coronavirus proteins from di erent species have been documented Each protein must have its own specific sensitivity to muta tions otherwise the proteins would have the same ratio of mutations per amino acid sequences However such an expectation has yet been found it is therefore important to define which protein is more sensitive to mutations than the others The aim of the present study is to discover which protein is more sensitive to muta tions among 158 coronavirus proteins using the model developed by us over the last several years Materials and methods The amino acid sequences of 158 coronavirus proteins were obtained from the Swiss Prot databank 22 These proteins are grouped as envelope proteins hemaggluti nin esterase precursors membrane glycoproteins non structural proteins nucleocapsid proteins spike glycoproteins and others including replicase polyprotein and hypothetical proteins for details see Supplemen tary Material The detailed calculations of randomly predictable and unpredictable portions in proteins have already been published previously for a review see 6 The calculations governed by the simple permutation prin ciple 23 are described for the example of the spike glycoprotein from human SARS CoV which consists of 1 255 amino acids As we know that an amino acid pair in a protein is composed of any 20 kinds of amino acids so theoretically there are 400 possible types of amino acid pairs In terms of amino acid pairs distinguishing proteins is di erent either in the numbers of possible types of amino acid pairs or in the frequency of each type or both Randomly predictable present type of amino acid pair with predictable frequency There are 39 arginines R and 96 serines S in spike glycoprotein from human SARS CoV the random fre quency of the amino acid pair RS is 3 39 1 255 96 1 254 1 254 2 983 Actually we find three RS s in the spike glycoprotein so the type of RS is present and its frequency is 3 In such a case both the presence of type RS and its frequency are randomly predict able and the di erence between actual and predicted values is 0 Randomly predictable present type of amino acid pair with unpredictable frequency There are 84 alanines A in the spike glycoprotein from human SARS CoV The frequency of random presence of AA is 6 84 1 255 83 1 254 1 254 5 555 In fact AA appears ten times Thus the presence of type AA is randomly predictable but its frequency is randomly unpredictable and the di erence between ac tual and predicted values is 4 Randomly unpredictable present type of amino acid pair There are 11 tryptophans W in the spike glycoprotein from human SARS CoV the frequency of random presence of WR is 0 11 1 255 39 1 254 1 254 0 342 i e the type WR would not ap pear in the spike glycoprotein However WR appears once in reality so the presence of type WR is ran domly unpredictable Naturally its frequency is unpre dictable too and the di erence between actual and predicted values is 1 Randomly predictable absent type of amino acid pair The frequency of random presence of RW is 0 39 1 255 11 1 254 1 254 0 342 i e thetype RW would not appear in the spike glycoprotein which is true in the real situation This is the case that the absence of type RW with its frequency is randomly predictable and the di erence between actual and predicted values is 0 Randomly unpredictable absent type of amino acid pairs There are 99 threonines T in the spike glycoprotein the frequency of random presence of RT is 3 39 1 255 99 1 254 1 254 3 076 i e there would be three RT s in the spike glycoprotein However no RT is found therefore the absence of RT from the spike glycoprotein is randomly unpredictable Naturally its frequency is unpredictable too and the di erence be tween actual and predicted values is C03 Statistics With respect to actual and predicted values in a single protein the statistical inference is carried out as follows Generally each of 20 kinds of amino acids has a chance 9 of 1 20 p 0 05 to repeat once and a type of amino acid pair has the chance of 1 400 p 0 0025 to repeat once In case of the spike glycoprotein from human SARS CoV there are 99 Ts the most abundant amino acid and 11 Ws the least abundant amino acid If the first amino acid is T then the chance of the second amino acid to be T is 98 1 254 p 0 078 0 05 if the first amino acid is W then the chance of the second amino acid to be W is 10 1 254 p 0 008 0 01 Thus the chance of first TT is 99 1 255 98 1 254 p 0 0062 0 01 and the chance of second TT is 97 1 253 96 1 252 p 0 0059 0 01 If we consider the lowest occurring amino acids W the chance of first WW is 11 1 255 10 1 254 p 0 00007 0 001 and the chance of second WW is 9 1 253 8 1 252 p 0 00005 0 001 Clearly the probability is less than 0 05 if the di erence between actual and predicted values is equal to or larger than 1 With respect to the comparisons among proteins the statistical inference is conducted as follows All the data are examined by the Kolmogorov Smirnov test to determine their distribution properties For normal dis tributions the data are presented as mean SD For non normal distributions the data are presented as median with interquartile range Outliers are detected according to Healy s method 24 The one way ANO VA and the Friedman ANOVA rank tests are used for parametric and non parametric tests respectively fol lowed by comparison tests SigmaStat for Windows SPSS Inc 1992 2003 is used to perform all the statis tical tests and the p 0 05 is considered statistically significant Results After such calculations the amino acid pairs in a protein are classified into randomly predictable and unpredict able portions By comparing the percentages of pre dictable and unpredictable portions among di erent proteins we can find which protein has a larger unpre dictable portion than others Consequently this protein is more sensitive to mutations according to our previous studies 25 32 Figure 1 shows the predictable and unpredictable portions in coronavirus proteins This figure can be read as follows The length of each bar presents 100 which is located at both unpredictable and predictable sites separated by dotted line For example the unfilled bar in spike glycoprotein group presents the absent types which are composed of 19 70 randomly predictable portion with interquartile range from 16 67 to 26 89 right panel and 80 30 randomly unpredictable por tion with interquartile range from 73 11 to 83 33 left panel The statistical inference in Fig 1 as well as Fig 2 is conducted by using the ANOVA test to detect whether or not there is a di erence among di erent proteins in a panel followed by a comparison test For example regarding the absent type in Fig 1 at first we use the Friedman ANOVA rank test whether or not there is a di erence among di erent protein groups Taking three bars in Fig 1 into account the spike glycoproteins have a larger unpredictable portion than others These results suggest that the spike glycoprotein is more sensitive to mutations than other coronavirus proteins Although di erent proteins have di erent types of unpredictable absent amino acid pairs some types are absent from all members of a group of proteins For Percent of unpredictable predictable portions 100 75 50 25 0 25 50 75 100 Envelope proteins n 8 Hemagglutinin esterase precursors n 10 Membrane glycoproteins n 17 Nonstructural proteins n 38 Nucleocapsid proteins n 28 Spike glycoproteins n 27 Other proteins n 30 Absent type Present type Frequency of present type PredictableUnpredictable Fig 1 Predictable and unpredictable portions in coronavirus proteins The data are presented as median with interquartile range the predictable and unpredictable portions in spike glycoprotein group are statistically di erent from any other protein groups at p 0 05 level except for hemagglutinin esterase precursor group the predictable and unpredictable portions in spike glycoprotein group are statistically di erent from hemagglu tinin esterase precursor membrane protein and nucleocapsid protein groups at p 0 05 level C160 the predictable and unpredictable portions in spike glycoprotein group are statistically di erent from hemagglutinin esterase precursor and membrane protein groups at p Predicted valueActual value Predicted value Fig 2 Percent of unpredictable types and frequencies with respect to whether the actual value is larger or smaller than the predicted value in coronavirus proteins The data are presented as mean SD the percents of unpredictable types frequencies in spike glycoprotein group are statistically di erent from other protein groups at p 0 05 level the percents of unpredictable types in spike glycoprotein group are statistically di erent from any other protein groups at p 0 05 level except for hemagglutinin esterase precursor and nucleocapsid protein groups Table 1 Unpredictable absent amino acid pairs that disappear from a group of proteins Hemagglutinin esterase precursor Spike glycoprotein RA RD NQ DR CA CS QF IK LK FA FC FQ FP VK WI 11 has a largest scale for the horizontal axis Still we can see which species is more sensitive to mutations in each figure For instance the human spike glycoprotein is more sensitive to mutation in Fig 9 Discussion Without clearly identifying the source of SARS CoV its fast spreading process and its mutations the battle with SARS is unlikely to be finished soon therefore sooner or later we would expect to see new mutated forms of SARS CoV In such a case the determination of vul nerable proteins in SARS CoV is important and press ing The coronaviruses exhibit considerable serologic and sequence variation with the most extreme variability being within S genes 3 Variant spike glycoproteins 34 are now known to impact pathogenic outcome 15 35 37 This study provides three lines of evidence that sug gest that the spike glycoprotein is di erent from the others 1 the spike glycoprotein is more sensitive to mutations this is the current state of spike glycoprotein 2 the spike glycoprotein had experienced more muta tions in the past this is the history of spike glycoprotein and 3 the spike glycoprotein has a bigger potential towards future mutations this is the future of spike glycoprotein With respect to the first line of evidence the argu ment is that the randomly unpredictable portion is larger in spike glycoproteins than in others Fig 1 If we compare the unpredictable portion in spike glycopro teins with the proteins we have studied in the past columns I and II in Table 2 similar to the left panel in Fig 1 we find that the unpredictable portion of the present types is statistically larger in spike glycoproteins than in others and statistically similar in the unpre dictable portion of the present frequencies This suggests that the spike glycoprotein is not only more sensitive to mutations than other coronavirus proteins but also more sensitive than the proteins in Table 2 With respect to the second line of evidence we find that the spike glycoprotein has a larger percentage of unpredictable types and frequencies whose actual values are smaller than the predicted values in Fig 2 Actually 172 mutations have currently been documented in cor onavirus proteins of which 153 occur in spike glyco Difference between actual and predicted values in envelope proteins 2 1 0 1 2 3 Number of amino acid pairs 0 01 0 1 1 10 100 Human n 2 Canine n 1 Equine n 1 Feline n 1 Porcine n 2 Turkey n 1 Fig 4 Number of amino acid pairs in envelop proteins from di erent species with respect to the di erence between their actual and predicted values The data are presented as mean SD 4 3 2 1 0 Difference between actual and predicted values 0 Envelope proteins n 8 Hemagglutinin esterase precursors n 10 Membrane glycoproteins n 17 Nonstructural proteins n 38 Nucleocapsid proteins n 28 Spike glycoproteins n 27 Other proteins n 30 Unpredictable type Unpredictable frequency Actual value Predicted value 4321 Fig 3 Magnitude of di erence between actual and predicted values in coronavirus proteins The data are presented as mean SD indicates the di erence between actual and predicted values in spike glycoprotein group is statistically di erent from any other protein group at p 0 05 level indicates the di erence between actual and predicted values in spike glycoprotein group is statistically di erent from other protein groups at p 0 05 level except for envelope protein group 12 proteins This supports our argument that the spike glycoprotein has undergone more mutations in the past Moreover if we look at the nine proteins which have been documented with more mutations column IX in Table 2 we find that the percentage of unpredictable type in spike glycoproteins is statistically similar to the proteins in Table 2 columns III and IV in Table 2 similar to right panel in Fig 2 but the di erence regarding the percentage of unpredictable frequencies is statistical significant This suggests that the intensity of mutations in spike glycoproteins is weaker than the first nine proteins listed in Table 2 With respect to the third line of evidence we find that the di erence between actual and predicted values in spike glycoproteins is larger than in others Fig 3 Comparison with the first nine proteins in Table 2 columns V VI VII and VIII in Table 2 similar to Fig 3 shows that the di erence between actual and Difference between actual and predicted values in membrane glycoproteins 2 1 0 1 2 3 4 5 Number of amino acid pairs 0 1 1 10 100 Human n 3 Bovine n 1 Canine n 1 Equine n 1 Feline n 3 Murine n 2 Porcine n 2 Rat n 3 Turkey n 1 Fig 6 Number of amino acid pairs in membrane glycoproteins from di erent species with respect to the di erence between their actual and predicted values The data are presented as mean SD Difference between actual and predicted values in nonstructural proteins 2 101234 Number of amino acid pairs 0 1 1 10 100 Human n 2 Bovine n 7 Canine n 5 Equine n 2 Feline n 3 Murine n 6 Porcine n 6 Rat n 3 Turkey n 4 Fig 7 Number of amino acid pairs in nonstructural proteins from di erent species with respect to the di erence between their actual and predicted values The data are presented as mean SD Difference between actual and predicted values in hemagglutinin esterase precursor proteins 2 10123456 Number of amino acid pairs 0 1 1 10 100 Human n 1 Bovine n 4 Equine n 1 Murine n 3 Rat n 1 Fig 5 Number of amino acid pairs in hemagglutinin esterase precursor proteins from di erent species with respect to the di erence between their actual and predicted values The data are presented as mean SD 13 predicted values is statistically larger in spike glycopro teins regarding unpredictable types and is statistically smaller regarding unpredictable frequency This suggests that the spike glycoprotein still has more potential for mutations than the first nine proteins in Table 2 For the species susceptibility the vulnerability of species depends on the number of amino acid pairs with the largest di erence between actual and predicted val ues Figures 4 5 6 7 8 9 and 10 may at least partly highlight the species susceptibility For example why have so many mutations been found in the human spike glycoproteins Although it is obvious that an individual protein is di erent from the other proteins of a genome our results quantitatively and systematically determine the di er ence between the spike and other proteins by comparing their predictable and unpredictabl
展开阅读全文
相关资源
相关搜索

当前位置:首页 > 其他分类 > 大学论文


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!