Medical Statistics (full English class),Ji-Qian Fang,School of Public Health,Sun Yat-Sen University

Chapter 10,Statistical Analysis of Enumeration Data

10.1 Statistical Description for enumeration data

Absolute measure:,The numbers counted for each category (frequencies),The absolute measure can hardly be,used for comparison between,different populations.

1. Relative measure,Three kinds of relative measures:,Frequency (Proportion),Intensity (Rate),Ratio (1) Relative Frequency,Note: The Chinese text book is wrong!,It is not "rate"!,It is proportion or frequency!

Example 10-1(P.304, revised),Question: Which grade has the most serious condition of,myopias?

Prevalence rates describe :,P(Myopia,|First grade),P(Myopia,|Second grade),P(Myopia,|Third grade),Constitute among myopias describe:,P(,First grade,|,Myopia,),P(,Second grade,|,Myopia,),P(,Third grade,|,Myopia,),Which grade has the most serious condition of myopias?,Answer:,P(Myopia,|Third grade) = Maximum,-The third grade,has the highest prevalence of myopias,P(,Second grade,|,Myopia,)= Maximum,- Among the myopias, the absolute number of Second grade,students is the highest.

(2) Intensity,Example A smoking population had followed up for 562833 person-years, 346 lung cancer cases were found.,The incidence rate of lung cancer in the smoking population is :,Incidence rate =346/562833,=61.47 per 100,000 person-year

Example The mortality rate of liver cancer in Guangzhou is 32 per 100,000 per year.

In general,Denominator: Sum of the person-years observed in the period,Numerator: Total number of the event appearing in the period,Unit: person/person year, or 1/Year,Nature: the relative frequency per unit of time.

(3) Ratio,Ratio,is a number divided by another related number,Examples,Sex ratio of students in this class:,No. of males : No. of females = 52%,Coefficient of variation:,CV=SD/mean,Ratio of time spent per clinic visit:,Large hospital : Community health station,= 81.9 min. : 18.6 min. = 4.40 2. Caution in use of relative measures,a.The denominator should be big enough! Otherwise the absolute measure should be used.,Example: Out of 5 cases, 3 were cured 60% ?,b. Attention to the population where the relative measure comes from.,Mistake in the textbook (P.305) :,"Distinguish between constitutes and proportion" !?,We should say,"Distinguish between Prevalence rate and Constitute among patients",Prevalence rate: Population is the students in the,same grade,Constitutes: Population is all the patients

The above two frequency distributions reflect two,populations of all patients,;,To describe the prevalence rate, one has to look at the,general population,;

c. Pooled estimate of the frequency,Pooled estimate,=,numerators /,denominators,Example:,The prevalence of myopia among 3 grades, (15.16+15.89+18.37)/3,The prevalence of myopia among 3 grades,= (67+68+56)/(442+428+305),= 192/1175,= 16.34,d. Comparability between frequencies or between frequency distributions Notice the balance of other conditions Comparability between frequencies or between frequency distributions Notice the balance of other conditions,c. Pooled estimate of the freq,18,e. If the distributions of other variables are different, to improve the comparability, “Standardization” is needed.,f. To compare two samples, hypothesis test is needed. (See Chi square test),The following will emphasize the above two points:,Standardization,Hypothesis test,e. If the distributions of oth,19,3. Standardization for crude frequency or crude intensity,Crude incidence rate of city A=28.96;,Crude incidence rate of city B=35.03,- Strange!?,They are not comparable !,- Because the constitute are quite different,Table 10-3 Incidence rates of infectious diseases, children of two cities,3. 1. Sampling error of frequency,Example Suppose the death rate is 0.2, if the rats,are fed with a kind of poison.,What will happen when we do the experiment on,n,=1, 2, 3 or 4 rat(s)?

In general,Supposed the population proportion is, sample size =,n,The frequency is a random variable,When,is unknown and,n,is big enough, is approximately equal to

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive.

If the sample size,n,is big enough, and observed frequency is,p, then we have approximately

2. Confidence Interval of Probability,If the sample size n is big enough,and observed frequency is,p, then,95% Confidence interval,99% Confidence interval

Example 10-5 HBV Surface antigen. 200 people were tested, 7 positive. 3. The hypothesis testing of proportion (u test),1. Comparison of sample proportion and population proportion,Example 10.6 Cerebral infarction,Cases Cure rate,New Method 98 50%,Routine 30%

Statistic,u,Decision rule,If , then reject,Otherwise, no reason to reject (accept ),Since , reject

2. Comparison of two sample proportions,Example 10.7,Carrier rate of Hepatitis B,City: 522people were tested, 24 carriers, 4.06% (population carrier rate:,1,),Countryside: 478people were tested, 33 carriers, 6.90% (population carrier rate:,2,)

Pooled estimate,Standard error of,P,1,-,P,2

Statistic,u,Decision rule,If , then reject,Otherwise, no reason to reject (accept ),Since , not reject

Summary,The parameter estimation and hypothesis testing of proportion are based on the normal approximation (when sample size is big enough),How big is enough?,By experience,n, 5 and,n,(,1-,),5,If the sample,size is not big,u,test cant be used and there is no,t,-test for proportion. (see more detailed text book)

10.3 Chi-square test The,u,test,can only be used for,comparing,with a given,0,(one sample),or comparing,1,with,2,(two samples).,If we need to compare more than,two samples,Chi-square test,is widely,used.

1. Basic idea of,2,test,Given a set of observed frequency distribution,A,1,A,2,A,3,to test whether the data follow certain theory.,If the theory is true, then we will have a set,of theoretical frequency distribution:,T,1,T,2,T,3,Comparing,A,1,A,2,A,3, and,T,1,T,2,T,3,If they are quite different, then the theory might not be true;,Otherwise, the theory is acceptable. (see more detailed text book),SummaryThe parameter estimatio,38,10.3 Chi-square test,10.3 Chi-square test,39,The,u,test,can only be used for,comparing,with a given,0,(one sample),or comparing,1,with,2,(two samples).,If we need to compare more than,two samples,Chi-square test,is widely,used.,The u test can only be u,40,1. Basic idea of,2,test,Given a set of observed frequency distribution,A,1,A,2,A,3,to test whether the data follow certain theory.,If the theory is true, then we will have a set,of theoretical frequency distribution:,T,1,T,2,T,3,Comparing,A,1,A,2,A,3, and,T,1,T,2,T,3,If they are quite different, then the theory might not be true;,Otherwise, the theory is acceptable.,1. Basic idea of 2 testGiven,41,Example10-8,Acute,lower respiratory infection,Treatment,Effect,Non-effect,Total,Effect rate,Drug A,68(64.82),a,6(9.18),b,74 (,a,+,b,),91.89 %,Drug B,52(55.18),c,11(7.82),d,63(,c,+,d,),82.54 %,Total,120 (,a,+,c,),17 (,b,+,d,),137,53.59 %,(2) Chi-square test for 2,2,table,H,0,:,1,=,2,H,1,:,1,2,=0.05,To calculate the theoretical frequencies,If,H,0,is true,1,=,2,120/137,T,11,=74120/137 =64.82,T,21,=63120/137=55.18,T,12,=7417/137 =9.18,T,22,=6317/137=7.82,Example10-8 Acute lower respi,42,To compare A and T by a statistic,2,If,H,0,is true,2,follows a chi-square,distribution.,=(row-1)(column-1),If the,2,value is big enough, we doubt,about,H,0, then reject,H,0,!,医学医学统计学6Chisquaretest课件,43,To Example10-8,=(row-1)(column-1)=(2-1)(2-1)=1,2,0.05(1),=3.84,Now,2,=2.7340.05,H,0,is not rejected.,We have no reason to say the effects of two treatments are different.,To Example10-8 ,44,For,2,2,table, there is a specific formula of chi-square calculation:,To Example10-8,For 22 table, there is a sp,45,Large sample is required,(1),N,40,T,i,5,N,40,(2)If,n, 40 or,T,i, 1,2,test is not applicable,(3)If,N,40,1,T,i,40),Otherwise, needs adjustment,If,the,2,value is,too big, then reject,H,0,H0: 1=2, H1: 12, =0.05,50,Example10-10,:,=1,4.92,3.84,P,0.05,H,0,is rejected,Conclusion: There is significant difference in positive rates between the two diagnosis methods.,Since,P,A,2, “,H,0,is rejected”only means there is difference among some groups. If the above requirements are violated, what should we do?(1) Increase the sample size.(2) Re-organize the categories, Pool some categories, or Cancel some categories,Think: In fact, it is not appropriate to use a Chi-square test for Example 10-10 in the textbook. Why? 