资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,HRP223 2005,HRP 223 2005,Topic 9-,Working with Multiple Tables,Copyright 1999-2005 Leland Stanford Junior University.All rights reserved.,Warning:This presentation is protected by copyright law and international treaties.Unauthorized reproduction of this presentation,or any portion of it,may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.,Combining Data,(1),When you have data sets that need to be combined you can do six different things:,concatenating,interleaving,one to one table reading,one to one table matching,matched merging,Updating,These things can be done with data steps or with,proc,sql,.I use and recommend SQL but I will show you the SAS way so you can read other peoples code.,Combining Data,(2),Concatenating,*typical method;,data,all_dx,;,set,dxyr1 dxyr2;,run,;,*faster method;,proc,append,base,=dxyr1,data,=dxyr2;,run,;,common variable names,dataset names,Combining Data,(3),Interleaving,data,all_dx,;,set,dxyr1 dxyr2;,by,dxyear,;,run,;,Combining Data,(4),One-to-one Read or Merge,data,datetype,;,set,dxyr1;,set,cancer;,run,;,data,datetype,;,merge,dxyr1 cancer;,run,;,Combining Data,(5),Match Merge,data,new;,merge,pers_dat,can_dat,;,by,id;,run,;,Combining Data,(6),Updating,data,new;,update,birth_dx,theup,;,by,id;,run,;,Database Reality,Since the 1980s,the dominant model for databases has been“relational.”Relational databases use multiple tables to store data.Each of these tables,which look just like spreadsheet pages,stores related information.The primary goals in laying out a relational database are to remove redundancy and extra variables.This process is called,normalization,.,Normalization,In a normalized database there is always a variable(or set of variables)that can be used to link the tables,but other than that,the same information is not repeated in a table or across tables.,This buys you lots of space and speed!,A Normalized Database,In the Has_CA library:,ID is a key joining the master and cancer tables.,ID and,dxDate,make a joint key.,ID is a key joining the master and dude tables.,hName,is a key relating the treatment and hospital tables.,Getting Data Out of A RDB!,The dominant language for manipulating data in relational databases is called SQL,which is short for Structured Query Language.,Pronounced sequel or,Ess-Que-Ell,SAS has a well implemented version of SQL and I use it to combine my tables.,You can directly access other databases like Oracle,Sybase,MS Access and MS FoxPro.,PROC SQL,(1),The syntax is a bit different than the rest of SAS because it is SQL not SAS!,proc,sql,;,select,VARIABLES,from,LOCATION,where,TEST;,quit,;,A SAS step has lots of statements,each beginning with a key word and ending in a semicolon.A SQL has lots of,clauses,each beginning with a keyword but there is only one semicolon at the end of a long statement.,PROC SQL,(2),The syntax is confusing because it uses periods to separate libraries from tables and tables from variables.,While you,never write this,here is the underlying way to tell,proc,sql,how to reference a variable:,libname.table.variable,Another difference is that,proc SQL,does not use the keyword run.Rather,your SQL gets processed with every semicolon.SQL keeps running until you submit a,quit,statement.,Again the stylized representation,proc,sql,;,select,VARIABLES,from,LOCATION,where,TEST;,quit,;,Proc SQL Naming,Must be specified as,libname.table,Can be specified using,table.variable,or just,variable,proc,sql,;,select,dude.name,from,has_ca.dude,where,dude.id 2001;,quit,;,PROC SQL,(3),This is nice but you already know how to do that with a data step.Say you needed the names and cancer type of everyone who had a dreadful response to radiation.This requires you to link together information from two tables.This process is called,joining,two tables.,proc,sql,;,create,table dreadful,as,select,dude.name,response.icd9code,from,has_CA.dude,has_CA.response,where,response.res,=,10,and,dude.id=response.id;,quit,;,PROC SQL,(4),Here is the same thing rewritten with the short names:,proc,sql,;,create,table dreadful,as,select,name,icd9code,from,has_CA.dude,has_CA.response,where,res,=,10,and,dudes.id=response.id;,quit,;,PROC SQL,(5),You can abbreviate the names of the tables you are using:,proc,sql,;,create,table,dreadful,as,select,name,icd9code,from,has_CA.dude,as,d,has_CA.response,as,r,where,res,=,10,and,d.id=r.id;,quit,;,PROC SQL,(6),You can specify all variables in a table with*,proc,sql,;,select,dude.*,from,has_CA.dude,where,id,2001,;,quit,;,You can get unique values with,distinct,.,proc,sql,;,select,distinct,id,from,has_CA.dude,where,id,2001,;,quit,;,PROC SQL,(7),You can also get ordered data without using a sort procedure:,proc,sql,;,select,distinct,id,from,dude,where,id,10,;,quit,;,SQL to Process Groups,You can tell SQL to process groups of records for you.,This code tells you how
展开阅读全文