Analysis of Overdispersed Data in SASJessica Harwood, M.S.Statistician, Center f

Analysis of Overdispersed Data in SASJessica Harwood, M.S.Statistician, Center f

Wolf, Gary, Contributing Editor has reference to this Academic Journal, PHwiki organized this Journal Analysis of Overdispersed Data in SASJessica Harwood, M.S.Statistician, Center as long as Community HealthJHarwood@mednet.ucla.eduOutline – OverdispersionDefinition, background, in addition to causes of overdispersionConsequences of ignoring overdispersionAccounting as long as overdispersion in regression analysis in SASFor count dataFor binary dataConcluding remarksJessica Harwood CHIPTS Methods Seminar 1/8/20132Overdispersed dataAlso known as extra variationArises when count or binary data exhibit variances larger than those assumed under the Poisson or binomial distributions3Jessica Harwood CHIPTS Methods Seminar 1/8/2013

This Particular University is Related to this Particular Journal

Count dataDefinition: non-negative integer values {0, 1, 2, 3, } arising from counting rather than rankingExample: the number of days a student is absent in one school yearCommonly analyzed using Poisson distribution, e.g., Poisson regressionJessica Harwood CHIPTS Methods Seminar 1/8/20134Poisson DistributionPoisson: number of occurrences of a r in addition to om event in an interval of time or space. Poisson regression IRR (relative risk)Natural model as long as count dataDisadvantage – strong assumption: variance = meanOverdispersion: variance > mean Jessica Harwood CHIPTS Methods Seminar 1/8/20135Binary: 0 or 1Example: ever tested as long as HIV (1) or not (0)Grouped binaryExample: proportion tested as long as HIVCommonly analyzed using binomial distribution, e.g., logistic regression Binary: tested-HIV Grouped: num-tested-HIV/num-subjectsBinary DataJessica Harwood CHIPTS Methods Seminar 1/8/20136

Binomial distributionBinomial: the number of successes in a sequence of r in addition to om processes that results in one of two mutually exclusive outcomesOverdispersion: variance larger than that assumed under the binomial distributionJessica Harwood CHIPTS Methods Seminar 1/8/20137Causes of OverdispersionObserved data rarely follow statistical distributions exactlyThe variance of count variables tends to increase with the size of the countsCorrelated (ex: clustered) dataHeterogeneity among observations Large number of 0sJessica Harwood CHIPTS Methods Seminar 1/8/20138Consequences of Ignoring OverdispersionJessica Harwood CHIPTS Methods Seminar 1/8/20139

Checking as long as Overdispersion in SAS  Count DataPROC MEANSvariance > mean PROC GENMODdist=negbin dispersion parameter significantJessica Harwood CHIPTS Methods Seminar 1/8/201310Example  Count Data Differences in baseline depression between intervention conditions in a RCTIndependent variable: INTV – intervention condition1 = R in addition to omized to intervention condition0 = R in addition to omized to control conditionDependent variable: EPDS – Edinburgh Postnatal Depression Scale; weighted count of depressive symptoms felt in past weekJessica Harwood CHIPTS Methods Seminar 1/8/201311HISTOGRAM OF EPDSExample  Count DataEPDS Score (range 0-30)0248101214Percent6Jessica Harwood CHIPTS Methods Seminar 1/8/201312

Example  Count Data Check mean in addition to variance as long as overdispersionMean in addition to variance;proc means data=base mean var; var EPDS; run;Conditional mean in addition to variance;proc means data=base mean var; var EPDS; class INTV; run;–SAS Code SAS Output–Jessica Harwood CHIPTS Methods Seminar 1/8/201313Poisson regression  ignore overdispersion;proc genmod data = base; model EPDS = INTV / dist=poisson; run;Negative binomial regression  account as long as overdispersion;proc genmod data = base; model EPDS = INTV / dist=negbin; run;Example  Count Data SAS regression analysisJessica Harwood CHIPTS Methods Seminar 1/8/201314Dispersion parameter significantly different from zero (see 95% CI): Indicates significant over- (> 0) or under- (< 0) dispersionUse negative binomial rather than PoissonExample  Count Data Check as long as overdispersion: negative binomial regression in PROC GENMODJessica Harwood CHIPTS Methods Seminar 1/8/201315 Example  Count DataResultsP-values quite differentDifferent conclusions regarding similarity of intervention conditions at baselineJessica Harwood CHIPTS Methods Seminar 1/8/201316Accounting as long as overdispersion in SAS: count dataNegative binomial Variance-adjustment modelsQuasi-likelihood EstimationEmpirical (aka robust, s in addition to wich) variance estimation Models as long as correlated dataZero-inflated modelsJessica Harwood CHIPTS Methods Seminar 1/8/201317Negative binomial (NB) Negative binomial distribution: variance is larger than the mean excellent model as long as overdispersed count dataNegative binomial regression relative riskDisadvantage: estimating extra parameter (dispersion)PROC GENMOD PROC COUNTREGJessica Harwood CHIPTS Methods Seminar 1/8/201318 SAS code: negative binomial regressionproc genmod data = base; model EPDS = INTV / dist=negbin; run;proc countreg data = base; model EPDS = INTV / dist=negbin; run;Jessica Harwood CHIPTS Methods Seminar 1/8/201319SAS output: negative binomial regressionCompare to Poisson regression: INTV: Estimate=-0.0974; SE=0.0218; P<.0001 Jessica Harwood CHIPTS Methods Seminar 1/8/201320SAS output: negative binomial regressionNB: Variance > mean Variance = mean + k mean2 SAS estimate of dispersion parameter k: Dispersion, -AlphaIf k significantly different from zero use NB rather than PoissonJessica Harwood CHIPTS Methods Seminar 1/8/201321

Count data: Quasi-likelihood Estimation (QLE)QLE allows as long as adjusting variance without specifying distribution exactlyVariances inflated by Deviance/DOF (GENMOD: dscale)Pearsons Chi-Square/DOF GENMOD: pscaleGLIMMIX: r in addition to om -residual-Poisson in addition to negative binomial regression ( in addition to logistic regression)Jessica Harwood CHIPTS Methods Seminar 1/8/201322QLE  Example – SAS Code Use dscale as the norm!Poisson regression- no adjustment as long as overdispersion;proc genmod data=base;model EPDS = INTV/ dist=poisson;run;Poisson regression- adjust as long as overdispersion using DSCALE;proc genmod data=base;model EPDS = INTV/ dist=poisson dscale;run;Jessica Harwood CHIPTS Methods Seminar 1/8/201323QLE- st in addition to ard errors (SE) correctedJessica Harwood CHIPTS Methods Seminar 1/8/201324

QLE- Poisson vs. NBJessica Harwood CHIPTS Methods Seminar 1/8/201325proc genmod data=base;model EPSD=INTV / dist=poisson pscale;run;proc glimmix data=base; model EPSD=INTV / dist=poisson s;r in addition to om -residual-;run;QLE  PSCALE- SAS CodeJessica Harwood CHIPTS Methods Seminar 1/8/201326Count Data  QLE  In SumUse as the norm, in Poisson or NBDSCALE better than PSCALE, especially as long as low countsJessica Harwood CHIPTS Methods Seminar 1/8/201327

Count Data: Empirical Variance EstimationEmpirical (or robust or s in addition to wich) variance estimation  account as long as extra variation by using both empirical-based estimates in addition to model-based estimates in variance estimation Poisson in addition to NB regression ( in addition to logistic regression) GENMOD: REPEATED statementGLIMMIX: EMPIRICAL option Jessica Harwood CHIPTS Methods Seminar 1/8/201328Empirical Variance Estimation  GENMOD REPEATEDPID = Participant ID, 1 observation per PID;proc genmod data=base; class PID; model EPDS=INTV / dist=poisson; repeated subject = PID; run;Jessica Harwood CHIPTS Methods Seminar 1/8/201329Compare to unadjusted Poisson regression: INTV: Estimate=-0.0974; SE=0.0218; P<.0001 Empirical Variance Estimation  GLIMMIX EMPIRICALPID = Participant ID 1 observation per PID. MBN is small-sample bias correction;proc glimmix data=base empirical=mbn; class PID; model EPDS=INTV / dist=poisson s; r in addition to om -residual- /subject = PID; run;Jessica Harwood CHIPTS Methods Seminar 1/8/201330 Thank you very much!QuestionsJessica Harwood CHIPTS Methods Seminar 1/8/201361JHarwood@mednet.ucla.edu

Wolf, Gary Contributing Editor

Wolf, Gary is from United States and they belong to Wired Magazine and they are from  San Francisco, United States got related to this Particular Journal. and Wolf, Gary deal with the subjects like Computers; Information Technology Industry

Journal Ratings by Messiah College

This Particular Journal got reviewed and rated by Messiah College and short form of this particular Institution is PA and gave this Journal an Excellent Rating.