SemiparametricCensoredRegressionModels
KennethY.ChayandJamesL.Powell
A
regressionmodeliscensoredwhentherecordeddataonthedependentvariablecutsoffoutsideacertainrangewithmultipleobservationsattheendpointsofthatrange.Whenthedataarecensored,variationinthe
observeddependentvariablewillunderstatetheeffectoftheregressorsonthe“true”dependentvariable.Asaresult,standardordinaryleastsquaresregressionusingcensoreddatawilltypicallyresultincoefficientestimatesthatarebiasedtowardzero.
Traditionalstatisticalanalysisusesmaximumlikelihoodorrelatedprocedurestodealwiththeproblemofcensoreddata.However,thevalidityofsuchmethodsrequirescorrectspecificationoftheerrordistribution,whichcanbeproblematicinpractice.Inthepasttwodecades,anumberofsemiparametricalternativesfordealingwithcensoreddatahavebeenproposed.Inasemiparametricapproach,partofthefunctionalformofthemodel—usuallytheregressionfunction—isparametricallyspecifiedbytheresearcherbaseduponplausibleassumptions,whiletherestofthemodelisnotparameterized.1Whilethetheoreticalliteraturehasproducedseveralsemiparametricestimatorsforthecensoreddatamodel,pub-lishedapplicationsoftheseestimatorstoempiricalproblemsineconomicshavelaggedfarbehind.
Thispaperreviewstheintuitionandcomputationofahandfulofsemipara-metricestimatorsproposedforthecensoredregressionmodel.Thevariousesti-matorsareusedtoexaminechangesinblack-whiteearningsinequalityduringthe
1Inthisissue,thepaperbyDiNardoandTobiasoffersfurtherdiscussionofnonparametricandsemiparametricanalysis.
yKennethY.ChayisAssistantProfessorofEconomicsandJamesL.PowellisProfessorof
Economics,UniversityofCalifornia,Berkeley,California.
30JournalofEconomicPerspectives
1960s,aroundthetimeofthepassageoftheCivilRightsActof1964,basedonlongitudinalSocialSecurityAdministration(SSA)earningsrecords.Theseearningsrecordsarecensoredatthetaxablemaximum;thatis,anyoneearningmorethanthemaximumthatwastaxableunderSocialSecurityisrecordedashavingearnedatthemaximum.Thus,abovethemaximum,thedataonearningsdonotaccu-ratelyreflectactualearnings.Ordinaryleastsquaresanalysisofthesedataimplieslittleconvergenceintheearningsofblackandwhiteworkersduringthe1960s.Ontheotherhand,theestimatesfromthesemiparametricmodelsthataccountforcensoringsuggestthatsignificantblack-whiteearningsconvergencedidoccurafter1964.Comparisonsoftheresultsfromparametricandsemiparametricprocedureshelppinpointsourcesofmisspecificationintheparametricapproach.
CensoredRegressionModelsandEstimators
TheSocialSecurityAdministrationdatasetthatweanalyzesuffersfromthesimplestformofdatacensoring,intervalcensoring,forwhichthevaluesofthe“true”dependentvariable,y*,areobservedonlyiftheyfallwithinsomeknown,oftenone-sided,interval[a,b].Otherwise,theclosestendpointoftheintervalisob-servedinsteadofy*.Tobin(1958)usedthismodeltoanalyzeconsumerexpendi-turesonautomobiles,withaϭ0andbϭϱ,andeconomistsgenerallyrefertoregressionmodelswithnonnegativityconstraintsasTobitmodels.Othertypicalapplicationsofthesecensoredregressionmodelsaretoright-censoreddata,whereaϭϪϱandbrepresentsamaximumrecordablevalueforthedependentvariable.Suchmodelsarisefortop-codeddata,wheresufficientlylargevaluesofthetruevariabley*arerecordedas“atleastequaltob.”Inourprimaryempiricalapplica-tion,thedependentvariable,thelogarithmofannualearnings,is“top-coded,”orcensoredfromabove,withbequaltothelogarithmofthemaximumannualearningssubjecttoSocialSecuritytaxesinagivenyear.
Algebraically,themodelfortheobserveddependentvariableyunderintervalcensoringis
yϭ
ͭaifxЈϩϽa,bifxЈϩϾb,xЈϩotherwise,
whereyistheobservedvalueofthedependentvariable,xisavectorofobservedexplanatoryvariables,isavectorofunknownregressioncoefficientstobeestimated,isanunobservederrorterm,andaandbarethecensoringintervalendpoints.Whilethetruedependentvariabley*satisfiesastandardlinearregres-sionmodel,theobservedvariableyclearlydoesnotwheny*liesoutside[a,b].Becauseydoesnotvarywiththeregressorsxwhenitiscensored(unlikethetrue
KennethY.ChayandJamesL.Powell31
variabley*),standardleastsquaresregressionwillunderestimatethemagnitudeoftheregressionslopecoefficients.
Ifthedistributionoftheerrortermsgiventheregressorshasaknownparametricform—forexample,normallydistributedandhomoskedasticerrors—itisstraightforwardtoderiveandmaximizethelikelihoodfunction.Thisprovidesaconsistentandapproximatelynormalestimatoroftheregressioncoefficients(see,forexample,Amemiya,1985,chapter10).However,inmanyempiricalproblems,thedistributionoftheerrorsisnotknownorissubjecttoheteroskedas-ticityofunknownform.Insuchcases,themaximumlikelihoodestimatorwillnotprovideaconsistentestimate(Goldberger,1983;ArabmazarandSchmidt,1981,1982).Also,forcensoredpaneldatawithfixedeffects—thatis,censoreddatawithrepeatedobservationsonindividualsovertimeandintercepttermsthatareallowedtovaryfreelyacrossindividuals—maximumlikelihoodestimationmethodswillgenerallybeinconsistentevenwhentheparametricformoftheconditionalerrordistributioniscorrectlyspecified(Honore´,1992).
Thus,itisimportanttodevelopestimationmethodsthatprovideconsistentestimatesforcensoreddataevenwhentheerrordistributionisnonnormalorheteroskedastic.Here,wefocusondescribingthreeparticularsemiparametricestimatorsforthecensoredregressionmodel,withacronymsCLAD,SCLSandICLAD.Allthreeestimatorscanbecomputedbyalternatingbetweena“recensor-ing”step,inwhichthedataare“trimmed”(usingthecurrentparameterestimates)tocompensateforthecensoringproblem,anda“regression”stepusingthetrimmeddatatoobtaincoefficientestimates.MorecompletealgebraicderivationsanddiscussionsofthevariousalternativesareavailableinPowell(1994,section5.3).Furtherdetailsonlarge-samplepropertiesandstandarderrorformulaecanbefoundinthecitedreferences.
Thecensoredleastabsolutedeviations(CLAD)estimationmethodwasproposedbyPowell(1984).Forthelinearmodel,themethodofleastabsolutedeviationsobtainsregressioncoefficientestimatesbyminimizingthesumofabsoluteresidu-als.Itisageneralizationofthesamplemediantotheregressioncontextjustasleastsquaresisageneralizationofthesamplemeantothelinearmodel.Ifthetruedependentvariabley*wereobserved,thenitsmedianwouldbetheregressionfunctionxЈundertheconditionthattheerrorshaveazeromedian.Leastabsolutedeviationscouldthenbeusedtoestimatetheunknowncoefficients.
Whenthedependentvariableyiscensored,itsmedianisunaffectedbythecensoringiftheregressionfunctionxЈisintheuncensoredregion(thatis,ifxЈisintheinterval[a,b]).However,iftheregressionfunctionxЈisbelowthelowerthresholda(orabovetheupperthresholdb),thenmorethan50percentofthedistributionwill“pileup”ata(orb).Inthiscase,themedianofyisthatintervalendpoint,whichdoesnotdependonxЈ.Thus,computationoftheCLADesti-matoralternatesbetweendeletingobservationswithestimatesoftheregressionfunctionxЈthatareoutsidetheuncensoredregion[a,b](the“recensoring”step)andestimatingtheregressioncoefficientsbyapplyingleastabsolutedeviationsto
32JournalofEconomicPerspectives
theremainingobservations(the“regression”step),asdescribedbyBuchinsky(1994).2Thesymmetricallycensoredleastsquares(SCLS)estimationmethod,proposedbyPowell(1986b),isbasedona“symmetrictrimming”idea.Forsimplicity,supposethataϭϪϱ,sothatthedataare“top-coded”atb(asinourempiricalapplication),andassumethatthetruedependentvariabley*issymmetricallydistributedaroundtheregressionfunctionxЈ.Duetothecensoring,theobserveddependentvari-ableyhasanasymmetricdistribution,sinceitsuppertailis“piledup”atthecensoringpointb.ThissituationisillustratedinFigure1.However,symmetrycanberestoredby“symmetricallycensoring”thedependentvariableyfrombelowatthepoint2xЈϪb.Nowtheregressionfunctionisequidistantfrombothcensor-ingpoints.Sincethisnew“recensored”dependentvariableissymmetricallydistrib-utedaroundtheregressionfunction,theregressioncoefficientscanbeestimatedbyleastsquares.Iteratingbetweenthis“symmetriccensoring”ofthedependentvariableusingthecurrentestimates(whichdropsobservationswithvaluesoftheregressionfunctionaboveb)andleastsquaresestimationoftheregressioncoeffi-cientsusingthe“symmetricallytrimmed”datayieldstheSCLSestimator.
Finally,theidenticallycensoredleastabsolutedeviations(ICLAD)andidenticallycensoredleastsquares(ICLS)estimationmethodswereproposedbyHonore´andPowell(1994).Themotivationfortheseestimatorsissimilartothe“symmetrictrimming”ideausedtoderivetheSCLSestimator,butinvolvesrecensoringthedependentvariableforpairsofobservationssothattheirdensityfunctionshavethesameshape.Supposethatthedependentvariablesfortwoobservations,y1andy2,arecensoredfromaboveatb,asdepictedinFigure2.Whiletheshapeofthedensitiesforthesetwoobservationswouldbethesameintheabsenceofcensoring,thecensoreddensitieshavedifferentshapes.Also,thedistancesfromtheregres-sionfunctions,xЈ1andxЈ2,tothecensoringpointbaredifferent.
However,thesecondobservationy2,whichhasthesmallerregressionfunctionxЈ2,canbeartificially“recensored”atthepointxЈ2ϪxЈ1ϩbϵ⌬xЈϩb.Theresulting“identicallycensored”densityfory2willhavethesameshapeasthedensityfory1.Further,thedifferencebetweenthetwoidenticallycensoredvari-ableswillbesymmetricallydistributedaroundthedifferenceintheirregressionfunctions,⌬xЈ.Asaresult,theregressioncoefficientscanbeestimatedbyfindingthevalueofthatminimizesthesumofabsolute(ICLAD)orsquared(ICLS)differencesofthe“identicallycensored”residualsacrossalldistinctpairsofobser-vations.Aswiththeestimatorsdiscussedabove,theICLSandICLADestimatorscan
2Quantileregression,asdiscussedbyKoenkerandHallockinthisissue,isbasedonaweightedversionoftheleastabsolutedeviationsapproach(KoenkerandBassett,1978).Indeed,itisstraightforwardtoextendtheCLADapproachtoanalyzethecaseofcensoredquantileregression(CRQ)estimation,asproposedbyPowell(1986a).
SemiparametricCensoredRegressionModels33
Figure1
Densityofyand“SymmetricallyCensored”Density
Figure2
Densitiesofy1andy2and“IdenticallyCensored”Densities
becalculatedbyrepeatedapplicationoflinearleastsquaresorleastabsolutedeviationsregressionprograms.3Honore´(1992)originallyproposedtheconceptbehindtheICLADandICLSestimatorsforcensoredpaneldatawithindividual-specificintercepts(alsoknownas“fixedeffects”).Insteadofidenticallycensoringobservationsforpairsofindi-viduals,theapproachcanbeappliedtopairsofobservationsacrosstimeperiodsfor
3Intheempiricalapplication,wefocusontheleastabsolutedeviationsversionoftheestimator—ICLAD—ratherthantheleastsquaresversion(ICLS),sincesimulationevidencesuggeststhatitperformsbetterinsmallsamples(Honore´andPowell,1994).
34JournalofEconomicPerspectives
eachindividual.Differencingtheidenticallycensoredobservationsforafixedindividualeliminatesthefixedeffect,justastimedifferenceseliminatefixedeffectsinthestandardlinearpaneldatamodel.Infact,thisapproachyieldsconsistentestimatesevenwhencorrectlyspecifiedmaximumlikelihoodwouldnot.
Eachoftheseestimationproceduresimposesaparticularassumptionontheunderlyingerrordistribution.TheSCLSestimatorisbasedontheassumptionthattheerrortermsaresymmetricallydistributedaroundzero,whichimpliesthattheirmedian(andmean)iszero.Whilecompatiblewiththetraditionalassumptionofnormallydistributedandhomoskedasticerrors,thesymmetryassumptionislessrestrictiveandprovidesconsistentestimateswhentheseparametricconditionsfailtohold.However,itisstrongerthanthe“zeromedian”restrictionexploitedbytheCLADestimator,whichpermitsnonnormal,heteroskedasticandasymmetricer-rors.TheICLADandICLSestimatorsassumethattheerrortermsareidentically(butnotnecessarilysymmetrically)distributed,rulingoutheteroskedasticity,butpermittingasymmetryoftheerrordistribution.4Aswiththechoiceofregressors,itisultimatelyuptotheempiricalresearchertodeterminewhichassumptionismostplausiblefortheparticularapplication.Inpractice,though,computationofseveralparametricandsemiparametricestimatorsprovidesausefulguidetothesensitivityoftheresultstotheidentifyingassump-tions,eitherthroughcasualcomparisonofthecoefficientestimatesandstandarderrorsormoreformalspecificationtestsofthekinddescribedbyNewey(1987).Inotherwords,thedifferentestimationapproachesgivetheresearcheradditionalwaysto“cutthedata”toseewhichresultsarerobusttoalternativespecifications.
Theintervalcensoredregressionmodelisaspecialcaseofthemoregeneralcensoredselectionmodel,inwhichthedependentvariableyisgeneratedas
yϭdϫ͑xЈϩ͒,
wheredisanobservablebinary(“dummy”)variableindicatingwhetherthetruedependentvariabley*ϭxЈϩisobserved(dϭ1)or“censored”(dϭ0).Forthespecialcaseofintervalcensoringconsideredabove,disanindicatorforwhetherthetruevariabley*isintheuncensoredregion[a,b].Moregenerally,though,dwilldependonregressorsanderrortermsthatarerelatedto,butdistinctfrom,thoseintheequationfory*.Forexample,marketwagesmayonlybeobservedforindividualswithpositivelaborsupply,whichisadifferentformofcensoringthantopcodinginwages.
Insuchselectionmodels,parametricestimationmethodsforthecoefficientsaretypicallybasedonmaximumlikelihood(Gronau,1973)orthe“two-step”strategyproposedbyHeckman(1976,1979).Whentheerrordistributionisnotparametricallyspecified,however,semiparametricestimationoftheregression
4Muchofthetheoreticalliteratureonsemiparametricestimationofcensoredregressionmodelshasfocusedonthisassumption.However,somesimulationevidencesuggeststhatheteroskedasticitycausesgreaterbiasinstandardmaximumlikelihoodestimationthannonnormality(Powell,1986b).
KennethY.ChayandJamesL.Powell35
coefficientsgenerallyinvolvesexplicitnonparametricestimationofdensityorregressionfunctions,unlikethesimplermethodsforintervalcensoreddatade-scribedabove.5Also,semiparametricidentificationofthecensoredselectionmodelgenerallyrequiresan“exclusionrestriction”—thatis,aregressorthatisincludedinthesetofregressorsforthebinaryvariabledmustbeexcludedfromthelistofregressorsxintheequationofinterest.Thisexclusionrestriction(orinstrumentalvariable),whichisnotrequiredfortheinterval-censoringmodel,maynotbeplausibleinmanyempiricalapplications.Duetothesedifficulties,empiricalappli-cationsofsemiparametricselectionmodelsareevenlesscommonthanapplicationsofthesemiparametriccensoredregressionmodelsdescribedabove.6AnEmpiricalApplication:RelativeEarningsofBlackMenintheSouthDuringthe1960s
TitleVIIoftheCivilRightsActof1964,whichwentintoeffectonJuly2,1965,outlaweddiscriminationagainstblackandfemaleworkersandestablishedtheEqualEmploymentOpportunityCommissiontomonitorcompliancewithTitleVIIandtoenforceitsstatutes.ExecutiveOrder11246,signedbyPresidentJohnsononSeptember24,1965,prohibiteddiscriminationbyfederalcontractorsandcreateditsenforcementarm,theOfficeofFederalContractCompliance,tomonitorcontractors.Manystateshadalsoadoptedtheirownfairemploymentpracticelawsbefore1964forbiddingdiscriminationamongemployerslocatedwithinthestate.TheselawsweresimilartoTitleVIIandestablishedstate-levelcommissionstohearindividualdiscriminationclaims.However,noneofthe21stateswithenforceablestatelawsbeforethepassageofthe1964CivilRightsActwereintheSouth.
Weuselongitudinaldataonearningstoestimatetheimpactofthesecivilrightspolicies.InajointprojectoftheCensusBureauandtheSocialSecurityAdministration(SSA),respondentstothe1973and1978MarchCurrentPopula-tionSurveyswerematchedbytheirSocialSecuritynumberstotheirSocialSecurityearningshistories.Theresultingfilescontainsurveyresponsesonrace,gender,education,ageandregionofresidenceasofthesurveyyearforpersonsintheMarchsurveyslinkedtoanyearningsforwhichtheypaidSocialSecuritytaxes.
Weexaminethepooleddatacontainingearningsinformationfrom1958to1974,withaparticularfocusontheyears1963,1964,1970and1971.Weuseasampleofblackandwhitemenlivinginsouthernstateswhowerebornintheperiod1910–1939(theyoungestmaninthesamplewas24in1963,whiletheoldestwas61in1971).WefocusonmenintheSouthbecausenoneofthestatesintheSouthhadfairemploymentpracticelawsbefore1965,andTitleVIIenforcement
56AnexceptionisgivenbyHonore´,KyriazidouandUdry(1997).
OneexampleofanearlyimplementationisprovidedbyNewey,PowellandWalker(1990),whichappliedsemiparametricestimationmethodstotheMroz(1987)dataonthelaborsupplyofmarriedwomen.
36JournalofEconomicPerspectives
activitywasprimarilydirectedatracialdiscriminationintheSouth.Awithin-cohortanalysisisusedtocontrolforthechangingcompositionofworkersovertime.Thefinalsampleconsistsof10,105men,andouranalysisusesallmenwithnonzeroearningsinagivenyear.7AsignificantshortcomingoftheearningsdataisthatmanyrecordsarecensoredattheSocialSecuritymaximumtaxableearningslevel.Inaddition,therealvalueofthetaxceilingchangedsubstantiallyduringthetimeperiodofinterest,risingfromalittleover$15,000(in1982–1984dollars)in1963–1964toabout$20,000in1970–1971.Atleast32percentofthesampleisclassifiedasearningthetop-codedamountfrom1958to1974,andthissharefluctuatesconsiderablyduringthekeyperiods,reachingapeakof54percentin1965.Consequently,anyestimatesoftheimpactofTitleVIIontheblack-whiteearningsgapthatdonotexplicitlyaccountforcensoringatthetaxceilingandchangesinitcouldbeseverelybiased.
Weuseseveralapproachestoestimatetheintervalcensoringmodel.Thedependentvariableineachcaseisthenaturallogarithmofannualtaxableearn-ings,andtheexplanatoryvariablesarerace,levelofeducation,ageandage-squared.Table1presentstheestimationresultsfortheraceandeducationcoef-ficientsbasedonthevariousestimators.Thefirstcolumn,headedOLS1,containstheordinaryleastsquaresestimatesbasedonallofthedata.Thesecondcolumn,headedOLS2,presentstheleastsquaresresultsusingonlytheobservationsthatarenotcensored.ThethirdcolumncontainstheTobitmaximumlikelihoodestimatesundertheassumptionthattheerrorsarenormallydistributedandhomoskedastic.Theremainingcolumnspresenttheresultsforthethreesemiparametricestima-tors:CLAD,SCLSandICLAD.TheTobit,CLADandSCLSestimatorswereimple-mentedusingtheStatasoftwarepackage,whiletheICLADestimatorwascalculatedusingtheGausspackage.Foreachestimator,wehavecreatedStata“ado”filesthatareavailableat͗http://elsa.berkeley.edu/ϳkenchay͘.8ItisclearfromTable1thattheleastsquaresandmaximumlikelihoodestimatesoftheblack-whitelog-earningsgapandthereturnstoeducationareextremelybiasedwhencomparedtothesemiparametricestimators.WethinkoftheCLADestimatorasthenaturalbenchmark,sinceitisconsistentunderthenormalityoferrorsassumptionjustifyingthemaximumlikelihoodestimator,undertheindependenceoferrorsassumptionjustifyingtheICLADestimator,andundertheconditionalsymmetryoferrorsassumptionjustifyingtheSCLSestimator.WhencomparedtotheCLADbenchmark,theleastsquaresestimatorbasedonallofthedata(OLS1)actuallydoesbetterthanthemaximumlikelihoodestimator.Forthis
7Over84percentofthemeninthesamplehavepositiveearningsin1963–1964.Thisfigureis83percentin1970–1971.8ThestandarderrorsforOLS,MLEandICLADwerecalculatedusingstandardapproximations.ThestandarderrorsforCLADandSCLSwerecalculatedusingthebootstraptechniquesdiscussedbyBrownstoneandVallettainthisissue.ItismuchmoreefficienttocalculatetheICLADestimatorusingGaussinsteadofStata.
SemiparametricCensoredRegressionModels37
Table1
EstimatedEffectsofRaceandEducationonLog-Earnings(estimatedstandarderrorsinparentheses)
OLS1
Black-WhiteGap1963196419701971
ReturnstoEducation1963196419701971
OLS2
MLE
CLAD
SCLS
ICLAD
Ϫ0.355(0.033)Ϫ0.349(0.032)Ϫ0.262(0.032)Ϫ0.242(0.031)0.041(0.003)0.040(0.003)0.037(0.003)0.035(0.002)
Ϫ0.183(0.038)Ϫ0.154(0.038)Ϫ0.115(0.037)Ϫ0.111(0.038)0.012(0.004)0.013(0.005)0.003(0.005)0.002(0.004)
Ϫ0.629(0.044)Ϫ0.674(0.044)Ϫ0.508(0.044)Ϫ0.486(0.044)0.102(0.004)0.103(0.004)0.101(0.004)0.100(0.004)
Ϫ0.416(0.027)Ϫ0.428(0.033)Ϫ0.278(0.020)Ϫ0.244(0.022)0.051(0.004)0.064(0.006)0.055(0.003)0.054(0.003)
Ϫ0.444(0.031)Ϫ0.444(0.036)Ϫ0.302(0.031)Ϫ0.287(0.032)0.068(0.007)0.079(0.007)0.066(0.006)0.065(0.005)
Ϫ0.474(0.032)Ϫ0.473(0.031)Ϫ0.338(0.029)Ϫ0.312(0.031)0.073(0.003)0.075(0.003)0.071(0.003)0.070(0.003)
Notes:Thedependentvariableisthenaturallogarithmofannualtaxableearnings.Regressionsalsoincludeaconstantandageandage-squaredasexplanatoryvariables.Observationswithnonpositiveearningsaredroppedfromtheanalysis.Thesamplesizesfor1963,1964,1970and1971are8525,8529,8391and8275,respectively.TheOLS2specificationalsodropstop-codedobservations,leadingtosamplesizesof4632,4267,4485and4163.MLEisTobitmaximumlikelihood;CLADiscensoredleastabsolutedeviations;SCLSissymmetricallycensoredleastsquares;ICLADisidenticallycensoredleastabsolutedeviations.
application,itappearsthatmisspecifyingtheerrorsasbeingnormallydistributedandusingmaximumlikelihoodestimationresultsinmorebiasedestimatesthanignoringthecensoringproblementirelyandusingleastsquaresestimation.Amoreformaltestofthenormalityassumptionalsosuggeststhatitisviolatedforthelog-earningsmodel.9Therearesizeabledifferencesintheestimatedeffectsofeducationonearn-ingsacrossthethreesemiparametricestimators.WhiletheICLADandSCLSestimatorsoftheeducationpremiumaresimilar,theyarealwaysgreaterthantheCLADestimator.Thesedifferencesaresignificantgiventheprecisionoftheestimatesandrangefrom17percentto43percentfortheICLADestimatorand
ChayandHonore´(1998)calculatetheteststatisticsfornonnormalityandforheteroskedasticityincensoredregressionmodelsdiscussedbyChesherandIrish(1987).Theteststatisticfordetectingnonnormalityrangesfrom900.47to1200.68.Underthenull,thisstatistichasan(asymptotic)2(2)distributionwitha1percentcriticalvalueof9.21.Therefore,weeasilyrejectthehypothesisthattheerrorsarenormallydistributed.Thetestforheteroskedasticityyieldsstatisticsbetween84.71and90.85.Underthenull,thesehave(asymptotic)2(12)distributionswitha1percentcriticalvalueof26.22.Wethereforealsorejectthenullofnoheteroskedasticity.
938JournalofEconomicPerspectives
20percentto33percentfortheSCLSestimator.Thedifferencesintheestimatesoftheblack-whiteearningsgaparesmaller,withtheCLADestimatorabout11percentto28percentand4percentto18percentsmallerinmagnitudethantheICLADandSCLSestimators,respectively.Strikingly,thesemiparametricap-proachesallresultinmorepreciseestimatesoftheracecoefficientthanmaximumlikelihoodestimation.Forexample,thestandarderrorsoftheCLADestimatorare25percentto55percentsmallerthanthestandarderrorsoftheTobitestimator.10Thedifferencesinthecoefficientestimatesacrossthevariousestimatorscanbeusedasasortofspecificationcheck,similarinspirittotheNewey(1987)specificationanalysismentionedearlier.Fortheeducationcoefficient,thelargedifferencesbetweenthemaximumlikelihoodandsemiparametricestimatessug-gestthatnonnormalerrorsareanimportantsourceofbiasintheTobitestimator.Further,thesignificantdifferencesamongthesemiparametricestimatesimplythatheteroskedasticityandasymmetryoftheerrorsarealsosourcesofmisspecificationinthemaximumlikelihoodestimatoroftheeducationpremium.Conversely,fortheblack-whiteearningsgap,thesmallerdifferencesamongthesemiparametricestimatessuggestthatnonnormalityisthebiggestsourceofbiasintheTobitestimator,withheteroskedasticityandasymmetryplayingsmallerroles.
Toexaminethequestionofspecificationinmoredetail,weestimatedthedistributionoftheerrortermsderivedfromtheCLADestimates,usingtheKaplanandMeier(1958)estimator.Theresultingestimatederrordistributionforlog-earningshasfattertailsthandoesanormaldistribution.Themaximumlikelihoodestimatorissensitivetovaluesinthetails,whiletheleastabsolutedeviationsestimator,whichfocusesonthemedianvalue,isunaffectedbyextremeobserva-tions.Sinceblackmenaremorelikelytobeintheleft-handtailofthedistribution,fattailscanexplaintheconsistentlylarger(inmagnitude)maximumlikelihoodestimatesoftheracecoefficient.TheycanalsoexplainthelargersamplingerrorsoftheTobitestimatorrelativetothesemiparametricestimators.Thus,abnormallylongtailsinthelog-earningsdistributionmaybethemajorsourceofmisspecifica-tioninthemaximumlikelihoodestimatesoftheblack-whiteearningsgap(Chay,1995;ChayandHonore´,1998).
Basedontheseriesofcross-sectionalestimatorsforthefouryearsshowninTable1,themaximumlikelihoodandsemiparametricapproachesyieldverysimilarestimatesofchangesinblack-whiterelativeearningsduringthelate1960s.Themaximumlikelihoodandsemiparametricestimatesallimplythattheblack-whiteearningsgapnarrowedabout0.15logpointsfrom1963–1964to1970–1971.WeconcludefromthisthatwhilethereisbiasintheTobitestimatoroftheracecoefficient,thisbiasisfixedovertime.Thus,itis“differencedout”whenoneexamineschangesintheestimatedracecoefficient.However,thetwoordinaryleast
10ThestandarderrorsfortheCLADandSCLSestimatorswerecalculatedusing500bootstrapreplica-tions.ApplyingthebootstraptotheTobitmaximumlikelihoodestimatorresultsinstandarderrorsthatarenearlyidenticaltothosepresentedinTable1,whicharebasedontheasymptoticapproximation.Thus,thebootstraptechniqueisnotthesourceofthedifferencesintheestimatedstandarderrors.
KennethY.ChayandJamesL.Powell39
Figure3
Top-CodeRateandEstimatesofBlack-WhiteLog-EarningsGap,1958–1974
squaresestimatorsimplythatrelativeearningsonlyconvergedbetween0.06(OLS2)and0.10(OLS1)logpointsduringtheperiodofinterest.NotaccountingfortheseverecensoringintheearningsdataresultsindownwardlybiasedestimatesoftheimpactofTitleVII.Also,althoughtheCLADestimatorimposestheweakeststochasticrestrictionsontheerrorterms,itresultsinthemostpreciseestimatesofthepolicyeffects.
Figures3Aand3Bprovideamoredetailedpictureofthevariousestimators.Thetoppanelshowsthepercentageofworkersinthesamplerecordedasearningatthetaxablemaximum(thetop-coderate)from1958to1974,separatelybyrace.Thebottompanelplotstheestimatedblack-whitelog-earningsgapsfromtheOLS1,
40JournalofEconomicPerspectives
Table2
Fixed-EffectsICLADEstimatesofEffectsofRaceandEducation(estimatedstandarderrorsinparentheses)
1963–64
Changein
Black-WhiteGapReturnstoEducation
1963–70
1963–71
1964–70
1964–71
1970–71
0.011(0.007)0.002(0.001)0.102(0.017)0.001(0.002)0.136(0.021)0.000(0.003)0.095(0.020)0.000(0.003)0.108(0.019)Ϫ0.003(0.002)0.015(0.007)0.000(0.001)
Notes:SeenotestoTable1.Thesampleisthe7,435menwithpositiveearningsinallfouryears.Foreachpairofyears,theabsoluteerrorlossfunctionwasusedtoestimatetheidenticallycensoredpaneldatamodelwithfixedeffects.Theestimatesrepresentthechangeinthecoefficientsbetweenthetwoyears.
OLS2,MLEandCLADestimatorsfortheseriesofcross-sectionsfrom1958–1974.Thereisastrikingcorrespondencebetweenchangesinthetop-coderateinthetoppanelandchangesintheordinaryleastsquaresestimatesinthebottompanel.Indeed,itseemsthatmostofthechangesintheordinaryleastsquaresestimatesovertimearebeingdrivenbychangesintheamountofcensoringvaryingbyrace.Thisresultsinasevereunderstatementoftheracialearningsconvergenceinthelate1960s.ThetimeseriesoftheCLADandmaximumlikelihoodestimates,ontheotherhand,havenoassociationwiththetop-coderates,although,asnotedearlier,themaximumlikelihoodestimatessystematicallyoverstatethesizeoftheblack-whiteearningsgapwhencomparedtotheCLADestimates.Themaximumlikeli-hoodandCLADestimatesimplysubstantialblackeconomicprogressintheSouthafter1964;aresultthatismaskedbytheordinaryleastsquaresestimates.
Finally,Table2presentsthefixed-effectsestimationresultsbasedontheICLADestimatorforeachofthesixpossiblepairsoftimeperiodsinourpaneldata.Thetableentriesgivetheestimatedchangeintheraceandeducationcoefficientsbetweenthetwospecifiedperiods.Theanalysisincludesageandage-squaredasexplanatoryvariables.Tocomparetheparameterestimatesacrossthesixcolumns,thesampleisrestrictedtothe7,435menwithpositiveearningsinallfouryears.Thisreducesthesamplesizebyabout10percentto13percentrelativetothecross-sectionalsamples.
Theblack-whiteearningsgapnarrowedsubstantiallyduringtheperiodofinterest,evenafteraccountingforindividual-specificfixedeffects.Therelativeearningsofblackmenincreasedabout0.12–0.14log-pointsfrom1963to1971.Theseestimatesaresimilartothoseimpliedbytheseriesofmaximumlikelihoodandsemiparametricestimatesofthecross-sectionalcensoredregressionmodelinTable1.Akeyassumptionunderlyingthefixed-effectsICLADestimatoristhatthedistributionoftheunobservablesisthesameinalltimeperiodsforagivenindividual.Aformalspecificationtestdidnotrejecttherestrictionsimpliedbythisassumptionatconventionallevelsofsignificance(ChayandHonore´,1998).
SemiparametricCensoredRegressionModels41
Conclusion
Whendataarecensored,ordinaryleastsquaresregressioncanprovidemis-leadingestimates.TheresultsfromthesemiparametricmodelsshowthattherewassignificantearningsconvergenceamongblackandwhitemenintheAmericanSouthafterthepassageofthe1964CivilRightsAct,aresultthatwasmaskedbyleastsquaresanalysis.Thesemiparametricmethodscanalsoprovideinformationonthesourcesofmisspecificationinparametricestimationapproaches.Inthelog-earningsmodel,itappearsthatabnormallylongtailsarethemajorsourceofbiasintheTobitmaximumlikelihoodestimates.
´,DavidLee,andespeciallyAlanKrueger,TimothyTaylorandyWethankBoHonore
MichaelWaldmanfortheirhelpfulcomments.MarceloMoreiraprovidedoutstandingresearch
assistance.SupportfromtheAlfredP.SloanFoundationandtheCenterforAdvancedStudyintheBehavioralSciencesisgratefullyacknowledged.
References
Amemiya,Takeshi.1985.AdvancedEconomet-rics.Cambridge:HarvardUniversityPress.
Arabmazar,AbbasandPeterSchmidt.1981.“FurtherEvidenceontheRobustnessoftheTo-bitEstimatortoHeteroskedasticity.”JournalofEconometrics.November,17:2,pp.253–58.
Arabmazar,AbbasandPeterSchmidt.1982.“AnInvestigationoftheRobustnessoftheTobitEstimatortoNon-Normality.”Econometrica.July,50:4,pp.1055–63.
Buchinsky,Moshe.1994.“ChangesintheU.S.WageStructure1963–1987:ApplicationofQuantileRegression.”Econometrica.March,62:2,pp.405–58.
Chay,KennethY.1995.“EvaluatingtheImpactofthe1964CivilRightsActontheEco-nomicStatusofBlackMenusingCensoredLon-gitudinalEarningsData.”Unpublishedman-uscript,DepartmentofEconomics,PrincetonUniversity.
Chay,KennethY.andBoE.Honore´.1998.“EstimationofSemiparametricCensoredRe-gressionModels:AnApplicationtoChangesinBlack-WhiteEarningsInequalityDuringthe1960s.”JournalofHumanResources.Winter,33:1,pp.4–38.
Chesher,AndrewandMargaretIrish.1987.
“ResidualAnalysisintheGroupedandCensoredNormalLinearModel.”JournalofEconometrics.January/February,34:1-2,pp.33–61.
Goldberger,ArthurS.1983.“AbnormalSelec-tionBias,”inStudiesinEconometrics,TimeSeries,andMultivariateStatistics.S.Karlinetal.,eds.NewYork:AcademicPress,pp.67–84.
Gronau,Reuben.1973.“TheEffectofChil-drenontheHousewife’sValueofTime.”JournalofPoliticalEconomy.March/April,81:2,pp.S168–S199.
Heckman,JamesJ.1976.“TheCommonStructureofStatisticalModelsofTruncation,SampleSelectionandLimitedDependentVari-ablesandaSimpleEstimatorforSuchModels.”AnnalsofEconomicsandSocialMeasurement.Fall,5:4,pp.475–92.
Heckman,JamesJ.1979.“SampleSelectionBiasasaSpecificationError.”Econometrica.Jan-uary,47:1,pp.153–61.Honore´,BoE.1992.“TrimmedLADandLeastSquaresEstimationofTruncatedandCen-soredRegressionModelswithFixedEffects.”Econometrica.May,60:3,pp.533–65.Honore´,BoE.andJamesL.Powell.1994.“PairwiseDifferenceEstimatorsforCensoredandTruncatedRegressionModels.”Journalof
42JournalofEconomicPerspectives
Econometrics.September/October,64:1-2,pp.241–78.Honore´,BoE.,EkateriniKyriazidouandChristopherUdry.1997.“EstimationofType3TobitModelsUsingSymmetricTrimmingandPairwiseComparisons.”JournalofEconometrics.January/February,76:1-2,pp.107–28.
Kaplan,E.L.andP.Meier.1958.“Nonpara-metricEstimationfromIncompleteObserva-tions.”JournaloftheAmericanStatisticalAssocia-tion.53,pp.457–81.
Koenker,RogerandGilbertS.Bassett,Jr.1978.“RegressionQuantiles.”Econometrica.Jan-uary,46:1,pp.33–50.
Mroz,ThomasA.1987.“TheSensitivityofanEmpiricalModelofMarriedWomen’sHoursofWorktoEconomicandStatisticalAssumptions.”Econometrica.July,55:4,pp.765–99.
Newey,WhitneyK.1987.“SpecificationTestsforDistributionalAssumptionsintheTobitModel.”JournalofEconometrics.January/Febru-ary,34:1-2,pp.125–45.
Newey,WhitneyK.,JamesL.PowellandJamesM.Walker.1990.“SemiparametricEsti-mationofSelectionModels:SomeEmpiricalRe-sults.”AmericanEconomicReview.May,80:2,pp.324–28.
Powell,JamesL.1984.“LeastAbsoluteDevia-tionsEstimationfortheCensoredRegressionModel.”JournalofEconometrics.July,25:3,pp.303–25.
Powell,JamesL.1986a.“CensoredRegressionQuantiles.”JournalofEconometrics.June,32:1,pp.143–55.
Powell,JamesL.1986b.“SymmetricallyTrimmedLeastSquaresEstimationforTobitModels.”Econometrica.November,54:6,pp.1435–60.
Powell,JamesL.1994.“EstimationofSemipa-rametricModels,”inHandbookofEconometrics,VolumeIV.RobertF.EngleandDanielL.McFad-den,eds.Amsterdam:NorthHolland,pp.2443–521.Tobin,James.1958.“EstimationofRelation-shipsforLimitedDependentVariables.”Econo-metrica.January,26,pp.24–36.
This article has been cited by:
1.Kamran A. Khan, Stavros Petrou, Oliver Rivero-Arias, Stephen J. Walters, Spencer E. Boyle. 2014.Mapping EQ-5D Utility Scores from the PedsQL™ Generic Core Scales. PharmacoEconomics 32:7,693-706. [CrossRef]
2.Weili Ding, Yuan Zhang. 2014. When a Son is Born: The Impact of Fertility Patterns on FamilyFinance in Rural China. China Economic Review . [CrossRef]
3.Bob Edward Vásquez, Gregory M. Zimmerman. 2014. An investigation into the empirical relationshipbetween time with peers, friendship, and delinquency. Journal of Criminal Justice 42:3, 244-256.[CrossRef]
4.Juan F. Delgado, Juan Oliva, Miguel Llano, Domingo Pascual-Figal, José J. Grillo, Josep Comín-Colet, Beatriz Díaz, León Martínez de La Concha, Belén Martí, Luz M. Peña. 2014. Costes sanitariosy no sanitarios de personas que padecen insuficiencia cardiaca crónica sintomática en España. RevistaEspañola de Cardiología . [CrossRef]
5.Juan F. Delgado, Juan Oliva, Miguel Llano, Domingo Pascual-Figal, José J. Grillo, Josep Comín-Colet, Beatriz Díaz, León Martínez de La Concha, Belén Martí, Luz M. Peña. 2014. Health Careand Nonhealth Care Costs in the Treatment of Patients With Symptomatic Chronic Heart Failure inSpain. Revista Española de Cardiología (English Edition) . [CrossRef]
6.Stefano Mainardi. 2014. Disparities in Public Service Provision in Niger: Cross-District Evidence onAccess to Primary Schools and Healthcare. Regional Studies 1-20. [CrossRef]
7.Sandro C. Andrade, Gennaro Bernile, Frederick M. Hood. 2014. SOX, corporate transparency, andthe cost of debt. Journal of Banking & Finance 38, 145-165. [CrossRef]
8.Jaya Prakash Pradhan, Keshab Das. 2013. Exporting by Indian small and medium enterprises: roleof regional technological knowledge, agglomeration and foreign direct investment. Innovation andDevelopment 3:2, 239-257. [CrossRef]
9.G. M. Artz, K. L. Kimle, P. F. Orazem. 2013. Does the Jack of All Trades Hold the Winning Hand?Comparing the Role of Specialized versus General Skills in the Returns to an Agricultural Degree.American Journal of Agricultural Economics . [CrossRef]
10.Antonio F. Galvao, Carlos Lamarche, Luiz Renato Lima. 2013. Estimation of Censored QuantileRegression for Panel Data With Fixed Effects. Journal of the American Statistical Association 108:503,1075-1089. [CrossRef]
11.Ralph Crott, Matthijs Versteegh, Carin Uyl-de-Groot. 2013. An assessment of the external validityof mapping QLQ-C30 to EQ-5D preferences. Quality of Life Research 22:5, 1045-1054. [CrossRef]12.Billingsley Kaambwa, Lucinda Billingham, Stirling Bryan. 2013. Mapping utility scores from theBarthel index. The European Journal of Health Economics 14:2, 231-241. [CrossRef]
13.Paul A. Raschky, Reimund Schwarze, Manijeh Schwindt, Ferdinand Zahn. 2013. Uncertainty ofGovernmental Relief and the Crowding out of Flood Insurance. Environmental and Resource Economics54:2, 179-200. [CrossRef]
14.Sreejata Banerjee. 2012. Basel l and Basel ll compliance issues for banks in India. Macroeconomics andFinance in Emerging Market Economies 5:2, 228-245. [CrossRef]
15.João Ricardo Faria, Le Wang, Zhongmin Wu. 2012. Debts on debts. The North American Journal ofEconomics and Finance 23:2, 203-219. [CrossRef]
16.Ziyodullo Parpiev, Kakhramon Yusupov, Nurmukhammad Yusupov. 2012. Outlay equivalence analysisof child gender bias in household expenditure data. Economics of Transition 20:3, 549-567. [CrossRef]
17.Ismael Arciniegas Rueda. 2012. EMPIRICAL ANALYSIS OF SPECULATIVE ATTACKSWITH CONTRACTIONARY REAL EFFECTS. Intelligent Systems in Accounting, Finance andManagement n/a-n/a. [CrossRef]
18.Jaya Prakash Pradhan. 2011. Regional heterogeneity and firms' R&D in India. Innovation andDevelopment 1:2, 259-282. [CrossRef]19.Che-Yuan Liang. 2011. Nonparametric structural estimation of labor supply in the presence ofcensoring. Journal of Public Economics . [CrossRef]
20.Danutė Krapavickaitė. 2011. Some models for estimation of total of a study variable having many zerovalues. Lithuanian Mathematical Journal . [CrossRef]
21.Sarah E. Anderson. 2011. Complex constituencies: intense environmentalists and representation.Environmental Politics 20:4, 547-565. [CrossRef]
22.Alessandro Acquisti, Sarah Spiekermann. 2011. Do Interruptions Pay off? Effects of Interruptive Adson Consumers' Willingness to Pay. Journal of Interactive Marketing . [CrossRef]
23.Guixian Lin, Xuming He, Stephen Portnoy. 2011. Quantile regression with doubly censored data.Computational Statistics & Data Analysis . [CrossRef]
24.Paul R. Hunter, Marianna Anderle de Sylor, Helen L. Risebro, Gordon L. Nichols, DavidKay, Philippe Hartemann. 2011. Quantitative Microbial Risk Assessment of Cryptosporidiosis andGiardiasis from Very Small Private Water Supplies. Risk Analysis 31:2, 228-236. [CrossRef]
25.Terry Mashtare, Alan Hutson. 2011. Utilizing the Flexibility of the Epsilon-Skew-NormalDistribution for Tobit Regression Problems. Communications in Statistics - Theory and Methods 40:3,408-423. [CrossRef]
26.Carolyn Kousky. 2011. Understanding the Demand for Flood Insurance. Natural Hazards Review12:2, 96. [CrossRef]
27.Aurora Galego, João Pereira. 2010. EVIDENCE ON GENDER WAGE DISCRIMINATION INPORTUGAL: PARAMETRIC AND SEMI-PARAMETRIC APPROACHES. Review of Incomeand Wealth 56:4, 651-666. [CrossRef]
28.Ralph Crott, Andrew Briggs. 2010. Mapping the QLQ-C30 quality of life cancer questionnaire toEQ-5D patient preferences. The European Journal of Health Economics 11:4, 427-434. [CrossRef]29.Christopher Sullivan, Tara Livelsberger. 2010. Censored Regression in Response to the DistributionalRealities of Crime and Justice Measures. Journal of Criminal Justice Education 21:2, 197-208.[CrossRef]
30.Anil Kumar. 2010. Nonparametric estimation of the impact of taxes on female labor supply. Journalof Applied Econometrics n/a-n/a. [CrossRef]
31.H. Saito, M. Gopinath. 2009. Plants' self-selection, agglomeration economies and regionalproductivity in Chile. Journal of Economic Geography 9:4, 539-558. [CrossRef]
32.Miriam Manchin, Anna Maria Pinna. 2009. Border effects in the enlarged EU area: evidence fromimports to accession countries. Applied Economics 41:14, 1835-1854. [CrossRef]
33.Christopher J. Sullivan, Jean Marie McGloin, Alex R. Piquero. 2008. Modeling the Deviant Y inCriminology: An Examination of the Assumptions of Censored Normal Regression and PotentialAlternatives. Journal of Quantitative Criminology 24:4, 399-421. [CrossRef]
34.Conor Keelan, Carol Newman, Maeve Henchion. 2008. Quick-service expenditure in Ireland:parametric vs. semiparametric analysis. Applied Economics 40:20, 2659-2669. [CrossRef]
35.M. Maria Glymour, Jennifer Weuve, Jarvis T. Chen. 2008. Methodological Challenges in CausalResearch on Racial and Ethnic Patterns of Cognitive Trajectories: Measurement, Selection, and Bias.Neuropsychology Review 18:3, 194-213. [CrossRef]
36.Mark Ottoni Wilhelm. 2008. Practical Considerations for Choosing Between Tobit and SCLS orCLAD Estimators for Censored Regression Models with an Application to Charitable Giving*. OxfordBulletin of Economics and Statistics 70:4, 559-582. [CrossRef]
37.Wojciech Kopczuk. 2007. Bequest and Tax Planning: Evidence from Estate Tax Returns *. QuarterlyJournal of Economics 122:4, 1801-1854. [CrossRef]38.MATTHEW T. BILLETT, HUI XUE. 2007. The Takeover Deterrent Effect of Open Market ShareRepurchases. The Journal of Finance 62:4, 1827-1850. [CrossRef]
39.Shawn Bushway, Brian D. Johnson, Lee Ann Slocum. 2007. Is the Magic Still There? The Useof the Heckman Two-Step Correction for Selection Bias in Criminology. Journal of QuantitativeCriminology 23:2, 151-178. [CrossRef]
40.Maria Karlsson. 2006. Estimators of Regression Parameters for Truncated and Censored Data. Metrika63:3, 329-341. [CrossRef]
41.Jean-Claude Berthelemy. 2006. Bilateral Donors' Interest vs. Recipients' Development Motives inAid Allocation: Do All Donors Behave the Same?. Review of Development Economics 10:2, 179-194.[CrossRef]
42.Jay H. Lubin, Joanne S. Colt, David Camann, Scott Davis, James R. Cerhan, Richard K. Severson,Leslie Bernstein, Patricia Hartge. 2004. Epidemiologic Evaluation of Measurement Data in thePresence of Detection Limits. Environmental Health Perspectives 112:17, 1691-1696. [CrossRef]43.Judith A Chevalier, Austan GoolsbeeVALUING INTERNET RETAILERS: AMAZON ANDBARNES AND NOBLE 12, 73-84. [CrossRef]
因篇幅问题不能全部显示,请点此查看更多更全内容