DistributedScientificComputing
P.Luksch,U.Maier,S.Rathmayer,M.Weidmann
Lehrstuhlf¨urRechnertechnikundRechnerorganisation/Parallelrechnerarchitektur
(LRR-TUM)Institutf¨urInformatikTechnischeUniversit¨atM¨unchen
D-80290M¨unchen
e-mail:luksch,maier,maiers,weidmann@informatik-tu.muenchen.de
WWW:http://wwwbode.informatik.tu-muenchen.de/
Tel.:(089)2105-8164;Fax:(089)2105-8232
Abstract.Inthispaper,wepresentaninterdisciplinaryresearchprojectwhosecentralobjectiveistodevelopnewsoftwareengineering(SWE)methodsfordistributedmemoryparallelscientificcomputing.Ouremphasisisonputtingintopracticeandevaluatingtheproposedmethods.ThemaintestcasefortheirdefinitionandevaluationistheparallelizationofanindustrialCFDsoftwarepackage.Amajorconcernistoachievemaximumportability,i.e.coveralargenumberoftargetsystemsrangingfromnetworksofworkstations(NOWs)tomassivelyparallelsystems(MPPs).InordertooptimizeutilizationofNOWs,aresourcemanagementsystemisbeingdesigned,whichrunsparallelapplicationsinbatchmodeanddynamicallyassignsavailableresourcestothetasksoftheparallelapplications.Besidesgivinganoverviewoftheobjectivesfollowedintheproject,wegiveaprogressreport,whichdetailstheachievementsreachedsofar.
1Objectives
TheinterdisciplinaryresearchprojectSEMPA(SarallelScientificA
ngineeringM
Bundesministeriumf¨urBildung,Wissenschaft,ForschungundTechnologie
Academia
SFB 342 TPA1DFGGerman Science FoundationIndustry
ProgrammingEnvironments +ToolsComputerScienceThe Tool-SetInteractive on-lineCoCheckcontrol of scientificcomputationsTASCflow3d CFD packageNavier-Stokesequationsunstructured gridsASCAMGAdvancedAlgebraic ScientificMultigrid ComputingSolverGmbHVIPERPriority Program CFDDFGParallelizationTU Münchenof scientificLehrstuhl fürFORTWIHRapplicationsRechnertechnikBavarian ScienceFoundatiionBFSundParNSFLEXRechnerorganisationPAR-CVDMUMUSBMBFLRR-TUMFASTESTInstitut fürComputer-anwendungenICA IIIMechanicalEngineeringSEMPAAdaptiveMultigridMethodsGENIASSoftwareGmbHAppliedNumericalMathematicsUGUni StuttgartToolboxfor adaptivesolution for partialdifferential equationson unstructured meshesin 2D and 3DCODINEBatch Queueing Systemfor NOW’sComputerScienceColor Code: Software PackageProjectFunding Institution
Fig.1.partnersinvolvedinSEMPA
1.1SoftwareEngineeringMethodsforParallelScientificApplications
UnderstandingComplexSoftwareSystemsInparallelscientificcomputing,softwareengineersusuallyarefacedwithanexistingprogramorwithexistingmodules,typicallywritteninFORTRAN77,whichtheyareexpectedtoparallelizeforexecutiononadistributedmemorymultiprocessor(NOWorMPP).
Thefirststepinparallelizinganexistingsoftwarepackageistoachieveabasicunderstandingofitscontrolandatastructures,whichisnecessarytodefineaconceptforparallelization.Inaddition,comprehensiveprofilinghastobedonewitharepresentativesetoftestcasesinordertoidentifythecomputationallymostintensivemodules,i.e.themostpromisingcandidatesforparallelization.
Theknowledgegainedabouttheprogramstructureshouldbedocumentedinanappropriateformforfuturereference.Designdocumentsbytheauthorsoftheprogramoftenareunavailable,sincemanyscientificcodesaretheresultofalongdevelopmentprocessinvolvingseveralgenerationsofprogrammers.Evenifgooddesigndocumenta-tionisavailable,asisthecasewithTASCFLOW,itturnsoutthatdocumentationwrittenasareferenceforCFDdevelopersisinmanyaspectsinappropriateforthecomputerscientistfacedwiththeproblemofparallelizingthecode.
WethereforedecidedtoorganizeaseriesofmeetingswhereCFDdevelopersfromASCexplainedthealgorithmsandtheirimplementationinTASCFLOWtocomputerscientistsatLRR-TUM.Asaresultofthesemeetings,designdocumentshavebeenwrittenatLRR-TUMandreviewedbyASC.Wefoundthisprocedureveryeffectiveforacquiringtheknow-howthatisneededbeabletoanalyzetheprogramandtosetupaparallelizationconcept.Theimmediatereviewofourdocumentshashelpedtofixsourcesofmisconceptionataveryearlystage.
Itis,however,nottoocommoninscientificcomputingthatcodedeveloperscanprovideextensivesupportforunderstandingaprogram.Analternativewayforobtai-ninginsightintothestructureoftheprogramunderconsiderationistouseinteractiveparallelizationtools.Theycanprovidevaluableinsightaboutdataandcontrolflowdependencies,andprovidedetailedprofilinginformation.
MostofourinsightintoTASCFLOWresultsfromthedesigndocumentsourjointmeetingswithASC,whichcertainlyisduetothefactthatTASCFLOWhasbeenimple-mentedfollowingsoftwareengineeringguidelinesdefinedbythedevelopmentteam.DocumentationTheresultofanalyzingtheserialprogramaswellasconceptsforitsparallelizationhavetobedocumentedindetailtoprovideacommonrepositoryofinformationforfutureworkintheproject.Tobeuseful,thesedocumentsmustfol-lowstandardsthatwillhavetobedefinedaccordingtotherequirementsoftheintendedaudience.Ourexperienceisthattherequirementsfordesigndocumentationthatisinten-dedasareferenceforCFDdevelopersandnumericalanalystsdifferconsiderablyfromtherequirementsofcomputerscientistsinvolvedinparallelization.Havingdocumentsreviewedbyprojectmembersnotdirectlyinvolvedintheprocessbeingdocumentedhasprovedtobeveryhelpfulinassuringhigh-qualitydocumentation.
GeneralApproachestoParallelizationofScientificComputationsFormostscien-tificcomputations,SPMDlikeparallelismisanaturalchoicebecauseitscaleswellwiththesizeoftheproblem.BeingabletosolvelargerproblemsusuallyisthestrongestmotivationforparallelizingsoftwareinScientificComputing,especiallyinCFD.
Manyscientificapplicationsaresimilarinthestructureoftheirlogicalobjectsbutdifferintheirimplementation.Forexample,everyCFDpackageimplementstheconceptofagridandasystemoflinearequations.AnumberofdecisionsinimplementingSPMDlikeparallelismcanbeexpressedintermsoftheselogicalobjectsandthereforecanbegeneralizedtoacertainextend.Therefore,theparallelizationconceptforTASCFLOWhasbeendefinedattheleveloflogicalobjects.
Dataparallelismrequirestheproblemdescriptiontobepartitioned.Ithastobedecidesonwhichlogicalobjectsthedistributionistobebased.Datadistriubtionitselfisagraphpartitioningproblemforwhichanumberofsoftwarepackagesareavailable.
StandardsforProgramDevelopmentGuidelineshavetobesetupthatdefinethethestagesofthedevelopmentprocess:functionalspecification,highlevelanddetaileddesign,implementation,testandverification,documentation,andrelease.Itisimportant
thateachstageofspecificationanddesigniscarefullyreviewedbeforeproceedingtothenextstep.
StartingwiththesoftwareengineeringguidelinesapprovedinthedesignofTASCFLOW,wearedefiningnewmethodsthataddressissuesrelatedtoparallelism.Object-orientedConceptsinScientificComputingFORTRANstillistheprevailingprogramminglanguageinscientificcomputing.Fromthesoftwareengineeringpointofview,itisstronglydesirabletomovetonewerlanguagesthatprovidebettersupportforthedesignofmodularandre-usablesoftware.ExampleswhichareavailableforalargenumberofhardwareplatformsareC++andFortran90.
ThereareanumberofreasonsforthepersistenceofFORTRANinScientificCompu-ting.Firstofall,mostsoftwarepackageshavealonghistoryofdevelopment.Usually,thereisalotof“dustydeck”code,forwhichnodesigndocumentationexistsandwhichthereforeisusedasa“blackbox”.FORTRANcompilersusuallyimplementsophisticatedoptimizationtechniquesandthereforeproducefastobjectcode.Inaddition,thereisanumberofhighperformancenumericallibrariesforFORTRANprogrammers.
ManyengineershesitatetodealwithC++orFortran90becauseoftheefforttolearnanewlanguageandbecausetheyareconcernedaboutefficiency.Re-implementingawholesoftwarepackageinanewlanguageusuallyisinfeasiblebecauseoftimeandmanpowerrestrictions.However,inordertotakefulladvantageofobjectorientedfeatures,themaindatastructureshavetobere-implementedandencapsulated,whichrequiresthewholecodetoberewritten.
Ourapproachthereforehasbeentoidentifyasthebasisfordemonstratingobjectorientedtechniquesamoduleofreasonablebutmanageablesizethatcanbeimplementedasastand-aloneprogram.InTASCFLOW,wefoundthesmootherofthealgebraicmultigridsolvertobeagoodcandidate.ItisbeingimplementedinFortran90andC++.Itslimitedsizeallowsustoconsidertwolanguagesthatoffersupportforobjectorientedprogrammingandtospendtimeininvestigatingimplementationalternativesineachlanguage.Theobjectiveistodemonstratehowobjectorientedconceptscanbeusedina“realworld”scientificapplicationwithoutsacrificingtoomuchefficiency.1.2
ParallelizationofTASCflow
TASCFLOWisaCFDsimulationpackagethatsolvestheNavierStokesequationsin3dspace.Theprogram,whichisdevelopedandmarketedbyASC,canbeappliedtoawiderangeoffluidproblems.TASCFLOW’sfinitevolumediscretizationisbasedonunstructuredgridsthataregeneratedinapreprocessingstepbyfillingspacewithelementsofpredefinedtopologies,e.g.hexahedralandwedgeelements.Whilethediscretizationmethodisbasedonfinitevolumes,thecontrolflowinitsimplementationiselement-based.
Theresultofthediscretizationisasystemoflinearequations,whichisverysparse,structurallysymmetricanddiagonaldominant.ItssolutioniscomputedbyanAlgebraicMultigrid(AMG)solver[Raw94],whichusesamodifiedversionofincompleteLUdecompositioncalledILU0assmoother.Atthecoarsestgridlevel,eitherILU0(withalargernumberofiterations)oradirectsolvercanbeused.
TheparallelversionofTASCFLOWisintendedtorunonNOWsaswellasonMPPs.Bycoveringalargerangeofhardwareplatformswithdifferentpriceandperformancecharacteristics,theparallelprogramisbecominginterestingforalargeusercommunity.WhilegraduallymovingfromsmallscaleNOWstomorepowerfulconfigurationsincludingMPPs,userssavetheirinvestmentsinmodeldesign.1.3LoadBalancingandResourceManagement
Balanceddistributionofloadontoprocessorsisanimportantprerequisiteforparallelefficiency.Therearequiteafewfactorsthatmaycausecomputationalloadtobedis-tributedunevenly.StaticpartitioningofdataforanSPMDparallelprogrammayresultinpartitionsofunequalsize,orpartitionsofequalsizemayrequiredifferentamountsofcomputation.Ifgridsaregenerateddynamically,asisthecasewithallgridlevels(exceptthefinestone)inAMG,loadimbalanceisverylikelytooccur.Besidesunevendistributionofcomputationcausedbythealgorithm,multiprocessandmultiuseropera-tionespeciallyinNOWsaffectloaddistribution.Inheterogeneoussystems,processorsdifferintheircomputationalpoweraswellasinmainmemoryanddiskcapacity,sothatprocesseswillproceedatdifferentspeedsondifferentnodeseveniftheydothesamecomputationandtherearenootherprocessesontheirprocessors.
Inourproject,differentapproachestodynamicloadbalancingareconsidered:–Loadbalancingattheapplicationlevelbyrepartitioningtheproblemdescriptionatruntime.
–loadbalancingatthesytemlevelbyhavingaresourcemanagermigrateprocessesoftheparallelapplicationatneed.
–loadbalancingimplementedasacooperativeprocessofresourcemanagerandapplication.Theresourcemanagerprovidesinformationaboutthecurrentresourceutilizationandoffersamigrationservicethatallowstheapplicationtorequesttaskstobemoved.
2ProgressReport
Asastartingpointforourresearchinsoftwareengineering(SWE)methods,wehaveadoptedASC’ssetofguidelinesandprocedures,whichhavebeenappliedinthedesignandimplementationofTASCFLOW.Theanalysisoftheserialprogramhasbeenalmostfinished.Astandardizedformatfordocumentingitsdesignisbeingdefined.Asalreadymentioned,themosteffectivesourceofinformationaboutTASCFLOW’sdesignhavebeenourregularmeetingswithdevelopersfromASC.Wehavealsoconsideredaninteractiveparallelizer,FORGE[Lev92],which,however,untilnowcouldonlybeappliedsuccessfullytosomeprogramsthataremuchlesscomplexthanTASCFLOW.
FORGEidentifiesthemostsignificantloopsarebyeitherusingprofilinginformation,orcheckingthecodeforthedeepestloopnesting.Oncetheloopshavebeenchosen,thearraysreferencedinsideofthemarepartitionedanddistributedaccordingtothepartitions.Theparallelprocessesthenrunthesameprogrambutonlyonasubsetofthepartitioneddatastructuresfollowingtheso-calledownercomputesrule.
Theadvantageofthistypeoftoolisthattheusercangetabetterunderstandingoftheprogramthatheisabouttoparallelize.Healsoistakenofftheburdentoexplicitlyprogrammessagepassingcode.Ontheotherhandheanyhowhastohaveagoodunderstandingofhowmessagepassingreallyworksbecausethetoolsarenotyetatapointwheretheycanproduceefficientcode.NeithercantheyreallyworkonverycomplexpackagesasforexampleTASCflow.
Basedontheanalysisoftheserialprogram,aconceptforparallelizingTASCFLOWaccordingtotheSPMDparadigmhasbeendefinedanddocumentedattheleveloflogicalobjects.Asafirststeptowardsitsimplementation,apartitioningproce-durehasbeenintegratedintoTASCFLOW.WeusethepublicdomaingraphpartitioningpackageMeTiS[MET95]toassignthenodesofthegridtopartitions.Inordertoreducethenumberofsynchronizationpointsaswellastheamountofdatatobeexchanged,wehavedecidedtoreplicateelementswhosenodesbelongtodifferentpartitionsoneachofthepartitionsinvolved.
3ConclusionandFutureWork
SEMPAaddressestheproblemofsoftwareengineeringfordistributedmemoriesfromapointofviewthatemphasizespracticalapplicabilityofthemethodsthatarebeingproposed.Thereforeitisveryimportanttohavea“realworld”testcasefromwhichrequirementsfornewSWEmethodsaswellasfornewtoolscanbederivedandthenevaluatedinpractice.Thesameistrueforourresearchinloadbalancingandresourcemanagement.
TheresultoftheprojectwillbeprototypeimplementationsoftheparallelCFDsimulationpackageandtheresourcemanagerforNOWswhichareintendedtobecomeproductsaftertheprojecthasfinished.
Aswehavebeenreportingonworkinprogress,ouremphasishasbeenonmotivationandobjectives.Weexpecttobeabletoreportonsomeimplementationresultsbyspring1996.
References
[KN94]JosephA.KaplanundMichaelLNelson.AComparisonofQueueing,Clusterand
DistributedComputingSystems.Bericht,NASALangleyResearchCenter(Juni1994).
[Lev92]J.Levesque.FORGE90andHighPerformanceFortran(HPF).Bericht,Applied
ParallelResearch(Juni1992).
[MET95]METIS:UnstructuredGraphPartitioningandSparseMatrixOrderingSystem“.Ge-”
orgeKarypisandVipinKumar,UniversityofMinnesota(1995).
[Raw94]MichaelRaw.ACoupledAlgebraicMultigridMethodforthe3DNavierStokes
Equations(1994).
AThisarticlewasprocessedusingtheLTEXmacropackagewithLLNCSstyle
因篇幅问题不能全部显示,请点此查看更多更全内容