搜索
您的当前位置:首页Software Engineering Methods for Parallel and Distributed Scientific Computing

Software Engineering Methods for Parallel and Distributed Scientific Computing

来源:智榕旅游
SoftwareEngineeringMethodsforParalleland

DistributedScientificComputing

P.Luksch,U.Maier,S.Rathmayer,M.Weidmann

Lehrstuhlf¨urRechnertechnikundRechnerorganisation/Parallelrechnerarchitektur

(LRR-TUM)Institutf¨urInformatikTechnischeUniversit¨atM¨unchen

D-80290M¨unchen

e-mail:luksch,maier,maiers,weidmann@informatik-tu.muenchen.de

WWW:http://wwwbode.informatik.tu-muenchen.de/

Tel.:(089)2105-8164;Fax:(089)2105-8232

Abstract.Inthispaper,wepresentaninterdisciplinaryresearchprojectwhosecentralobjectiveistodevelopnewsoftwareengineering(SWE)methodsfordistributedmemoryparallelscientificcomputing.Ouremphasisisonputtingintopracticeandevaluatingtheproposedmethods.ThemaintestcasefortheirdefinitionandevaluationistheparallelizationofanindustrialCFDsoftwarepackage.Amajorconcernistoachievemaximumportability,i.e.coveralargenumberoftargetsystemsrangingfromnetworksofworkstations(NOWs)tomassivelyparallelsystems(MPPs).InordertooptimizeutilizationofNOWs,aresourcemanagementsystemisbeingdesigned,whichrunsparallelapplicationsinbatchmodeanddynamicallyassignsavailableresourcestothetasksoftheparallelapplications.Besidesgivinganoverviewoftheobjectivesfollowedintheproject,wegiveaprogressreport,whichdetailstheachievementsreachedsofar.

1Objectives

TheinterdisciplinaryresearchprojectSEMPA(SarallelScientificA

ngineeringM

Bundesministeriumf¨urBildung,Wissenschaft,ForschungundTechnologie

Academia

SFB 342 TPA1DFGGerman Science FoundationIndustry

ProgrammingEnvironments +ToolsComputerScienceThe Tool-SetInteractive on-lineCoCheckcontrol of scientificcomputationsTASCflow3d CFD packageNavier-Stokesequationsunstructured gridsASCAMGAdvancedAlgebraic ScientificMultigrid ComputingSolverGmbHVIPERPriority Program CFDDFGParallelizationTU Münchenof scientificLehrstuhl fürFORTWIHRapplicationsRechnertechnikBavarian ScienceFoundatiionBFSundParNSFLEXRechnerorganisationPAR-CVDMUMUSBMBFLRR-TUMFASTESTInstitut fürComputer-anwendungenICA IIIMechanicalEngineeringSEMPAAdaptiveMultigridMethodsGENIASSoftwareGmbHAppliedNumericalMathematicsUGUni StuttgartToolboxfor adaptivesolution for partialdifferential equationson unstructured meshesin 2D and 3DCODINEBatch Queueing Systemfor NOW’sComputerScienceColor Code: Software PackageProjectFunding Institution

Fig.1.partnersinvolvedinSEMPA

1.1SoftwareEngineeringMethodsforParallelScientificApplications

UnderstandingComplexSoftwareSystemsInparallelscientificcomputing,softwareengineersusuallyarefacedwithanexistingprogramorwithexistingmodules,typicallywritteninFORTRAN77,whichtheyareexpectedtoparallelizeforexecutiononadistributedmemorymultiprocessor(NOWorMPP).

Thefirststepinparallelizinganexistingsoftwarepackageistoachieveabasicunderstandingofitscontrolandatastructures,whichisnecessarytodefineaconceptforparallelization.Inaddition,comprehensiveprofilinghastobedonewitharepresentativesetoftestcasesinordertoidentifythecomputationallymostintensivemodules,i.e.themostpromisingcandidatesforparallelization.

Theknowledgegainedabouttheprogramstructureshouldbedocumentedinanappropriateformforfuturereference.Designdocumentsbytheauthorsoftheprogramoftenareunavailable,sincemanyscientificcodesaretheresultofalongdevelopmentprocessinvolvingseveralgenerationsofprogrammers.Evenifgooddesigndocumenta-tionisavailable,asisthecasewithTASCFLOW,itturnsoutthatdocumentationwrittenasareferenceforCFDdevelopersisinmanyaspectsinappropriateforthecomputerscientistfacedwiththeproblemofparallelizingthecode.

WethereforedecidedtoorganizeaseriesofmeetingswhereCFDdevelopersfromASCexplainedthealgorithmsandtheirimplementationinTASCFLOWtocomputerscientistsatLRR-TUM.Asaresultofthesemeetings,designdocumentshavebeenwrittenatLRR-TUMandreviewedbyASC.Wefoundthisprocedureveryeffectiveforacquiringtheknow-howthatisneededbeabletoanalyzetheprogramandtosetupaparallelizationconcept.Theimmediatereviewofourdocumentshashelpedtofixsourcesofmisconceptionataveryearlystage.

Itis,however,nottoocommoninscientificcomputingthatcodedeveloperscanprovideextensivesupportforunderstandingaprogram.Analternativewayforobtai-ninginsightintothestructureoftheprogramunderconsiderationistouseinteractiveparallelizationtools.Theycanprovidevaluableinsightaboutdataandcontrolflowdependencies,andprovidedetailedprofilinginformation.

MostofourinsightintoTASCFLOWresultsfromthedesigndocumentsourjointmeetingswithASC,whichcertainlyisduetothefactthatTASCFLOWhasbeenimple-mentedfollowingsoftwareengineeringguidelinesdefinedbythedevelopmentteam.DocumentationTheresultofanalyzingtheserialprogramaswellasconceptsforitsparallelizationhavetobedocumentedindetailtoprovideacommonrepositoryofinformationforfutureworkintheproject.Tobeuseful,thesedocumentsmustfol-lowstandardsthatwillhavetobedefinedaccordingtotherequirementsoftheintendedaudience.Ourexperienceisthattherequirementsfordesigndocumentationthatisinten-dedasareferenceforCFDdevelopersandnumericalanalystsdifferconsiderablyfromtherequirementsofcomputerscientistsinvolvedinparallelization.Havingdocumentsreviewedbyprojectmembersnotdirectlyinvolvedintheprocessbeingdocumentedhasprovedtobeveryhelpfulinassuringhigh-qualitydocumentation.

GeneralApproachestoParallelizationofScientificComputationsFormostscien-tificcomputations,SPMDlikeparallelismisanaturalchoicebecauseitscaleswellwiththesizeoftheproblem.BeingabletosolvelargerproblemsusuallyisthestrongestmotivationforparallelizingsoftwareinScientificComputing,especiallyinCFD.

Manyscientificapplicationsaresimilarinthestructureoftheirlogicalobjectsbutdifferintheirimplementation.Forexample,everyCFDpackageimplementstheconceptofagridandasystemoflinearequations.AnumberofdecisionsinimplementingSPMDlikeparallelismcanbeexpressedintermsoftheselogicalobjectsandthereforecanbegeneralizedtoacertainextend.Therefore,theparallelizationconceptforTASCFLOWhasbeendefinedattheleveloflogicalobjects.

Dataparallelismrequirestheproblemdescriptiontobepartitioned.Ithastobedecidesonwhichlogicalobjectsthedistributionistobebased.Datadistriubtionitselfisagraphpartitioningproblemforwhichanumberofsoftwarepackagesareavailable.

StandardsforProgramDevelopmentGuidelineshavetobesetupthatdefinethethestagesofthedevelopmentprocess:functionalspecification,highlevelanddetaileddesign,implementation,testandverification,documentation,andrelease.Itisimportant

thateachstageofspecificationanddesigniscarefullyreviewedbeforeproceedingtothenextstep.

StartingwiththesoftwareengineeringguidelinesapprovedinthedesignofTASCFLOW,wearedefiningnewmethodsthataddressissuesrelatedtoparallelism.Object-orientedConceptsinScientificComputingFORTRANstillistheprevailingprogramminglanguageinscientificcomputing.Fromthesoftwareengineeringpointofview,itisstronglydesirabletomovetonewerlanguagesthatprovidebettersupportforthedesignofmodularandre-usablesoftware.ExampleswhichareavailableforalargenumberofhardwareplatformsareC++andFortran90.

ThereareanumberofreasonsforthepersistenceofFORTRANinScientificCompu-ting.Firstofall,mostsoftwarepackageshavealonghistoryofdevelopment.Usually,thereisalotof“dustydeck”code,forwhichnodesigndocumentationexistsandwhichthereforeisusedasa“blackbox”.FORTRANcompilersusuallyimplementsophisticatedoptimizationtechniquesandthereforeproducefastobjectcode.Inaddition,thereisanumberofhighperformancenumericallibrariesforFORTRANprogrammers.

ManyengineershesitatetodealwithC++orFortran90becauseoftheefforttolearnanewlanguageandbecausetheyareconcernedaboutefficiency.Re-implementingawholesoftwarepackageinanewlanguageusuallyisinfeasiblebecauseoftimeandmanpowerrestrictions.However,inordertotakefulladvantageofobjectorientedfeatures,themaindatastructureshavetobere-implementedandencapsulated,whichrequiresthewholecodetoberewritten.

Ourapproachthereforehasbeentoidentifyasthebasisfordemonstratingobjectorientedtechniquesamoduleofreasonablebutmanageablesizethatcanbeimplementedasastand-aloneprogram.InTASCFLOW,wefoundthesmootherofthealgebraicmultigridsolvertobeagoodcandidate.ItisbeingimplementedinFortran90andC++.Itslimitedsizeallowsustoconsidertwolanguagesthatoffersupportforobjectorientedprogrammingandtospendtimeininvestigatingimplementationalternativesineachlanguage.Theobjectiveistodemonstratehowobjectorientedconceptscanbeusedina“realworld”scientificapplicationwithoutsacrificingtoomuchefficiency.1.2

ParallelizationofTASCflow

TASCFLOWisaCFDsimulationpackagethatsolvestheNavierStokesequationsin3dspace.Theprogram,whichisdevelopedandmarketedbyASC,canbeappliedtoawiderangeoffluidproblems.TASCFLOW’sfinitevolumediscretizationisbasedonunstructuredgridsthataregeneratedinapreprocessingstepbyfillingspacewithelementsofpredefinedtopologies,e.g.hexahedralandwedgeelements.Whilethediscretizationmethodisbasedonfinitevolumes,thecontrolflowinitsimplementationiselement-based.

Theresultofthediscretizationisasystemoflinearequations,whichisverysparse,structurallysymmetricanddiagonaldominant.ItssolutioniscomputedbyanAlgebraicMultigrid(AMG)solver[Raw94],whichusesamodifiedversionofincompleteLUdecompositioncalledILU0assmoother.Atthecoarsestgridlevel,eitherILU0(withalargernumberofiterations)oradirectsolvercanbeused.

TheparallelversionofTASCFLOWisintendedtorunonNOWsaswellasonMPPs.Bycoveringalargerangeofhardwareplatformswithdifferentpriceandperformancecharacteristics,theparallelprogramisbecominginterestingforalargeusercommunity.WhilegraduallymovingfromsmallscaleNOWstomorepowerfulconfigurationsincludingMPPs,userssavetheirinvestmentsinmodeldesign.1.3LoadBalancingandResourceManagement

Balanceddistributionofloadontoprocessorsisanimportantprerequisiteforparallelefficiency.Therearequiteafewfactorsthatmaycausecomputationalloadtobedis-tributedunevenly.StaticpartitioningofdataforanSPMDparallelprogrammayresultinpartitionsofunequalsize,orpartitionsofequalsizemayrequiredifferentamountsofcomputation.Ifgridsaregenerateddynamically,asisthecasewithallgridlevels(exceptthefinestone)inAMG,loadimbalanceisverylikelytooccur.Besidesunevendistributionofcomputationcausedbythealgorithm,multiprocessandmultiuseropera-tionespeciallyinNOWsaffectloaddistribution.Inheterogeneoussystems,processorsdifferintheircomputationalpoweraswellasinmainmemoryanddiskcapacity,sothatprocesseswillproceedatdifferentspeedsondifferentnodeseveniftheydothesamecomputationandtherearenootherprocessesontheirprocessors.

Inourproject,differentapproachestodynamicloadbalancingareconsidered:–Loadbalancingattheapplicationlevelbyrepartitioningtheproblemdescriptionatruntime.

–loadbalancingatthesytemlevelbyhavingaresourcemanagermigrateprocessesoftheparallelapplicationatneed.

–loadbalancingimplementedasacooperativeprocessofresourcemanagerandapplication.Theresourcemanagerprovidesinformationaboutthecurrentresourceutilizationandoffersamigrationservicethatallowstheapplicationtorequesttaskstobemoved.

2ProgressReport

Asastartingpointforourresearchinsoftwareengineering(SWE)methods,wehaveadoptedASC’ssetofguidelinesandprocedures,whichhavebeenappliedinthedesignandimplementationofTASCFLOW.Theanalysisoftheserialprogramhasbeenalmostfinished.Astandardizedformatfordocumentingitsdesignisbeingdefined.Asalreadymentioned,themosteffectivesourceofinformationaboutTASCFLOW’sdesignhavebeenourregularmeetingswithdevelopersfromASC.Wehavealsoconsideredaninteractiveparallelizer,FORGE[Lev92],which,however,untilnowcouldonlybeappliedsuccessfullytosomeprogramsthataremuchlesscomplexthanTASCFLOW.

FORGEidentifiesthemostsignificantloopsarebyeitherusingprofilinginformation,orcheckingthecodeforthedeepestloopnesting.Oncetheloopshavebeenchosen,thearraysreferencedinsideofthemarepartitionedanddistributedaccordingtothepartitions.Theparallelprocessesthenrunthesameprogrambutonlyonasubsetofthepartitioneddatastructuresfollowingtheso-calledownercomputesrule.

Theadvantageofthistypeoftoolisthattheusercangetabetterunderstandingoftheprogramthatheisabouttoparallelize.Healsoistakenofftheburdentoexplicitlyprogrammessagepassingcode.Ontheotherhandheanyhowhastohaveagoodunderstandingofhowmessagepassingreallyworksbecausethetoolsarenotyetatapointwheretheycanproduceefficientcode.NeithercantheyreallyworkonverycomplexpackagesasforexampleTASCflow.

Basedontheanalysisoftheserialprogram,aconceptforparallelizingTASCFLOWaccordingtotheSPMDparadigmhasbeendefinedanddocumentedattheleveloflogicalobjects.Asafirststeptowardsitsimplementation,apartitioningproce-durehasbeenintegratedintoTASCFLOW.WeusethepublicdomaingraphpartitioningpackageMeTiS[MET95]toassignthenodesofthegridtopartitions.Inordertoreducethenumberofsynchronizationpointsaswellastheamountofdatatobeexchanged,wehavedecidedtoreplicateelementswhosenodesbelongtodifferentpartitionsoneachofthepartitionsinvolved.

3ConclusionandFutureWork

SEMPAaddressestheproblemofsoftwareengineeringfordistributedmemoriesfromapointofviewthatemphasizespracticalapplicabilityofthemethodsthatarebeingproposed.Thereforeitisveryimportanttohavea“realworld”testcasefromwhichrequirementsfornewSWEmethodsaswellasfornewtoolscanbederivedandthenevaluatedinpractice.Thesameistrueforourresearchinloadbalancingandresourcemanagement.

TheresultoftheprojectwillbeprototypeimplementationsoftheparallelCFDsimulationpackageandtheresourcemanagerforNOWswhichareintendedtobecomeproductsaftertheprojecthasfinished.

Aswehavebeenreportingonworkinprogress,ouremphasishasbeenonmotivationandobjectives.Weexpecttobeabletoreportonsomeimplementationresultsbyspring1996.

References

[KN94]JosephA.KaplanundMichaelLNelson.AComparisonofQueueing,Clusterand

DistributedComputingSystems.Bericht,NASALangleyResearchCenter(Juni1994).

[Lev92]J.Levesque.FORGE90andHighPerformanceFortran(HPF).Bericht,Applied

ParallelResearch(Juni1992).

[MET95]METIS:UnstructuredGraphPartitioningandSparseMatrixOrderingSystem“.Ge-”

orgeKarypisandVipinKumar,UniversityofMinnesota(1995).

[Raw94]MichaelRaw.ACoupledAlgebraicMultigridMethodforthe3DNavierStokes

Equations(1994).

AThisarticlewasprocessedusingtheLTEXmacropackagewithLLNCSstyle

因篇幅问题不能全部显示,请点此查看更多更全内容

Top