Optimal processor allocation for sort-last compositing under(4)

来源：网络收集时间：2026-07-30

导读： No.of renderers 16 64 128CompositingprocessorsNumberTimeNumberTimeNumber Time1150.24s630.24s1270.25s2100.24s420.26s850.26sMaximumnumberofcompositingoperationsperprocessor3581315214463752210.46s0.46s1

No.of

renderers

128CompositingprocessorsNumberTimeNumberTimeNumber

Time1150.24s630.24s1270.25s2100.24s420.26s850.26sMaximumnumberofcompositingoperationsperprocessor3581315214463752210.46s0.46s1.56s1.56s3.11s2721994210.46s0.47s1.35s1.35s3.80s6.91s13.8s554218189440.46s0.69s1.57s1.58s3.13s7.34s7.34s80100214s214s

Table1.Numberofcompositingprocessorsusedandcompositingtimeafter rstimage(toignorepipelinestartupoverhead)forafullBSP-treewith16,64,or128renderingprocessors.

SL-fullversusSL-sparse:NotethatonecaneasilychangetheSL-fullimplementedabovewithaSL-sparse.Allthecom-plexityofmanipulatingsparseimagescanbelocalizedinsidethefunctionsthatsendandreceivedtheimages.Formaximumperformanceand exibility,therenderingalgorithmsshouldgenerateimagesinscan-lineorder,andalsoprovideacompactrepresentationforscan-lines(i.e.,onlythefullpixelsarerepresented).

5.PERFORMANCERESULTS

Forourexperiments,weusedanIntelParagonXP/SrunningSUNMOS(installedatSandiaNationalLaboratories),andweuseditinNXcompatibilitymode.WestudiedthemaximumframeratethatcanbeachievedwithourpipelinedevaluationschemeforagivenfullBSP-tree,andhowthisframeratedegradesasweincrease(i.e.,themaximumnumberofcompositingoperationsperformedononecompositingprocessor).Werantestsoverthreedifferentrenderingcon gurations:16,64and128renderingprocessors;andseveralvariationson,leadingtoseveralcompositingcon gurations.Notethatbyincreasingfromto(thenumberofcompositingoperations)thenumberofcompositingprocessorsdecreasesfromto.Table1summarizestheperformanceoneachofthesecon gurations;Figure5showsgraphicallyhowthecompositingtimechangesasincreases.Inthesetests,ourprimaryinterestwastostudythecorrelationofandthecompositingtime.Toachievethecorrecteffect,weneedtomakethecompositingclusteroperateatitsmaximumspeed(foragiven).Thiswasdonebymakingtherenderingprocessorsrenderasingleimage,andsimplysendthesameimageoneverysubsequentrequest.Thisisthescenariowherecompositingclusteristhebottleneckoftherenderingprocess.

ThetimesreportedintheTable1arethosereportedbythePVRcollectornode,andrepresentactualwall-clocktimes.Thatis,ifrenderingwasfastenough,andtheimagescouldbepushedonaframebufferbythecollector,itwouldbetheactualframerateauserwouldget.Inparticular,nothingelseneedstobedonetotheimagestopreparethemforpresentation;infact,thefullycomposedimageisstoredonthenodethatcontaintherootofthecompositingtree.Notethatframe-to-framecoherencedoesnotmatter,sincewearenotexploitingasparseimagerepresentation(theseexperimentsonlyshowtheperformanceofaSL-fullarchitecture).

Thecompositingcapacityrequiredofacompositingpipelineisde nedasthenumberofframesthatneedtobecomposedperunittime.Notethatbyvaryingthenumberofrenderingnodesfrom16to128,essentially,wemakethecompositingtreeworkharder.With16renderingnodesproducingimagesat4framespersecond,weneedacompositingcapacityof64framespersecondinourcompositingpipeline.TheIntelParagonhasveryslowprocessorsbytoday’sstandards,actuallyusingourimagerepresentation(anRGBaimageisstoredasfour oatsperpixel),ittakes0.22secondstoalphacompositetwo250250images.So,asingleprocessorcancomposite4.5framespersecond.Toobtainacompositingpipelinewithcapacityof64framesasecondneeds14.2(64/4.5)processors.Hence,inourexperimentalsetup,thecompositingpipelineformsthebottleneckevenatthelowestrenderingspeeds(16renderingnodesat4framesasecond).

ThefollowingobservationscanbedrawnfromthedatainTable1:

Asincreases,theframeratesdecreaseaccordingly,sincethecompositingcapacitydecreases.Alsowecanseethatourtreepartitioningschemeiseffectiveindistributingtheload.Onecanseeclearlythe niteboundarieswhereitispossibletosaveaprocessorandstillachievethesameframerate(e.g.,whenisequalto8or13).Actuallywithourpartitioningalgorithm,onecanreliablypredicttheframeratebased(almost)solelyonthecompositingcapacityofagivencompositingtree.

Inourmeasurements,thiswasaccomplishedbymakingComputeImagereturnapre-computedimageimmediatelyuponcall.

In this paper, we consider a parallel rendering model that exploits the fundamental distinction between rendering and compositing operations, by assigning processors from specialized pools for each of these operations. Our motivation is to support the para

Time in seconds1086

001020304050K6070809010016 Rendering ProcessorsFigure5.Variationofoverallcompositingtimewithacrossthethreebenchmarkedcon gurations.Noticethatthecom-positingcapacityneededincreaseswiththenumberofrenderingnodes.Inordertokeepthedesiredframerate,oneneedstoincreasethenumberofprocessorsallocatedtothecompositingtree.Bykeepingconstant,thisisachievedautomatically,sincethenumberofcompositingprocessorsneededalsogrow,andcanbecomputedbyouroptimalpartitioningalgorithm.

Ourasynchronousevaluationofthecompositingtreehidesalmostallthecommunicationcost.Infact,theframeratesofthepipelineareindependentofitsdepth.Furthermore,theoverallspeedofthepipelineisdirectlyrelatedtothemaximumnumberofcompositionsperformedbyeachnode(relatedto).Forinstance,wheneveryprocessorperformsonecompositingoperation,theframerateis4framespersecond(i.e.,0.25sperimage),extremelyclosetothebestachievableframerateof4.5.Ascanbeseenfromthedata,italsodegradesgracefully.

6.RELATEDWORK

Mostparallelrenderingwork(forbothgeometricandvolumetricprimitives)ongeneralpurposeMIMDmachineshaveusedthesameprocessorsforbothphases.Infact,manytechniqueshavebeendevisedtoeffectivelyinterleavethetwophasesononeprocessor.Forexample,involumerenderingusingtheBinary-Swapmethod,allprocessessynchronizebetweenrenderingandcompositingphasesaswellasduringcomposition.Forpolygonrendering,themethoddescribedbyEllsworthchangesstateslocallybetweenthetransformationandtherasterizationphases,avoidingglobalsynchronization.Incontrast,usingprocessorsperformspecializedtasks,therenderingandcompositingphasescanoverlapintime,andinfact,canbepipelined.Hardwarebuildershavebeenusingdualtypecon gurationsforalongtime.Thedistinctionbetweenthetwocategorieshasbee …… 此处隐藏：6463字，全部文档内容请下载后查看。喜欢就下载吧 ……

Optimal processor allocation for sort-last compositing under(4).doc 将本文的Word文档下载到电脑，方便复制、编辑、收藏和打印

下载这篇word文档

本文链接：https://www.jiaowen.net/wenku/117377.html（转载请注明文章来源）

上一篇：【音乐教师】2019年中小学特岗音乐教师招聘考试音乐学科专业知识
下一篇：《Altium Designer winter 09电路设计案例教程》-第5讲第3章多