教学文库网 - 权威文档分享云平台
您的当前位置:首页 > 文库大全 > 外语考试 >

Optimal processor allocation for sort-last compositing under(4)

来源:网络收集 时间:2026-05-19
导读: No.of renderers 16 64 128CompositingprocessorsNumberTimeNumberTimeNumber Time1150.24s630.24s1270.25s2100.24s420.26s850.26sMaximumnumberofcompositingoperationsperprocessor3581315214463752210.46s0.46s1

No.of

renderers

16

64

128CompositingprocessorsNumberTimeNumberTimeNumber

Time1150.24s630.24s1270.25s2100.24s420.26s850.26sMaximumnumberofcompositingoperationsperprocessor3581315214463752210.46s0.46s1.56s1.56s3.11s2721994210.46s0.47s1.35s1.35s3.80s6.91s13.8s554218189440.46s0.69s1.57s1.58s3.13s7.34s7.34s80100214s214s

Table1.Numberofcompositingprocessorsusedandcompositingtimeafter rstimage(toignorepipelinestartupoverhead)forafullBSP-treewith16,64,or128renderingprocessors.

SL-fullversusSL-sparse:NotethatonecaneasilychangetheSL-fullimplementedabovewithaSL-sparse.Allthecom-plexityofmanipulatingsparseimagescanbelocalizedinsidethefunctionsthatsendandreceivedtheimages.Formaximumperformanceand exibility,therenderingalgorithmsshouldgenerateimagesinscan-lineorder,andalsoprovideacompactrepresentationforscan-lines(i.e.,onlythefullpixelsarerepresented).

5.PERFORMANCERESULTS

Forourexperiments,weusedanIntelParagonXP/SrunningSUNMOS(installedatSandiaNationalLaboratories),andweuseditinNXcompatibilitymode.WestudiedthemaximumframeratethatcanbeachievedwithourpipelinedevaluationschemeforagivenfullBSP-tree,andhowthisframeratedegradesasweincrease(i.e.,themaximumnumberofcompositingoperationsperformedononecompositingprocessor).Werantestsoverthreedifferentrenderingcon gurations:16,64and128renderingprocessors;andseveralvariationson,leadingtoseveralcompositingcon gurations.Notethatbyincreasingfromto(thenumberofcompositingoperations)thenumberofcompositingprocessorsdecreasesfromto.Table1summarizestheperformanceoneachofthesecon gurations;Figure5showsgraphicallyhowthecompositingtimechangesasincreases.Inthesetests,ourprimaryinterestwastostudythecorrelationofandthecompositingtime.Toachievethecorrecteffect,weneedtomakethecompositingclusteroperateatitsmaximumspeed(foragiven).Thiswasdonebymakingtherenderingprocessorsrenderasingleimage,andsimplysendthesameimageoneverysubsequentrequest.Thisisthescenariowherecompositingclusteristhebottleneckoftherenderingprocess.

ThetimesreportedintheTable1arethosereportedbythePVRcollectornode,andrepresentactualwall-clocktimes.Thatis,ifrenderingwasfastenough,andtheimagescouldbepushedonaframebufferbythecollector,itwouldbetheactualframerateauserwouldget.Inparticular,nothingelseneedstobedonetotheimagestopreparethemforpresentation;infact,thefullycomposedimageisstoredonthenodethatcontaintherootofthecompositingtree.Notethatframe-to-framecoherencedoesnotmatter,sincewearenotexploitingasparseimagerepresentation(theseexperimentsonlyshowtheperformanceofaSL-fullarchitecture).

Thecompositingcapacityrequiredofacompositingpipelineisde nedasthenumberofframesthatneedtobecomposedperunittime.Notethatbyvaryingthenumberofrenderingnodesfrom16to128,essentially,wemakethecompositingtreeworkharder.With16renderingnodesproducingimagesat4framespersecond,weneedacompositingcapacityof64framespersecondinourcompositingpipeline.TheIntelParagonhasveryslowprocessorsbytoday’sstandards,actuallyusingourimagerepresentation(anRGBaimageisstoredasfour oatsperpixel),ittakes0.22secondstoalphacompositetwo250250images.So,asingleprocessorcancomposite4.5framespersecond.Toobtainacompositingpipelinewithcapacityof64framesasecondneeds14.2(64/4.5)processors.Hence,inourexperimentalsetup,thecompositingpipelineformsthebottleneckevenatthelowestrenderingspeeds(16renderingnodesat4framesasecond).

ThefollowingobservationscanbedrawnfromthedatainTable1:

Asincreases,theframeratesdecreaseaccordingly,sincethecompositingcapacitydecreases.Alsowecanseethatourtreepartitioningschemeiseffectiveindistributingtheload.Onecanseeclearlythe niteboundarieswhereitispossibletosaveaprocessorandstillachievethesameframerate(e.g.,whenisequalto8or13).Actuallywithourpartitioningalgorithm,onecanreliablypredicttheframeratebased(almost)solelyonthecompositingcapacityofagivencompositingtree.

Inourmeasurements,thiswasaccomplishedbymakingComputeImagereturnapre-computedimageimmediatelyuponcall.

In this paper, we consider a parallel rendering model that exploits the fundamental distinction between rendering and compositing operations, by assigning processors from specialized pools for each of these operations. Our motivation is to support the para

16

14

12

Time in seconds1086

4

2

001020304050K6070809010016 Rendering ProcessorsFigure5.Variationofoverallcompositingtimewithacrossthethreebenchmarkedcon gurations.Noticethatthecom-positingcapacityneededincreaseswiththenumberofrenderingnodes.Inordertokeepthedesiredframerate,oneneedstoincreasethenumberofprocessorsallocatedtothecompositingtree.Bykeepingconstant,thisisachievedautomatically,sincethenumberofcompositingprocessorsneededalsogrow,andcanbecomputedbyouroptimalpartitioningalgorithm.

Ourasynchronousevaluationofthecompositingtreehidesalmostallthecommunicationcost.Infact,theframeratesofthepipelineareindependentofitsdepth.Furthermore,theoverallspeedofthepipelineisdirectlyrelatedtothemaximumnumberofcompositionsperformedbyeachnode(relatedto).Forinstance,wheneveryprocessorperformsonecompositingoperation,theframerateis4framespersecond(i.e.,0.25sperimage),extremelyclosetothebestachievableframerateof4.5.Ascanbeseenfromthedata,italsodegradesgracefully.

6.RELATEDWORK

Mostparallelrenderingwork(forbothgeometricandvolumetricprimitives)ongeneralpurposeMIMDmachineshaveusedthesameprocessorsforbothphases.Infact,manytechniqueshavebeendevisedtoeffectivelyinterleavethetwophasesononeprocessor.Forexample,involumerenderingusingtheBinary-Swapmethod,allprocessessynchronizebetweenrenderingandcompositingphasesaswellasduringcomposition.Forpolygonrendering,themethoddescribedbyEllsworthchangesstateslocallybetweenthetransformationandtherasterizationphases,avoidingglobalsynchronization.Incontrast,usingprocessorsperformspecializedtasks,therenderingandcompositingphasescanoverlapintime,andinfact,canbepipelined.Hardwarebuildershavebeenusingdualtypecon gurationsforalongtime.Thedistinctionbetweenthetwocategorieshasbee …… 此处隐藏:6463字,全部文档内容请下载后查看。喜欢就下载吧 ……

Optimal processor allocation for sort-last compositing under(4).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.jiaowen.net/wenku/117377.html(转载请注明文章来源)
Copyright © 2020-2025 教文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:78024566 邮箱:78024566@qq.com
苏ICP备19068818号-2
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)