教学文库网 - 权威文档分享云平台
您的当前位置:首页 > 文库大全 > 资格考试 >

2012--Super-Bit Locality-Sensitive Hashing(3)

来源:网络收集 时间:2026-02-09
导读: Foreachdataset,wefurtherconductasimplepreprocessingstepasin[12],i.e.mean-centeringeachdatasample,soastoobtainadditionalmean-centeredversionsoftheabovedatasets,PhotoTourismSIFT(mean),andMIR-Flickr(mea

Foreachdataset,wefurtherconductasimplepreprocessingstepasin[12],i.e.mean-centeringeachdatasample,soastoobtainadditionalmean-centeredversionsoftheabovedatasets,PhotoTourismSIFT(mean),andMIR-Flickr(mean).Theexperimentonthesemean-centereddatasetswilltesttheperformanceofSBLSHwhentheanglesofdatapairsarenotconstrainedin(0,π/2].

3.1.2TheEffectofSuper-BitDepthNandCodeLengthK

Ineachdataset,foreach(N,K)pair,i.e.Super-BitdepthNandcodelengthK,werandomlysample10,000data,whichinvolveabout50,000,000datapairs,andrandomlygenerateSRP-LSHfunctions,togetherwithSBLSHfunctionsbyorthogonalizingthegeneratedSRPinbatches.Werepeatthetestfor10times,andcomputethemeansquarederror(MSE)oftheestimation.

TotesttheeffectofSuper-BitdepthN,we xK=120forPhotoTourismSIFTandK=3000forMIR-Flickrrespectively,andtotesttheeffectofcodelengthK,we xN=120forPhotoTourismSIFTandN=3000forMIR-Flickr.Werepeattheexperimentonthemean-centeredversionsofthesedatasets,anddenotethemethodsbyMean+SRP-LSHandMean+SBLSHrespectively.1

2http://phototour.cs.washington.edu/patches/default.htmhttp://users.ecs.soton.ac.uk/jsh2/mirflickr/

SRP-LSH SBLSH Mean+SRP-LSH Mean+SBLSH

SRP-LSH SRP-LSH SBLSH SBLSH Mean+SRP-LSH Mean+SRP-LSH Mean+SBLSH Mean+SBLSH

Figure3:TheeffectofSuper-BitdepthN(1<N≤min(d,K))with xedcodelengthK(K=N×L),andtheeffectofcodelengthKwith xedSuper-BitdepthN.

Table1:ANNretrievalresults,measuredbyproportionofgoodneighborswithinquery’sHammingballofradius3.NotethatthecodelengthK=30.

Data

NotreDame

HalfDome

TreviE2LSH0.4675±0.09000.4503±0.07120.4661±0.0849SRP-LSH0.7500±0.05250.7137±0.04130.7591±0.0464SBLSH0.7845±0.03520.7535±0.02760.7891±0.0329

Figure3showsthatwhenusing xedcodelengthK,astheSuper-BitdepthNgetslarger(1<N≤min(d,K)),theMSEofSBLSHgetssmaller,andthegapbetweenSBLSHandSRP-LSHgetslarger.Particularly,whenN=K,over30%MSEreductioncanbeobservedonallthedatasets.Thisveri esCorollary2thatwhenapplyingSBLSH,thebeststrategywouldbetosettheSuper-BitdepthNaslargeaspossible,i.e.min(d,K).Aninformalexplanationtothisinterestingphenomenonisthatasthedegreeoforthogonalityoftherandomprojectionsgetshigher,thecodebecomesmoreandmoreinformative,andthusprovidesbetterestimate.Ontheotherhand,itcanbeobservedthattheperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.Thisshowsthatevenwhentheanglebetweeneachdatapairisnotconstrainedin(0,π/2],SBLSHstillgivesmuchmoreaccurateestimation.

Figure3alsoshowsthatwith xedSuper-BitdepthNSBLSHsigni cantlyoutperformsSRP-LSH.WhenincreasingthecodelengthK,theaccuraciesofSBLSHandSRP-LSHshallbothincrease.Theperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.

3.2ApproximateNearestNeighborRetrieval

Inthissubsection,weconductANNretrievalexperiment,whichcomparesSBLSHwithtwootherwidelyuseddata-independentbinaryLSHmethods:SRP-LSHandE2LSH(weusethebinaryver-sionin[23],theoriginalversionisin[1]).WeusethedatasetsNotreDame,HalfDomeandTrevifromthePhotoTourismpatchdataset[24],whichisalsousedin[12,10,13]forANNretrieval.Weuse128DSIFTrepresentationandnormalizethevectorstounitnorm.Foreachdataset,werandomlypick1,000samplesasqueries,andtherestsamples(around100,000)asthecorpusfortheretrievaltask.Wede nethegoodneighborstoaqueryqasthesampleswithinthetop5%nearestneighbors(measuredinEuclideandistance)toq.Weadopttheevaluationcriteriausedin[12,23],i.e.theproportionofgoodneighborsinreturnedsamplesthatarewithinthequery’sHammingballofradiusr.Wesetr=ingcodelengthK=30,werepeattheexperimentfor10timesandtakethemeanoftheresults.ForSBLSH,we xtheSuper-BitdepthN=K=30.Table1showsthatSBLSHperformsbestamongallthesedata-independenthashingmethods.

4RelationstoOtherHashingMethods

ThereexistdifferentkindsofLSHmethods,e.g.bit-samplingLSH[9,7]forHammingdistanceand 1-distance,min-hash[2]forJaccardcoef cient,p-stable-distributionLSH[6]for p-distancewhenp∈(0,2].Thesedata-independentmethodsaresimple,thuseasytobeintegratedasamoduleinmorecomplicatedalgorithmsinvolvingpairwisedistanceorsimilaritycomputation,e.g.nearestneighborsearch.Newdata-independentmethodsforimprovingtheseoriginalLSHmethodshave

beenproposedrecently.[1]proposedanear-optimalLSHmethodforEuclideandistance.Lietal.

[16]proposedb-bitminwisehashwhichimprovestheoriginalmin-hashintermsofcompactness.

[17]showsthatb-bitminwisehashcanbeintegratedinlinearlearningalgorithmsforlarge-scalelearningtasks.[14]reducesthevarianceofrandomprojectionsbytakingadvantageofmarginalnorms,andcomparesthevarianceofSRPwithregularrandomprojectionsconsideringthemargins.

[15]proposedverysparserandomprojectionsforacceleratingrandomprojectionsandSRP.

PriortoSBLSH,SRP-LSH[3]wastheonlyhashingmethodproventoprovideunbiasedestimateofangularsimilarity.TheproposedSBLSHmethodisthe rstdata-independentmethodthatoutper-formsSRP-LSHintermsofhigheraccuracyinestimatingangularsimilarity.

Ontheotherhand,data-dependenthashingmethodshavebeenextensivelystudied.Spectralhashing

[23]isadata-dependentunsupervisedmethodforEuclideandistance.Kulisetal.[13]proposedker-nelizedlocality-sensitivehashing(KLSH),whichisbasedonSRP-LSH,toapproximatetheangularsimilarityinveryhighorevenin nitedimensionalspaceinducedbyanygivenkernel,withaccesstodataonlyviakernels.Therearealsoabunchofworksdevotedtosemi-supervisedorsupervisedhashingmethods[10,19,21,22],whichtrytocapturenotonlythegeometryoftheoriginaldata,butalsothesemanticrelations.

5Discussion

InsteadoftheGram-Schmidtprocess,wecanuseothermethodtoorthogonalizetheprojectionvec-tors,e.g.Householdertransformation,whichisnumericallymorestable.TheadvantageofGram-Schmidtprocessisitssimplicityindescribingthealgorithmandbuildinguptheoreticalguarantees.Inthepaperwedidnottestthemethodondataofveryhighdimension.Whenthedimensionishigh,andNisnotsmall,theGram-Schmidtprocesswillbecomputationallyexpensive.Infact,whenthedimensionofdataisveryhigh,therandomnormalprojectionvectors{vi}i=1,2...,Kwilltendto …… 此处隐藏:5916字,全部文档内容请下载后查看。喜欢就下载吧 ……

2012--Super-Bit Locality-Sensitive Hashing(3).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.jiaowen.net/wenku/96811.html(转载请注明文章来源)
Copyright © 2020-2025 教文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:78024566 邮箱:78024566@qq.com
苏ICP备19068818号-2
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)