2012--Super-Bit Locality-Sensitive Hashing(3)
Foreachdataset,wefurtherconductasimplepreprocessingstepasin[12],i.e.mean-centeringeachdatasample,soastoobtainadditionalmean-centeredversionsoftheabovedatasets,PhotoTourismSIFT(mean),andMIR-Flickr(mean).Theexperimentonthesemean-centereddatasetswilltesttheperformanceofSBLSHwhentheanglesofdatapairsarenotconstrainedin(0,π/2].
3.1.2TheEffectofSuper-BitDepthNandCodeLengthK
Ineachdataset,foreach(N,K)pair,i.e.Super-BitdepthNandcodelengthK,werandomlysample10,000data,whichinvolveabout50,000,000datapairs,andrandomlygenerateSRP-LSHfunctions,togetherwithSBLSHfunctionsbyorthogonalizingthegeneratedSRPinbatches.Werepeatthetestfor10times,andcomputethemeansquarederror(MSE)oftheestimation.
TotesttheeffectofSuper-BitdepthN,we xK=120forPhotoTourismSIFTandK=3000forMIR-Flickrrespectively,andtotesttheeffectofcodelengthK,we xN=120forPhotoTourismSIFTandN=3000forMIR-Flickr.Werepeattheexperimentonthemean-centeredversionsofthesedatasets,anddenotethemethodsbyMean+SRP-LSHandMean+SBLSHrespectively.1
2http://phototour.cs.washington.edu/patches/default.htmhttp://users.ecs.soton.ac.uk/jsh2/mirflickr/
SRP-LSH SBLSH Mean+SRP-LSH Mean+SBLSH
SRP-LSH SRP-LSH SBLSH SBLSH Mean+SRP-LSH Mean+SRP-LSH Mean+SBLSH Mean+SBLSH
Figure3:TheeffectofSuper-BitdepthN(1<N≤min(d,K))with xedcodelengthK(K=N×L),andtheeffectofcodelengthKwith xedSuper-BitdepthN.
Table1:ANNretrievalresults,measuredbyproportionofgoodneighborswithinquery’sHammingballofradius3.NotethatthecodelengthK=30.
Data
NotreDame
HalfDome
TreviE2LSH0.4675±0.09000.4503±0.07120.4661±0.0849SRP-LSH0.7500±0.05250.7137±0.04130.7591±0.0464SBLSH0.7845±0.03520.7535±0.02760.7891±0.0329
Figure3showsthatwhenusing xedcodelengthK,astheSuper-BitdepthNgetslarger(1<N≤min(d,K)),theMSEofSBLSHgetssmaller,andthegapbetweenSBLSHandSRP-LSHgetslarger.Particularly,whenN=K,over30%MSEreductioncanbeobservedonallthedatasets.Thisveri esCorollary2thatwhenapplyingSBLSH,thebeststrategywouldbetosettheSuper-BitdepthNaslargeaspossible,i.e.min(d,K).Aninformalexplanationtothisinterestingphenomenonisthatasthedegreeoforthogonalityoftherandomprojectionsgetshigher,thecodebecomesmoreandmoreinformative,andthusprovidesbetterestimate.Ontheotherhand,itcanbeobservedthattheperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.Thisshowsthatevenwhentheanglebetweeneachdatapairisnotconstrainedin(0,π/2],SBLSHstillgivesmuchmoreaccurateestimation.
Figure3alsoshowsthatwith xedSuper-BitdepthNSBLSHsigni cantlyoutperformsSRP-LSH.WhenincreasingthecodelengthK,theaccuraciesofSBLSHandSRP-LSHshallbothincrease.Theperformancesonthemean-centereddatasetsaresimilarasontheoriginaldatasets.
3.2ApproximateNearestNeighborRetrieval
Inthissubsection,weconductANNretrievalexperiment,whichcomparesSBLSHwithtwootherwidelyuseddata-independentbinaryLSHmethods:SRP-LSHandE2LSH(weusethebinaryver-sionin[23],theoriginalversionisin[1]).WeusethedatasetsNotreDame,HalfDomeandTrevifromthePhotoTourismpatchdataset[24],whichisalsousedin[12,10,13]forANNretrieval.Weuse128DSIFTrepresentationandnormalizethevectorstounitnorm.Foreachdataset,werandomlypick1,000samplesasqueries,andtherestsamples(around100,000)asthecorpusfortheretrievaltask.Wede nethegoodneighborstoaqueryqasthesampleswithinthetop5%nearestneighbors(measuredinEuclideandistance)toq.Weadopttheevaluationcriteriausedin[12,23],i.e.theproportionofgoodneighborsinreturnedsamplesthatarewithinthequery’sHammingballofradiusr.Wesetr=ingcodelengthK=30,werepeattheexperimentfor10timesandtakethemeanoftheresults.ForSBLSH,we xtheSuper-BitdepthN=K=30.Table1showsthatSBLSHperformsbestamongallthesedata-independenthashingmethods.
4RelationstoOtherHashingMethods
ThereexistdifferentkindsofLSHmethods,e.g.bit-samplingLSH[9,7]forHammingdistanceand 1-distance,min-hash[2]forJaccardcoef cient,p-stable-distributionLSH[6]for p-distancewhenp∈(0,2].Thesedata-independentmethodsaresimple,thuseasytobeintegratedasamoduleinmorecomplicatedalgorithmsinvolvingpairwisedistanceorsimilaritycomputation,e.g.nearestneighborsearch.Newdata-independentmethodsforimprovingtheseoriginalLSHmethodshave
beenproposedrecently.[1]proposedanear-optimalLSHmethodforEuclideandistance.Lietal.
[16]proposedb-bitminwisehashwhichimprovestheoriginalmin-hashintermsofcompactness.
[17]showsthatb-bitminwisehashcanbeintegratedinlinearlearningalgorithmsforlarge-scalelearningtasks.[14]reducesthevarianceofrandomprojectionsbytakingadvantageofmarginalnorms,andcomparesthevarianceofSRPwithregularrandomprojectionsconsideringthemargins.
[15]proposedverysparserandomprojectionsforacceleratingrandomprojectionsandSRP.
PriortoSBLSH,SRP-LSH[3]wastheonlyhashingmethodproventoprovideunbiasedestimateofangularsimilarity.TheproposedSBLSHmethodisthe rstdata-independentmethodthatoutper-formsSRP-LSHintermsofhigheraccuracyinestimatingangularsimilarity.
Ontheotherhand,data-dependenthashingmethodshavebeenextensivelystudied.Spectralhashing
[23]isadata-dependentunsupervisedmethodforEuclideandistance.Kulisetal.[13]proposedker-nelizedlocality-sensitivehashing(KLSH),whichisbasedonSRP-LSH,toapproximatetheangularsimilarityinveryhighorevenin nitedimensionalspaceinducedbyanygivenkernel,withaccesstodataonlyviakernels.Therearealsoabunchofworksdevotedtosemi-supervisedorsupervisedhashingmethods[10,19,21,22],whichtrytocapturenotonlythegeometryoftheoriginaldata,butalsothesemanticrelations.
5Discussion
InsteadoftheGram-Schmidtprocess,wecanuseothermethodtoorthogonalizetheprojectionvec-tors,e.g.Householdertransformation,whichisnumericallymorestable.TheadvantageofGram-Schmidtprocessisitssimplicityindescribingthealgorithmandbuildinguptheoreticalguarantees.Inthepaperwedidnottestthemethodondataofveryhighdimension.Whenthedimensionishigh,andNisnotsmall,theGram-Schmidtprocesswillbecomputationallyexpensive.Infact,whenthedimensionofdataisveryhigh,therandomnormalprojectionvectors{vi}i=1,2...,Kwilltendto …… 此处隐藏:5916字,全部文档内容请下载后查看。喜欢就下载吧 ……
相关推荐:
- [资格考试]石油钻采专业设备项目可行性研究报告编
- [资格考试]2012-2013学年度第二学期麻风病防治知
- [资格考试]道路勘测设计 绪论
- [资格考试]控烟戒烟知识培训资料
- [资格考试]建设工程安全生产管理(三类人员安全员
- [资格考试]photoshop制作茶叶包装盒步骤平面效果
- [资格考试]授课进度计划表封面(09-10下施工)
- [资格考试]麦肯锡卓越工作方法读后感
- [资格考试]2007年广西区农村信用社招聘考试试题
- [资格考试]软件实施工程师笔试题
- [资格考试]2014年初三数学复习专练第一章 数与式(
- [资格考试]中国糯玉米汁饮料市场发展概况及投资战
- [资格考试]塑钢门窗安装((专项方案)15)
- [资格考试]初中数学答题卡模板2
- [资格考试]2015-2020年中国效率手册行业市场调查
- [资格考试]华北电力大学学习实践活动领导小组办公
- [资格考试]溃疡性结肠炎研究的新进展
- [资格考试]人教版高中语文1—5册(必修)背诵篇目名
- [资格考试]ISO9001-2018质量管理体系最新版标准
- [资格考试]论文之希尔顿酒店集团进入中国的战略研
- 全国中小学生转学申请表
- 《奇迹暖暖》17-支2文学少女小满(9)公
- 2019-2020学年八年级地理下册 第六章
- 2005年高考试题——英语(天津卷)
- 无纺布耐磨测试方法及标准
- 建筑工程施工劳动力安排计划
- (目录)中国中央空调行业市场深度调研分
- 中国期货价格期限结构模型实证分析
- AutoCAD 2016基础教程第2章 AutoCAD基
- 2014-2015学年西城初三期末数学试题及
- 机械加工工艺基础(完整版)
- 归因理论在管理中的应用[1]0
- 突破瓶颈 实现医院可持续发展
- 2014年南京师范大学商学院决策学招生目
- 现浇箱梁支架预压报告
- Excel_2010函数图表入门与实战
- 人教版新课标初中数学 13.1 轴对称 (
- Visual Basic 6.0程序设计教程电子教案
- 2010北京助理工程师考试复习《建筑施工
- 国外5大医疗互联网模式分析




