2012--Super-Bit Locality-Sensitive Hashing(2)
Lemma1.([8],Lemma3.2)LetSd 1denotetheunitsphereinRd.GivenarandomvectorvuniformlysampledfromSd 1,wehavePr[hv(a)=hv(b)]=θa,b/π.
Lemma2.Ifv∈Rdfollowsanisotropicdistribution,thenv¯=v/ v isuniformlydistributedond 1S.
Thislemmacanbeprovenbythede nitionofisotropicdistribution,andweomitthedetailshere.Lemma3.Givenkvectorsv1,...,vk∈Rd,whicharesampledi.i.d.fromthenormaldistributionN(0,Id),andspanasubspaceSk,letPSkdenotetheorthogonalprojectionontoSk,thenPSkisarandommatrixuniformlydistributedontheGrassmannmanifoldGk,d k.
ThislemmacanbeprovenbyapplyingtheTheorem2.2.1(iii),Theorem2.2.2(iii)in
[4].Lemma4.IfPisarandommatrixuniformlydistributedontheGrassmannmanifoldGk,d k,1≤k≤d,andv~N(0,Id)isindependentofP,thentherandomvectorv =Pvfollowsanisotropicdistribution.
FromtheuniformityofPontheGrassmannmanifoldandthepropertyofthenormaldistributionN(0,Id),wecangetthisresultdirectly.Wegiveasketchofproofbelow.
Proof.WecanwriteP=UUT,wherethecolumnsofU=[u1,u2,...,uk]constitutesanorthonor-malbasisofarandomk-dimensionalsubspace.Sincethestandardnormaldistributionis2-stableT[6],v =UTv=[v i~N(0,1),andit1,v2,...,vk]isanN(0,Ik)-distributedvector,whereeachvk iui.Sinceui,...,ukiseasytoverifythatv isindependentofU.Thereforev =Pv=Uv =Σi=1v
canbeanyorthonormalbasisofanyk-dimensionalsubspacewithequalprobabilitydensity,and{v followsanisotropicdistribution.1,v2,...,vk}arei.i.d.N(0,1)randomvariables,v
Theorem1.GivenNi.i.d.randomvectorsv1,v2,...,vN∈Rdsampledfromthenormaldistri-butionN(0,Id),where1≤N≤d,performtheGram-SchmidtprocessonthemandproduceNorthogonalizedvectorsw1,w2,...,wN,thenforanytwodatavectorsa,b∈Rd,byde ningNindicatorrandomvariablesX1,X2,...,XNas
1,hwi(a)=hwi(b)Xi=0,hwi(a)=hwi(b)
wehaveE[Xi]=θa,b/π,forany1≤i≤N.
Proof.DenoteSi 1thesubspacespannedby{w1,...,wi 1},andtheorthogonalprojectionontoits⊥⊥.Thenwi=PSv.Denotew¯=wi/ wi .orthogonalcomplementasPSi 1ii 1
Forany1≤i≤N,E[Xi]=Pr[Xi=1]=Pr[hwi(a)=hwi(b)]=Pr[hw¯(a)=hw¯(b)].Fori=1,byLemma2andLemma1,wehavePr[X1=1]=θa,b/π.
Forany1<i≤N,considerthedistributionofwi.Bylemma3,PSi 1isarandommatrix⊥uniformlydistributedontheGrassmannmanifoldGi 1,d i+1,thusPS=I PSi 1isuniformlyi 1distributedonGd i+1,i 1.Sincevi~N(0,Id)isindependentofv1,v2,...,vi 1,viisindependent
⊥⊥vfollowsanisotropicdistribution.ByLemmaofPS.ByLemma4,wehavethatwi=PSi 1i 1i
2,w¯=wi/ wi isuniformlydistributedontheunitsphereinRd.ByLemma1,Pr[hw¯(a)=hw¯(b)]=θa,b/π.
Corollary1.ForanySuper-BitdepthN,1≤N≤d,assumingthatthecodelengthK=N×L,theHammingdistancedHamming(h(a),h(b))isanunbiasedestimateofθa,b,foranytwodatavectorsaandb∈Rd,uptoaconstantscalefactorC=K/π.
Proof.ApplyTheorem1andwegetE[dHamming(h(a),h(b))]=L×E[ΣNi=1Xi]=L×
Kθa,bNΣN=Cθa,b.i=1E[Xi]=L×Σi=1θa,b/π=2.2Variance
Inthissubsectionweprovethatwhentheangleθa,b∈(0,π/2],thevarianceofSBLSHisstrictlysmallerthanthatofSRP-LSH.
Lemma5.Fortherandomvariables{Xi}de nedinTheorem1,wehavethefollowingequalityPr[Xi=1|Xj=1]=Pr[Xi=1|X1=1],1≤j<i≤N≤d.
/2
Proof.Withoutlossofgenerality,weassume wk =1for1≤k≤N.Pr[Xi=1|Xj=1]=Pr[hwi(a)=hwi(b)|Xj=1]=Pr[hvi Σi 1wkwTvi(a)=hvi Σi 1wkwTvi(b)|hwj(a)=kkk=1k=1hwj(b)].Since{w1,...wi 1}isauniformlyrandomorthonormalbasisofarandomsubspaceuni-formlydistributedonGrassmannmanifold,byexchangingtheindexjand1wehavethatequalsPr[hvi Σi 1wkwTvi(a)=hvi Σi 1wkwTvi(b)|hw1(a)=hw1(b)]=Pr[Xi=1|X1=1].k=1kk=1k
Lemma6.For{Xi}de nedinTheorem1,wehavePr[Xi=1|Xj=1]=Pr[X2=1|X1=1],
θa,b1≤j<i≤N≤d.Givenθa,b∈(0,π
],wehavePr[X2=1|X1=1]<.
Theproofofthislemmaislong,thusweprovideitintheAppendix(insupplementary le).
Theorem2.Giventwovectorsa,b∈Rdandrandomvariables{Xi}de nedasinTheorem1,denotep2,1=Pr[X2=1|X1=1],andSX=ΣNi=1XiwhichistheHammingdistancebetween
Nθpθa,bNθtheN-Super-Bitsofaandb,for1<N≤d,thenVar[SX]=a,b+N(N 1)2,1 (a,b)2.Proof.ByLemma6,Pr[Xi=1|Xj=1]=Pr[X2=1|X1=1]=p2,1when1≤j<i≤N.
pθa,b,forany1≤j<ThereforePr[Xi=1,Xj=1]=Pr[Xi=1|Xj=1]Pr[Xj=1]=2,1
2222i≤N.ThereforeVar[SX]=E[SX] E[SX]2=ΣNi=1E[Xi]
+2Σj<iE[XiXj] NE[X1]=
Nθa,bNθNθpθa,bNθ+2Σj<iPr[Xi=1,Xj=1] (a,b)2=a,b+N(N 1)2,1 (a,b)2.Corollary2.DenoteVar[SBLSHN,K]asthevarianceoftheHammingdistanceproducedbySBLSH,where1≤N≤distheSuper-Bitdepth,andK=N×Listhecodelength.ThenVar[SBLSHN,K]=L×Var[SBLSHN,N].Furthermore,givenθa,b∈(0,π
],ifK=N1×L1=
N2×L2and1≤N2<N1≤d,thenVar[SBLSHN1,K]<Var[SBLSHN2,K].
Proof.Sincev1,v2,...,vKareindependentlysampled,andw1,w2,...,wKareproducedbyorthog-onalizingeveryNvectors,theHammingdistancesproducedbydifferentN-Super-Bitsareinde-pendent,thusVar[SBLSHN,K]=L×Var[SBLSHN,N].
a,bThereforeVar[SBLSHN1,K]=L1×(1
a,b+N1(N1 1)2,1 (1
a,b)2)=a,b+K(N1 θθa,bpθa,bπ2 KN1(a,b1)2,1
).ByLemma6,whenθa,b∈(0,],forN1>N2>1,0≤p2,1<.
KθθThereforeVar[SBLSHN1,K] Var[SBLSHN2
,K]=a,b(N1 N2)(p2,1 a,b
)<0.For
Kθa,bθN1>N2=1,Var[SBLSHN1,K] Var[SBLSHN2,K]=(N1 1)(p2,1 a,b
)<0NθpθNθKθ
Corollary3.DenoteVar[SRPLSHK]asthevarianceoftheHammingdistanceproducedbySRP-LSH,where
K=N×L
isthecodelengthandLisapositiveinteger,1<N≤d.Ifθa,b∈(0,π
],
thenVar[SRPLSHK]>Var[SBLSHN,K].
Proof.ByCorollary2,Var[SRPLSHK]=Var[SBLSH1,K]>Var[SBLSHN,K].
2.2.1Numericalveri cation
Figure2:ThevarianceofSRP-LSHandSBLSHagainsttheangleθa,btoestimate.
InthissubsectionweverifynumericallythebehaviorofthevariancesofbothSRP-LSHandSBLSHfordifferentanglesθa,b∈(0,π].ByTheorem2,thevarianceofSBLSHiscloselyrelatedtop2,1de nedinTheorem2.Werandomlygenerate30pointsinR10,whichinvolves435angles.Foreachangle,wenumericallyapproximatep2,1usingsamplingmethod,wherethesamplenumberis1000.We xK=N=d,andplotthevariancesVar[SRPLSHN]and …… 此处隐藏:5754字,全部文档内容请下载后查看。喜欢就下载吧 ……
相关推荐:
- [资格考试]石油钻采专业设备项目可行性研究报告编
- [资格考试]2012-2013学年度第二学期麻风病防治知
- [资格考试]道路勘测设计 绪论
- [资格考试]控烟戒烟知识培训资料
- [资格考试]建设工程安全生产管理(三类人员安全员
- [资格考试]photoshop制作茶叶包装盒步骤平面效果
- [资格考试]授课进度计划表封面(09-10下施工)
- [资格考试]麦肯锡卓越工作方法读后感
- [资格考试]2007年广西区农村信用社招聘考试试题
- [资格考试]软件实施工程师笔试题
- [资格考试]2014年初三数学复习专练第一章 数与式(
- [资格考试]中国糯玉米汁饮料市场发展概况及投资战
- [资格考试]塑钢门窗安装((专项方案)15)
- [资格考试]初中数学答题卡模板2
- [资格考试]2015-2020年中国效率手册行业市场调查
- [资格考试]华北电力大学学习实践活动领导小组办公
- [资格考试]溃疡性结肠炎研究的新进展
- [资格考试]人教版高中语文1—5册(必修)背诵篇目名
- [资格考试]ISO9001-2018质量管理体系最新版标准
- [资格考试]论文之希尔顿酒店集团进入中国的战略研
- 全国中小学生转学申请表
- 《奇迹暖暖》17-支2文学少女小满(9)公
- 2019-2020学年八年级地理下册 第六章
- 2005年高考试题——英语(天津卷)
- 无纺布耐磨测试方法及标准
- 建筑工程施工劳动力安排计划
- (目录)中国中央空调行业市场深度调研分
- 中国期货价格期限结构模型实证分析
- AutoCAD 2016基础教程第2章 AutoCAD基
- 2014-2015学年西城初三期末数学试题及
- 机械加工工艺基础(完整版)
- 归因理论在管理中的应用[1]0
- 突破瓶颈 实现医院可持续发展
- 2014年南京师范大学商学院决策学招生目
- 现浇箱梁支架预压报告
- Excel_2010函数图表入门与实战
- 人教版新课标初中数学 13.1 轴对称 (
- Visual Basic 6.0程序设计教程电子教案
- 2010北京助理工程师考试复习《建筑施工
- 国外5大医疗互联网模式分析




