教学文库网 - 权威文档分享云平台
您的当前位置:首页 > 文库大全 > 资格考试 >

2012--Super-Bit Locality-Sensitive Hashing(2)

来源:网络收集 时间:2026-02-09
导读: Lemma1.([8],Lemma3.2)LetSd 1denotetheunitsphereinRd.GivenarandomvectorvuniformlysampledfromSd 1,wehavePr[hv(a)=hv(b)]=θa,b/π. Lemma2.Ifv∈Rdfollowsanisotropicdistribution,thenv=v/ v isuniformlydist

Lemma1.([8],Lemma3.2)LetSd 1denotetheunitsphereinRd.GivenarandomvectorvuniformlysampledfromSd 1,wehavePr[hv(a)=hv(b)]=θa,b/π.

Lemma2.Ifv∈Rdfollowsanisotropicdistribution,thenv¯=v/ v isuniformlydistributedond 1S.

Thislemmacanbeprovenbythede nitionofisotropicdistribution,andweomitthedetailshere.Lemma3.Givenkvectorsv1,...,vk∈Rd,whicharesampledi.i.d.fromthenormaldistributionN(0,Id),andspanasubspaceSk,letPSkdenotetheorthogonalprojectionontoSk,thenPSkisarandommatrixuniformlydistributedontheGrassmannmanifoldGk,d k.

ThislemmacanbeprovenbyapplyingtheTheorem2.2.1(iii),Theorem2.2.2(iii)in

[4].Lemma4.IfPisarandommatrixuniformlydistributedontheGrassmannmanifoldGk,d k,1≤k≤d,andv~N(0,Id)isindependentofP,thentherandomvectorv =Pvfollowsanisotropicdistribution.

FromtheuniformityofPontheGrassmannmanifoldandthepropertyofthenormaldistributionN(0,Id),wecangetthisresultdirectly.Wegiveasketchofproofbelow.

Proof.WecanwriteP=UUT,wherethecolumnsofU=[u1,u2,...,uk]constitutesanorthonor-malbasisofarandomk-dimensionalsubspace.Sincethestandardnormaldistributionis2-stableT[6],v =UTv=[v i~N(0,1),andit1,v2,...,vk]isanN(0,Ik)-distributedvector,whereeachvk iui.Sinceui,...,ukiseasytoverifythatv isindependentofU.Thereforev =Pv=Uv =Σi=1v

canbeanyorthonormalbasisofanyk-dimensionalsubspacewithequalprobabilitydensity,and{v followsanisotropicdistribution.1,v2,...,vk}arei.i.d.N(0,1)randomvariables,v

Theorem1.GivenNi.i.d.randomvectorsv1,v2,...,vN∈Rdsampledfromthenormaldistri-butionN(0,Id),where1≤N≤d,performtheGram-SchmidtprocessonthemandproduceNorthogonalizedvectorsw1,w2,...,wN,thenforanytwodatavectorsa,b∈Rd,byde ningNindicatorrandomvariablesX1,X2,...,XNas

1,hwi(a)=hwi(b)Xi=0,hwi(a)=hwi(b)

wehaveE[Xi]=θa,b/π,forany1≤i≤N.

Proof.DenoteSi 1thesubspacespannedby{w1,...,wi 1},andtheorthogonalprojectionontoits⊥⊥.Thenwi=PSv.Denotew¯=wi/ wi .orthogonalcomplementasPSi 1ii 1

Forany1≤i≤N,E[Xi]=Pr[Xi=1]=Pr[hwi(a)=hwi(b)]=Pr[hw¯(a)=hw¯(b)].Fori=1,byLemma2andLemma1,wehavePr[X1=1]=θa,b/π.

Forany1<i≤N,considerthedistributionofwi.Bylemma3,PSi 1isarandommatrix⊥uniformlydistributedontheGrassmannmanifoldGi 1,d i+1,thusPS=I PSi 1isuniformlyi 1distributedonGd i+1,i 1.Sincevi~N(0,Id)isindependentofv1,v2,...,vi 1,viisindependent

⊥⊥vfollowsanisotropicdistribution.ByLemmaofPS.ByLemma4,wehavethatwi=PSi 1i 1i

2,w¯=wi/ wi isuniformlydistributedontheunitsphereinRd.ByLemma1,Pr[hw¯(a)=hw¯(b)]=θa,b/π.

Corollary1.ForanySuper-BitdepthN,1≤N≤d,assumingthatthecodelengthK=N×L,theHammingdistancedHamming(h(a),h(b))isanunbiasedestimateofθa,b,foranytwodatavectorsaandb∈Rd,uptoaconstantscalefactorC=K/π.

Proof.ApplyTheorem1andwegetE[dHamming(h(a),h(b))]=L×E[ΣNi=1Xi]=L×

Kθa,bNΣN=Cθa,b.i=1E[Xi]=L×Σi=1θa,b/π=2.2Variance

Inthissubsectionweprovethatwhentheangleθa,b∈(0,π/2],thevarianceofSBLSHisstrictlysmallerthanthatofSRP-LSH.

Lemma5.Fortherandomvariables{Xi}de nedinTheorem1,wehavethefollowingequalityPr[Xi=1|Xj=1]=Pr[Xi=1|X1=1],1≤j<i≤N≤d.

/2

Proof.Withoutlossofgenerality,weassume wk =1for1≤k≤N.Pr[Xi=1|Xj=1]=Pr[hwi(a)=hwi(b)|Xj=1]=Pr[hvi Σi 1wkwTvi(a)=hvi Σi 1wkwTvi(b)|hwj(a)=kkk=1k=1hwj(b)].Since{w1,...wi 1}isauniformlyrandomorthonormalbasisofarandomsubspaceuni-formlydistributedonGrassmannmanifold,byexchangingtheindexjand1wehavethatequalsPr[hvi Σi 1wkwTvi(a)=hvi Σi 1wkwTvi(b)|hw1(a)=hw1(b)]=Pr[Xi=1|X1=1].k=1kk=1k

Lemma6.For{Xi}de nedinTheorem1,wehavePr[Xi=1|Xj=1]=Pr[X2=1|X1=1],

θa,b1≤j<i≤N≤d.Givenθa,b∈(0,π

],wehavePr[X2=1|X1=1]<.

Theproofofthislemmaislong,thusweprovideitintheAppendix(insupplementary le).

Theorem2.Giventwovectorsa,b∈Rdandrandomvariables{Xi}de nedasinTheorem1,denotep2,1=Pr[X2=1|X1=1],andSX=ΣNi=1XiwhichistheHammingdistancebetween

Nθpθa,bNθtheN-Super-Bitsofaandb,for1<N≤d,thenVar[SX]=a,b+N(N 1)2,1 (a,b)2.Proof.ByLemma6,Pr[Xi=1|Xj=1]=Pr[X2=1|X1=1]=p2,1when1≤j<i≤N.

pθa,b,forany1≤j<ThereforePr[Xi=1,Xj=1]=Pr[Xi=1|Xj=1]Pr[Xj=1]=2,1

2222i≤N.ThereforeVar[SX]=E[SX] E[SX]2=ΣNi=1E[Xi]

+2Σj<iE[XiXj] NE[X1]=

Nθa,bNθNθpθa,bNθ+2Σj<iPr[Xi=1,Xj=1] (a,b)2=a,b+N(N 1)2,1 (a,b)2.Corollary2.DenoteVar[SBLSHN,K]asthevarianceoftheHammingdistanceproducedbySBLSH,where1≤N≤distheSuper-Bitdepth,andK=N×Listhecodelength.ThenVar[SBLSHN,K]=L×Var[SBLSHN,N].Furthermore,givenθa,b∈(0,π

],ifK=N1×L1=

N2×L2and1≤N2<N1≤d,thenVar[SBLSHN1,K]<Var[SBLSHN2,K].

Proof.Sincev1,v2,...,vKareindependentlysampled,andw1,w2,...,wKareproducedbyorthog-onalizingeveryNvectors,theHammingdistancesproducedbydifferentN-Super-Bitsareinde-pendent,thusVar[SBLSHN,K]=L×Var[SBLSHN,N].

a,bThereforeVar[SBLSHN1,K]=L1×(1

a,b+N1(N1 1)2,1 (1

a,b)2)=a,b+K(N1 θθa,bpθa,bπ2 KN1(a,b1)2,1

).ByLemma6,whenθa,b∈(0,],forN1>N2>1,0≤p2,1<.

KθθThereforeVar[SBLSHN1,K] Var[SBLSHN2

,K]=a,b(N1 N2)(p2,1 a,b

)<0.For

Kθa,bθN1>N2=1,Var[SBLSHN1,K] Var[SBLSHN2,K]=(N1 1)(p2,1 a,b

)<0NθpθNθKθ

Corollary3.DenoteVar[SRPLSHK]asthevarianceoftheHammingdistanceproducedbySRP-LSH,where

K=N×L

isthecodelengthandLisapositiveinteger,1<N≤d.Ifθa,b∈(0,π

],

thenVar[SRPLSHK]>Var[SBLSHN,K].

Proof.ByCorollary2,Var[SRPLSHK]=Var[SBLSH1,K]>Var[SBLSHN,K].

2.2.1Numericalveri cation

Figure2:ThevarianceofSRP-LSHandSBLSHagainsttheangleθa,btoestimate.

InthissubsectionweverifynumericallythebehaviorofthevariancesofbothSRP-LSHandSBLSHfordifferentanglesθa,b∈(0,π].ByTheorem2,thevarianceofSBLSHiscloselyrelatedtop2,1de nedinTheorem2.Werandomlygenerate30pointsinR10,whichinvolves435angles.Foreachangle,wenumericallyapproximatep2,1usingsamplingmethod,wherethesamplenumberis1000.We xK=N=d,andplotthevariancesVar[SRPLSHN]and …… 此处隐藏:5754字,全部文档内容请下载后查看。喜欢就下载吧 ……

2012--Super-Bit Locality-Sensitive Hashing(2).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.jiaowen.net/wenku/96811.html(转载请注明文章来源)
Copyright © 2020-2025 教文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:78024566 邮箱:78024566@qq.com
苏ICP备19068818号-2
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)