{"title": "Collaboratively Learning Preferences from Ordinal Data", "book": "Advances in Neural Information Processing Systems", "page_first": 1909, "page_last": 1917, "abstract": "In personalized recommendation systems, it is important to predict preferences of a user on items that have not been seen by that user yet. Similarly, in revenue management, it is important to predict outcomes of comparisons among those items that have never been compared so far. The MultiNomial Logit model, a popular discrete choice model, captures  the structure of the hidden preferences  with a low-rank matrix. In order to predict the preferences, we want to learn the underlying model from noisy observations of the low-rank matrix, collected as revealed preferences in various forms of ordinal data. A natural approach to learn such a model is to solve a convex relaxation of nuclear norm minimization. We present the convex relaxation approach in two contexts of interest: collaborative ranking and bundled choice modeling. In both cases, we show that the convex relaxation is minimax optimal. We prove an upper bound on the resulting error with finite samples, and  provide a matching information-theoretic lower bound.", "full_text": "CollaborativelyLearningPreferencesfromOrdinalDataSewoongOh,KiranK.ThekumparampilUniversityofIllinoisatUrbana-Champaign{swoh,thekump2}@illinois.eduJiamingXuTheWhartonSchool,UPennjiamingx@wharton.upenn.eduAbstractInpersonalizedrecommendationsystems,itisimportanttopredictpreferencesofauseronitemsthathavenotbeenseenbythatuseryet.Similarly,inrevenuemanagement,itisimportanttopredictoutcomesofcomparisonsamongthoseitemsthathaveneverbeencomparedsofar.TheMultiNomialLogitmodel,apopulardiscretechoicemodel,capturesthestructureofthehiddenpreferenceswithalow-rankmatrix.Inordertopredictthepreferences,wewanttolearntheunderlyingmodelfromnoisyobservationsofthelow-rankmatrix,collectedasrevealedpreferencesinvariousformsofordinaldata.Anaturalapproachtolearnsuchamodelistosolveaconvexrelaxationofnuclearnormminimization.Wepresenttheconvexrelaxationapproachintwocontextsofinterest:collaborativerankingandbundledchoicemodeling.Inbothcases,weshowthattheconvexrelaxationisminimaxoptimal.Weproveanupperboundontheresultingerrorwith\ufb01nitesamples,andprovideamatchinginformation-theoreticlowerbound.1IntroductionInrecommendationsystemsandrevenuemanagement,itisimportanttopredictpreferencesonitemsthathavenotbeenseenbyauserorpredictoutcomesofcomparisonsamongthosethathaveneverbeencompared.Predictingsuchhiddenpreferenceswouldbehopelesswithoutfurtherassump-tionsonthestructureofthepreference.Motivatedbythesuccessofmatrixfactorizationmodelsoncollaborative\ufb01lteringapplications,wemodelhiddenpreferenceswithlow-rankmatricestocollab-orativelylearnpreferencematricesfromordinaldata.Thispaperconsidersthefollowingscenarios:\u2022Collaborativeranking.Consideranonlinemarketthatcollectseachuser\u2019spreferenceasarankingoverasubsetofitemsthatare\u2018seen\u2019bytheuser.Suchdatacanbeobtainedbydirectlyaskingtocomparesomeitems,orbyindirectlytrackingonlineactivitiesonwhichitemsareviewed,howmuchtimeisspentonthepage,orhowtheuserratedtheitems.Inordertomakepersonalizedrecommendations,wewantamodelwhich(a)captureshowuserswhoprefersimilaritemsarealsolikelytohavesimilarpreferencesonunseenitems,(b)predictswhichitemsausermightprefer,bylearningfromsuchordinaldata.\u2022Bundledchoicemodeling.Discretechoicemodelsdescribehowausermakesdecisionsonwhattopurchase.Typicalchoicemodelsassumethewillingnesstobuyanitemisindepen-dentofwhatelsetheuserbought.Inmanycases,however,wemake\u2018bundled\u2019purchases:webuyparticularingredientstogetherforonerecipeorwebuytwoconnecting\ufb02ights.Onechoice(the\ufb01rst\ufb02ight)hasasigni\ufb01cantimpactontheother(theconnecting\ufb02ight).Inordertooptimizetheassortment(which\ufb02ightschedulestooffer)formaximumexpectedrev-enue,itiscrucialtoaccuratelypredictthewillingnessoftheconsumerstopurchase,basedonpasthistory.Weconsideracasewheretherearetwotypesofproducts(e.g.jeansandshirts),andwant(a)amodelthatcapturessuchinteractingpreferencesforpairsofitems,onefromeachcategory;and(b)topredicttheconsumer\u2019schoiceprobabilitiesonpairsofitems,bylearningsuchmodelsfrompastpurchasehistory.1\fWeuseadiscretechoicemodelknownasMultiNomialLogit(MNL)model[1](describedinSection2.1)torepresentthepreferences.Incollaborativerankingcontext,MNLusesalow-rankmatrixtorepresentthehiddenpreferencesoftheusers.Eachrowcorrespondstoauser\u2019spreferenceoveralltheitems,andwhenpresentedwithasubsetofitemstheuserprovidesarankingoverthoseitems,whichisanoisyversionofthehiddentruepreference.Thelow-rankassumptionnaturallycapturesthesimilaritiesamongusersanditems,byrepresentingeachonalow-dimensionalspace.Inbundledchoicemodelingcontext,thelow-rankmatrixnowrepresentshowpairsofitemsarematched.Eachrowcorrespondstoanitemfromthe\ufb01rstcategoryandeachcolumncorrespondstoanitemfromthesecondcategory.Anentryinthematrixrepresentshowmuchthepairispreferredbyarandomlychosenuserfromapoolofusers.Noticethatinthiscasewedonotmodelindividualpreferences,butthepreferenceofthewholepopulation.Thepurchasehistoryofthepopulationistherecordofwhichpairwaschosenamongasubsetsofitemsthatwerepresented,whichisagainanoisyversionofthehiddentruepreference.Thelow-rankassumptioncapturesthesimilaritiesanddis-similaritiesamongtheitemsinthesamecategoryandtheinteractionsacrosscategories.Contribution.Anaturalapproachtolearnsuchalow-rankmodel,fromnoisyobservations,istosolveaconvexrelaxationofnuclearnormminimization(describedinSection2.2),sincenuclearnormisthetightestconvexsurrogatefortherankfunction.Wepresentsuchanapproachforlearn-ingtheMNLmodelfromordinaldata,intwocontexts:collaborativerankingandbundledchoicemodeling.Inbothcases,weanalyzethesamplecomplexityofthealgorithm,andprovideanupperboundontheresultingerrorwith\ufb01nitesamples.Weproveminimax-optimalityofourapproachbyprovidingamatchinginformation-theoreticlowerbound(uptoapoly-logarithmicfactor).Techni-cally,weutilizetheRandomUtilityModel(RUM)[2,3,4]interpretation(outlinedinSection2.1)oftheMNLmodeltoproveboththeupperboundandthefundamentallimit,whichcouldbeofinteresttoanalyzingmoregeneralclassofRUMs.Relatedwork.Inthecontextofcollaborativeranking,MNLmodelshavebeenproposedtomodelpartialrankingsfromapoolofusers.Recently,therehasbeennewalgorithmsandanalysesofthosealgorithmstolearnMNLmodelsfromsamples,inthecasewheneachuserprovidespair-wisecomparisons[5,6].[6]proposessolvingaconvexrelaxationofmaximizingthelikelihoodovermatriceswithboundednuclearnorm.Itisshownthatthisapproachachievesstatisticallyoptimalgeneralizationerrorrate,insteadofFrobeniusnormerrorthatweanalyze.Ouranalysistechniquesareinspiredby[5],whichproposedtheconvexrelaxationforlearningMNL,butwhentheusersprovideonlypair-wisecomparisons.Inthispaper,wegeneralizetheresultsof[5]byanalyzingmoregeneralsamplingmodelsbeyondpairwisecomparisons.Theremainderofthepaperisorganizedasfollows.InSection2,wepresenttheMNLmodelandproposeaconvexrelaxationforlearningthemodel,inthecontextofcollaborativeranking.WeprovidetheoreticalguaranteesforcollaborativerankinginSection3.InSection4,wepresenttheproblemstatementforbundledchoicemodeling,andanalyzeasimilarconvexrelaxationapproach.Notations.Weuse|||A|||Fand|||A|||\u221etodenotetheFrobeniusnormandthe\u2018\u221enorm,|||A|||nuc=Pi\u03c3i(A)todenotethenuclearnormwhere\u03c3i(A)denotethei-thsingularvalue,and|||A|||2=\u03c31(A)forthespectralnorm.Weusehhu,vii=PiuiviandkuktodenotetheinnerproductandtheEuclideannorm.Allonesvectorisdenotedby1andI(A)istheindicatorfunctionoftheeventA.Thesetofthe\ufb01stNintegersaredenotedby[N]={1,...,N}.2ModelandAlgorithmInthissection,wepresentadiscretechoicemodelingforcollaborativeranking,andproposeaninferencealgorithmforlearningthemodelfromordinaldata.2.1MultiNomialLogit(MNL)modelforcomparativejudgmentIncollaborativeranking,wewanttomodelhowpeoplewhohavesimilarpreferencesonasubsetofitemsarelikelytohavesimilartastesonotheritemsaswell.Whenusersprovideratings,asincollaborative\ufb01lteringapplications,matrixfactorizationmodelsarewidelyusedsincethelow-rankstructurecapturesthesimilaritiesbetweenusers.Whenusersprovideorderedpreferences,weuseadiscretechoicemodelknownasMultiNomialLogit(MNL)[1]modelthathasasimilarlow-rankstructurethatcapturesthesimilaritiesbetweenusersanditems.2\fLet\u0398\u2217bethed1\u00d7d2dimensionalmatrixcapturingthepreferenceofd1usersond2items,wheretherowsandcolumnscorrespondtousersanditems,respectively.Typically,\u0398\u2217isassumedtobelow-rank,havingarankrthatismuchsmallerthanthedimensions.However,inthefollowingweallowamoregeneralsettingwhere\u0398\u2217mightbeonlyapproximatelylowrank.WhenauseriispresentedwithasetofalternativesSi\u2286[d2],sherevealsherpreferencesasarankedlistoverthoseitems.Tosimplifythenotationsweassumealluserscomparethesamenumberkofitems,buttheanalysisnaturallygeneralizestothecasewhenthesizemightdifferfromausertoauser.Letvi,\u2018\u2208Sidenotethe(random)\u2018-thbestchoiceofuseri.Eachusergivesaranking,independentofotherusers\u2019rankings,fromP{vi,1,...,vi,k}=kY\u2018=1e\u0398\u2217i,vi,\u2018Pj\u2208Si,\u2018e\u0398\u2217i,j,(1)wherewithSi,\u2018\u2261Si\\{vi,1,...,vi,\u2018\u22121}andSi,1\u2261Si.Forauseri,thei-throwof\u0398\u2217representstheunderlyingpreferencevectoroftheuser,andthemorepreferreditemsaremorelikelytoberankedhigher.Theprobabilisticnatureofthemodelcapturesthenoiseintherevealedpreferences.Therandomutilitymodel(RUM),pioneeredby[2,3,4],describesthechoicesofusersasmanifes-tationsoftheunderlyingutilities.TheMNLmodelsisaspecialcaseofRUMwhereeachdecisionmakerandeachalternativearerepresentedbyar-dimensionalfeaturevectorsuiandvjrespectively,suchthat\u0398\u2217ij=hhui,vjii,resultinginalow-rankmatrix.WhenpresentedwithasetofalternativesSi,thedecisionmakeriranksthealternativesaccordingtotheirrandomutilitydrawnfromUij=hhui,vjii+\u03beij,(2)foritemj,where\u03beijfollowthestandardGumbeldistribution.Intuitively,thisprovidesajusti\ufb01cationfortheMNLmodelasmodelingthedecisionmakersasrationalbeing,seekingtomaximizeutility.Technically,thisRUMinterpretationplaysacrucialroleinouranalysis,inprovingrestrictedstrongconvexityinAppendixA.5andalsoinprovingfundamentallimitinAppendixC.ThereareafewcaseswheretheMaximumLikelihood(ML)estimationforRUMistractable.OnenotableexampleisthePlackett-Luce(PL)model,whichisaspecialcaseoftheMNLmodelwhere\u0398\u2217isrank-oneandallusershavethesamefeatures.PLmodelhasbeenwidelyappliedinecono-metrics[1],analyzingelections[7],andmachinelearning[8].Ef\ufb01cientinferencealgorithmshasbeenproposed[9,10,11],andthesamplecomplexityhasbeenanalyzedfortheMLE[12]andfortheRankCentrality[13].AlthoughPLisquiterestrictive,inthesensethatitassumesalluserssharethesamefeatures,littleisknownaboutinferenceinRUMsbeyondPL.Recently,toovercomesucharestriction,mixedPLmodelshavebeenstudied,where\u0398\u2217isrank-rbutthereareonlyrclassesofusersandallusersinthesameclasshavethesamefeatures.Ef\ufb01cientinferencealgorithmswithprovableguaranteeshavebeenproposedbyapplyingrecentadvancesintensordecompositionmeth-ods[14,15],directlyclusteringtheusers[16,17],orusingsamplingmethods[18].However,thismixturePLisstillrestrictive,andbothclusteringandtensorbasedapproachesrelyheavilyonthefactthatthedistributionisa\u201cmixture\u201dandrequireadditionalincoherenceassumptionson\u0398\u2217.Formoregeneralmodels,ef\ufb01cientinferencealgorithmshavebeenproposed[19]butnoperformanceguaranteeisknownfor\ufb01nitesamples.AlthoughtheMLEforthegeneralMNLmodelin(1)isintractable,weprovideapolynomial-timeinferencealgorithmwithprovableguarantees.2.2NuclearnormminimizationAssuming\u0398\u2217iswellapproximatedbyalow-rankmatrix,weestimate\u0398\u2217bysolvingthefollowingconvexrelaxationgiventheobservedpreferenceintheformofrankedlists{(vi,1,...,vi,k)}i\u2208[d1].b\u0398\u2208argmin\u0398\u2208\u2126L(\u0398)+\u03bb|||\u0398|||nuc,(3)wherethe(negative)loglikelihoodfunctionaccordingto(1)isL(\u0398)=\u22121kd1d1Xi=1kX\u2018=1\uf8eb\uf8edhh\u0398,eieTvi,\u2018ii\u2212log\uf8eb\uf8edXj\u2208Si,\u2018exp(cid:0)hh\u0398,eieTjii(cid:1)\uf8f6\uf8f8\uf8f6\uf8f8,(4)withSi={vi,1,...,vi,k}andSi,\u2018\u2261Si\\{vi,1,...,vi,\u2018\u22121},andappropriatelychosenset\u2126de\ufb01nedin(7).Sincenuclearnormisatightconvexsurrogatefortherank,theaboveoptimizationsearches3\fforalow-ranksolutionthatmaximizesthelikelihood.Nuclearnormminimizationhasbeenwidelyusedinrankminimizationproblems[20],butprovableguaranteestypicallyexistsonlyforquadraticlossfunctionL(\u0398)[21,22].Ouranalysisextendssuchanalysistechniquestoidentifytheconditionsunderwhichrestrictedstrongconvexityissatis\ufb01edforaconvexlossfunctionthatisnotquadratic.3Collaborativerankingfromk-wisecomparisonsWe\ufb01rstprovidebackgroundontheMNLmodel,andthenpresentmainresultsontheperformanceguarantees.Noticethatthedistribution(1)isindependentofshiftingeachrowof\u0398\u2217byaconstant.Hence,thereisanequivalentclassof\u0398\u2217thatgivesthesamedistributionsfortherankedlists:[\u0398\u2217]={A\u2208Rd1\u00d7d2|A=\u0398\u2217+u1Tforsomeu\u2208Rd1}.(5)Sincewecanonlyestimate\u0398\u2217uptothisequivalentclass,wesearchfortheonewhoserowssumtozero,i.e.Pj\u2208[d2]\u0398\u2217i,j=0foralli\u2208[d1].Let\u03b1\u2261maxi,j1,j2|\u0398\u2217ij1\u2212\u0398\u2217ij2|denotethedynamicrangeoftheunderlying\u0398\u2217,suchthatwhenkitemsarecompared,wealwayshave1ke\u2212\u03b1\u226411+(k\u22121)e\u03b1\u2264P{vi,1=j}\u226411+(k\u22121)e\u2212\u03b1\u22641ke\u03b1,(6)forallj\u2208Si,allSi\u2286[d2]satisfying|Si|=kandalli\u2208[d1].Wedonotmakeanyassumptionson\u03b1otherthanthat\u03b1=O(1)withrespecttod1andd2.Thepurposeofde\ufb01ningthedynamicrangeinthiswayisthatweseektocharacterizehowtheerrorscaleswith\u03b1.Giventhisde\ufb01nition,wesolvetheoptimizationin(3)over\u2126\u03b1=nA\u2208Rd1\u00d7d2(cid:12)(cid:12)|||A|||\u221e\u2264\u03b1,and\u2200i\u2208[d1]wehaveXj\u2208[d2]Aij=0o.(7)Whileinpracticewedonotrequirethe\u2018\u221enormconstraint,weneeditfortheanalysis.Forarelatedproblemofmatrixcompletion,wherethelossL(\u03b8)isquadratic,eitherasimilarconditionon\u2018\u221enormisrequiredoradifferentconditiononincoherenceisrequired.3.1PerformanceguaranteeWeprovideanupperboundontheresultingerrorofourconvexrelaxation,whenamulti-setofitemsSipresentedtouseriisdrawnuniformlyatrandomwithreplacement.Precisely,foragivenk,Si={ji,1,...,ji,k}whereji,\u2018\u2019sareindependentlydrawnuniformlyatrandomoverthed2items.Further,ifanitemissampledmorethanonce,i.e.ifthereexistsji,\u20181=ji,\u20182forsomeiand\u201816=\u20182,thenweassumethattheusertreatsthesetwoitemsasiftheyaretwodis-tinctitemswiththesameMNLweights\u0398\u2217i,ji,\u20181=\u0398\u2217i,ji,\u20182.Theresultingpreferenceisthereforealwaysoverkitems(withpossiblymultiplecopiesofthesameitem),anddistributedaccordingto(1).Forexample,ifk=3,itispossibletohaveSi={ji,1=1,ji,2=1,ji,3=2},inwhichcasetheresultingrankingcanbe(vi,1=ji,1,vi,2=ji,3,vi,3=ji,2)withprobability(e\u0398\u2217i,1)/(2e\u0398\u2217i,1+e\u0398\u2217i,2)\u00d7(e\u0398\u2217i,2)/(e\u0398\u2217i,1+e\u0398\u2217i,2).Suchsamplingwithreplacementisnecessaryfortheanalysis,wherewerequireindependenceinthechoiceoftheitemsinSiinordertoapplythesymmetrizationtechnique(e.g.[23])toboundtheexpectationofthedeviation(cf.AppendixA.5).Similarsamplingassumptionshavebeenmadeinexistinganalysesonlearninglow-rankmodelsfromnoisyobservations,e.g.[22].Letd\u2261(d1+d2)/2,andlet\u03c3j(\u0398\u2217)denotethej-thsingularvalueofthematrix\u0398\u2217.De\ufb01ne\u03bb0\u2261e2\u03b1sd1logd+d2(logd)2(log2d)4kd21d2.Theorem1.Underthedescribedsamplingmodel,assume24\u2264k\u2264min{d21logd,(d21+d22)/(2d1)logd,(1/e)d2(4logd2+2logd1)},and\u03bb\u2208[480\u03bb0,c0\u03bb0]withanyconstantc0=O(1)largerthan480.Then,solvingtheoptimization(3)achieves1d1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2F\u2264288\u221a2e4\u03b1c0\u03bb0\u221ar(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F+288e4\u03b1c0\u03bb0min{d1,d2}Xj=r+1\u03c3j(\u0398\u2217),(8)foranyr\u2208{1,...,min{d1,d2}}withprobabilityatleast1\u22122d\u22123\u2212d\u221232whered=(d1+d2)/2.4\fAproofisprovidedinAppendixA.Theaboveboundshowsanaturalsplittingoftheerrorintotwoterms,onecorrespondingtotheestimationerrorfortherank-rcomponentandthesecondonecorrespondingtotheapproximationerrorforhowwellonecanapproximate\u0398\u2217witharank-rmatrix.Thisboundholdsforallvaluesofrandonecouldpotentiallyoptimizeoverr.Weshowsuchresultsinthefollowingcorollaries.Corollary3.1(Exactlow-rankmatrices).Suppose\u0398\u2217hasrankatmostr.UnderthehypothesesofTheorem1,solvingtheoptimization(3)withthechoiceoftheregularizationparameter\u03bb\u2208[480\u03bb0,c0\u03bb0]achieveswithprobabilityatleast1\u22122d\u22123\u2212d\u221232,1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F\u2264288\u221a2e6\u03b1c0sr(d1logd+d2(logd)2(log2d)4)kd1.(9)Thenumberofentriesisd1d2andwerescaletheFrobeniusnormerrorappropriatelyby1/\u221ad1d2.When\u0398\u2217isarank-rmatrix,thenthedegreesoffreedominrepresenting\u0398\u2217isr(d1+d2)\u2212r2=O(r(d1+d2)).Theabovetheoremshowsthatthetotalnumberofsamples,whichis(kd1),needstoscaleasO(rd1(logd)+rd2(logd)2(log2d)4inordertoachieveanarbitrarilysmallerror.Thisisonlypoly-logarithmicfactorlargerthanthedegreesoffreedom.InSection3.2,weprovidealowerboundontheerrordirectly,thatmatchestheupperbounduptoalogarithmicfactor.Thedependenceonthedynamicrange\u03b1,however,issub-optimal.Itisexpectedthattheerrorincreaseswith\u03b1,sincethe\u0398\u2217scalesas\u03b1,buttheexponentialdependenceintheboundseemstobeaweaknessoftheanalysis,asseenfromnumericalexperimentsintherightpanelofFigure1.Althoughtheerrorincreasewith\u03b1,numericalexperimentssuggeststhatitonlyincreasesatmostlinearly.However,tighteningthescalingwithrespectto\u03b1isachallengingproblem,andsuchsub-optimaldependenceisalsopresentinexistingliteratureforlearningevensimplermodels,suchastheBradley-Terrymodel[13]orthePlackett-Lucemodel[12],whicharespecialcasesoftheMNLmodelstudiedinthispaper.Apracticalissueinachievingtheaboverateisthechoiceof\u03bb,sincethedynamicrange\u03b1isnotknowninadvance.Figure1illustratesthattheerrorisnotsensitivetothechoiceof\u03bbforawiderange.Anotherissueisthattheunderlyingmatrixmightnotbeexactlylowrank.Itismorerealistictoassumethatitisapproximatelylowrank.Following[22]weformalizethisnotionwith\u201c\u2018q-ball\u201dofmatricesde\ufb01nedasBq(\u03c1q)\u2261{\u0398\u2208Rd1\u00d7d2|Xj\u2208[min{d1,d2}]|\u03c3j(\u0398\u2217)|q\u2264\u03c1q}.(10)Whenq=0,thisisasetofrank-\u03c10matrices.Forq\u2208(0,1],thisissetofmatriceswhosesingularvaluesdecayrelativelyfast.OptimizingthechoiceofrinTheorem1,wegetthefollowingresult.Corollary3.2(Approximatelylow-rankmatrices).Suppose\u0398\u2217\u2208Bq(\u03c1q)forsomeq\u2208(0,1]and\u03c1q>0.UnderthehypothesesofTheorem1,solvingtheoptimization(3)withthechoiceoftheregularizationparameter\u03bb\u2208[480\u03bb0,c0\u03bb0]achieveswithprobabilityatleast1\u22122d\u22123,1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F\u22642\u221a\u03c1q\u221ad1d2\uf8eb\uf8ed288\u221a2c0e6\u03b1sd1d2(d1logd+d2(logd)2(log2d)2)kd1\uf8f6\uf8f82\u2212q2.(11)ThisisastrictgeneralizationofCorollary3.1.Forq=0and\u03c10=r,thisrecoverstheexactlow-rankestimationbounduptoafactoroftwo.Forapproximatelow-rankmatricesinan\u2018q-ball,weloseintheerrorexponent,whichreducesfromoneto(2\u2212q)/2.AproofofthisCorollaryisprovidedinAppendixB.TheleftpanelofFigure1con\ufb01rmsthescalingoftheerrorrateaspredictedbyCorollary3.1.Thelinesmergetoasinglelinewhenthesamplesizeisrescaledappropriately.Wemakeachoiceof\u03bb=(1/2)p(logd)/(kd2),Thischoiceisindependentof\u03b1andissmallerthanproposedinTheorem1.Wegeneraterandomrank-rmatricesofdimensiond\u00d7d,where\u0398\u2217=UVTwithU\u2208Rd\u00d7randV\u2208Rd\u00d7rentriesgeneratedi.i.dfromuniformdistributionover[0,1].Thenthe5\f 0.01 0.1 1 1000 10000r=3,d=50 0.01 0.1 1 1000 10000r=3,d=50r=6,d=50 0.01 0.1 1 1000 10000r=3,d=50r=6,d=50r=12,d=50 0.01 0.1 1 1000 10000r=3,d=50r=6,d=50r=12,d=50r=24,d=50RMSEsamplesizek 0.1 1 1 10 100 1000 10000 100000 RMSE\u03bb\u221a(logd)/(kd2)\u03b1=15\u03b1=10\u03b1=5Figure1:The(rescaled)RMSEscalesaspr(logd)/kasexpectedfromCorollary3.1for\ufb01xedd=50(left).Intheinset,thesamedataisplottedversusrescaledsamplesizek/(rlogd).The(rescaled)RMSEisstableforabroadrangeof\u03bband\u03b1for\ufb01xedd=50andr=3(right).row-meanissubtractedformeachrow,andthenthewholematrixisscaledsuchthatthelargestentryis\u03b1=5.Notethatthisoperationdoesnotincreasetherankofthematrix\u0398.Thisisbecausethisde-meaningcanbewrittenas\u0398\u2212\u039811T/d2andbothtermsintheoperationareofthesamecolumnspaceas\u0398whichisofrankr.Therootmeansquarederror(RMSE)isplottedwhereRMSE=(1/d)|||\u0398\u2217\u2212b\u0398|||F.Weimplementandsolvetheconvexoptimization(3)usingproximalgradientdescentmethodasanalyzedin[24].TherightpanelinFigure1illustratesthattheactualerrorisinsensitivetothechoiceof\u03bbforabroadrangeof\u03bb\u2208[p(logd)/(kd2),28p(logd)/(kd2)],afterwhichitincreaseswith\u03bb.3.2Information-theoreticlowerboundforlow-rankmatricesForapolynomial-timealgorithmofconvexrelaxation,wegaveintheprevioussectionaboundontheachievableerror.Wenextcomparethistothefundamentallimitofthisproblem,bygivingalowerboundontheachievableerrorbyanyalgorithm(ef\ufb01cientornot).Asimpleparametercountingargumentindicatesthatitrequiresthenumberofsamplestoscaleasthedegreesoffreedomi.e.,kd1\u221dr(d1+d2),toestimatead1\u00d7d2dimensionalmatrixofrankr.Weconstructanappropriatepackingoverthesetoflow-rankmatriceswithboundedentriesin\u2126\u03b1de\ufb01nedas(7),andshowthatnoalgorithmcanaccuratelyestimatethetruematrixwithhighprobabilityusingthegeneralizedFano\u2019sinequality.Thisprovidesaconstructiveargumenttolowerboundtheminimaxerrorrate,whichinturnestablishesthattheboundsinTheorem1issharpuptoalogarithmicfactor,andprovesnootheralgorithmcansigni\ufb01cantlyimproveoverthenuclearnormminimization.Theorem2.Suppose\u0398\u2217hasrankr.Underthedescribedsamplingmodel,forlargeenoughd1andd2\u2265d1,thereisauniversalnumericalconstantc>0suchthatinfb\u0398sup\u0398\u2217\u2208\u2126\u03b1Eh1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Fi\u2265cmin(\u03b1e\u2212\u03b1rrd2kd1,\u03b1d2\u221ad1d2logd),(12)wherethein\ufb01mumistakenoverallmeasurablefunctionsovertheobservedrankedlists{(vi,1,...,vi,k)}i\u2208[d1].AproofofthistheoremisprovidedinAppendixC.Thetermofprimaryinterestinthisboundisthe\ufb01rstone,whichshowsthescalingofthe(rescaled)minimaxrateaspr(d1+d2)/(kd1)(whend2\u2265d1),andmatchestheupperboundin(8).Itisthedominanttermintheboundwheneverthenumberofsamplesislargerthanthedegreesoffreedombyalogarithmicfactor,i.e.,kd1>r(d1+d2)logd,ignoringthedependenceon\u03b1.Thisisatypicalregimeofinterest,wherethesamplesizeiscomparabletothelatentdimensionoftheproblem.Inthisregime,Theorem2establishesthattheupperboundinTheorem1isminimax-optimaluptoalogarithmicfactorinthedimensiond.6\f4ChoicemodelingforbundledpurchasehistoryInthissection,weusetheMNLmodeltostudyanotherscenarioofpracticalinterest:choicemodel-ingfrombundledpurchasehistory.Inthissetting,weassumethatwehavebundledpurchasehistorydatafromnusers.Precisely,therearetwocategoriesofinterestwithd1andd2alternativesineachcategoryrespectively.Forexample,thereared1toothpastestochoosefromandd2toothbrushestochoosefrom.Forthei-thuser,asubsetSi\u2286[d1]ofalternativesfromthe\ufb01rstcategoryispresentedalongwithasubsetTi\u2286[d2]ofalternativesfromthesecondcategory.Weusek1andk2todenotethenumberofalternativespresentedtoasingleuser,i.e.k1=|Si|andk2=|Ti|,andweassumethatthenumberofalternativespresentedtoeachuseris\ufb01xed,tosimplifynotations.Giventhesesetsofalternatives,eachusermakesa\u2018bundled\u2019purchaseandweuse(ui,vi)todenotethebundledpairofalternatives(e.g.atoothbrushandatoothpaste)purchasedbythei-thuser.Eachusermakesachoiceofthebestalternative,independentofotherusers\u2019schoices,accordingtotheMNLmodelasP{(ui,vi)=(j1,j2)}=e\u0398\u2217j1,j2Pj01\u2208Si,j02\u2208Tie\u0398\u2217j01,j02,(13)forallj1\u2208Siandj2\u2208Ti.Thedistribution(13)isindependentofshiftingallthevaluesof\u0398\u2217byaconstant.Hence,thereisanequivalentclassof\u0398\u2217thatgivesthesamedistributionforthechoices:[\u0398\u2217]\u2261{A\u2208Rd1\u00d7d2|A=\u0398\u2217+c11Tforsomec\u2208R}.Sincewecanonlyestimate\u0398\u2217uptothisequivalentclass,wesearchfortheonethatsumtozero,i.e.Pj1\u2208[d1],j2\u2208[d2]\u0398\u2217j1,j2=0.Let\u03b1=maxj1,j01\u2208[d1],j2,j02\u2208[d2]|\u0398\u2217j1,j2\u2212\u0398\u2217j01,j02|,denotethedynamicrangeoftheunderlying\u0398\u2217,suchthatwhenk1\u00d7k2alternativesarepresented,wealwayshave1k1k2e\u2212\u03b1\u2264P{(ui,vi)=(j1,j2)}\u22641k1k2e\u03b1,(14)forall(j1,j2)\u2208Si\u00d7TiandforallSi\u2286[d1]andTi\u2286[d2]suchthat|Si|=k1and|Ti|=k2.Wedonotmakeanyassumptionson\u03b1otherthanthat\u03b1=O(1)withrespecttod1andd2.Assuming\u0398\u2217iswellapproximatebyalow-rankmatrix,wesolvethefollowingconvexrelaxation,giventheobservedbundledpurchasehistory{(ui,vi,Si,Ti)}i\u2208[n]:b\u0398\u2208argmin\u0398\u2208\u21260\u03b1L(\u0398)+\u03bb|||\u0398|||nuc,(15)wherethe(negative)loglikelihoodfunctionaccordingto(13)isL(\u0398)=\u22121nnXi=1\uf8eb\uf8edhh\u0398,euieTviii\u2212log\uf8eb\uf8edXj1\u2208Si,j2\u2208Tiexp(cid:0)hh\u0398,ej1eTj2ii(cid:1)\uf8f6\uf8f8\uf8f6\uf8f8,and(16)\u21260\u03b1\u2261nA\u2208Rd1\u00d7d2(cid:12)(cid:12)|||A|||\u221e\u2264\u03b1,andXj1\u2208[d1],j2\u2208[d2]Aj1,j2=0o.(17)Comparedtocollaborativeranking,(a)rowsandcolumnsof\u0398\u2217correspondtoanalternativefromthe\ufb01rstandsecondcategory,respectively;(b)eachsamplecorrespondstothepurchasechoiceofauserwhichfollowtheMNLmodelwith\u0398\u2217;(c)eachpersonispresentedsubsetsSiandTiofitemsfromeachcategory;(d)eachsampleddatarepresentsthemostpreferredbundledpairofalternatives.4.1PerformanceguaranteeWeprovideanupperboundontheerrorachievedbyourconvexrelaxation,whenthemulti-setofalternativesSifromthe\ufb01rstcategoryandTifromthesecondcategoryaredrawnuniformlyatrandomwithreplacementfrom[d1]and[d2]respectively.Precisely,forgivenk1andk2,weletSi={j(i)1,1,...,j(i)1,k1}andTi={j(i)2,1,...,j(i)2,k2},wherej(i)1,\u2018\u2019sandj(i)2,\u2018\u2019sareindependentlydrawnuniformlyatrandomoverthed1andd2alternatives,respectively.Similartotheprevioussection,thissamplingwithreplacementisnecessaryfortheanalysis.De\ufb01ne\u03bb1=se2\u03b1max{d1,d2}logdnd1d2.(18)7\fTheorem3.Underthedescribedsamplingmodel,assume16e2\u03b1min{d1,d2}logd\u2264n\u2264min{d5,k1k2max{d21,d22}}logd,and\u03bb\u2208[8\u03bb1,c1\u03bb1]withanyconstantc1=O(1)largerthanmax{8,128/pmin{k1,k2}}.Then,solvingtheoptimization(15)achieves1d1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)2F\u226448\u221a2e2\u03b1c1\u03bb1\u221ar(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F+48e2\u03b1c1\u03bb1min{d1,d2}Xj=r+1\u03c3j(\u0398\u2217),(19)foranyr\u2208{1,...,min{d1,d2}}withprobabilityatleast1\u22122d\u22123whered=(d1+d2)/2.AproofisprovidedinAppendixD.Optimizingoverrgivesthefollowingcorollaries.Corollary4.1(Exactlow-rankmatrices).Suppose\u0398\u2217hasrankatmostr.UnderthehypothesesofTheorem3,solvingtheoptimization(15)withthechoiceoftheregularizationparameter\u03bb\u2208[8\u03bb1,c1\u03bb1]achieveswithprobabilityatleast1\u22122d\u22123,1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F\u226448\u221a2e3\u03b1c1rr(d1+d2)logdn.(20)ThiscorollaryshowsthatthenumberofsamplesnneedstoscaleasO(r(d1+d2)logd)inordertoachieveanarbitrarilysmallerror.Thisisonlyalogarithmicfactorlargerthanthedegreesoffreedom.Weprovideafundamentallowerboundontheerror,thatmatchestheupperbounduptoalogarithmicfactor.Forapproximatelylow-rankmatricesinan\u20181-ballasde\ufb01nedin(10),weshowanupperboundontheerror,whoseerrorexponentreducesfromoneto(2\u2212q)/2.Corollary4.2(Approximatelylow-rankmatrices).Suppose\u0398\u2217\u2208Bq(\u03c1q)forsomeq\u2208(0,1]and\u03c1q>0.UnderthehypothesesofTheorem3,solvingtheoptimization(15)withthechoiceoftheregularizationparameter\u03bb\u2208[8\u03bb1,c1\u03bb1]achieveswithprobabilityatleast1\u22122d\u22123,1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)F\u22642\u221a\u03c1q\u221ad1d2 48\u221a2c1e3\u03b1rd1d2(d1+d2)logdn!2\u2212q2.(21)SincetheproofisalmostidenticaltotheproofofCorollary3.2inAppendixB,weomitit.Theorem4.Suppose\u0398\u2217hasrankr.Underthedescribedsamplingmodel,thereisauniversalconstantc>0suchthatthattheminimaxratewherethein\ufb01mumistakenoverallmeasurablefunctionsovertheobservedpurchasehistory{(ui,vi,Si,Ti)}i\u2208[n]islowerboundedbyinfb\u0398sup\u0398\u2217\u2208\u2126\u03b1Eh1\u221ad1d2(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)b\u0398\u2212\u0398\u2217(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)Fi\u2265cmin(re\u22125\u03b1r(d1+d2)n,\u03b1(d1+d2)\u221ad1d2logd).(22)SeeAppendixE.1fortheproof.The\ufb01rsttermisdominant,andwhenthesamplesizeiscomparabletothelatentdimensionoftheproblem,Theorem3isminimaxoptimaluptoalogarithmicfactor.5DiscussionWepresentedaconvexprogramtolearnMNLparametersfromordinaldata,motivatedbytwosce-narios:recommendationsystemsandbundledpurchases.Wetakethe\ufb01rstprincipleapproachofidentifyingthefundamentallimitsandalsodevelopingef\ufb01cientalgorithmsmatchingthosefunda-mentaltradeoffs.Thereareseveralremainingchallenges.(a)Nuclearnormminimization,whilepolynomial-time,isstillslow.Wewant\ufb01rst-ordermethodsthatareef\ufb01cientwithprovableguaran-tees.Themainchallengeisprovidingagoodinitializationtostartsuchnon-convexapproaches.(b)Forsimplermodels,suchasthePLmodel,moregeneralsamplingoveragraphhasbeenstudied.Wewantanalyticalresultsformoregeneralsampling.(c)Thepracticaluseofthemodelandthealgorithmneedstobetestedonrealdatasetsonpurchasehistoryandrecommendations.AcknowledgmentsThisresearchissupportedinpartbyNSFCMMIawardMES-1450848andNSFSaTCawardCNS-1527754.8\fReferences[1]DanielMcFadden.Conditionallogitanalysisofqualitativechoicebehavior.1973.[2]LouisLThurstone.Alawofcomparativejudgment.Psychologicalreview,34(4):273,1927.[3]JacobMarschak.Binary-choiceconstraintsandrandomutilityindicators.InProceedingsofasymposiumonmathematicalmethodsinthesocialsciences,volume7,pages19\u201338,1960.[4]D.R.Luce.IndividualChoiceBehavior.Wiley,NewYork,1959.[5]YuLuandSahandNNegahban.Individualizedrankaggregationusingnuclearnormregularization.arXivpreprintarXiv:1410.0860,2014.[6]DohyungPark,JoeNeeman,JinZhang,SujaySanghavi,andInderjitSDhillon.Preferencecompletion:Large-scalecollaborativerankingfrompairwisecomparisons.2015.[7]IsobelClaireGormleyandThomasBrendanMurphy.Agradeofmembershipmodelforrankdata.BayesianAnalysis,4(2):265\u2013295,2009.[8]Tie-YanLiu.Learningtorankforinformationretrieval.FoundationsandTrendsinInformationRetrieval,3(3):225\u2013331,2009.[9]D.R.Hunter.Mmalgorithmsforgeneralizedbradley-terrymodels.AnnalsofStatistics,pages384\u2013406,2004.[10]JohnGuiverandEdwardSnelson.Bayesianinferenceforplackett-lucerankingmodels.Inproceedingsofthe26thannualinternationalconferenceonmachinelearning,pages377\u2013384.ACM,2009.[11]FrancoisCaronandArnaudDoucet.Ef\ufb01cientbayesianinferenceforgeneralizedbradley\u2013terrymodels.JournalofComputationalandGraphicalStatistics,21(1):174\u2013196,2012.[12]B.Hajek,S.Oh,andJ.Xu.Minimax-optimalinferencefrompartialrankings.InAdvancesinNeuralInformationProcessingSystems,pages1475\u20131483,2014.[13]S.Negahban,S.Oh,andD.Shah.Iterativerankingfrompair-wisecomparisons.InNIPS,pages2483\u20132491,2012.[14]S.OhandD.Shah.Learningmixedmultinomiallogitmodelfromordinaldata.InAdvancesinNeuralInformationProcessingSystems,pages595\u2013603,2014.[15]W.Ding,P.Ishwar,andV.Saligrama.Atopicmodelingapproachtorankaggregation.BostonUniversityCenterforInfo.andSystemsEngg.TechnicalReporthttp://www.bu.edu/systems/publications,2014.[16]A.Ammar,S.Oh,D.Shah,andL.Voloch.What\u2019syourchoice?learningthemixedmulti-nomiallogitmodel.InProceedingsoftheACMSIGMETRICS/internationalconferenceonMeasurementandmodelingofcomputersystems,2014.[17]RuiWu,JiamingXu,RSrikant,LaurentMassouli\u00b4e,MarcLelarge,andBruceHajek.Clusteringandinferencefrompairwisecomparisons.arXivpreprintarXiv:1502.04631,2015.[18]H.AzariSou\ufb01ani,H.Diao,Z.Lai,andD.C.Parkes.Generalizedrandomutilitymodelswithmultipletypes.InAdvancesinNeuralInformationProcessingSystems,pages73\u201381,2013.[19]H.A.Sou\ufb01ani,D.C.Parkes,andL.Xia.Randomutilitytheoryforsocialchoice.InNIPS,pages126\u2013134,2012.[20]B.Recht,M.Fazel,andP.Parrilo.Guaranteedminimum-ranksolutionsoflinearmatrixequationsvianuclearnormminimization.SIAMreview,52(3):471\u2013501,2010.[21]E.J.Cand`esandB.Recht.Exactmatrixcompletionviaconvexoptimization.FoundationsofComputa-tionalMathematics,9(6):717\u2013772,2009.[22]S.NegahbanandM.J.Wainwright.Restrictedstrongconvexityand(weighted)matrixcompletion:Op-timalboundswithnoise.JournalofMachineLearningResearch,2012.[23]St\u00b4ephaneBoucheron,G\u00b4aborLugosi,andPascalMassart.Concentrationinequalities:Anonasymptotictheoryofindependence.OxfordUniversityPress,2013.[24]A.Agarwal,S.Negahban,andM.Wainwright.Fastglobalconvergenceratesofgradientmethodsforhigh-dimensionalstatisticalrecovery.InInNIPS,pages37\u201345,2010.[25]J.Tropp.User-friendlytailboundsforsumsofrandommatrices.FoundationsofComput.Math.,2011.[26]S.VanDeGeer.EmpiricalProcessesinM-estimation,volume6.Cambridgeuniversitypress,2000.[27]M.Ledoux.Theconcentrationofmeasurephenomenon.Number89.AmericanMathematicalSoc.,2005.9\f", "award": [], "sourceid": 1179, "authors": [{"given_name": "Sewoong", "family_name": "Oh", "institution": "UIUC"}, {"given_name": "Kiran", "family_name": "Thekumparampil", "institution": "UIUC"}, {"given_name": "Jiaming", "family_name": "Xu", "institution": null}]}