OntheDifficultyofUnbiasedAlphaDivergenceMinimizationTomasGeffner1JustinDomke1AbstractExistingalpha-divergenceminimizationalgorithmscanbeclassifiedintotwobroadgroups:biasedmethods(Li&Severalapproxim...
OntheExplicitRoleofInitializationontheConvergenceandImplicitBiasofOverparametrizedLinearNetworksHanchengMin12SalmaTarmoun13Rene´Vidal14EnriqueMallada12Abstractwithoutexplicitregularization,enjoysg...
OntheConvergenceofHamiltonianMonteCarlowithStochasticGradientsDifanZou1QuanquanGu1AbstracttionssuchasBayesianinference,reinforcementlearning,andcomputervision.Inthepastdecades,manyMCMCHamiltonianMo...
OnPerceptualLossyCompression:TheCostofPerceptualReconstructionandAnOptimalTrainingFrameworkZeyuYan1FeiWen1RendongYing1ChaoMa1PeilinLiu1Abstract2017;Santurkaretal.,2018;Shaham&Michaeli,2018).Forloss...
AnalyzingMonotonicLinearInterpolationinNeuralNetworkLossLandscapesJamesLucas12JuhanBae12MichaelR.Zhang12StanislavFort3RichardZemel12RogerGrosse12AbstractFigure1.MonotoniclinearinterpolationforaResN...
OnLinearIdentifiabilityofLearnedRepresentationsGeoffreyRoeder1LukeMetz2DiederikP.Kingma2Abstractetal.,2018;Devlinetal.,2018),andsequentialdecisionmaking(Oordetal.,2018).Identifiabilityisadesirablep...
OnExplainabilityofGraphNeuralNetworksviaSubgraphExplorationsHaoYuan1HaiyangYu1JieWang2KangLi3ShuiwangJi1Abstract2018;Wangetal.,2019),andgraphpooling(Yuan&Ji,2020;Gao&Ji,2019;Zhangetal.,2018).Howeve...
OnaCombinationofAlternatingMinimizationandNesterov’sMomentumSergeyGuminov123PavelDvurechensky423NazariiTupitsa123AlexanderGasnikov123Abstractandsparserecovery(Daubechiesetal.,2010).ThefamousExpect...
ofMomentsandMatching:AGame-TheoreticFrameworkforClosingtheImitationGapGokulSwamy1SanjibanChoudhury2J.AndrewBagnell12ZhiweiStevenWu3AbstractE()f()E()f()E()f()Weprovideaunifyingviewofalargefamilyof(a...
NonparametricDecompositionofSparseTensorsConorTillinghast1ShandianZhe1Abstractdecompositionisafundamentalframeworkformultiwaydataanalysis.Ingeneral,tensordecompositionaimstoes-Tensordecompositionis...
NoiseandFluctuationofFiniteLearningRateStochasticGradientDescentKangqiaoLiu1LiuZiyin1MasahitoUeda123AbstractandTeh,2011).Whenthenoiseisduetominibatchsam-pling,thenoiseiscalledtheSGDnoiseorminibatch...
OntheProofofGlobalConvergenceofGradientDescentforDeepReLUNetworkswithLinearWidthsQuynhNguyen1Abstracttrainingdata,thentheoutputatlayerlisgivenbyWegiveasimpleprooffortheglobalconver-genceofgradien...
OntheProblemofUnderrankinginGroup-FairRankingSruthiGorantla1AmitDeshpande2AnandLouis1Abstractethicalconcernsandcanpotentiallycauselong-termeco-nomicandsocietalharmtodemographicsandbusinessesBiasinr...
OnthepriceofexplainabilityforsomeclusteringproblemsEduardoLaber1LucasMurtinho1Abstractdecisiontreewith3leaves.Asanexample,theblueclustercanbeexplainedasthesetofpointsthatsatisfyFeatureThepriceofexp...
OnthePredictabilityofPruningAcrossScalesJonathanRosenfeld1JonathanFrankle1MichaelCarbin1NirShavit1AbstractAsafirsttry,wecouldattempttoanswerthisquestionusingbruteforce:wecouldpruneeverymemberofanet...
MaximumMeanDiscrepancyTestisAwareofAdversarialAttacksRuizeGao12FengLiu3JingfengZhang4BoHan1TongliangLiu5GangNiu4MasashiSugiyama46AbstractsupinEq.(1),Grettonetal.(2012b)restrictedFtobeaunitballinthe...
LosslessCompressionofEfficientPrivateLocalRandomizersVitalyFeldman1KunalTalwar1AbstractTheconceptofalocalrandomizerdatesbacktotheworkofWarner(1965)whereitwasusedtoencouragetruthful-LocallyDifferent...
LogME:PracticalAssessmentofPre-trainedModelsforTransferLearningKaichaoYou∗1YongLiu∗1JianminWang1MingshengLong1Abstracttrainedwithlarge-scalesuperviseddata(Dengetal.,2009;Russakovskyetal.,2015)and...
LIME:LearningInductiveBiasforPrimitivesofMathematicalReasoningYuhuaiWu12MarkusRabe3WendaLi4JimmyBa12RogerGrosse12ChristianSzegedy3Abstractmathematicalreasoning.Attemptstodesignelaboratear-chitectur...
LearningDe-identifiedRepresentationsofProsodyfromRawAudioJackWeston1RaphaëlLenain1UdeepaMeepegama1EmilFristed1AbstractThephoneticproblemhasautomaticspeechrecognition(ASR)asitsobvioususe-case.Inrec...