TrainingGraphNeuralNetworkswith1000LayersGuohaoLi12MatthiasMu¨ller1BernardGhanem2VladlenKoltun1AbstractRevGNN-WideResGNN-224Deepgraphneuralnetworks(GNNs)have88achievedexcellentresultsonvarioustask...
LipschitzNormalizationforSelf-AttentionLayerswithApplicationtoGraphNeuralNetworksGeorgeDasoulas12KevinScaman1AladinVirmaux1Abstractclassification(Velickovicetal.,2018;Lietal.,2016)andcomputervision...
LeveragingSparseLinearLayersforDebuggableDeepNetworksEricWong1ShibaniSanturkar1AleksanderMa˛dry1Abstractmodel’sfailuremodesorevaluatecorrectiveinterventionswithoutin-depthproblem-specificstudies....
GlobalOptimalityBeyondTwoLayers:TrainingDeepReLUNetworksviaConvexProgramsTolgaErgen1MertPilanci1AbstractOutputUnderstandingthefundamentalmechanismbe-Inputhindthesuccessofdeepneuralnetworksisoneofth...
BASELayers:SimplifyingTrainingofLarge,SparseModelsMikeLewis1ShrutiBhosale1TimDettmers12NamanGoyal1LukeZettlemoyer12AbstractWorker1Worker2Weintroduceanewbalancedassignmentofex-Re-routetooriginalwork...
DeepResidualOutputLayersforNeuralLanguageGenerationNikolaosPappas1JamesHenderson1Abstractbeddingstocapturethesimilaritystructureoftheoutputlabelspace,sothatdataforsimilarlabelscanhelpclassi-Manytas...
BERTandPALs:ProjectedAttentionLayersforEfficientAdaptationinMulti-TaskLearningAsaCooperStickland1IainMurray1AbstractHowever,fine-tuningseparatemodelsforeachtaskoftenworksbetterinpractice.Althoughwe...