Adafactor:AdaptiveLearningRateswithSublinearMemoryCostNoamShazeer1MitchellStern12Abstractvectorsummarizingthehistoryofsquaredgradients,usuallyobtainedthroughsummationasinAdagrad(Duchietal.,Insevera...
State-FrequencyMemoryRecurrentNeuralNetworksHaoHu1Guo-JunQi1AbstractRNNmodelssuchasLongShort-termMemory(LSTM)(Hochreiter&Schmidhuber,1997)havebeenprovenasModelingtemporalsequencesplaysafundamen-pow...