HighPerformanceZero-MemoryOverheadDirectConvolutionsJiyuanZhang1FranzFranchetti1TzeMengLow1AbstractPerformancenormalizedtoOpenBLASGEMMonAMDPileDriverThecomputationofconvolutionlayersindeep4.0GHz,4/...