UniSpeech:UnifiedSpeechRepresentationLearningwithLabeledandUnlabeledDataChengyiWang1YuWu2YaoQian2KenichiKumatani2ShujieLiu2FuruWei2MichaelZeng2XuedongHuang2Abstractmajorityofthenearly7000languagess...
UnsupervisedSpeechDecompositionviaTripleInformationBottleneckKaizhiQian12YangZhang1ShiyuChang1DavidCox1MarkHasegawa-Johnson2Abstractofthetoneofthespeaker,andrhythmcharacterizeshowfastthespeakerutte...
DeepGraphRandomProcessforRelational-Thinking-BasedSpeechRecognitionHengguanHuang1FuzhaoXue1HaoWang2YeWang1Abstractoninnumerableunconsciousperceptspertainingtorela-tionsbetweencurrentsensorysignalsa...
CHiVE:VaryingProsodyinSpeechSynthesiswithaLinguisticallyDrivenDynamicHierarchicalConditionalVariationalNetworkVincentWan1Chun-anChan1TomKenter1JakubVit2RobClark1AbstractμPredictedprosodicfeaturesT...
AlmostUnsupervisedTexttoSpeechandAutomaticSpeechRecognitionYiRen1XuTan2TaoQin2ShengZhao3ZhouZhao1Tie-YanLiu2Abstract1.IntroductionTexttoSpeech(TTS)andautomaticSpeechrecog-TexttoSpeech(TTS)andautoma...
TowardsEnd-to-EndProsodyTransferforExpressiveSpeechSynthesiswithTacotronRJSkerry-Ryan1EricBattenberg1YingXiao1YuxuanWang1DaisyStanton1JoelShor1RonJ.Weiss1RobClark1RifA.Saurous1Abstractcanalsobespok...
ParallelWaveNet:FastHigh-FidelitySpeechSynthesisAaronvandenOord1YazheLi1IgorBabuschkin1KarenSimonyan1OriolVinyals1KorayKavukcuoglu1GeorgevandenDriessche1EdwardLockhart1LuisC.Cobo1FlorianStimberg1No...
MultichannelEnd-to-endSpeechRecognitionTsubasaOchiai1ShinjiWatanabe2TakaakiHori2JohnR.Hershey2Abstractintoasingleneuralnetwork.Specifically,anattention-basedencoder-decoderframework(Chorowskietal.,...