Recent advances in transfer learning, style transfer, speaker coding and factor decoupling provide potential solutions for low-resource speech cloning. In 2020, IQiyi jointly held the Multi-Speaker and Multi-Style Timbre Cloning Contest -M2VoC in ICASSP2021. The M2VoC Challenge aims to provide a common data set and a fair testing platform for the study of speech cloning tasks. As one of the flagship tasks of the Signal Processing Challenge of the International Conference on Acoustics, Speech and Signal Processing (ICASSP2021), researchers from academia and industry are strongly encouraged to join the Multi-Speaker multi-Style Timbre Cloning Contest (M2VoC). \
A total of 153 teams from around the world have signed up for the competition since its launch. The competition committee conducted two rounds of strictly subjective evaluation: the first consisted of the systems of all the submitted teams, and the second consisted of the systems of the teams with the highest scores in the first round (starting with the teams with marginal differences in scores).
The winner is chosen by combining the results of the two rounds of testing. For Track1, the final score is the average MOS score for voice quality, style, and speaker similarity; For Track2, the final score is the average MOS score in terms of sound quality and speaker similarity. Note: The final results of this competition are not included in the comprehensibility test score.
Recently, the organizing committee reviewed and published the ICASSP2021 M2VoC high-scoring teams according to the submitted proposals.
You can click “Read the original article” to check the official website.
At 2pm on March 10th, we invited representatives of the top2 teams of the four circuits to gather on “cloud” together with the members of the organizing committee and judges to share excellent competition plans and review the wonderful content of the competition.
Specific information of the report
Time: \
March 10, 2:00-5:00pm
Agenda:
time | The guest | The theme |
---|---|---|
2 -“ | Li Hai, senior manager of IQiyi | Washington – |
“- PM | Xie Lei is a professor at Northwestern Polytechnical University | New progress in anthropomorphic speech synthesis |
At 3:00 PM – | Wu Zhiyong is an associate professor at Tsinghua University | Controllable accent speech generation for intelligent Speech Interaction |
3:00-3:20 | Tian Xiaohai is a researcher at the National University of Singapore | Non-parallel Data Voice Conversion |
The man – 3:50 p.m | Mingqi Yang is a researcher at Ape Tutoring AI Lab | THE YUANFUDAO TTS SYSTEM FOR M2VOC 2021 |
3:50 p.m. -“ | Wang Tao, PhD student, Institute of Automation, Chinese Academy of Sciences | Personalized speech synthesis style transfer |
Disaster – 4:50 | Li Hongbin is a researcher at VIVO Research Institute in Shenzhen | ICASSP2021 M2VoC Competition model sharing |
4:50 – right” | Chien-ming chien is a researcher at the Speech Processing Lab at National Taiwan University | Investigating on Incorporating Pretrained and Learnable Speaker Representation for Multi-Speaker Multi-Style Text-to-Speech |
Participation:
Sweep code?????? Enter the sharing group to get detailed live links!
Group full can add iQiyi small assistant wechat: iqiyixiaozhushou
Note “lecture”, pull you into the group
Review and share guest introductions
-
Lei: > \
-
Professor and doctoral supervisor of Northwestern Polytechnical University, head of Audio Speech and Language Processing Laboratory (ASLP@NPU), standing member of Speech Dialogue and Hearing Special Committee of Chinese Computer Society, Deputy Director of Speech Information Special Committee of Chinese Information Society, editorial board member of ACM/IEEE Trans. ASLP, the top journal of speech, has published more than 180 papers. \
-
Zhi-yong wu:
-
Associate Professor, Doctoral supervisor, Shenzhen International Graduate School, Tsinghua University. Deputy Director, Tsinghua University – CuHK Joint Research Center for Media Science, Technology and Systems. His research interest is intelligent speech interaction technology for artificial intelligence. He has undertaken a number of projects sponsored by the National Natural Science Foundation of China and the RGC of the Hksar Government. Ministry of Education Science and Technology Progress Award 2009 and 2016. Many of the students under my guidance have won excellent dissertations, national scholarships and outstanding graduates, and won the title in the 2017 Global Geek Competition “AI Sound Imitation Sound Detection Attack and Defense Competition”.
- Tian Xiaohai:
- Xiaohai Tian received his Ph.D. from Nanyang Technological University, Singapore. He received the B.Sc and M.Sc degrees from Northwestern Polytechnical University, Shaanxi, China in 2006 and 2011 respectively. He is now a research fellow at the Human Language Technology (HLT) Lab, School of Electrical and Computer Engineering, National University of Singapore. His research interests include voice conversion, speech synthesis, singing synthesis and anti-spoofing.
Organizational units
\
\
Scan the qr code below, more exciting content to accompany you!