Synthetic Speech Evaluation by 2D GMM Classification in Pleasure-Arousal Scale

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

Title:	Synthetic Speech Evaluation by 2D GMM Classification in Pleasure-Arousal Scale
Other Titles:	Hodnocení syntetické řeči pomocí 2D GMM klasifikace ve škále potěšení-vzrušení
Authors:	Přibil, Jiří Přibilová, Anna Matoušek, Jindřich
Citation:	PŘIBIL, J., PŘIBILOVÁ, A., MATOUŠEK, J. Synthetic Speech Evaluation by 2D GMM Classification in Pleasure-Arousal Scale. In: 2020 43nd International Conference on Telecommunications and Signal Processing (TSP). New York: IEEE, 2020. s. 10-13. ISBN 978-1-72816-376-5.
Issue Date:	2020
Publisher:	IEEE
Document type:	konferenční příspěvek conferenceObject
URI:	2-s2.0-85090553924 http://hdl.handle.net/11025/42771
ISBN:	978-1-72816-376-5
Keywords:	klasifikace GMM;statistická analýza;hodnocení syntetické řeči;systém syntézy řeči z textu
Keywords in different language:	GMM classification;statistical analysis;synthetic speech evaluation;text-to-speech system
Abstract:	Příspěvek je zaměřen na popis systému pro automatické hodnocení kvality syntetické řeči na základě dvourozměrné detekce v měřítku potěšení-vzrušení (Pleasure-Arousal, P-A). Původní řečový materiál je porovnán se syntetizovaným, aby bylo možné najít podobnosti/rozdíly mezi nimi. Pro kontinuální detekci P-A se používá klasifikátor modelu gaussovských směsí (GMM). Modely GMM tříd P-A jsou vytvářeny a trénovány pomocí zvukového/řečového materiálu z databáze označené přímo v měřítku P-A bez jakéhokoli vztahu k použité původní řeči nebo testovaným větám. Základní experimenty potvrzují principiální funkčnost vyvinutého systému. Dodatečná analýza ukazuje velký význam správného výběru počtu směsí a použitého typu zvukové/řečové databáze pro vytváření modelů GMM. Získané výsledky objektivního hodnocení vysoce korelují se subjektivním hodnocením lidských hodnotitelů.
Abstract in different language:	The paper is focused on a description of a system for automatic evaluation of synthetic speech quality based on two-dimensional detection in the Pleasure-Arousal (P-A) scale. The original speech material of a speaker used for synthesis is compared with the synthesized one to find similarities/differences between them. For continual P-A detection, the Gaussian mixture model (GMM) classifier is used. The GMM models of the P-A classes are created and trained using the sound/speech material from the database labelled directly in the P-A scale without any relation with the used original speech or the tested sentences. The basic experiments confirm the principal functionality of the developed system. Additional analysis shows the great importance of the proper selection of the number of mixtures, and the used type of the sound/speech database for GMM models building. The obtained objective evaluation results are highly correlated with the subjective ratings of human evaluators.
Rights:	Plný text je přístupný v rámci univerzity přihlášeným uživatelům. © IEEE
Appears in Collections:	Konferenční příspěvky / Conference papers (NTIS) Konferenční příspěvky / Conference Papers (KKY) OBD

Files in This Item:

File	Size	Format
TSP2020-proceedings_AJPs.pdf	503,29 kB	Adobe PDF	View/Open Request a copy

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/42771

search

navigation