GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale

Přibil, Jiří; Přibilová, Anna; Matoušek, Jindřich

Title:	GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale
Other Titles:	Hodnocení syntetické řeči založené na GMM klasifikaci ve 2D škále potěšení-vzrušení
Authors:	Přibil, Jiří Přibilová, Anna Matoušek, Jindřich
Citation:	PŘIBIL, J. PŘIBILOVÁ, A. MATOUŠEK, J. GMM-Based Evaluation of Synthetic Speech Quality Using 2D Classification in Pleasure-Arousal Scale. Applied Sciences, 2021, roč. 11, č. 1, s. 1-18. ISSN: 2076-3417
Issue Date:	2021
Publisher:	MDPI
Document type:	článek article
URI:	2-s2.0-85098620235 http://hdl.handle.net/11025/45598
ISSN:	2076-3417
Keywords:	klasifikace GMM;statistická analýza;hodnocení syntetické řeči;systém syntézy řeči z textu
Keywords in different language:	GMM classification;statistical analysis;synthetic speech evaluation;text-to-speech system
Abstract:	Článek se zaměřuje na popis systému pro automatické hodnocení kvality syntetické řeči založeného na klasifikátoru modelu gaussovských směsí (GMM). Řečový materiál pocházející od skutečného mluvčího se porovnává se syntetizovaným materiálem, aby se identifikovaly podobnosti nebo rozdíly mezi nimi. Finální hodnocení je určeno vzdálenostmi v prostoru potěšení-vzrušení (Pleasure-Arousal, P-A) mezi původní a syntetickou řečí pomocí různých metod syntézy a/nebo prozodických manipulací implementovaných v českém systému převodu textu na řeč. Modely GMM pro kontinuální 2D detekci tříd P-A jsou trénovány pomocí zvukového/řečového materiálu z databází bez jakéhokoli vztahu k původní řeči nebo k syntetizovaným větám. Předběžné a pomocné analýzy ukazují podstatný vliv počtu směsí, počtu a typu použitých řečových příznaků, velikosti zpracovaného řečového materiálu a typu databáze použité k vytvoření GMM na klasifikaci P-A procesu a na konečném výsledku hodnocení. Hlavní evaluační experimenty potvrzují funkčnost vyvinutého systému. Získané výsledky objektivního hodnocení jsou v zásadě korelovány se subjektivním hodnocením lidských hodnotitelů; byly však naznačeny dílčí rozdíly, takže je nutné provést následné podrobné šetření.
Abstract in different language:	The paper focuses on the description of a system for the automatic evaluation of synthetic speech quality based on the Gaussian mixture model (GMM) classifier. The speech material originating from a real speaker is compared with synthesized material to determine similarities or differences between them. The final evaluation order is determined by distances in the Pleasure-Arousal (P-A) space between the original and synthetic speech using different synthesis and/or prosody manipulation methods implemented in the Czech text-to-speech system. The GMM models for continual 2D detection of P-A classes are trained using the sound/speech material from the databases without any relation to the original speech or the synthesized sentences. Preliminary and auxiliary analyses show a substantial influence of the number of mixtures, the number and type of the speech features used the size of the processed speech material, as well as the type of the database used for the creation of the GMMs on the P-A classification process and on the final evaluation result. The main evaluation experiments confirm the functionality of the system developed. The objective evaluation results obtained are principally correlated with the subjective ratings of human evaluators; however, partial differences were indicated, so a subsequent detailed investigation must be performed.
Rights:	©CC-BY
Appears in Collections:	Články / Articles (KKY) OBD

Files in This Item:

File	Size	Format
applsci-11-00002-v2.pdf	4,96 MB	Adobe PDF	View/Open

Show full item record

Please use this identifier to cite or link to this item: http://hdl.handle.net/11025/45598

search

navigation