This study investigates the topic of voice-based gender identification, focusing on aspects that concern accuracy and ethical implications. One aim is to review the literature available in order to understand what shapes the expectations of gender as revealed through voice. Then, an experimental investigation is conducted to examine the impact of age on voicebased gender classification. Our evaluation of a Multilayer Perceptron neural network model using MFCC features indicates 0.90 overall accuracy in binary gender classification. This seemingly high accuracy score would mask a substantial 10% misclassification rate in real-world applications. When considering age, accuracy drops further, with varying scores across the seven groups (0.64-0.88). Generally, our results indicate age-related variability in model performance, highlighting limitations and ethical concerns in generalizing across age groups. As concerns societal implications, our literature review and experiments suggest that greater awareness and diversity-informed approaches are needed in the design, development, and marketing of speech technologies.

Whose Voice Speaks Volumes? The Problem with Gender Identification from Speech

Matteo Gay;Claudia Roberta Combei
2024-01-01

Abstract

This study investigates the topic of voice-based gender identification, focusing on aspects that concern accuracy and ethical implications. One aim is to review the literature available in order to understand what shapes the expectations of gender as revealed through voice. Then, an experimental investigation is conducted to examine the impact of age on voicebased gender classification. Our evaluation of a Multilayer Perceptron neural network model using MFCC features indicates 0.90 overall accuracy in binary gender classification. This seemingly high accuracy score would mask a substantial 10% misclassification rate in real-world applications. When considering age, accuracy drops further, with varying scores across the seven groups (0.64-0.88). Generally, our results indicate age-related variability in model performance, highlighting limitations and ethical concerns in generalizing across age groups. As concerns societal implications, our literature review and experiments suggest that greater awareness and diversity-informed approaches are needed in the design, development, and marketing of speech technologies.
2024
La voce nei media e nelle nuove tecnologie: produzione e percezione [The voice in the media and new technologies: production and perception]
Valentina De Iacovo, Bianca Maria De Paolis e Daniela Mereu
Language & Linguistics covers resources concerned with the theoretical, descriptive, and historical aspects of linguistics.
Esperti anonimi
Inglese
Internazionale
ELETTRONICO
225
241
17
978-88-97657-73-6
Officinaventuno
Milano
ITALIA
Articolo in open access
speech technology; voice; gender; bias; ethics
https://www.studi.aisv.it/index.php/home/article/view/268
no
2 Contributo in Volume::2.1 Contributo in volume (Capitolo o Saggio)
2
268
none
Gay, Matteo; Combei, Claudia Roberta
info:eu-repo/semantics/bookPart
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1538776
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact