Close menu

SURE

Sunderland Repository records the research produced by the University of Sunderland including practice-based research and theses.

Towards Automating the Rhythmic Analysis of Speech: A Comparative Study of English Spoken by American and Thai Speakers

Palahan, Sirinda and Tongvivat, Yuwaree (2025) Towards Automating the Rhythmic Analysis of Speech: A Comparative Study of English Spoken by American and Thai Speakers. International of Speech Technology, 28 (1). pp. 261-279. ISSN 1572-8110

Item Type: Article

Abstract

Efforts to classify language rhythm have significantly contributed to methodologies for identifying acoustic features that define speech “beats”. This study advances the field through three key contributions: (1) validation of algorithmic rhythm detection, (2) characterization of distinctive rhythmic features in native versus non-native speech, and (3) development of an automated, scalable approach for rhythmic analysis. The effectiveness of the maxD parameter—representing the moment of fastest energy increase—was evaluated as an alternative to manual annotation for identifying syllabic beats. Speech samples from 34 speakers (17 American and 17 Thai) were analyzed using statistical and machine learning models, including Support Vector Machine, Random Forest, Gradient Boosting, and Logistic Regression, to classify rhythmic patterns. Results indicate that the maxD parameter demonstrates strong alignment with manually annotated beat locations, achieving high predictive accuracy for native speakers (RMSE: 0.1182, MAPE: 0.3290%). Additionally, SHapley Additive exPlanations (SHAP) analysis revealed key rhythmic features distinguishing the two groups: maxD_value, intensity_ratio, and max_intensity were identified as key characteristics of native speech, while shorter vowel and syllable durations were established as distinctive native features. Systematic differences in maxD, pitch, and intensity alignment patterns between native and non-native speech were documented, particularly in content words and multisyllabic structures. These findings contribute to the ultimate goal of improving non-native speech by identifying critical rhythmic features that differentiate native and non-native speakers. This research study lays the groundwork for targeted pronunciation training tools and language-learning applications, offering a foundation for enhancing naturalness and fluency in second language acquisition.

Keyword: maxD, language beat, L2 pronunciation, rhythm detection, speech technology

[img] PDF (Bespoke Publisher licence)
2025_Palahan&Tongvivat_Rhythm.pdf
Restricted to Repository staff only until 11 March 2026.

Download (884kB) | Request a copy

More Information

Additional Information: This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10772-025-10177-1
Related URLs:
Depositing User: Yuwaree Tongvivat

Identifiers

Item ID: 19067
Identification Number: https://doi.org/10.1007/s10772-025-10177-1
ISSN: 1572-8110
URI: http://sure.sunderland.ac.uk/id/eprint/19067
Official URL: https://link.springer.com/article/10.1007/s10772-0...

Users with ORCIDS

ORCID for Sirinda Palahan: ORCID iD orcid.org/0000-0002-1110-8928
ORCID for Yuwaree Tongvivat: ORCID iD orcid.org/0009-0000-0233-3528

Catalogue record

Date Deposited: 22 May 2025 10:59
Last Modified: 22 May 2025 10:59

Contributors

Author: Sirinda Palahan ORCID iD
Author: Yuwaree Tongvivat ORCID iD

University Divisions

Faculty of Business, Law and Tourism > Sunderland Business School

Subjects

Languages > Languages
Computing > Programming

Actions (login required)

View Item (Repository Staff Only) View Item (Repository Staff Only)