Towards Automating the Rhythmic Analysis of Speech: A Comparative Study of English Spoken by American and Thai Speakers

Palahan, Sirinda and Tongvivat, Yuwaree (2025) Towards Automating the Rhythmic Analysis of Speech: A Comparative Study of English Spoken by American and Thai Speakers. International of Speech Technology, 28 (1). pp. 261-279. ISSN 1572-8110

Item Type:	Article

Abstract

Efforts to classify language rhythm have significantly contributed to methodologies for identifying acoustic features that define speech “beats”. This study advances the field through three key contributions: (1) validation of algorithmic rhythm detection, (2) characterization of distinctive rhythmic features in native versus non-native speech, and (3) development of an automated, scalable approach for rhythmic analysis. The effectiveness of the maxD parameter—representing the moment of fastest energy increase—was evaluated as an alternative to manual annotation for identifying syllabic beats. Speech samples from 34 speakers (17 American and 17 Thai) were analyzed using statistical and machine learning models, including Support Vector Machine, Random Forest, Gradient Boosting, and Logistic Regression, to classify rhythmic patterns. Results indicate that the maxD parameter demonstrates strong alignment with manually annotated beat locations, achieving high predictive accuracy for native speakers (RMSE: 0.1182, MAPE: 0.3290%). Additionally, SHapley Additive exPlanations (SHAP) analysis revealed key rhythmic features distinguishing the two groups: maxD_value, intensity_ratio, and max_intensity were identified as key characteristics of native speech, while shorter vowel and syllable durations were established as distinctive native features. Systematic differences in maxD, pitch, and intensity alignment patterns between native and non-native speech were documented, particularly in content words and multisyllabic structures. These findings contribute to the ultimate goal of improving non-native speech by identifying critical rhythmic features that differentiate native and non-native speakers. This research study lays the groundwork for targeted pronunciation training tools and language-learning applications, offering a foundation for enhancing naturalness and fluency in second language acquisition.

Keyword: maxD, language beat, L2 pronunciation, rhythm detection, speech technology

[thumbnail of Bespoke Publisher licence]

PDF (Bespoke Publisher licence)
2025_Palahan&Tongvivat_Rhythm.pdf
Restricted to Repository staff only until 11 March 2026.
Download (884kB) | Request a copy

More Information

Additional Information: This version of the article has been accepted for publication, after peer review (when applicable) and is subject to Springer Nature’s AM terms of use, but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: https://doi.org/10.1007/s10772-025-10177-1

Related URLs: Publisher

Depositing User: Yuwaree Tongvivat