Ivan Jokić

Docent • Repozitorijum radova

Bibliografske reference

Publikacije i radovi autora prikazani su u kompaktnim karticama.

Use of Covariance Matrix in Automatic Speaker Recognition

M33

Naziv publikacije / časopisa

Proceedings of International Scientific Conference „ALFATECH – Smart Cities and modern technologies“

Naslov rada

Use of Covariance Matrix in Automatic Speaker Recognition

Autori

Ivan Jokić; Stevan Jokić

Godina izdanja

2025

Vol/No.

—

ISSN

—

ISBN

978-86-6461-093-3

DOI

10.46793/ALFATECHproc25.204J

Stranice

204–207

Link

https://doi.org/10.46793/ALFATECHproc25.204J

Apstrakt

One procedure for automatic speaker recognition based on use of 21 mel-frequency cepstral coefficients as speaker features and covariance matrix as speaker model is tested in this paper. Tests are conducted on the Solo part of the CHAINS speech database which contains 37 recordings for each of 36 speakers. Each speech recording is represented by appropriate matrix of feature vectors. Modeling of recording of speaker is done by covariance matrix of matrix of feature vectors. Results of recognition accuracy are compared for two cases, when on elements of speaker model is applied sigmoid function and when it is not. Tests are done in five stages. Application of sigmoid function on elements of covariance matrices results in most of tests in significantly increasing of recognition accuracy. Achieved mean recognition accuracy for all done tests when sigmoid function is not applied is 87,84% and when sigmoid function is applied is 94,64%.

Ključne reči

Automatic speaker recognition; Mel-Frequency Cepstral Coefficients; Covariance Matrix.

Kategorija objave

M33

Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities

M53

Naziv publikacije / časopisa

ALFATECH Journal

Naslov rada

Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities

Autori

Ivan Jokić; Vlado Delić; Zoran Perić

Godina izdanja

2025

Vol/No.

1/1

ISSN

1XX0-3XX1

ISBN

—

DOI

10.46793/AlfaTech1.1.05J

Stranice

5–10

Link

https://doi.org/10.46793/AlfaTech1.1.05J

Apstrakt

This paper presents an implementation of automatic speaker recognition utilizing feature vectors composed of 21 mel- frequency cepstral coefficients (MFCCs) as part of an IoT- driven solution for enhancing security and optimization in smart cities. Experiments are conducted on the Solo portion of the CHAINS database, containing 33 unique sentences pronounced by each of 36 speakers. Results indicate that recognition accuracy varies with the training and testing datasets and improves with longer test recordings. A comparative analysis of MFCC calculation methods reveals that accuracy is generally higher when a sigmoidal square of amplitude characteristic is applied to frequency-selective ranges, rather than an exponential approach. Models are developed for each speaker’s recordings, represented by a covariance matrix of feature vectors, and applying a sigmoid function to the model elements yields a 5% increase in recognition accuracy in most cases. These findings highlight the potential for MFCC-based speker recognition as a scalable, data-driven IoT tool for security, public safety, and resource optimization in the context of smart cities.

Ključne reči

Automatic speaker recognition; Mel-frequency cepstral coefficients (MFCCs); Covariance matrix; Exponential; Sigmoidal.

Kategorija objave

M53

Mel-Frequency Cepstral Coefficients and Spectrum Based Additional Features in Automatic Speaker Recognition

M23

Naziv publikacije / časopisa

Facta Universitatis Series: Electronics and Energetics

Naslov rada

Mel-Frequency Cepstral Coefficients and Spectrum Based Additional Features in Automatic Speaker Recognition

Autori

Ivan Jokić; Stevan Jokić; Vlado Delić; Zoran Perić

Godina izdanja

2025

Vol/No.

38/4

ISSN

0353-3670

ISBN

—

DOI

10.2298/FUEE2504663J

Stranice

663–680

Link

https://doi.org/10.2298/FUEE2504663J

Apstrakt

The efficiency of the proposed automatic speaker recognizer is evaluated using two speech databases. The feature vector consists of 21 mel-frequency cepstral coefficients (MFCCs), along with up to three additional features derived from the amplitude spectrum. The additional features are calculated based on the logarithm of the energy around the appropriate local maximum in the spectrum, the frequency of that maximum, and the logarithm of the energy of the maximum component in the spectrum across all frames of the observed signal. The speaker identification procedure for a closed set of speakers is tested on the Solo section of the CHAINS database and a speech database with expressed emotions, developed within the S-ADAPT project. The achieved maximum mean recognition accuracies are 97.11% on the CHAINS database, using a feature vector of 21 MFCCs and two additional features, and 98.65% on neutral speech, as well as 98.72% on the entire database, for the S-ADAPT database, using a feature vector of 21 MFCCs.

Ključne reči

accuracy; audio recording; human voice; speaker recognition; spectral analysis

Kategorija objave

M23