Ivan Jokić
Docent • Repozitorijum radova
Bibliografske reference
Publikacije i radovi autora prikazani su u kompaktnim karticama.
Use of Covariance Matrix in Automatic Speaker Recognition
M33Proceedings of International Scientific Conference „ALFATECH – Smart Cities and modern technologies“
Use of Covariance Matrix in Automatic Speaker Recognition
Ivan Jokić; Stevan Jokić
2025
—
—
978-86-6461-093-3
204–207
One procedure for automatic speaker recognition based on use of 21 mel-frequency cepstral coefficients as speaker features and covariance matrix as speaker model is tested in this paper. Tests are conducted on the Solo part of the CHAINS speech database which contains 37 recordings for each of 36 speakers. Each speech recording is represented by appropriate matrix of feature vectors. Modeling of recording of speaker is done by covariance matrix of matrix of feature vectors. Results of recognition accuracy are compared for two cases, when on elements of speaker model is applied sigmoid function and when it is not. Tests are done in five stages. Application of sigmoid function on elements of covariance matrices results in most of tests in significantly increasing of recognition accuracy. Achieved mean recognition accuracy for all done tests when sigmoid function is not applied is 87,84% and when sigmoid function is applied is 94,64%.
Automatic speaker recognition; Mel-Frequency Cepstral Coefficients; Covariance Matrix.
M33
Evidencija radova • Ivan Jokić
Otvori radApplication of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities
M53ALFATECH Journal
Application of Mel-Frequency Cepstral Coefficients in Automatic Speaker Recognition as Part of IoT Solutions for Security and Optimization in Smart Cities
Ivan Jokić; Vlado Delić; Zoran Perić
2025
1/1
1XX0-3XX1
—
5–10
This paper presents an implementation of automatic speaker recognition utilizing feature vectors composed of 21 mel- frequency cepstral coefficients (MFCCs) as part of an IoT- driven solution for enhancing security and optimization in smart cities. Experiments are conducted on the Solo portion of the CHAINS database, containing 33 unique sentences pronounced by each of 36 speakers. Results indicate that recognition accuracy varies with the training and testing datasets and improves with longer test recordings. A comparative analysis of MFCC calculation methods reveals that accuracy is generally higher when a sigmoidal square of amplitude characteristic is applied to frequency-selective ranges, rather than an exponential approach. Models are developed for each speaker’s recordings, represented by a covariance matrix of feature vectors, and applying a sigmoid function to the model elements yields a 5% increase in recognition accuracy in most cases. These findings highlight the potential for MFCC-based speker recognition as a scalable, data-driven IoT tool for security, public safety, and resource optimization in the context of smart cities.
Automatic speaker recognition; Mel-frequency cepstral coefficients (MFCCs); Covariance matrix; Exponential; Sigmoidal.
M53
Evidencija radova • Ivan Jokić
Otvori radMel-Frequency Cepstral Coefficients and Spectrum Based Additional Features in Automatic Speaker Recognition
M23Facta Universitatis Series: Electronics and Energetics
Mel-Frequency Cepstral Coefficients and Spectrum Based Additional Features in Automatic Speaker Recognition
Ivan Jokić; Stevan Jokić; Vlado Delić; Zoran Perić
2025
38/4
0353-3670
—
663–680
The efficiency of the proposed automatic speaker recognizer is evaluated using two speech databases. The feature vector consists of 21 mel-frequency cepstral coefficients (MFCCs), along with up to three additional features derived from the amplitude spectrum. The additional features are calculated based on the logarithm of the energy around the appropriate local maximum in the spectrum, the frequency of that maximum, and the logarithm of the energy of the maximum component in the spectrum across all frames of the observed signal. The speaker identification procedure for a closed set of speakers is tested on the Solo section of the CHAINS database and a speech database with expressed emotions, developed within the S-ADAPT project. The achieved maximum mean recognition accuracies are 97.11% on the CHAINS database, using a feature vector of 21 MFCCs and two additional features, and 98.65% on neutral speech, as well as 98.72% on the entire database, for the S-ADAPT database, using a feature vector of 21 MFCCs.
accuracy; audio recording; human voice; speaker recognition; spectral analysis
M23
Evidencija radova • Ivan Jokić
Otvori rad