chapter 21 in

**Introduction**

This webpage provides material related to:

- Morrison, G.S., Enzinger, E., Ramos, D., González-Rodríguez, J., Lozano-Díez, A. (2020). Statistical models in forensic voice comparison. In Banks, D.L., Kafadar, K., Kaye, D.H., Tackett, M. (Eds.)
*Handbook of Forensic Statistics*(Ch. 21). Boca Raton, FL: CRC.

**Abstract**

- This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings / x-vectors), mismatch compensation, and score to likelihood ratio conversion (aka calibration). Empirical validation of forensic voice comparison systems is also covered. The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic speaker recognition literature from which the signal-processing and statistical-modeling techniques are mostly drawn. Knowledge of the likelihood ratio framework for the evaluation of forensic evidence is assumed. It is hoped that the material presented here will be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could potentially also be applied to data from other branches of forensic science.

**Table of Contents**

1 Introduction

2 Feature extraction

2.1 Mel-frequency cepstral coefficients (MFCCs)

2.2 Deltas and double deltas

2.3 Voice-activity detection (VAD) and diarization

3 Mismatch compensation in the feature domain

3.1 Cepstral-mean subtraction (CMS) and Cepstral-mean-and-variance normalization (CMVN)

3.2 Feature warping

4 GMM-UBM

4.1 Training the relevant-population model (the UBM): Expectation maximization (EM) algorithm

4.2 Training the known-speaker model: Maximum a posteriori (MAP) adaptation

4.3 Calculating a score

4.4 Remarks regarding UBM training data

5 i-vector PLDA

5.1 i-vectors

5.2 i-vector domain mismatch compensation (LDA)

5.3 PLDA

6 DNN-based systems

6.1 DNN senone posterior i-vector systems

6.2 Bottleneck-feature based systems

6.3 DNN speaker embedding systems (x-vector systems)

7 Score-to-likelihood-ratio conversion (calibration)

8 Validation

8.1 List of published validation studies

9 Conclusion

10 Acknowledgments

11 Appendix A: Mathematical details of T matrix training and i-vector extraction

12 References

Legal references

**Preprint**

- Morrison, et al (2020) Statistical models in forensic voice comparison - 2019-12-23a.pdf

- also available at arXiv:1912.13242

**Color Figures**

**Video Lectures**

Lectures by Geoffrey Stewart Morrison, Aston University

This webpage is maintained by Geoffrey Stewart Morrison.

Last update 2020-01-01