> 연구성과 > 학술발표
Financial institutions employ speech dictation systems that convert the conversation recordings between telemarketer and client into the texts. The dictation system is necessary for checking incomplete sales, in which a telemarketer fails to provide important sales information to a client. However, the manually performed dictation procedure takes too much time and effort. Automatic speech dictation system is being adopted as an alternative. We suggest that, in such an automatic speech dictation system, a speaker diarization to be performed prior to speech recognition. In this paper, we propose a diarization method based on pitch detection, which suits very well to given condition in which two speakers, telemarketer and client, make a conversation in a telephone recording. We suggest a method based on average short time spectral feature and unsupervised learning scheme. In the experiments, actual telephone recordings for insurance contraction are used. In average about 5% of actual telemarketer’s voice was classified as client. Also in average about 5% of client’s voice was classified as telemarketer’s.