提出了一种规则和隐马尔可夫模型相结合的音频分层分类算法,首先利用规则将新闻节目中的音频分为静音、语音和音乐三类,然后采用隐马尔可夫模型进一步将语音和音乐细分为男主持人语音、女主持人语音、交替报道、独白语音、现场语音和音乐六类。实验结果表明,男主持人语音、女主持人语音以及音乐的分类效果最好,查准率和查全率均可达90%以上;交替报道的分类性能最差,查准率为57.5%,查全率为79.3%;其他类别的分类性能居中,在70%~90%左右。与同类算法相比,该算法分类性能较高。
This paper proposed hierarchical audio classification algorithm, which first classified the news audio stream into silence, speech and music with rule-based classifier, and then employed hidden Markov models to categorize the speech and music to male-anchor speech, female-anchor speech, alternate speech, monologue speech, live report and music. The experiment results show that the classification works best in male-anchor speech, female-anchor speech and music, in which precision and reall can both reach more than 90%. The classification performs worst in alternate speech with precision of 57.5% and with recall of 79.3%. The performance of classification in other types is at the average level with precision and recall ranging from 70% to 90%. Compared with the other representative algorithm, this method works well with relatively high precision.