Abstract:
Objective Laser microphone is a kind of equipment which employs optical Doppler effect to acquire acoustic vibration information (speech). Compared with conventional microphones, laser microphones have the characteristics of extended range, high precision and non-contact. It is capable of collecting distant sound field information in a directional fashion while avoiding interference from the sound field close to the equipment. However, when the laser microphone is used to collect the remote sound field speech information, the quality of the obtained speech is affected by many factors, which leads to the severe decline of the laser speech quality. At present, the research of speech enhancement algorithm for laser microphone speech is relatively preliminary. The traditional single-channel speech enhancement method requires the signal and noise to satisfy the conditions of stationarity or correlation, and its performance is significantly reduced under complex conditions such as low signal-to-noise ratio and non-stationarity noise. The method based on deep neural network can understand the complex mapping relationship between noisy speech and clear speech, and the performance is better than the traditional method. This technique, however, has poor generalizability for laser speech from complex targets in unpreset environments because different targets have different frequency response characteristics. Therefore, in order to increase the quality of far-field speech captured by laser microphones, a laser microphone speech enhancement method based on ResUnet network and TFGAN network is proposed in this paper.
Methods Using laboratory-made laser microphones, four different types of objects were used in this paper's remote speech acquisition tests (Fig.6). The technique described in this paper is used to process the recorded speech, and it is contrasted with methods for nonlinear function harmonic reconstruction and DNN+ harmonic reconstruction (Fig.9). Finally, objective speech quality assessment (PESQ) and time-domain segmented signal-to-noise ratio (SNRseg) were used to quantitatively evaluate the processed laser speech (Fig.11).
Results and Discussions Compared with the above two methods, the method proposed in this paper can better suppress the broadband noise and pulse noise and reconstruct the more accurate high-frequency information after the stepwise enhancement processing of the collected laser speech. The laser speech PESQ scores of A4 paper, A4 paper box, corrugated box and PET plastic bottle after this method are 2.126, 1.818, 1.804 and 1.951, respectively increased by 0.129, 0.113, 0.117 and 0.22. The corresponding SNRseg scores were −5.31 dB, −3.36 dB, −5.07 dB and −3.40 dB, which were increased by 1 dB, 6.25 dB, 1.41 dB and 0.17 dB, respectively. The experimental results show that the ResUnet+TFGAN network method proposed in this paper can effectively improve the laser speech quality of the above targets.
Conclusions In this study, a laser microphone speech enhancement method based on ResUnet and TFGAN network is proposed. Speech pieces are gathered on various targets by self-made laser microphones in the lab, and the proposed method is demonstrated through experiments. The experimental results show that this method can enhance the speech of laser microphone from a variety of objects. Compared with the nonlinear function harmonic reconstruction method and DNN+ harmonic reconstruction method, the advantages of this method are that ResUet and TFGAN networks can respectively realize the clear Mel spectrum prediction and time domain waveform recovery of laser speech, avoiding the high-frequency noise introduced by the harmonic reconstruction method in the reconstruction of speech signal, and at the same time recover the more clear high-frequency information of laser speech. PESQ and SNRseg results demonstrate that using the proposed method results in improved speech quality for the laser microphone. This method extends the application range of laser microphones to a certain extent, and we will further verify and improve this method on objects with more complex materials and shapes.