Despite the fact that SRC-HE reduces the number of FEs, audio measurements extraction based on SRC would still be not suitable for real-time applications [39]. The previous SRC-HE module is then replaced by the generalised cross correlation phase transform (GCC-PHAT) introduced in Section 2.1, as this does not involve cumbersome point function estimations. The drawback is that the basic GCC algorithm can only detect one source at a time and it is known to be sensitive to room reverberations [5], however it is still effective under moderate reverberant environments (T60≈0.3s) [40]. For these reasons, at first experiments where only a speaker is active at any given time are carried out, as it often happens in a polite conversation between two or more people. Speech segments using a voice activity detector (VAD) [41] are further extracted and processed using a GCC-PHAT step, for the signal to be more robust to reverberations. Thus, the measure vector obtained za (see Section 2.1) can now be rewritten as za={τm(t)}, where each component τm is the TDOA collected at the m-th microphone pair at each time step t. Since TDOAs are not linear in the speaker position, they must be input into an extended Kalman filter (EKF), as in [10] to get an audio position estimation.
