Category Archives: Articles & Conference Papers

Pitch and Timbre Interactions in Dynamically Varying Complex Tones

Abstract:

Perceptual interactions between pitch and timbre have been demonstrated repeatedly using discrimination tasks of static sounds in psychoacoustics (Allen & Oxenham, 2014; Marozeau & de Cheveigne, 2007). Furthermore, interactions may be influenced by the congruency of pitch and timbre cues in the stimulus. Larger interference effects on F0 and SC discrimination have been observed when shifts in F0 and SC occurred in the same direction across intervals as opposed to inverse directions (Allen & Oxenham, 2014). Previous interference effects were often prompted via pitch and timbre cues that were static over the duration of the stimulus. Timbre and pitch, however, are rarely this simple in natural sounds. In the current study, effects of stimulus congruency on pitch-timbre interactions are investigated in static and dynamic sounds. Listeners discriminated the pitch of static complex sounds and sounds with movement in the fundamental frequency (F0) and spectral centroid (SC) in two yes/no tasks. Measures of pitch sensitivity indicate that stimulus congruency effects perceptual interactions to both static and dynamic stimuli, with larger effects as the difference between cues increase. Analyses on cue movement and congruency suggest larger relative interaction effects for dynamic sounds relative to static sounds.

Publication(s):

Anderson, Ryan & Shen, Yi & Shofner, William. (2023). Pitch and Timbre Interactions in Dynamically Varying Complex Tones..

Authors:

Ryan Anderson, Yi Shen, William P Shofner

Differential sensitivity to speech rhythms in young and older adults

Abstract:

Sensitivity to the temporal properties of auditory patterns tends to be poorer in older listeners, and this has been hypothesized to be one factor contributing to their poorer speech understanding. This study examined sensitivity to speech rhythms in young and older normal-hearing subjects, using a task designed to measure the effect of speech rhythmic context on the detection of changes in the timing of word onsets in spoken sentences. A temporal-shift detection paradigm was used in which listeners were presented with an intact sentence followed by two versions of the sentence in which a portion of speech was replaced with a silent gap: one with correct gap timing (the same duration as the missing speech) and one with altered gap timing (shorter or longer than the duration of the missing speech), resulting in an early or late resumption of the sentence after the gap. The sentences were presented with either an intact rhythm or an altered rhythm preceding the silent gap. Listeners judged which sentence had the altered gap timing, and thresholds for the detection of deviations from the correct timing were calculated separately for shortened and lengthened gaps. Both young and older listeners demonstrated lower thresholds in the intact rhythm condition than in the altered rhythm conditions. However, shortened gaps led to lower thresholds than lengthened gaps for the young listeners, while older listeners were not sensitive to the direction of the change in timing. These results show that both young and older listeners rely on speech rhythms to generate temporal expectancies for upcoming speech events. However, the absence of lower thresholds for shortened gaps among the older listeners indicates a change in speech-timing expectancies with age. A further examination of individual differences within the older group revealed that those with better rhythm-discrimination abilities (from a separate study) tended to show the same heightened sensitivity to early events observed with the young listeners.

Publication(s):

Pearson, Dylan & Shen, Yi & Mcauley, J. & Kidd, Gary. (2023). Differential sensitivity to speech rhythms in young and older adults. Frontiers in Psychology. 14. 10.3389/fpsyg.2023.1160236.

Authors:

Dylan V. Pearson, Yi Shen, J. Devin Mcauley, Gary R. Kidd

Spectral weighting for sentence recognition in steady-state and amplitude-modulated noise

Abstract:

Spectral weights in octave-frequency bands from 0.25 to 4 kHz were estimated for speech-in-noise recognition using two sentence materials (i.e., the IEEE and AzBio sentences). The masking noise was either unmodulated or sinusoidally amplitude-modulated at 8 Hz. The estimated spectral weights did not vary significantly across two test sessions and were similar for the two sentence materials. Amplitude-modulating the masker increased the weight at 2 kHz and decreased the weight at 0.25 kHz, which may support an upward shift in spectral weights for temporally fluctuating maskers.

Publication(s):

Shen, Yi & Langley, Lauren. (2023). Spectral weighting for sentence recognition in steady-state and amplitude-modulated noise. JASA express letters. 3. 10.1121/10.0017934.

Authors:

Yi Shen, Lauren Langley

Verification of Estimated Output Signal-to-Noise Ratios From a Phase Inversion Technique Using a Simulated Hearing Aid

Abstract:

Purpose The signal-to-noise ratio (SNR) for speech presented in background noise may vary after being processed by digital hearing aids with nonlinear signal processing algorithms, such as wide dynamic range compression (WDRC). A phase inversion technique has been previously developed to assess the output SNR of hearing aids. However, systematic validations of this technique have not been conducted. This study aims to validate the phase inversion technique. Method A simulated hearing aid with multichannel WDRC was implemented, from which the output SNRs, computed via shadow filtering, for connected speech in background noise were directly computed. The agreement between the shadow filter output SNRs and those estimated using the phase inversion technique for the same stimuli was utilized to validate the phase inversion technique. The background noise was 2- or 20-talker babble noise, and the speech stimuli were presented at SNRs of −10 to +10 dB at the input of the simulated hearing aid. The simulated hearing aid was configured to provide amplification for four representative audiograms, and the WDRC was set to be fast or slow acting. To investigate the effects of additive noise, independent of the presented noise stimulus, on the phase inversion estimated output SNR, the same simulated hearing aid was implemented with an additive Gaussian noise at its input (45 and 60 dB SPL). Results Results showed that the phase inversion technique could either overestimate or underestimate output SNR, depending on the test condition; the estimation errors tended to coincide with temporal landmarks, such as natural pauses between consecutive sentences or fricatives; and increasing the simulated noise led to poorer estimates of output SNR. Conclusions Results imply that the accuracy of the phase inversion technique is dependent on the test conditions. Thus, the phase inversion technique should be used with caution, and its validity should be evaluated further.

Publication(s):

Yun, Donghyeon & Shen, Yi & Lentz, Jennifer. (2023). Verification of Estimated Output Signal-to-Noise Ratios From a Phase Inversion Technique Using a Simulated Hearing Aid. American Journal of Audiology. 32. 1-13. 10.1044/2022_AJA-22-00023.

Authors:

Donghyeon Yun, Yi Shen, Jennifer J Lentz

System and Method for Individualized Hearing Aid Prescription

Abstract:

Disclosed is a system and method for individualized hearing aid prescription which consists of a test procedure that enables the fitting of an individualized estimation of the SII model to individual listeners efficiently and an optimization process which translates the resulting individualized model to the prescribed gains across frequencies for programming into the user’s hearing aids. The test involves the recognition of one or more words presented in background noise, which better approximates the daily listening experiences of hearing-aid users compared to the pure-tone detection in a silent environment task which is commonly utilized during conventional audiometric testing. In the estimated SII model, five parameters describe in a custom fashion the relative weights of speech information across the five frequency bands for a given listener. The results from the speech test is used to determine the desirable amount of gain for each frequency region to optimize the user’s aided speech intelligibility. The resulting gains for the individual may then be programmatically applied to the individual’s prescribed hearing aid device(s).

Publication(s):

Shen, Yi. (2023). SYSTEM AND METHOD FOR INDIVIDUALIZED HEARING AID PRESCRIPTION.

Authors:

Yi Shen

The effect of rhythm on selective listening in multiple-source environments for young and older adults

Abstract:

Understanding continuous speech with competing background sounds is challenging, particularly for older adults. One stimulus property that may aid listeners understanding of to-be-attended (target) material is temporal regularity (rhythm). In the context of speech-in-noise understanding, McAuley and colleagues recently showed a target rhythm effect whereby recognition of target speech was better when natural speech rhythm of a target talker was intact than when it was temporally altered. The current study replicates the target rhythm effect using a synthetic vowel sequence paradigm in young adults (Experiment 1) and then uses this paradigm to investigate potential age-related changes in the effect of rhythm on recognition (Experiment 2). Listeners identified the last three vowels of temporally regular (isochronous) and irregular (anisochronous) synthetic vowel sequences in quiet and with a competing background sequence of vowel-like harmonic tone complexes presented at various tempos. The results replicated the target rhythm effect whereby temporal regularity in the vowel sequences improved identification accuracy of young listeners compared to irregular vowel sequences. The magnitude of the effect was not found to be influenced by background tempo, but faster background tempos led to greater vowel identification accuracy independent of regularity. Older listeners also demonstrated a target rhythm effect but received less benefit from the temporal regularity of the target sequences than did young listeners. This study highlights the importance of rhythm for understanding age-related differences in selective listening in complex environments and provides a novel paradigm for investigating effects of rhythm on perception.

Publication(s):

Pearson, Dylan & Shen, Yi & Mcauley, J. & Kidd, Gary. (2023). The effect of rhythm on selective listening in multiple-source environments for young and older adults. Hearing Research. 435. 108789. 10.1016/j.heares.2023.108789.

Authors:

Dylan V. Pearson, Yi Shen, J. Devin Mcauley, Gary R. Kidd

Investigation on the Band Importance of Phase-aware Speech Enhancement

Abstract:

Many existing phase-aware speech enhancement algorithms consider the phase at all spectral frequencies to be equally important to perceptual quality and intelligibility. Although improvements are observed according to both objective and subjective measures, as compared to phase-insensitive approaches, it is not clear whether phase information is equally important across the frequency spectrum. In this paper, we investigate the importance of estimating phase across spectral regions, by conducting a pairwise listening study to determine if phase enhancement can be limited to certain frequency bands. Our experimental results suggest that estimating phase at lower-frequency bands is mostly important for speech quality in normal-hearing (NH) listeners. We further propose a hybrid deep-learning framework that adopts two sub-networks for handling phase differently across the spectrum. The proposed hybrid-net significantly improves the model compatibility with low-resource platforms while achieving superior performance to the original phase-aware speech enhancement approaches.

Publication(s):

Zhang, Zhuohuang & Williamson, Donald & Shen, Yi. (2022). Investigation on the Band Importance of Phase-aware Speech Enhancement. 4651-4655. 10.21437/Interspeech.2022-284.

Authors:

Zhuohuang Zhang, Donald Williamson, Yi Shen

Feasibility of hearing aid gain self-adjustment using speech recognition

Abstract:

Personal hearing devices, such as hearing aids, may be fine-tuned by allowing the users to conduct self-adjustment. Two self-adjustment procedures were developed to collect the listener preferred gains in six octave-frequency bands from 0.25 kHz to 8 kHz. These procedures were designed to allow rapid exploration of a multi-dimensional parameter space using a simple, one-dimensional user control interface (i.e., a programmable knob). The two procedures differ in whether the user interface controls the gains in all frequency bands simultaneously (Procedure A) or only the gain in one frequency band (Procedure B) on a given trial. Monte-Carlo simulations suggested that for both procedures the gain preference identified by simulated listeners rapidly converged to the ground-truth preferred gain profile over the first 20 trials. Initial behavioral evaluations of the self-adjustment procedures, in terms of test-retest reliability, were conducted using 20 young, normal-hearing listeners. Each estimate of the preferred gain profile took less than 20 minutes. The deviation between two separate estimates of the preferred gain profile, conducted at least a week apart, was about 10 dB ~ 15 dB.

Publication(s):

Yun, Donghyeon & Shen, Yi & Zhang, Zhuohuang. (2022). Feasibility of hearing aid gain self-adjustment using speech recognition. The Journal of the Acoustical Society of Korea. 41. 76-86. 10.7776/ASK.2022.41.1.076.

Authors:

Donghyeon Yun, Yi Shen, Zhuohuang Zhang

Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners

Abstract:

Phase serves as a critical component of speech that influences the quality and intelligibility. Current speech enhancement algorithms are beginning to address phase distortions, but the algorithms focus on normal-hearing (NH) listeners. It is not clear whether phase enhancement is beneficial for hearing-impaired (HI) listeners. We investigated the influence of phase distortion on speech quality through a listening study, in which NH and HI listeners provided speech-quality ratings using the MUSHRA procedure. In one set of conditions, the speech was mixed with babble noise at 4 different signal-to-noise ratios (SNRs) from -5 to 10 dB. In another set of conditions, the SNR was fixed at 10 dB and the noisy speech was presented in a simulated re-verberant room with T60s ranging from 100 to 1000 ms. The speech level was kept at 65 dB SPL for NH listeners and amplification was applied for HI listeners to ensure audibility. Ideal ratio masking (IRM) was used to simulate speech enhancement. Two objective metrics (i.e., PESQ and HASQI) were utilized to compare subjective and objective ratings. Results indicate that phase distortion has a negative impact on perceived quality for both groups and PESQ is more closely correlated with human ratings.

Publication(s):

Zhang, Zhuohuang & Williamson, Donald & Shen, Yi. (2020). Investigation of Phase Distortion on Perceived Speech Quality for Hearing-impaired Listeners.

Authors:

Donghyeon Yun, Yi Shen, Zhuohuang Zhang

Individualized estimation of the Speech Intelligibility Index for short sentences: Test-retest reliability

Abstract:

The speech intelligibility index (SII) model was modified to allow individualized parameters. These parameters included the relative weights of speech cues in five octave-frequency bands ranging from 0.25 to 4 kHz, i.e., the band importance function, and the transfer function that allows the SII to generate predictions on speech-recognition scores. A Bayesian adaptive procedure, the quick-band-importance-function (qBIF) procedure, was utilized to enable efficient estimation of the SII parameters from individual listeners. In two experiments, the SII parameters were estimated for 30 normal-hearing adults using Institute of Electrical and Electronics Engineers (IEEE) sentences at speech levels of 55, 65, and 75 dB sound pressure level (in Experiment I) and for 15 hearing-impaired (HI) adult listeners using amplified IEEE or AzBio sentences (in Experiment II). In both experiments, even without prior training, the estimated model parameters showed satisfactory reliability between two runs of the qBIF procedure at least one week apart. For the HI listeners, inter-listener variability in most estimated SII parameters was larger than intra-listener variability of the qBIF procedure.

Publication(s):

Shen, Yi & Yun, Donghyeon & Liu, Yi. (2020). Individualized estimation of the Speech Intelligibility Index for short sentences: Test-retest reliability. The Journal of the Acoustical Society of America. 148. 1647-1661.

Authors:

Yi Shen, Donghyeon Yun, Yi Liu