Research
home research academia links personal contactme
 

Currently, I am working at the Speech Communications Lab where my research is on the extraction of acoustic parameters for speech and speaker recognition. A related area I have worked on is the study of phoneme production strategies. My advisor is Prof. Carol Espy-Wilson.

 

Automatic Detection of Irregular Phonation in Continuous Speech

Srikanth Vishnubhotla, Carol Espy-Wilson, Speech Communication Lab, UMD

I am currently working on the extraction of acoustic features that can detect irregular phonation in a speech signal. This work is part of a bigger project with the aim of distinguishing between different voice qualities. In particular, I am working on using the Aperiodicity, Periodicity and Pitch (APP) Detector [Ref. Om D Deshmukh & Carol Espy-Wilson] to analyze creaky voices (and other instances of irregular phonation) for their characteristic APP profiles. In addition, I then apply other knowledge-based constraints to characterize irregular phonation from other confusion elements. A more clear description of this work is given in a paper (to appear) in the ICSLP (Interspeech) 2006. Also, I recently presented a poster at the 150th ASA Meeting at Minneapolis on my work. This research was supported by NSF grant # BCS-0519256.

A PDF version of the ICSLP paper can be obtained here.
A PDF version of the ASA poster can be obtained here.

A New Set of Parameters for Text-Independent Speaker Identification

Carol Espy-Wilson, Sandeep Manocha, Srikanth Vishnubhotla Speech Communication Lab, UMD

The work involved comparing the performance of a set of acoustic features against that of the standard Mel-Frequency Cepstral Coefficients (MFCCs) for text-independent speaker identification. The eight acoustic features that were used included the four formants F1 through F4, the spectral slope, harmonic difference H1-H2 and the aperiodicity & periodicity contents in the speech signal. The first four of these parameters are useful in capturing the vocal tract information, like the dynamic range of configurations of the speaker's vocal tract etc. The latter four features capture the source information of the speaker, and help characterize the voice quality. It was seen that this set of parameters that explicitly capture the speaker-specific information give comparable performance to the standard MFCCs on average, and perform better for female speakers in general. A more clear description of this work is given in a paper (to appear) in the ICSLP (Interspeech) 2006. This work was supported by NSF grant # BCS-0519256.

A PDF version of the ICSLP paper can be obtained here.

Study of Formant Behavior in American English /r/

Srikanth Vishnubhotla, Carol Espy-Wilson, Speech Communication Lab, UMD

The work involved two tasks – the first of these was to investigate the behavior of the speaker position (Upright V/S Supine) on the production strategy for /r/, and the second was a study of the effect of a perturbation block placed along the vocal tract, as well as studying V-C-V transitions involving /r/. In both cases, the time-frequency behavior, which reflects in the variation of the formant frequencies, formed the basis of research. In the first study, formant behavior was collected from a database of ten speakers, for the upright and supine position of the speakers. Statistics of the formant behavior were analyzed, and it was found that the formant behavior does not change much with the position, except for the higher formants F5-F6. In the second study, the database consisted of V-C-V utterances with vowels /a/, /i/ and /u/, the consonant being /r/ in each case. Three speakers were analyzed, two female and one male, in four different conditions: combination of with and without the perturbation block, pre and post adaptation to it. Statistics of the study showed that the male speaker showed some change in formant behavior in one of the conditions, while the others did not show any significant changes. Results of this study are that the formant behavior, and thus the vocal tract configuration for /r/ production, remain unaffected by the speaker position. However, use of the perturbation block does affect the dynamic behavior and thus can be used to identify specific critical points along the vocal tract that affect the production of /r/. Further, the onset of /r/ also seems to be affected by the presence of the perturbation block.

A PDF version of the report of this study can be obtained here.

Following is a short description of work done during my undergraduate study.

A Radar Target Simulator Model: Receiver Performance based on Data Update Rates, Antenna Beam-width, Noise; Performance Enhancement

Srikanth Vishnubhotla, Abhishek Ivaturi, Srinivas N, Defense Electronics Research Laboratory, Hyderabad, India

The aim was to design a new kind of a TWS tracker that would have the functionality of an STT. This tracker is not only able to maintain a more accurate track than conventional TWS radars, but can also take the maneuvers of the target into account while directing the antenna towards the target in the next scan. The antenna does not need to scan the horizon to find a particular target, but is automatically indicated its next probable position by an in-built logic-device that performs the calculations and guides it to the correct position. This way, the track-time is considerably reduced. The logic-device was implemented in MATLAB. 3-Dimensional tracking capability was provided.

 

Home Research Academia & Resume Links Personal Contact Me
Copyright © Srikanth Vishnubhotla 2006