Hidden conditional random fields for visual speech recognition

Adrian Pass, Jianguo Zhang, Darryl Stewart

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    1 Citation (Scopus)

    Abstract

    In this paper we present the application of hidden conditional random fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.
    Original languageEnglish
    Title of host publicationProceedings 13th International Machine Vision and Image Processing Conference, 2009
    Subtitle of host publicationIMVIP '09.
    EditorsKen Dawson-Howe, Rozenn Dahyot, Anil Kokaram, Gerard Lacey
    Place of PublicationLos Alamitos, Calif.
    PublisherIEEE
    Pages117-122
    Number of pages6
    ISBN (Electronic)9780769537962
    ISBN (Print)9781424448753
    DOIs
    Publication statusPublished - 2009
    Event13th International Machine Vision and Image Processing Conference - Dublin, Ireland
    Duration: 2 Sep 20094 Sep 2009

    Conference

    Conference13th International Machine Vision and Image Processing Conference
    CountryIreland
    CityDublin
    Period2/09/094/09/09

    Fingerprint

    Speech recognition
    Image compression

    Cite this

    Pass, A., Zhang, J., & Stewart, D. (2009). Hidden conditional random fields for visual speech recognition. In K. Dawson-Howe, R. Dahyot, A. Kokaram, & G. Lacey (Eds.), Proceedings 13th International Machine Vision and Image Processing Conference, 2009: IMVIP '09. (pp. 117-122). Los Alamitos, Calif.: IEEE. https://doi.org/10.1109/IMVIP.2009.28
    Pass, Adrian ; Zhang, Jianguo ; Stewart, Darryl. / Hidden conditional random fields for visual speech recognition. Proceedings 13th International Machine Vision and Image Processing Conference, 2009: IMVIP '09.. editor / Ken Dawson-Howe ; Rozenn Dahyot ; Anil Kokaram ; Gerard Lacey. Los Alamitos, Calif. : IEEE, 2009. pp. 117-122
    @inproceedings{c92936b4083c489cbca845c7a26975b7,
    title = "Hidden conditional random fields for visual speech recognition",
    abstract = "In this paper we present the application of hidden conditional random fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.",
    author = "Adrian Pass and Jianguo Zhang and Darryl Stewart",
    year = "2009",
    doi = "10.1109/IMVIP.2009.28",
    language = "English",
    isbn = "9781424448753",
    pages = "117--122",
    editor = "Ken Dawson-Howe and Rozenn Dahyot and Anil Kokaram and Gerard Lacey",
    booktitle = "Proceedings 13th International Machine Vision and Image Processing Conference, 2009",
    publisher = "IEEE",

    }

    Pass, A, Zhang, J & Stewart, D 2009, Hidden conditional random fields for visual speech recognition. in K Dawson-Howe, R Dahyot, A Kokaram & G Lacey (eds), Proceedings 13th International Machine Vision and Image Processing Conference, 2009: IMVIP '09.. IEEE, Los Alamitos, Calif., pp. 117-122, 13th International Machine Vision and Image Processing Conference, Dublin, Ireland, 2/09/09. https://doi.org/10.1109/IMVIP.2009.28

    Hidden conditional random fields for visual speech recognition. / Pass, Adrian; Zhang, Jianguo; Stewart, Darryl.

    Proceedings 13th International Machine Vision and Image Processing Conference, 2009: IMVIP '09.. ed. / Ken Dawson-Howe; Rozenn Dahyot; Anil Kokaram; Gerard Lacey. Los Alamitos, Calif. : IEEE, 2009. p. 117-122.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    TY - GEN

    T1 - Hidden conditional random fields for visual speech recognition

    AU - Pass, Adrian

    AU - Zhang, Jianguo

    AU - Stewart, Darryl

    PY - 2009

    Y1 - 2009

    N2 - In this paper we present the application of hidden conditional random fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

    AB - In this paper we present the application of hidden conditional random fields (HCRFs) to modeling speech for visual speech recognition. HCRFs may be easily adapted to model long range dependencies across an observation sequence. As a result visual word recognition performance can be improved as the model is able to take more of a contextual approach to generating state sequences. Results are presented from a speaker-dependent, isolated digit, visual speech recognition task using comparisons with a baseline HMM system. We firstly illustrate that word recognition rates on clean video using HCRFs can be improved by increasing the number of past and future observations being taken into account by each state. Secondly we compare model performances using various levels of video compression on the test set. As far as we are aware this is the first attempted use of HCRFs for visual speech recognition.

    U2 - 10.1109/IMVIP.2009.28

    DO - 10.1109/IMVIP.2009.28

    M3 - Conference contribution

    SN - 9781424448753

    SP - 117

    EP - 122

    BT - Proceedings 13th International Machine Vision and Image Processing Conference, 2009

    A2 - Dawson-Howe, Ken

    A2 - Dahyot, Rozenn

    A2 - Kokaram, Anil

    A2 - Lacey, Gerard

    PB - IEEE

    CY - Los Alamitos, Calif.

    ER -

    Pass A, Zhang J, Stewart D. Hidden conditional random fields for visual speech recognition. In Dawson-Howe K, Dahyot R, Kokaram A, Lacey G, editors, Proceedings 13th International Machine Vision and Image Processing Conference, 2009: IMVIP '09.. Los Alamitos, Calif.: IEEE. 2009. p. 117-122 https://doi.org/10.1109/IMVIP.2009.28