Close menu

SURE

Sunderland Repository records the research produced by the University of Sunderland including practice-based research and theses.

DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization

Chen, Rhuihan, Junpeng, Tan, Yang, Zhijing, Yang, Xiaojung, Dai, Quingyun, Cheng, Yongqiang and Lin, Liang (2024) DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization. IEEE Transactions on Multimedia. ISSN 1520-9210

Item Type: Article

Abstract

Natural Language Video Localization (NLVL) has
recently attracted much attention because of its practical significance.
However, the existing methods still face the following
challenges: 1) When the models learn intra-modal semantic
association, the temporal causal interaction information and contextual
semantic discriminative information are ignored, resulting
in the lack of intra-modal semantic context connection; 2) When
learning fusion representations, existing cross-modal interaction
modules lack hierarchical attention function to extract intermodal
similarity information and intra-modal self-correlation
information, resulting in insufficient cross-modal information
interaction; 3) When the loss function is optimized, the existing
models ignore the correlation of causal inference between the
start and end boundaries, resulting in inaccurate start and end
boundary calibrations. To conquer the above challenges, we
proposed a novel NLVL model, called Discriminative Parallel
and Hierarchical Attention Network (DPHANet). Specifically,
we emphasized the importance of temporal causal interaction
information and contextual semantic discriminative information
and correspondingly proposed a Discriminative Parallel Attention
Encoder (DPAE) module to infer and encode the above critical
information. Besides, to overcome the shortcomings of the existing
cross-modal interaction modules, we designed a Video-Query
Hierarchical Attention (VQHA) module, which can perform
cross-modal interaction and intra-modal self-correlation modeling
in a hierarchical manner. Furthermore, a novel deviation
loss function was proposed to capture the correlation of causal
inference between the start and end boundaries and force the
model to focus on the continuity and temporal causality in
the video. Finally, extensive experiments on three benchmark
datasets demonstrated the superiority of our proposed DPHANet
model, which has achieved about 1.5% and 3.5% average
performance improvement and about 2.5% and 7.5% maximum
performance improvement on the Charades-STA and TACoS
datasets respectively.

[img]
Preview
PDF (Author Accepted Manuscript on publisher template/ following requested formatting)
FINAL VERSION - TMM.pdf

Download (4MB) | Preview

More Information

Additional Information: © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”
Uncontrolled Keywords: Location awareness , Semantics , TV , Natural languages , Correlation , Glass , Electronic mail
Depositing User: Yongqiang Cheng

Identifiers

Item ID: 17612
Identification Number: https://doi.org/10.1109/TMM.2024.3395888
ISSN: 1520-9210
URI: http://sure.sunderland.ac.uk/id/eprint/17612
Official URL: https://ieeexplore.ieee.org/document/10517423

Users with ORCIDS

ORCID for Yongqiang Cheng: ORCID iD orcid.org/0000-0001-7282-7638

Catalogue record

Date Deposited: 21 Jun 2024 14:15
Last Modified: 21 Jun 2024 14:30

Contributors

Author: Yongqiang Cheng ORCID iD
Author: Rhuihan Chen
Author: Tan Junpeng
Author: Zhijing Yang
Author: Xiaojung Yang
Author: Quingyun Dai
Author: Liang Lin

University Divisions

Faculty of Technology > School of Computer Science

Subjects

Computing > Artificial Intelligence
Computing > Information Systems

Actions (login required)

View Item (Repository Staff Only) View Item (Repository Staff Only)