DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization

Chen, Rhuihan, Junpeng, Tan, Yang, Zhijing, Yang, Xiaojung, Dai, Quingyun, Cheng, Yongqiang and Lin, Liang (2024) DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization. IEEE Transactions on Multimedia. ISSN 1520-9210

Item Type:	Article

Abstract

Natural Language Video Localization (NLVL) has
recently attracted much attention because of its practical significance.
However, the existing methods still face the following
challenges: 1) When the models learn intra-modal semantic
association, the temporal causal interaction information and contextual
semantic discriminative information are ignored, resulting
in the lack of intra-modal semantic context connection; 2) When
learning fusion representations, existing cross-modal interaction
modules lack hierarchical attention function to extract intermodal
similarity information and intra-modal self-correlation
information, resulting in insufficient cross-modal information
interaction; 3) When the loss function is optimized, the existing
models ignore the correlation of causal inference between the
start and end boundaries, resulting in inaccurate start and end
boundary calibrations. To conquer the above challenges, we
proposed a novel NLVL model, called Discriminative Parallel
and Hierarchical Attention Network (DPHANet). Specifically,
we emphasized the importance of temporal causal interaction
information and contextual semantic discriminative information
and correspondingly proposed a Discriminative Parallel Attention
Encoder (DPAE) module to infer and encode the above critical
information. Besides, to overcome the shortcomings of the existing
cross-modal interaction modules, we designed a Video-Query
Hierarchical Attention (VQHA) module, which can perform
cross-modal interaction and intra-modal self-correlation modeling
in a hierarchical manner. Furthermore, a novel deviation
loss function was proposed to capture the correlation of causal
inference between the start and end boundaries and force the
model to focus on the continuity and temporal causality in
the video. Finally, extensive experiments on three benchmark
datasets demonstrated the superiority of our proposed DPHANet
model, which has achieved about 1.5% and 3.5% average
performance improvement and about 2.5% and 7.5% maximum
performance improvement on the Charades-STA and TACoS
datasets respectively.

[thumbnail of Author Accepted Manuscript on publisher template/ following requested formatting]

Preview

PDF (Author Accepted Manuscript on publisher template/ following requested formatting)
FINAL VERSION - TMM.pdf
Available under License Creative Commons Attribution.
Download (4MB) | Preview

More Information

Uncontrolled Keywords: Location awareness , Semantics , TV , Natural languages , Correlation , Glass , Electronic mail

Depositing User: Yongqiang Cheng

Identifiers

Item ID: 17612

Identification Number: 10.1109/TMM.2024.3395888

ISSN: 1520-9210

URI: http://sure.sunderland.ac.uk/id/eprint/17612

Official URL: https://ieeexplore.ieee.org/document/10517423

Users with ORCIDS

ORCID for Yongqiang Cheng:

orcid.org/0000-0001-7282-7638

Catalogue record

Date Deposited: 21 Jun 2024 14:15

Last Modified: 04 Jun 2025 14:58

Contributors

Author:	Yongqiang Cheng
Author:	Rhuihan Chen
Author:	Tan Junpeng
Author:	Zhijing Yang
Author:	Xiaojung Yang
Author:	Quingyun Dai
Author:	Liang Lin

University Divisions

Faculty of Business and Technology > School of Computer Science and Engineering

Subjects

Computing > Artificial Intelligence
Computing > Information Systems

Actions (login required)

View Item (Repository Staff Only)

Altmetric

Dimensions

Download Statistics

Downloads per month over past year

SURE

DPHANet: Discriminative Parallel and Hierarchical Attention Network for Natural Language Video Localization

Abstract

More Information

Identifiers

Users with ORCIDS

Catalogue record

Contributors

University Divisions

Subjects

Actions (login required)

Altmetric

Altmetric

Altmetric

Download Statistics

Download Statistics

Download Statistics