Disentangled Image-Text Classification: Enhancing Visual Representations with MLLM-driven Knowledge Transfer
Shuai, Qianjun, Chen, Xiaohao, Cheng, Yongqiang, Fang, Miao and Jin, Libiao (2026) Disentangled Image-Text Classification: Enhancing Visual Representations with MLLM-driven Knowledge Transfer. Expert Systems with Applications, 304. p. 130790.
| Item Type: | Article |
|---|
Abstract
Multimodal image-text classification plays a critical role in applications such as content moderation, news recommendation, and multimedia understanding. Despite recent advances, visual modality faces higher representation learning complexity than textual modality in semantic extraction, which often leads to a semantic gap between visual and textual representations. In addition, conventional fusion strategies introduce cross-modal redundancy, further limiting classification performance. To address
these issues,we proposeMD-MLLM, a novel image-text classification framework that leverages large multimodal language models (MLLMs) to generate semantically enhanced visual representations.
To mitigate redundancy introduced by direct MLLM feature integration, we introduce a hierarchical disentanglement mechanism based on the Hilbert-Schmidt Independence Criterion (HSIC) and orthogonality constraints, which explicitly separates modality-specific and shared representations. Furthermore, a hierarchical fusion strategy combines original unimodal features with disentangled shared semantics, promoting discriminative feature learning and cross-modal complementarity. Extensive experiments on two benchmark datasets, N24News and Food101, show that MD-MLLM achieves consistently stable improvements in classification accuracy and exhibits competitive performance compared with various representative multimodal baselines. The framework also demonstrates good generalization ability and robustness across different multimodal scenarios. The code is available at https://github.com/xiaohaochen0308/MD-MLLM.
|
PDF
MD_MLLM(eswa) (Clean author submitted).pdf - Accepted Version Restricted to Repository staff only until 18 December 2027. Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (15MB) | Request a copy |
More Information
| Additional Information: The code is available at https://github.com/xiaohaochen0308/MD-MLLM. |
| Related URLs: |
| Depositing User: Yongqiang Cheng |
Identifiers
| Item ID: 19756 |
| Identification Number: 10.1016/j.eswa.2025.130790 |
| URI: https://sure.sunderland.ac.uk/id/eprint/19756 | Official URL: https://www.sciencedirect.com/science/article/abs/... |
Users with ORCIDS
Catalogue record
| Date Deposited: 23 Dec 2025 07:39 |
| Last Modified: 23 Dec 2025 07:39 |
| Author: |
Qianjun Shuai
|
| Author: |
Xiaohao Chen
|
| Author: |
Yongqiang Cheng
|
| Author: | Miao Fang |
| Author: | Libiao Jin |
University Divisions
Faculty of Business and Technology > School of Computer Science and EngineeringSubjects
Computing > Data ScienceComputing > Artificial Intelligence
Actions (login required)
![]() |
View Item (Repository Staff Only) |


Dimensions
Dimensions