Eugene, Tan Boon Hong
(2024)
Visual Semantic Context-aware
Attention-based Dialog Model.
PhD thesis, Universiti Sains Malaysia.
Abstract
Visual dialogue dataset, i.e. VisDial v1.0 includes a wide range of Microsoft Common
Objects in Context (MSCOCO) image contents and collected questions via a
crowdsourcing marketplace platform (i.e. Amazon Mechanical Turk). The use of existing
question history and images no longer contributes to a better understanding of
the image context as they do not cover the entire image semantic context. This research
proposes the DsDial dataset, which is a context-aware visual dialogue that groups all
relevant dialogue histories extracted based on their respective MSCOCO image categories.
This research also exploits the overlapping visual context between images via
adaptive relevant dialogue history selection during new dataset generation based on the
groups of all relevant dialogue histories. It is half of 2.6 million question-answer pairs.
Meanwhile, this research proposes Diverse History-Dialog (DS-Dialog) to resolve the
missing visual semantic information for each image via context-aware visual attention.
The context-aware visual attention includes the question-guided and relevant-dialoguehistory-
guided visual attention modules to get the relevant visual context when both
have achieved great confidence. The qualitative and quantitative experimental results
on the VisDial v1.0 and DsDial datasets demonstrate that the proposed DS-Dialog
not only outperforms the existing methods, but also achieves a competitive results by
contributing to a better visual semantic extraction. DsDial dataset has proven its significance
on LF model as compared to VisDal v1.0. Overall quantitative results show
that DS-Dialog with DsDial dataset has achieved the best test scores for recall@1,
recall@5, recall@10, mean rank, MRR, and NDCG respectively.
Actions (login required)
 |
View Item |