Learning visual and multimodal representations

Abstract

Representations lie at the heart of artificial intelligence, enabling machines to perceive, interpret and interact with the world. Visual representations, extracted from images or videos, enable tasks such as image classification, image retrieval, and object detection. Visual-textual representations, bridging the gap between the visual and linguistic domains, enable tasks like image captioning, visual question answering, and cross-modal retrieval. The ability to learn and manipulate these representations is paramount for advancing the state-of-the-art in computer vision and beyond. In this dissertation, we investigate novel methods for learning both visual (unimodal) and visual-textual (multimodal) representations, focusing mainly on applications in deep metric learning, image classification, and composed image retrieval. We address the challenges of learning representations from both datacentric and model-centric perspectives, aiming to unlock new capabilities for visual understanding ...
show more

All items in National Archive of Phd theses are protected by copyright.

DOI
10.12681/eadd/57401
Handle URL
http://hdl.handle.net/10442/hedi/57401
ND
57401
Alternative title
Εκμάθηση οπτικών και πολυτροπικών αναπαραστάσεων
Author
Psomas, Vasileios (Father's name: Emmanouil)
Date
2024
Degree Grantor
National Technical University of Athens (NTUA)
Committee members
Καράντζαλος Κωνσταντίνος
Αργιαλάς Δημήτριος
Τόλιας Γεώργιος
Καραθανάση Βασιλεία
Παπουτσής Ιωάννης
Κομοντάκης Νικόλαος
Βακαλοπούλου Μαρία
Discipline
Engineering and TechnologyElectrical Engineering, Electronic Engineering, Information Engineering ➨ Media Technology
Keywords
Neural networks; Computer vision; Deep learning; Remote sensing; Representation learning
Country
Greece
Language
English
Description
im., tbls., fig., ch.
Usage statistics
VIEWS
Concern the unique Ph.D. Thesis' views for the period 07/2018 - 07/2023.
Source: Google Analytics.
ONLINE READER
Concern the online reader's opening for the period 07/2018 - 07/2023.
Source: Google Analytics.
DOWNLOADS
Concern all downloads of this Ph.D. Thesis' digital file.
Source: National Archive of Ph.D. Theses.
USERS
Concern all registered users of National Archive of Ph.D. Theses who have interacted with this Ph.D. Thesis. Mostly, it concerns downloads.
Source: National Archive of Ph.D. Theses.
Related items (based on users' visits)