News

Audio-visual Segmentation (AVS) is conceptualized as a conditional generation task, where audio is considered as the conditional variable for segmenting the sound producer(s). In this case, audio ...
LININ: Logic Integrated Neural Inference Network for Explanatory Visual Question Answering Abstract: Explanatory Visual Question Answering (EVQA) is a recently proposed multimodal reasoning task ...