Faculty Research Profile

인공지능대학원

SENOCAKARDA

조교수ARDA SENOCAK

SENOCAKARDA

ARDA SENOCAK

Biography

학력

• 2025-2025: Research Assistant Professor, KAIST
• 2022-2025: Postdoctoral Researcher, KAIST
• 2016-2022: Ph.D. Electrical Engineering, KAIST
• 2015-2016: Ph.D. Computer Science, Stevens Institute of Technology, USA
• 2015: M.S. Electrical Engineering, KAIST
• 2013: B.S. Computer Science, KAIST

수상/학회/외부활동

- ICCV 2025 Outstanding Reviewer (2025)
- ICCV 2023 Outstanding Reviewer (2023)
- 28th HumanTech Paper Award, Samsung Electronics Co., Ltd.
- Qualcomm Innovation Awards (Qualcomm Innovation Fellowship) (2018)
- Google Travel and Conference Grants (2018)

Research

Multisensory Intelligence Lab. (MILab)

Multisensory Intelligence Lab. (MILab)

Most of today’s foundational models perceive only a small fraction of the world through language in textual form. However, text modality primarily contains high-level supervisory information created by humans. In contrast, how humans experience the world is profoundly multimodal, as we learn through sensory inputs and by discovering correlations across different modalities. When a baby eats a red shiny apple, they taste the sweet and sour flavor, smell the fresh and sweet fruity aroma, hear the crunchy sound and feel the hard texture. Through this experience, the baby learns to associate the crunch sound with the hard texture, the flavor with the red color, and the circular shape of the object, among other connections, as these associations are formed between different sensory input. True multimodal learning requires expanding the range of mediums, and consequently adopting an approach that can handle all these diverse and heterogeneous modalities. This presents a challenging next step in multimodal learning with significant potential. Such a model would be capable of learning previously unseen interactions between modalities. Furthermore, it would enable searching and retrieving information across various modalities. These learned cross-modal representations could also be leveraged in generative tasks to create diverse types of content. Most importantly, this type of system could ultimately interact with and assist humans. Some of the technical research directions our group plan to pursue include: (1) developing better methods to integrate and learn from the extreme heterogeneity across diverse modalities, (2) designing new architectural innovations that can dynamically select which modality or modalities to use for specific tasks, and (3) creating new approaches to handle the imbalance in the information content carried by different modalities. Our research group will push forward this research agenda and make significant contributions to the field of multisensory intelligence, aiming to develop AI systems that can interact with the world through a full-body experience – enhancing and augmenting human capabilities and creativity.

Most of today’s foundational models perceive only a small fraction of the world through language in textual form. However, text modality primarily contains high-level supervisory information created by humans. In contrast, how humans experience the world is profoundly multimodal, as we learn through sensory inputs and by discovering correlations across different modalities. When a baby eats a red shiny apple, they taste the sweet and sour flavor, smell the fresh and sweet fruity aroma, hear the crunchy sound and feel the hard texture. Through this experience, the baby learns to associate the crunch sound with the hard texture, the flavor with the red color, and the circular shape of the object, among other connections, as these associations are formed between different sensory input. True multimodal learning requires expanding the range of mediums, and consequently adopting an approach that can handle all these diverse and heterogeneous modalities. This presents a challenging next step in multimodal learning with significant potential. Such a model would be capable of learning previously unseen interactions between modalities. Furthermore, it would enable searching and retrieving information across various modalities. These learned cross-modal representations could also be leveraged in generative tasks to create diverse types of content. Most importantly, this type of system could ultimately interact with and assist humans. Some of the technical research directions our group plan to pursue include: (1) developing better methods to integrate and learn from the extreme heterogeneity across diverse modalities, (2) designing new architectural innovations that can dynamically select which modality or modalities to use for specific tasks, and (3) creating new approaches to handle the imbalance in the information content carried by different modalities. Our research group will push forward this research agenda and make significant contributions to the field of multisensory intelligence, aiming to develop AI systems that can interact with the world through a full-body experience – enhancing and augmenting human capabilities and creativity.

Multisensory Intelligence Lab. (MILab)

연구분야

Multimodal Machine Learning

Multimodal Machine Learning

연구 희망분야

Multimodal Machine Learning, Audio Processing, Computer Vision, Audio-Visual Learning

Multimodal Machine Learning, Audio Processing, Computer Vision, Audio-Visual Learning

연구주제

- Multimodal Learning
- Audio-Visual Learning
- Computer Vision
- Audio/Speech Processing

- Multimodal Learning
- Audio-Visual Learning
- Computer Vision
- Audio/Speech Processing

Outputs

논문

- Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment, TPAMI 2025
- Seeing Speech and Sound: Distinguishing and Locating Audios in Visual Scenes, CVPR 2025
- Sound Source Localization is All about Cross-Modal Alignment, ICCV 2023
- Learning to Localize Sound Source in Visual Scenes: Analysis and Applications, TPAMI 2021
- Learning to Localize Sound Source in Visual Scenes, CVPR 2028