Publications

Preprint (arXiv)

[P5] Hybrid-Vector Retrieval for Visually Rich Documents: Combining Single-Vector Efficiency and Multi-Vector Accuracy

Juyeon Kim, Geon Lee, Dongwon Choi, Taeuk Kim, Kijung Shin
arXiv preprint (arXiv:2510.22215)

[P4] ADVICE: Answer-Dependent Verbalized Confidence Estimation

Ki Jung Seo, Sehun Lim, Taeuk Kim
arXiv preprint (arXiv:2510.10913)

[P3] CMR-SPB: Cross-Modal Multi-Hop Reasoning over Text, Image, and Speech with Path Balance

Seunghee Kim, Ingyu Bang, Seokgyu Jang, Changhyeon Kim, Sanghwan Bae, Jihun Choi, Richeng Xuan, Taeuk Kim
arXiv preprint (arXiv:2508.16198)

[P2] UniKnow: A Unified Framework for Reliable Language Model Behavior across Parametric and External Knowledge

Youna Kim, Hyuhng Joon Kim, Minjoon Choi, Sungmin Cho, Hyunsoo Cho, Sang-goo Lee, Taeuk Kim
arXiv preprint (arXiv:2502.13648)

[P1] Investigating the Influence of Prompt-Specific Shortcuts in AI Generated Text Detection

Choonghyun Park, Hyuhng Joon Kim, Junyeob Kim, Youna Kim, Taeuk Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-goo Lee, Kang Min Yoo
arXiv preprint (arXiv:2406.16275)

International Conferences

2026

[C33] Development and Evaluation of a Dual-Expertise, Utterance-Level Framework for LLM-Based Science Classroom Discourse Analysis

Jin Eun Yoo, Nam-Hwa Kang, Suna Ryu, Jun-ki Lee, Youngsun Kwak, Taeuk Kim, Hyeong Gwan Kim, Youngwoo Shin and Uiji Hwang
The 16th International Learning Analytics & Knowledge Conference (LAK 2026)

2025

[C32] Beyond Task-Oriented and Chitchat Dialogues: Proactive and Transition-Aware Conversational Agents

Yejin Yoon, Yuri Son, Namyeong So, Minseo Kim, Minsoo Cho, Chanhee Park, Seungshin Lee, Taeuk Kim
The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)

[C31] MAGIC: A Multi-Hop and Graph-Based Benchmark for Inter-Context Conflicts in Retrieval-Augmented Generation

Jungyeon Lee*, Kangmin Lee*, Taeuk Kim (*: equal contribution)
Findings of the Association for Computational Linguistics: EMNLP 2025 (Findings of EMNLP 2025)

[C30] Memorization or Reasoning? Exploring the Idiom Understanding of LLMs

Jisu Kim*, Youngwoo Shin*, Uiji Hwang, Jihun Choi, Richeng Xuan, Taeuk Kim (*: equal contribution)
The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)

[C29] Does Localization Inform Unlearning? A Rigorous Examination of Local Parameter Attribution for Knowledge Unlearning in Language Models

Hwiyeong Lee, Uiji Hwang, Hyelim Lim, Taeuk Kim
The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP 2025)

[C28] FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning

Seunghee Kim*, Changhyeon Kim*, Taeuk Kim (*: equal contribution)
The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

[C27] When to Speak, When to Abstain: Contrastive Decoding with Abstention

Hyuhng Joon Kim, Youna Kim, Sang-goo Lee, Taeuk Kim

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025)

[C26] ENGinius: A Bilingual LLM Optimized for Plant Construction Engineering

Wooseong Lee, Minseo Kim, Taeil Hur, Gyeong Hwan Jang, Woncheol Lee, Maro Na, Taeuk Kim

The 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025): Industry Track

[C25] KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking

Juyeon Kim, Geon Lee, Taeuk Kim, Kijung Shin

The 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2025)

[C24] Subgraph-Aware Training of Language Models for Knowledge Graph Completion Using Structure-Aware Contrastive Learning

Youmin Ko*, Hyemin Yang*, Taeuk Kim, Hyunjoon Kim (*: equal contribution)
The Web Conference 2025 (WWW 2025)

[C23] ESPRESSO: An Effective Approach to Passage Retrieval for High-Quality Conversational Recommender Systems

Taeho Kim, Hyeongjun Jang, Juwon Yu, Taeuk Kim, Hyunyoung Lee, Jihui Im, Sang-Wook Kim

The 39th AAAI Conference on Artificial Intelligence (AAAI 2025)

2024

[C22] Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2024 (Findings of EMNLP 2024)

[C21] Revisiting the Impact of Pursuing Modularity for Code Generation

Deokyeong Kang*, Ki Jung Seo*, Taeuk Kim (*: equal contribution)
Findings of the Association for Computational Linguistics: EMNLP 2024 (Findings of EMNLP 2024)

[C20] Aligning Language Models to Explicitly Handle Ambiguity

Hyuhng Joon Kim, Youna Kim, Cheonbok Park, Junyeob Kim, Choonghyun Park, Kang Min Yoo, Sang-goo Lee, Taeuk Kim
The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024)

[C19] Hyper-CL: Conditioning Sentence Representations with Hypernetworks

Young Hyun Yoo*, Jii Cha*, Changhyeon Kim, Taeuk Kim (*: equal contribution)
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

[C18] Analysis of Multi-Source Language Training in Cross-Lingual Transfer

Seong Hoon Lim, Taejun Yun, Jinhyeon Kim, Jihun Choi, Taeuk Kim
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

[C17] BlendX: Complex Multi-Intent Detection with Blended Patterns

Yejin Yoon, Jungyeon Lee, Kangsan Kim, Chanhee Park, Taeuk Kim
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

2023

[C16] X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity

Taejun Yun, Jinhyeon Kim, Deokyeong Kang, Seong Hoon Lim, Jihoon Kim, Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2023 (Findings of EMNLP 2023)

[C15] Universal Domain Adaptation for Robust Handling of Distributional Shifts in NLP

Hyuhng Joon Kim, Hyunsoo Cho, Sang-Woo Lee, Junyeob Kim, Choonghyun Park, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
Findings of the Association for Computational Linguistics: EMNLP 2023 (Findings of EMNLP 2023)

[C14] Prompt-Augmented Linear Probing: Scaling Beyond The Limit of Few-shot In-Context Learners

Hyunsoo Cho, Hyuhng Joon Kim, Junyeob Kim, Sang-Woo Lee, Sang-goo Lee, Kang Min Yoo, Taeuk Kim
The 37th AAAI Conference on Artificial Intelligence (AAAI 2023)

2022

[C13] Ground-Truth Labels Matter: A Deeper Look into Input-Label Demonstrations

Kang Min Yoo, Junyeob Kim, Hyuhng Joon Kim, Hyunsoo Cho, Hwiyeol Jo, Sang-Woo Lee, Sang-goo Lee, Taeuk Kim
The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

[C12] Enhancing Out-of-Distribution Detection in Natural Language Understanding via Implicit Layer Ensemble

Hyunsoo Cho, Choonghyun Park, Jaewook Kang, Kang Min Yoo, Taeuk Kim, Sang-goo Lee
Findings of the Association for Computational Linguistics: EMNLP 2022 (Findings of EMNLP 2022)

[C11] Revisiting the Practical Effectiveness of Constituency Parse Extraction from Pre-trained Language Models

Taeuk Kim
The 29th International Conference on Computational Linguistics (COLING 2022)

Before 2022

[C10] Multilingual Chart-based Constituency Parse Extraction from Pre-trained Language Models

Taeuk Kim, Bowen Li, Sang-goo Lee
Findings of the Association for Computational Linguistics: EMNLP 2021 (Findings of EMNLP 2021)

[C9] Self-Guided Contrastive Learning for BERT Sentence Representations

Taeuk Kim, Kang Min Yoo, Sang-goo Lee
The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021)

[C8] Heads-up! Unsupervised Constituency Parsing via Self-Attention Heads

Bowen Li, Taeuk Kim, Reinald Kim Amplayo, Frank Keller
The 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing (AACL-IJCNLP 2020)

[C7] Are Pre-trained Language Models Aware of Phrases? Simple but Strong Baselines for Grammar Induction

Taeuk Kim, Jihun Choi, Daniel Edmiston, Sang-goo Lee
International Conference on Learning Representations 2020 (ICLR 2020)

[C6] Cell-aware Stacked LSTMs for Modeling Sentences

Jihun Choi, Taeuk Kim, Sang-goo Lee
The 11th Asian Conference on Machine Learning (ACML 2019)

[C5] Don't Just Scratch the Surface: Enhancing Word Representations for Korean with Hanja

Taeuk Kim*, Kang Min Yoo*, Sang-goo Lee (*: equal contribution)
Conference on Empirical Methods in Natural Language Processing 2019 and 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP 2019)

[C4] Intrinsic Evaluation of Grammatical Information within Word Embeddings

Daniel Edmiston, Taeuk Kim
The 33rd Pacific Asia Conference on Language, Information and Computation (PACLIC 33)

[C3] A Cross-Sentence Latent Variable Model for Semi-Supervised Text Sequence Matching

Jihun Choi, Taeuk Kim, Sang-goo Lee
The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019)

[C2] Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations

Taeuk Kim, Jihun Choi, Daniel Edmiston, Sanghwan Bae, Sang-goo Lee
Thirty-Third AAAI Conference on Artificial Intelligence (AAAI 2019)

[C1] Element-wise Bilinear Interaction for Sentence Matching

Jihun Choi, Taeuk Kim, Sang-goo Lee
The Seventh Joint Conference on Lexical and Computational Semantics (*SEM 2018) at NAACL HLT 2018

Domestic Conferences

[D13] 거대 언어 모델 관용구 처리의 역전의 저주 현상 탐구
(Reversal Curse in Idiomatic Tasks of Larage Language Models)

Jisu Kim, Taeuk Kim
Korea Software Congress 2025 (KSC 2025)

[D12] 대화 맥락 추론 향상을 위한 리즈닝 피드백 기반 학습
(Enhancing Conversational Context Inference via Reasoning Feedback-Based Learning)

Yuri Son, Taeuk Kim
The 37th Annual Conference on Human and Cognitive Language Technology (HCLT 2025)

[D11] 한국어 표 설명 능력 향상을 위한 전처리 및 학습 방법론 탐구
(Learning Strategies to Improve Table Understanding and Explanation in Korean)

Changhyeon Kim, Seunghee Kim, Taeuk Kim
The 36th Annual Conference on Human and Cognitive Language Technology (HCLT 2024) [Best Paper Award (최우수논문상)]

[D10] KR-HumanEval을 활용한 언어 모델의 한국어 프로그램 합성 성능 분석
(Analysis of Language Models in Korean Program Synthesis Based on the KR-HumanEval Benchmark)

Deokyeong Kang, Taeuk Kim
The 36th Annual Conference on Human and Cognitive Language Technology (HCLT 2024) [Best Paper Award (최우수논문상)]

[D9] 한국어 발화의 다중 의도 감지 연구
(Multi-Intent Detection for Korean Spoken Language)

Yejin Yoon*, Jisu Kim*, Jungmin Im, Jungyeon Lee, Taeuk Kim (*: equal contribution)
Korea Computer Congress 2024 (KCC 2024) [Outstanding Paper Award (우수논문상)]

[D8] 다중 레이블 분류를 위한 프롬프팅 고도화
(Enhanced Prompting for Multi-Label Classification)

Jungyeon Lee, Youngwoo Shin, Yejin Yoon, Taeuk Kim
Korea Computer Congress 2024 (KCC 2024)

[D7] 생락과 상호참조를 보강한 다중 의도 데이터셋
(A Multi-Intent Dataset Enhanced with Implicit Concatenation)

Sungmin So*, Jiwoo Min*, Yejin Yoon, Jungyeon Lee, Taeuk Kim (*: equal contribution)
Korea Computer Congress 2024 (KCC 2024)

[D6] 기계 독해를 활용한 한국어 의미역 결정
(Korean Semantic Role Labeling with Machine Reading Comprehension)

Kang Min Lee, Dong Geon Seo, Eunrang Kwon, Junmo Song, Jeonghan Kang, Taeuk Kim
Korea Computer Congress 2024 (KCC 2024)

[D5] 대조학습 기반 문장표현 방법론 개선을 위한 공통 오류 분석 및 앙상블 기법
(Enhancing Sentence Representations with Common Error Analysis and Ensemble Techniques in Contrastive Learning)

Jii Cha, Taeuk Kim
Korea Software Congress 2023 (KSC 2023)

[D4] 효과적인 한국어 교차언어 전송을 위한 특성 연구
(Research on Features for Effective Cross-Lingual Transfer in Korean)

Taejun Yun, Taeuk Kim
The 35th Annual Conference on Human and Cognitive Language Technology (HCLT 2023)

[D3] MAdapter: 효율적인 중간 층 도입을 통한 Adapter 구조 개선
(MAdapter: A Refinement of Adapters by Augmenting Efficient Middle Layers)

Jinhyeon Kim, Taeuk Kim
The 35th Annual Conference on Human and Cognitive Language Technology (HCLT 2023)

[D2] 원천 언어 다각화를 통한 교차 언어 전이 성능 향상
(Enhanced Zero-Shot Cross-Lingual Transfer with the Diversification of Source Languages)

Seong Hoon Lim, Taeuk Kim
Korea Computer Congress 2023 (KCC 2023)

[D1] 한국어 문장 표현을 위한 비지도 대조 학습 방법론의 비교 및 분석
(Comparison and Analysis of Unsupervised Contrastive Learning Approaches for Korean Sentence Representations)

Young Hyun Yoo, Kyumin Lee, Minjin Jeon, Jii Cha, Kangsan Kim, Taeuk Kim
The 34th Annual Conference on Human and Cognitive Language Technology (HCLT 2022)

International Workshops

[W7] RAISE: Enhancing Scientific Reasoning in LLMs via Step-by-Step Retrieval

Minhae Oh, Jeonghye Kim, Nakyung Lee, Donggeon Seo, Taeuk Kim, Jungwoo Lee
The 5th Workshop on Mathematical Reasoning and AI (MATH-AI) at NeurIPS 2025

[W6] Self-Generated In-Context Learning: Leveraging Auto-regressive Language Models as a Demonstration Generator

Hyuhng Joon Kim, Hyunsoo Cho, Junyeob Kim, Taeuk Kim, Kang Min Yoo, Sang-goo Lee
Workshop on Large-scale Pre-trained Language Models (LPLM 2022) at NAACL 2022

[W5] HYU at SemEval-2022 Task 2: Effective Idiomaticity Detection with Consideration at Different Levels of Contextualization