HYU Natural Language Processing (NLP) Laboratory

Welcome to Natural Language Processing (NLP) Lab. at Hanyang University.

We study various approaches and problems with regard to natural language, chiefly based on machine learning and AI technologies.

We are looking for MS/Ph.D. students (and interns) who are self-motivated and passionate about doing research in NLP.

Please submit your information on this page if you are interested in applying for our lab.

News!

(25/05/16) Three papers—two in the Main Track and one in the Industry Track—have been accepted at ACL 2025. One main paper is the result of a collaboration with SNU, while the other stems from our internal project. Also pleased to have our first Industry Track paper, produced in collaboration with Hyundai Engineering and Jenti AI. Congratulations to our students Seunghee, Changhyeon, and Minseo!

(25/02/27) Honored to share that HYU NLP will receive a three-year grant (Outstanding Young Scientist Grants) from the NRF for the project titled "Conversational Agents with Hyper Long-Term Memory"!

(25/02/21) Deokyeong and Kang Min have graduated with their Master's degrees. Wish them all the best! (A new group photo from the ceremony!)

(24/09/24) Two papers have been accepted at HCLT 2024. Congrats to Deokyeong, Changhyeon, and Seung Hee! Also honored to share that both papers have received the Best Paper Award (최우수논문상) from the conference!

(24/09/20) Three papers—one Main and two Findings—have been accepted at EMNLP 2024. Two of them are the result of collaboration with SNU and Naver. The other is the outcome of an internal project. Congratulations to Deokyeong and Ki Jung!

Recent Publications

FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning (ACL 2025)

Abstract

Real-world decision-making often requires integrating and reasoning over information from multiple modalities. While recent multimodal large language models (MLLMs) have shown promise in such tasks, their ability to perform multi-hop reasoning across diverse sources remains insufficiently evaluated. Existing benchmarks, such as MMQA, face challenges due to (1) data contamination and (2) a lack of complex queries that necessitate operations across more than two modalities, hindering accurate performance assessment. To address this, we present Financial Cross-Modal Multi-Hop Reasoning (FCMR), a benchmark created to analyze the reasoning capabilities of MLLMs by urging them to combine information from textual reports, tables, and charts within the financial domain. FCMR is categorized into three difficulty levels—Easy, Medium, and Hard— facilitating a step-by-step evaluation. In particular, problems at the Hard level require precise cross-modal three-hop reasoning and are designed to prevent the disregard of any modality. Experiments on this new benchmark reveal that even state-of-the-art MLLMs struggle, with the best-performing model (Claude 3.5 Sonnet) achieving only 30.4% accuracy on the most challenging tier. We also conduct analysis to provide insights into the inner workings of the models, including the discovery of a critical bottleneck in the information retrieval phase.

Revisiting the Impact of Pursuing Modularity for Code Generation (EMNLP 2024 Findings)

Abstract

Modular programming, which aims to construct the final program by integrating smaller, independent building blocks, has been regarded as a desirable practice in software development. However, with the rise of recent code generation agents built upon large language models (LLMs), a question emerges: is this traditional practice equally effective for these new tools? In this work, we assess the impact of modularity in code generation by introducing a novel metric for its quantitative measurement. Surprisingly, unlike conventional wisdom on the topic, we find that modularity is not a core factor for improving the performance of code generation models. We also explore potential explanations for why LLMs do not exhibit a preference for modular code compared to non-modular code. Our code is available at https://github.com/HYU-NLP/Revisiting-Modularity.

Hyper-CL: Conditioning Sentence Representations with Hypernetworks (ACL 2024)

Abstract

While the introduction of contrastive learning frameworks in sentence representation learning has significantly contributed to advancements in the field, it still remains unclear whether state-of-the-art sentence embeddings can capture the fine-grained semantics of sentences, particularly when conditioned on specific perspectives. In this paper, we introduce Hyper-CL, an efficient methodology that integrates hypernetworks with contrastive learning to compute conditioned sentence representations. In our proposed approach, the hypernetwork is responsible for transforming pre-computed condition embeddings into corresponding projection layers. This enables the same sentence embeddings to be projected differently according to various conditions. Evaluation of two representative conditioning benchmarks, namely conditional semantic text similarity and knowledge graph completion, demonstrates that Hyper-CL is effective in flexibly conditioning sentence representations, showcasing its computational efficiency at the same time. We also provide a comprehensive analysis of the inner workings of our approach, leading to a better interpretation of its mechanisms. Our code is available at https://github.com/HYU-NLP/Hyper-CL.

Analysis of Multi-Source Language Training in Cross-Lingual Transfer (ACL 2024)

Abstract

The successful adaptation of multilingual language models (LMs) to a specific language-task pair critically depends on the availability of data tailored for that condition. While cross-lingual transfer (XLT) methods have contributed to addressing this data scarcity problem, there still exists ongoing debate about the mechanisms behind their effectiveness. In this work, we focus on one of the promising assumptions about the inner workings of XLT, that it encourages multilingual LMs to place greater emphasis on language-agnostic or task-specific features. We test this hypothesis by examining how the patterns of XLT change with a varying number of source languages involved in the process. Our experimental findings show that the use of multiple source languages in XLT—a technique we term Multi-Source Language Training (MSLT)—leads to increased mingling of embedding spaces for different languages, supporting the claim that XLT benefits from making use of language-independent information. On the other hand, we discover that using an arbitrary combination of source languages does not always guarantee better performance. We suggest simple heuristics for identifying effective language combinations for MSLT and empirically prove its effectiveness.

BlendX: Complex Multi-Intent Detection with Blended Patterns (LREC-COLING 2024)

Abstract

Task-oriented dialogue (TOD) systems are commonly designed with the presumption that each utterance represents a single intent. However, this assumption may not accurately reflect real-world situations, where users frequently express multiple intents within a single utterance. While there is an emerging interest in multi-intent detection (MID), existing in-domain datasets such as MixATIS and MixSNIPS have limitations in their formulation. To address these issues, we present BlendX, a suite of refined datasets featuring more diverse patterns than their predecessors, elevating both its complexity and diversity. For dataset construction, we utilize both rule-based heuristics as well as a generative tool-OpenAI's ChatGPT-which is augmented with a similarity-driven strategy for utterance selection.To ensure the quality of the proposed datasets, we also introduce three novel metrics that assess the statistical properties of an utterance related to word count, conjunction use, and pronoun usage. Extensive experiments on BlendX reveal that state-of-the-art MID models struggle with the challenges posed by the new datasets, highlighting the need to reexamine the current state of the MID field. The dataset is available at https://github.com/HYU-NLP/BlendX.

X-SNS: Cross-Lingual Transfer Prediction through Sub-Network Similarity (EMNLP 2023 Findings)

Abstract

Cross-lingual transfer (XLT) is an emergent ability of multilingual language models that preserves their performance on a task to a significant extent when evaluated in languages that were not included in the fine-tuning process. While English, due to its widespread usage, is typically regarded as the primary language for model adaption in various tasks, recent studies have revealed that the efficacy of XLT can be amplified by selecting the most appropriate source languages based on specific conditions. In this work, we propose the utilization of sub-network similarity between two languages as a proxy for predicting the compatibility of the languages in the context of XLT. Our approach is model-oriented, better reflecting the inner workings of foundation models. In addition, it requires only a moderate amount of raw text from candidate languages, distinguishing it from the majority of previous methods that rely on external resources. In experiments, we demonstrate that our method is more effective than baselines across diverse tasks. Specifically, it shows proficiency in ranking candidates for zero-shot XLT, achieving an improvement of 4.6% on average in terms of NDCG@3. We also provide extensive analyses that confirm the utility of sub-networks for XLT prediction.

Page updated

Google Sites

Report abuse