๊ฐœ์š”

RRF(Reciprocal Rank Fusion, ์ƒํ˜ธ ์ˆœ์œ„ ์œตํ•ฉ)๋Š” ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์ด๋‚˜ ์ฟผ๋ฆฌ๋กœ๋ถ€ํ„ฐ ์–ป์€ ๊ฒฐ๊ณผ๋ฅผ ํ•˜๋‚˜์˜ ํ†ตํ•ฉ๋œ ์ˆœ์œ„๋กœ ๊ฒฐํ•ฉํ•˜๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด๋‹ค. 2009๋…„ Gordon V. Cormack, Charles L.A. Clarke, Stefan Buettcher๊ฐ€ ๋ฐœํ‘œํ•œ ๋…ผ๋ฌธ1์—์„œ ์†Œ๊ฐœ๋˜์—ˆ๋‹ค.

RRF๋Š” ํŠนํžˆ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์—์„œ ํ‚ค์›Œ๋“œ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰(BM25 ๋“ฑ)๊ณผ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰(๋ฐ€์ง‘ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰)์˜ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•  ๋•Œ ๋„๋ฆฌ ์‚ฌ์šฉ๋œ๋‹ค.

๋™์ž‘ ์›๋ฆฌ

RRF ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ๋‹ค์Œ ๊ณผ์ •์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐํ•ฉํ•œ๋‹ค:

1. ์ƒํ˜ธ ์ˆœ์œ„ ์ ์ˆ˜ ๊ณ„์‚ฐ

๊ฐ ๋ฌธ์„œ์˜ ์ ์ˆ˜๋Š” ๋‹ค์Œ ๊ณต์‹์œผ๋กœ ๊ณ„์‚ฐ๋œ๋‹ค:

์—ฌ๊ธฐ์„œ:

  • = ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•  ๋ฌธ์„œ
  • = ์ฟผ๋ฆฌ ์ง‘ํ•ฉ (๋˜๋Š” ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ ์ง‘ํ•ฉ)
  • = ์ˆœ์œ„ ์ƒ์ˆ˜ (์ผ๋ฐ˜์ ์œผ๋กœ 60)
  • = ์ฟผ๋ฆฌ ์˜ ๊ฒฐ๊ณผ์—์„œ ๋ฌธ์„œ ์˜ ์ˆœ์œ„ (1๋ถ€ํ„ฐ ์‹œ์ž‘)

2. ์ ์ˆ˜ ์ง‘๊ณ„ ๋ฐ ์žฌ์ •๋ ฌ

  • ๊ฐ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ/์ฟผ๋ฆฌ์—์„œ ์–ป์€ ์ˆœ์œ„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ƒํ˜ธ ์ˆœ์œ„ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐ
  • ๋ชจ๋“  ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์— ๊ฑธ์ณ ๊ฐ ๋ฌธ์„œ์˜ ์ ์ˆ˜๋ฅผ ํ•ฉ์‚ฐ
  • ์ตœ์ข… ์ ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๋ฌธ์„œ๋ฅผ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ

์˜ˆ์‹œ

๋‘ ๊ฐœ์˜ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ(ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰, ๋ฒกํ„ฐ ๊ฒ€์ƒ‰)์ด ์žˆ๊ณ , k=60์ผ ๋•Œ:

ํ‚ค์›Œ๋“œ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:

  1. ๋ฌธ์„œ A (์ˆœ์œ„ 1)
  2. ๋ฌธ์„œ B (์ˆœ์œ„ 2)
  3. ๋ฌธ์„œ C (์ˆœ์œ„ 3)

๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ:

  1. ๋ฌธ์„œ C (์ˆœ์œ„ 1)
  2. ๋ฌธ์„œ A (์ˆœ์œ„ 2)
  3. ๋ฌธ์„œ D (์ˆœ์œ„ 3)

RRF ์ ์ˆ˜ ๊ณ„์‚ฐ:

  • ๋ฌธ์„œ A:
  • ๋ฌธ์„œ B:
  • ๋ฌธ์„œ C:
  • ๋ฌธ์„œ D:

์ตœ์ข… ์ˆœ์œ„: A โ†’ C โ†’ B โ†’ D

k ์ƒ์ˆ˜์˜ ์—ญํ• 

k ๊ฐ’์˜ ์˜๋ฏธ

์ƒ์ˆ˜ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์—ญํ• ์„ ํ•œ๋‹ค:

  1. ํ‰ํ™œํ™”(Smoothing): ํŠน์ • ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์ด ๊ฒฐ๊ณผ๋ฅผ ์ง€๋ฐฐํ•˜๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€
  2. ์ˆœ์œ„ ๊ฐ„ ์ฐจ์ด ์กฐ์ ˆ: ๋‚ฎ์€ ์ˆœ์œ„์˜ ๋ฌธ์„œ๋“ค์ด ์ตœ์ข… ๊ฒฐ๊ณผ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์กฐ์ ˆ

์™œ k=60์ธ๊ฐ€?

  • ์‹คํ—˜์ ์œผ๋กœ k=60์ด ๊ฐ€์žฅ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„
  • k < 60: ์ƒ์œ„ 1์œ„์— ๊ณผ๋„ํ•˜๊ฒŒ ์ง‘์ค‘, ๊ธ‰๊ฒฉํ•œ ์ ์ˆ˜ ๊ฐ์†Œ
  • k > 60: RRF ์ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ํ‰ํ‰ํ•ด์ ธ ์ˆœ์œ„ ๊ตฌ๋ถ„์ด ๋ชจํ˜ธํ•ด์ง

๋น„์„ ํ˜•์  ๊ฐ์†Œ

๊ณต์‹์€ ์ˆœ์œ„๊ฐ€ ์ฆ๊ฐ€ํ• ์ˆ˜๋ก ์ ์ˆ˜๊ฐ€ ๋น„์„ ํ˜•์ ์œผ๋กœ ๊ฐ์†Œํ•œ๋‹ค. ์ด๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ง๊ด€์„ ๋ฐ˜์˜ํ•œ๋‹ค:

  • 1์œ„์™€ 2์œ„์˜ ๊ด€๋ จ์„ฑ ์ฐจ์ด > 100์œ„์™€ 101์œ„์˜ ๊ด€๋ จ์„ฑ ์ฐจ์ด
  • ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์—์„œ ๋ชจ๋‘ ๋†’์€ ์ˆœ์œ„๋ฅผ ๋ฐ›์€ ๋ฌธ์„œ์— ๋” ๋†’์€ ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ

ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์—์„œ์˜ ํ™œ์šฉ

RRF๋Š” ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์—์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ™œ์šฉ๋œ๋‹ค:

ํฌ์†Œ ๋ฒกํ„ฐ + ๋ฐ€์ง‘ ๋ฒกํ„ฐ ๊ฒฐํ•ฉ

  1. ํฌ์†Œ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ (BM25, ํ•™์Šต๋œ ํฌ์†Œ ๊ฒ€์ƒ‰ ๋“ฑ)

    • ํ‚ค์›Œ๋“œ ๋งค์นญ์— ๊ฐ•์ 
    • ์ •ํ™•ํ•œ ์šฉ์–ด ์ผ์น˜๊ฐ€ ์ค‘์š”ํ•œ ๊ฒฝ์šฐ ํšจ๊ณผ์ 
  2. ๋ฐ€์ง‘ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ (์˜๋ฏธ์  ๋ฒกํ„ฐ ๊ฒ€์ƒ‰)

    • ๋ฌธ๋งฅ๊ณผ ์˜๋ฏธ ์ดํ•ด์— ๊ฐ•์ 
    • ์ •ํ™•ํ•œ ํ‚ค์›Œ๋“œ๊ฐ€ ์—†์–ด๋„ ๊ด€๋ จ ๋ฌธ์„œ ๊ฒ€์ƒ‰ ๊ฐ€๋Šฅ
  3. RRF๋ฅผ ํ†ตํ•œ ๊ฒฐํ•ฉ

    • ๋‘ ๊ฒ€์ƒ‰ ๋ฐฉ์‹์˜ ์ˆœ์œ„๋ฅผ RRF๋กœ ์œตํ•ฉ
    • ํ‚ค์›Œ๋“œ ์ •ํ™•์„ฑ๊ณผ ์˜๋ฏธ์  ๊ด€๋ จ์„ฑ์„ ๋ชจ๋‘ ๊ณ ๋ คํ•œ ๊ฒฐ๊ณผ ์ƒ์„ฑ

์ฃผ์š” ๊ฒ€์ƒ‰ ํ”Œ๋žซํผ์˜ RRF ์ง€์›

  • Elasticsearch: rrf retriever๋ฅผ ํ†ตํ•ด ๋ ‰์‹œ์ปฌ ๊ฒ€์ƒ‰๊ณผ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰ ๊ฒฐํ•ฉ ์ง€์›
  • OpenSearch: ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์„ ์œ„ํ•œ RRF ์ œ๊ณต
  • Azure AI Search: ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰์—์„œ RRF ์Šค์ฝ”์–ด๋ง ์‚ฌ์šฉ
  • Google Vertex AI: ํ† ํฐ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰๊ณผ ์˜๋ฏธ์  ๊ฒ€์ƒ‰ ๊ฒฐํ•ฉ์— RRF ํ™œ์šฉ

RRF์˜ ์žฅ์ 

1. ์ˆœ์œ„ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ

  • ์ ์ˆ˜ ์ •๊ทœํ™” ๋ฌธ์ œ ํšŒํ”ผ: ์„œ๋กœ ๋‹ค๋ฅธ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ์ ์ˆ˜ ์Šค์ผ€์ผ์„ ํ†ต์ผํ•  ํ•„์š” ์—†์Œ
  • ์ˆœ์œ„๋งŒ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค์—์„œ ์ผ๊ด€๋œ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

2. ๊ฐ•๊ฑด์„ฑ(Robustness)

  • ์—ฌ๋Ÿฌ ์†Œ์Šค์˜ ์ฆ๊ฑฐ๋ฅผ ํ•ฉ์‚ฐํ•˜์—ฌ ๋” ๊ฒฌ๊ณ ํ•œ ์ˆœ์œ„ ์ƒ์„ฑ
  • ๋‹จ์ผ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ํŽธํ–ฅ์ด๋‚˜ ํŠน์„ฑ์— ๋œ ๋ฏผ๊ฐ

3. ์ด์ƒ์น˜ ์ €ํ•ญ์„ฑ

  • Min-max ์ •๊ทœํ™”, L2 ์ •๊ทœํ™”๋Š” ์ด์ƒ์น˜(outlier)์— ๋ฏผ๊ฐ
  • RRF๋Š” ์ˆœ์œ„๋ฅผ ์ง‘๊ณ„ํ•˜๋ฏ€๋กœ ๊ทน๋‹จ์ ์ธ ์ ์ˆ˜ ๊ฐ’์ด ๊ฒฐ๊ณผ๋ฅผ ์™œ๊ณกํ•˜์ง€ ๋ชปํ•จ

4. ์กฐ์ • ๋ถˆํ•„์š”

  • ํŒŒ๋ผ๋ฏธํ„ฐ ํŠœ๋‹์ด ๊ฑฐ์˜ ํ•„์š” ์—†์Œ (k=60์ด ์ผ๋ฐ˜์ ์œผ๋กœ ์ž˜ ์ž‘๋™)
  • ์„œ๋กœ ๋‹ค๋ฅธ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ์ ์ˆ˜๊ฐ€ ๊ด€๋ จ์ด ์—†์–ด๋„ ๊ณ ํ’ˆ์งˆ ๊ฒฐ๊ณผ ์ƒ์„ฑ ๊ฐ€๋Šฅ

Python ๊ตฌํ˜„ ์˜ˆ์‹œ

def reciprocal_rank_fusion(search_results_list, k=60):
    """
    RRF ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๊ตฌํ˜„
 
    Args:
        search_results_list: ๊ฐ ๊ฒ€์ƒ‰ ์‹œ์Šคํ…œ์˜ ๊ฒฐ๊ณผ ๋ฆฌ์ŠคํŠธ
                            (๋ฌธ์„œ ID์˜ ๋ฆฌ์ŠคํŠธ๋“ค์„ ๋‹ด์€ ๋ฆฌ์ŠคํŠธ)
        k: ์ˆœ์œ„ ์ƒ์ˆ˜ (๊ธฐ๋ณธ๊ฐ’ 60)
 
    Returns:
        (๋ฌธ์„œ ID, RRF ์ ์ˆ˜) ํŠœํ”Œ์˜ ์ •๋ ฌ๋œ ๋ฆฌ์ŠคํŠธ
    """
    # ๋ชจ๋“  ๊ณ ์œ  ๋ฌธ์„œ ์ˆ˜์ง‘
    all_docs = set()
    for results in search_results_list:
        all_docs.update(results)
 
    # ๊ฐ ๋ฌธ์„œ์˜ RRF ์ ์ˆ˜ ๊ณ„์‚ฐ
    doc_scores = {}
    for doc in all_docs:
        score = 0.0
        for results in search_results_list:
            if doc in results:
                rank = results.index(doc) + 1  # 1๋ถ€ํ„ฐ ์‹œ์ž‘
                score += 1.0 / (k + rank)
        doc_scores[doc] = score
 
    # ์ ์ˆ˜ ๊ธฐ์ค€ ๋‚ด๋ฆผ์ฐจ์ˆœ ์ •๋ ฌ
    return sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)
 
# ์‚ฌ์šฉ ์˜ˆ์‹œ
keyword_results = ['doc_a', 'doc_b', 'doc_c']
vector_results = ['doc_c', 'doc_a', 'doc_d']
 
fused_results = reciprocal_rank_fusion([keyword_results, vector_results])
print(fused_results)
# ์ถœ๋ ฅ: [('doc_a', 0.0325), ('doc_c', 0.0323), ('doc_b', 0.0161), ('doc_d', 0.0159)]

์ฐธ๊ณ  ์ž๋ฃŒ

Footnotes

  1. Cormack, G. V., Clarke, C. L., & Buettcher, S. (2009). โ€œReciprocal rank fusion outperforms condorcet and individual rank learning methodsโ€. SIGIR โ€˜09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval. โ†ฉ