๊ฐœ์š”

OpenSearch Neural Sparse Search๋Š” OpenSearch 2.11๋ถ€ํ„ฐ ๋„์ž…๋œ learned sparse retrieval ๊ธฐ๋ฐ˜์˜ ๊ฒ€์ƒ‰ ๊ธฐ๋Šฅ์ด๋‹ค1. ํฌ์†Œ ๋ฒกํ„ฐ(sparse vector)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์˜๋ฏธ์  ๊ฒ€์ƒ‰์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ, inverted index๋ฅผ ํ™œ์šฉํ•ด BM25์™€ ์œ ์‚ฌํ•œ ์ˆ˜์ค€์˜ ํšจ์œจ์„ฑ์„ ์ œ๊ณตํ•˜๋ฉด์„œ๋„ ๋” ๋†’์€ ๊ฒ€์ƒ‰ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

Neural Sparse Search๋Š” OpenSearch์˜ Neural Search ํ”Œ๋Ÿฌ๊ทธ์ธ์˜ ์ผ๋ถ€๋กœ, ๋ฐ€์ง‘ ๋ฒกํ„ฐ ๊ฒ€์ƒ‰(HNSW)์˜ ๋†’์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰๊ณผ ๊ณ„์‚ฐ ๋น„์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด์„œ๋„ ์ „ํ†ต์ ์ธ ์–ดํœ˜ ๊ธฐ๋ฐ˜ ๊ฒ€์ƒ‰๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.

ํ•ต์‹ฌ ํŠน์ง•

ํšจ์œจ์ ์ธ ๊ฒ€์ƒ‰

  • Inverted index ๊ธฐ๋ฐ˜: BM25์™€ ๋™์ผํ•œ ์ธ๋ฑ์Šค ๊ตฌ์กฐ ์‚ฌ์šฉ
  • ๋‚ฎ์€ ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ: ๋ฐ€์ง‘ ๋ฒกํ„ฐ ์ธ๋ฑ์Šค์˜ 7.2~10.4% ํฌ๊ธฐ
  • ๊ฒ€์ƒ‰ ์‹œ RAM ๋น„์šฉ ์ฆ๊ฐ€ ์—†์Œ: ๋„ค์ดํ‹ฐ๋ธŒ Lucene ์ธ๋ฑ์Šค ์‚ฌ์šฉ

๋†’์€ ๊ฒ€์ƒ‰ ์ •ํ™•๋„

  • NDCG@10 ๊ฐœ์„ : ์ „ํ†ต์  ๋ฐฉ๋ฒ• ๋Œ€๋น„ 12.7%(doc-only) ~ 20%(bi-encoder) ํ–ฅ์ƒ
  • ์˜๋ฏธ์  ๋งค์นญ: Term expansion์œผ๋กœ ๋™์˜์–ด, ๊ด€๋ จ์–ด ๊ฒ€์ƒ‰
  • ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ: ์–ด๋–ค ๋‹จ์–ด๊ฐ€ ๊ฒ€์ƒ‰์— ๊ธฐ์—ฌํ–ˆ๋Š”์ง€ ํ™•์ธ ๊ฐ€๋Šฅ

์ž‘๋™ ๋ฐฉ์‹

1. ํฌ์†Œ ๋ฒกํ„ฐ ์ƒ์„ฑ

Neural Sparse Search๋Š” ํ…์ŠคํŠธ๋ฅผ ํฌ์†Œ ๋ฒกํ„ฐ(token: weight ์Œ์˜ ๋ฆฌ์ŠคํŠธ)๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค:

์ž…๋ ฅ ํ…์ŠคํŠธ: "OpenSearch vector search"

โ†“ Sparse encoding model

ํฌ์†Œ ๋ฒกํ„ฐ:
{
  "opensearch": 0.85,
  "vector": 0.72,
  "search": 0.68,
  "semantic": 0.32,  // term expansion
  "retrieval": 0.28   // term expansion
}

2. ์ธ๋ฑ์‹ฑ

ํฌ์†Œ ๋ฒกํ„ฐ๋ฅผ rank_features ํ•„๋“œ ํƒ€์ž…์œผ๋กœ ์ €์žฅ:

{
  "mappings": {
    "properties": {
      "content": { "type": "text" },
      "content_embedding": { "type": "rank_features" }
    }
  }
}

Ingest pipeline์„ ํ†ตํ•ด ์ž๋™์œผ๋กœ ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ:

{
  "sparse_encoding_ingest_processor": {
    "field_map": {
      "content": "content_embedding"
    },
    "model_id": "<model_id>"
  }
}

3. ๊ฒ€์ƒ‰

neural_sparse ์ฟผ๋ฆฌ๋กœ ๊ฒ€์ƒ‰:

{
  "query": {
    "neural_sparse": {
      "content_embedding": {
        "query_text": "vector database search",
        "model_id": "<model_id>",
        "max_token_score": 3.5
      }
    }
  }
}

์šด์˜ ๋ชจ๋“œ

OpenSearch Neural Sparse Search๋Š” ๋‘ ๊ฐ€์ง€ ์šด์˜ ๋ชจ๋“œ๋ฅผ ์ œ๊ณตํ•œ๋‹ค2.

Doc-only ๋ชจ๋“œ (๊ธฐ๋ณธ)

ํŠน์ง•:

  • ๋ฌธ์„œ๋งŒ ์‹ ๊ฒฝ๋ง ์ธ์ฝ”๋”๋กœ ์ฒ˜๋ฆฌ
  • ์ฟผ๋ฆฌ๋Š” ํ† ํฌ๋‚˜์ด์ €๋กœ ๋ถ„์„ (DL analyzer ์‚ฌ์šฉ)
  • ์˜จ๋ผ์ธ ์ถ”๋ก  ๋‹จ๊ณ„ ์ œ๊ฑฐ๋กœ ์ง€์—ฐ ์‹œ๊ฐ„ ๋Œ€ํญ ๊ฐ์†Œ

์žฅ์ :

  • ๋น ๋ฅธ ๊ฒ€์ƒ‰ ์†๋„
  • ๋‚ฎ์€ ๊ณ„์‚ฐ ๋น„์šฉ
  • ๋Œ€๋ถ€๋ถ„์˜ ์‚ฌ์šฉ ์‚ฌ๋ก€์— ์ ํ•ฉ

๋‹จ์ :

  • Bi-encoder ๋Œ€๋น„ ๊ฒ€์ƒ‰ ์ •ํ™•๋„ ์•ฝ๊ฐ„ ๋‚ฎ์Œ

Bi-encoder ๋ชจ๋“œ

ํŠน์ง•:

  • ๋ฌธ์„œ์™€ ์ฟผ๋ฆฌ ๋ชจ๋‘ ์‹ ๊ฒฝ๋ง ์ธ์ฝ”๋”๋กœ ์ฒ˜๋ฆฌ
  • ์–‘๋ฐฉํ–ฅ ์˜๋ฏธ์  ๋งค์นญ

์žฅ์ :

  • ๋†’์€ ๊ฒ€์ƒ‰ ์ •ํ™•๋„ (NDCG@10 20% ํ–ฅ์ƒ)
  • ๋” ํ’๋ถ€ํ•œ ์˜๋ฏธ์  ํ‘œํ˜„

๋‹จ์ :

  • ์ฟผ๋ฆฌ๋งˆ๋‹ค ๋ชจ๋ธ ์ถ”๋ก  ํ•„์š”๋กœ ์ง€์—ฐ ์‹œ๊ฐ„ ์ฆ๊ฐ€
  • ๋” ๋†’์€ ๊ณ„์‚ฐ ๋น„์šฉ

์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ

OpenSearch๋Š” Hugging Face๋ฅผ ํ†ตํ•ด ๊ณต์‹ sparse encoding ๋ชจ๋ธ์„ ์ œ๊ณตํ•œ๋‹ค.

v1 ๋ชจ๋ธ

opensearch-neural-sparse-encoding-v1 3

  • ์•„ํ‚คํ…์ฒ˜: BERT base (12-layer transformer)
  • ํŒŒ๋ผ๋ฏธํ„ฐ: 133M
  • ์ถœ๋ ฅ: 30,522์ฐจ์› ํฌ์†Œ ๋ฒกํ„ฐ (BERT vocabulary ํฌ๊ธฐ)
  • ํ•™์Šต ๋ฐ์ดํ„ฐ: MS MARCO
  • ์„ฑ๋Šฅ: NDCG@10 ํ‰๊ท  0.524

v2 ๋ชจ๋ธ

v2 ์‹œ๋ฆฌ์ฆˆ๋Š” distillation ๊ธฐ๋ฒ•์œผ๋กœ ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ์„ ๋ชจ๋‘ ๊ฐœ์„ ํ–ˆ๋‹ค4.

opensearch-neural-sparse-encoding-v2-distill

  • ์•„ํ‚คํ…์ฒ˜: DistilBERT base
  • ํŒŒ๋ผ๋ฏธํ„ฐ: 67M (v1 ๋Œ€๋น„ 50% ๊ฐ์†Œ)
  • ์ถœ๋ ฅ: 30,522์ฐจ์› ํฌ์†Œ ๋ฒกํ„ฐ
  • ํ•™์Šต ๋ฐ์ดํ„ฐ: MS MARCO, WikiAnswers, SQuAD, Yahoo Answers ๋“ฑ 14๊ฐœ ๋ฐ์ดํ„ฐ์…‹
  • ์„ฑ๋Šฅ: NDCG@10 ํ‰๊ท  0.528 (v1 ๋Œ€๋น„ ๊ฐœ์„ )
  • ํšจ์œจ์„ฑ:
    • GPU ์ฒ˜๋ฆฌ๋Ÿ‰ 1.39๋ฐฐ ์ฆ๊ฐ€
    • CPU ์ฒ˜๋ฆฌ๋Ÿ‰ 1.74๋ฐฐ ์ฆ๊ฐ€

opensearch-neural-sparse-encoding-doc-v2-mini

  • ์•„ํ‚คํ…์ฒ˜: MiniLM base
  • ํŒŒ๋ผ๋ฏธํ„ฐ: 33M (v1 ๋Œ€๋น„ 75% ๊ฐ์†Œ)
  • ์šฉ๋„: Doc-only ๋ชจ๋“œ ์ „์šฉ
  • ํšจ์œจ์„ฑ:
    • GPU ์ฒ˜๋ฆฌ๋Ÿ‰ 1.74๋ฐฐ ์ฆ๊ฐ€
    • CPU ์ฒ˜๋ฆฌ๋Ÿ‰ 4.18๋ฐฐ ์ฆ๊ฐ€

multilingual-v1 ๋ชจ๋ธ (๋‹ค๊ตญ์–ด ์ง€์›)

opensearch-neural-sparse-encoding-multilingual-v1 5

OpenSearch v3์—์„œ ์ถœ์‹œ๋œ ์ตœ์ดˆ์˜ ๋‹ค๊ตญ์–ด neural sparse retrieval ๋ชจ๋ธ์ด๋‹ค.

  • ์•„ํ‚คํ…์ฒ˜: Multilingual transformer
  • ํŒŒ๋ผ๋ฏธํ„ฐ: 160M
  • ์ถœ๋ ฅ: 105,879์ฐจ์› ํฌ์†Œ ๋ฒกํ„ฐ (๋‹ค๊ตญ์–ด vocabulary)
  • ์„ฑ๋Šฅ:
    • NDCG@10 ํ‰๊ท  0.629 (v2 ๋Œ€๋น„ ํ–ฅ์ƒ)
    • ํ‰๊ท  FLOPS: 1.3
    • ํ‰๊ท  ์ž„๋ฒ ๋”ฉ ํฌ๊ธฐ: 138
  • ์ง€์› ์–ธ์–ด (15๊ฐœ):
    • ์•„๋ž์–ด, ๋ฒต๊ณจ์–ด, ์ค‘๊ตญ์–ด, ์˜์–ด, ํ•€๋ž€๋“œ์–ด, ํ”„๋ž‘์Šค์–ด
    • ํžŒ๋””์–ด, ์ธ๋„๋„ค์‹œ์•„์–ด, ์ผ๋ณธ์–ด, ํ•œ๊ตญ์–ด, ํŽ˜๋ฅด์‹œ์•„์–ด
    • ๋Ÿฌ์‹œ์•„์–ด, ์ŠคํŽ˜์ธ์–ด, ์Šค์™€ํž๋ฆฌ์–ด, ํ…”๋ฃจ๊ตฌ์–ด
  • ํ•™์Šต ๊ธฐ๋ฒ•: GTE์™€ LLM teacher ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ distillation
  • ์„ฑ๋Šฅ: ๋ชจ๋“  ์–ธ์–ด์—์„œ BM25 ๋Œ€๋น„ ํฐ ์„ฑ๋Šฅ ํ–ฅ์ƒ

์ค‘์š” ์ œ์•ฝ์‚ฌํ•ญ:

  • ํ† ํฐ ์ œํ•œ: ๋‹ค๊ตญ์–ด ๋ฌธ์„œ๋Š” ์ฒซ 512 ํ† ํฐ๋งŒ ์ฒ˜๋ฆฌ (์˜์–ด ์ „์šฉ ๋ชจ๋ธ์€ 8,192 ํ† ํฐ)
  • ๊ธด ํ•œ๊ตญ์–ด ๋ฌธ์„œ์˜ ๊ฒฝ์šฐ ๋ฌธ์„œ ๋ถ„ํ• (chunking) ํ•„์š”

๋ชจ๋ธ ์„ ํƒ ๊ฐ€์ด๋“œ

๋ชจ๋ธํŒŒ๋ผ๋ฏธํ„ฐ์–ธ์–ด์šฉ๋„์„ฑ๋Šฅํšจ์œจ์„ฑ
v1133M์˜์–ด๋ฒ”์šฉ๊ธฐ๋ณธ๊ธฐ๋ณธ
v2-distill67M์˜์–ด๋ฒ”์šฉํ–ฅ์ƒ1.4~1.7x
v2-mini33M์˜์–ดDoc-onlyํ–ฅ์ƒ1.7~4.2x
multilingual-v1160M15๊ฐœ ์–ธ์–ด๋‹ค๊ตญ์–ด์ตœ๊ณ ์ค‘๊ฐ„

๊ถŒ์žฅ์‚ฌํ•ญ:

  • ์˜์–ด ์ „์šฉ:
    • ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ: v2-distill (์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ ๊ท ํ˜•)
    • CPU ํ™˜๊ฒฝ: v2-mini (CPU์—์„œ 4๋ฐฐ ์ด์ƒ ๋น ๋ฆ„)
    • ๋†’์€ ์ •ํ™•๋„ ํ•„์š”: v2-distill + bi-encoder ๋ชจ๋“œ
  • ํ•œ๊ตญ์–ด ๋˜๋Š” ๋‹ค๊ตญ์–ด: multilingual-v1 (ํ•œ๊ตญ์–ด ๊ณต์‹ ์ง€์›)

Two-Phase ์•Œ๊ณ ๋ฆฌ์ฆ˜ (OpenSearch 2.15+)

OpenSearch 2.15๋ถ€ํ„ฐ two-phase ๊ฒ€์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ๊ฒ€์ƒ‰ ์†๋„๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค6.

์ž‘๋™ ์›๋ฆฌ

์ฟผ๋ฆฌ ํ† ํฐ์„ ๋‘ ๊ทธ๋ฃน์œผ๋กœ ๋ถ„๋ฆฌ:

  1. High-scoring tokens: ๊ฒ€์ƒ‰ ๊ด€๋ จ๋„๊ฐ€ ๋†’์€ ํ† ํฐ

    • ๋ชจ๋“  ๋ฌธ์„œ ๋Œ€์ƒ scoring ๋ฐ ํ•„ํ„ฐ๋ง
    • Top-k ๋ฌธ์„œ ์„ ํƒ
  2. Low-scoring tokens: ๊ฒ€์ƒ‰ ๊ด€๋ จ๋„๊ฐ€ ๋‚ฎ์€ ํ† ํฐ

    • Top-k ๋ฌธ์„œ๋งŒ ๋Œ€์ƒ์œผ๋กœ rescoring

์„ฑ๋Šฅ ํ–ฅ์ƒ

Doc-only ๋ชจ๋“œ:

  • ์†๋„ ํ–ฅ์ƒ: 1.22x ~ 1.78x

Bi-encoder ๋ชจ๋“œ:

  • ์†๋„ ํ–ฅ์ƒ: 4.15x ~ 6.87x (๋” ํฐ ์„ฑ๋Šฅ ๊ฐœ์„ )

์„ค์ • ๋ฐฉ๋ฒ•

Search pipeline ์ƒ์„ฑ:

{
  "request_processors": [
    {
      "neural_sparse_two_phase": {
        "tag": "neural-sparse-two-phase",
        "description": "Two-phase neural sparse processor",
        "enabled": true
      }
    }
  ]
}

์ฟผ๋ฆฌ ์‹œ pipeline ์‚ฌ์šฉ:

{
  "query": {
    "neural_sparse": {
      "passage_embedding": {
        "query_text": "what is a Manhattan Project",
        "model_id": "<model_id>"
      }
    }
  },
  "search_pipeline": "two_phase_search_pipeline"
}

์„ฑ๋Šฅ ๋น„๊ต

๊ฒ€์ƒ‰ ์ •ํ™•๋„

๋ฐฉ๋ฒ•NDCG@10๊ฐœ์„ ํญ
BM25 (๋ฒ ์ด์Šค๋ผ์ธ)1.00x-
Neural Sparse (doc-only)1.127x+12.7%
Neural Sparse (bi-encoder)1.200x+20.0%

๋ฆฌ์†Œ์Šค ์‚ฌ์šฉ๋Ÿ‰

์ง€ํ‘œDense VectorNeural Sparse
์ธ๋ฑ์Šค ํฌ๊ธฐ100%7.2~10.4%
๊ฒ€์ƒ‰ ์‹œ RAM ์ฆ๊ฐ€+7.9%0%
๊ฒ€์ƒ‰ ์†๋„๋А๋ฆผ (k-NN)๋น ๋ฆ„ (inverted index)

Lucene ์—”์ง„ ์—…๊ทธ๋ ˆ์ด๋“œ ํšจ๊ณผ

OpenSearch 2.12์˜ Lucene ์—”์ง„ ์—…๊ทธ๋ ˆ์ด๋“œ๋กœ neural sparse search ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์—ˆ๋‹ค:

  • ์ฒ˜๋ฆฌ๋Ÿ‰(throughput) ํ–ฅ์ƒ
  • ์ง€์—ฐ ์‹œ๊ฐ„(latency) ๊ฐ์†Œ

์‚ฌ์šฉ ๋ฐฉ๋ฒ•

1. ๋ชจ๋ธ ๋“ฑ๋ก ๋ฐ ๋ฐฐํฌ

POST /_plugins/_ml/models/_register
{
  "name": "opensearch-neural-sparse-encoding-v2-distill",
  "version": "1.0.0",
  "model_format": "TORCH_SCRIPT",
  "function_name": "SPARSE_ENCODING"
}

๋ชจ๋ธ ๋ฐฐํฌ:

POST /_plugins/_ml/models/<model_id>/_deploy

2. Ingest Pipeline ์ƒ์„ฑ

PUT /_ingest/pipeline/neural-sparse-pipeline
{
  "description": "Neural sparse encoding pipeline",
  "processors": [
    {
      "sparse_encoding": {
        "model_id": "<model_id>",
        "field_map": {
          "passage_text": "passage_embedding"
        }
      }
    }
  ]
}

3. ์ธ๋ฑ์Šค ์ƒ์„ฑ ๋ฐ ๋งคํ•‘

PUT /my_index
{
  "settings": {
    "index.default_pipeline": "neural-sparse-pipeline"
  },
  "mappings": {
    "properties": {
      "passage_text": { "type": "text" },
      "passage_embedding": { "type": "rank_features" }
    }
  }
}

4. ๋ฌธ์„œ ์ธ๋ฑ์‹ฑ

POST /my_index/_doc
{
  "passage_text": "OpenSearch provides neural sparse search capabilities for efficient semantic retrieval."
}

5. ๊ฒ€์ƒ‰

GET /my_index/_search
{
  "query": {
    "neural_sparse": {
      "passage_embedding": {
        "query_text": "semantic search in OpenSearch",
        "model_id": "<model_id>",
        "max_token_score": 3.5
      }
    }
  }
}

ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰

Neural Sparse Search๋Š” ๋‹ค๋ฅธ ๊ฒ€์ƒ‰ ๋ฐฉ๋ฒ•๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

BM25 + Neural Sparse

{
  "query": {
    "hybrid": {
      "queries": [
        {
          "match": {
            "passage_text": "vector search"
          }
        },
        {
          "neural_sparse": {
            "passage_embedding": {
              "query_text": "vector search",
              "model_id": "<model_id>"
            }
          }
        }
      ]
    }
  }
}

Dense + Sparse Vectors

๋ฐ€์ง‘ ๋ฒกํ„ฐ์™€ ํฌ์†Œ ๋ฒกํ„ฐ๋ฅผ ๊ฒฐํ•ฉํ•œ RAG ์‹œ์Šคํ…œ7:

  1. 1์ฐจ ๊ฒ€์ƒ‰: Neural Sparse๋กœ ํ›„๋ณด ํ•„ํ„ฐ๋ง
  2. 2์ฐจ ๊ฒ€์ƒ‰: Dense vector๋กœ ์ •๋ฐ€ ๋งค์นญ
  3. ์ˆœ์œ„ ๊ฒฐํ•ฉ: RRF๋กœ ์—ฌ๋Ÿฌ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ ์œตํ•ฉ
  4. Re-ranking: Cross-encoder๋กœ ์ตœ์ข… ์ˆœ์œ„ ๊ฒฐ์ •

ํ™œ์šฉ ์‚ฌ๋ก€

์—”ํ„ฐํ”„๋ผ์ด์ฆˆ ๊ฒ€์ƒ‰

  • ๋‚ด๋ถ€ ๋ฌธ์„œ ๊ฒ€์ƒ‰: ํšŒ์‚ฌ ๋ฌธ์„œ, ์œ„ํ‚ค, ์ง€์‹ ๋ฒ ์ด์Šค
  • ์ฝ”๋“œ ๊ฒ€์ƒ‰: ์†Œ์Šค ์ฝ”๋“œ, API ๋ฌธ์„œ
  • ์ด๋ฉ”์ผ ๊ฒ€์ƒ‰: ์˜๋ฏธ ๊ธฐ๋ฐ˜ ์ด๋ฉ”์ผ ๊ฒ€์ƒ‰

E-commerce

  • ์ƒํ’ˆ ๊ฒ€์ƒ‰: ๋™์˜์–ด, ์œ ์‚ฌ ์ƒํ’ˆ๋ช… ์ฒ˜๋ฆฌ
  • ์ถ”์ฒœ: ๊ด€๋ จ ์ƒํ’ˆ ์ถ”์ฒœ
  • Q&A: ์ƒํ’ˆ ๋ฌธ์˜ ์ž๋™ ๋งค์นญ

RAG (Retrieval-Augmented Generation)

  • ๋ฌธ์„œ ๊ฒ€์ƒ‰: LLM์— ์ œ๊ณตํ•  ๊ด€๋ จ ๋ฌธ์„œ ๊ฒ€์ƒ‰
  • ์‚ฌ์‹ค ๊ฒ€์ฆ: ์ •ํ™•ํ•œ ์ถœ์ฒ˜ ๋ฌธ์„œ ์ œ๊ณต
  • ์ปจํ…์ŠคํŠธ ํ™•์žฅ: ๊ด€๋ จ ์ •๋ณด ์ž๋™ ์ถ”๊ฐ€

๊ณ ๊ฐ ์ง€์›

  • FAQ ๋งค์นญ: ์œ ์‚ฌ ์งˆ๋ฌธ ์ž๋™ ๊ฒ€์ƒ‰
  • ํ‹ฐ์ผ“ ๋ผ์šฐํŒ…: ๊ด€๋ จ ๋ถ€์„œ ์ž๋™ ๋ถ„๋ฅ˜
  • ํ•ด๊ฒฐ์ฑ… ์ œ์•ˆ: ๊ณผ๊ฑฐ ์ด์Šˆ ๊ธฐ๋ฐ˜ ํ•ด๊ฒฐ์ฑ… ์ถ”์ฒœ

๋ชจ๋ฒ” ์‚ฌ๋ก€

๋ชจ๋“œ ์„ ํƒ

Doc-only ๋ชจ๋“œ ๊ถŒ์žฅ ์ƒํ™ฉ:

  • ๋Œ€๋ถ€๋ถ„์˜ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ (๊ธฐ๋ณธ ๊ถŒ์žฅ)
  • ๋‚ฎ์€ ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ค‘์š”ํ•œ ๊ฒฝ์šฐ
  • CPU ๋ฆฌ์†Œ์Šค๊ฐ€ ์ œํ•œ์ ์ธ ๊ฒฝ์šฐ
  • DL (Deep Learning) analyzer์™€ ํ•จ๊ป˜ ์‚ฌ์šฉ

Bi-encoder ๋ชจ๋“œ ๊ถŒ์žฅ ์ƒํ™ฉ:

  • ๊ฒ€์ƒ‰ ์ •ํ™•๋„๊ฐ€ ์ตœ์šฐ์„ ์ธ ๊ฒฝ์šฐ
  • ์ฟผ๋ฆฌ ๋นˆ๋„๊ฐ€ ๋‚ฎ์€ ๊ฒฝ์šฐ
  • ์ถฉ๋ถ„ํ•œ GPU/CPU ๋ฆฌ์†Œ์Šค๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ

์„ฑ๋Šฅ ์ตœ์ ํ™”

  1. Two-phase ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์‚ฌ์šฉ: OpenSearch 2.15+ ํ™˜๊ฒฝ
  2. ์ ์ ˆํ•œ ๋ชจ๋ธ ์„ ํƒ: CPU ํ™˜๊ฒฝ์—์„œ๋Š” v2-mini ๊ณ ๋ ค
  3. max_token_score ์กฐ์ •: ์ฟผ๋ฆฌ ํŠน์„ฑ์— ๋”ฐ๋ผ ์ž„๊ณ„๊ฐ’ ์ตœ์ ํ™”
  4. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๊ฒ€์ƒ‰: BM25์™€ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ฐ•๊ฑด์„ฑ ํ–ฅ์ƒ

์ œํ•œ ์‚ฌํ•ญ

  • ์–ธ์–ด ์ง€์›:
    • v1, v2 ๋ชจ๋ธ: ์˜์–ด ์ „์šฉ
    • multilingual-v1: 15๊ฐœ ์–ธ์–ด ์ง€์› (ํ•œ๊ตญ์–ด ํฌํ•จ)
    • ๋‹ค๊ตญ์–ด ๋ชจ๋ธ์˜ ํ† ํฐ ์ œํ•œ: 512 ํ† ํฐ (์˜์–ด ์ „์šฉ์€ 8,192 ํ† ํฐ)
  • ๋ชจ๋ธ ํฌ๊ธฐ: ํฐ ๋ชจ๋ธ์ผ์ˆ˜๋ก ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰ ์ฆ๊ฐ€
  • ํ•™์Šต ๋ฐ์ดํ„ฐ: ๋„๋ฉ”์ธ ํŠนํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•ด fine-tuning ํ•„์š”

๋ฒ„์ „ ํžˆ์Šคํ† ๋ฆฌ

OpenSearch 2.11 (2023)

  • Neural Sparse Search ์ตœ์ดˆ ๋„์ž…
  • Doc-only ๋ฐ bi-encoder ๋ชจ๋“œ ์ง€์›
  • v1 ๋ชจ๋ธ ์ œ๊ณต

OpenSearch 2.13 (2024)

  • Neural Sparse Search Tool ๋„์ž…
  • Agent ๊ธฐ๋ฐ˜ ์›Œํฌํ”Œ๋กœ์šฐ ์ง€์›

OpenSearch 2.15 (2024)

  • Two-phase ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋„์ž…
  • Boolean compound query ์ง€์›
  • 4~7๋ฐฐ ์†๋„ ํ–ฅ์ƒ (bi-encoder ๋ชจ๋“œ)

OpenSearch 3.0 (2024-2025)

  • multilingual-v1 ๋ชจ๋ธ ์ถœ์‹œ
  • 15๊ฐœ ์–ธ์–ด ๊ณต์‹ ์ง€์› (ํ•œ๊ตญ์–ด ํฌํ•จ)
  • GTE ๋ฐ LLM teacher ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํ–ฅ์ƒ๋œ ํ•™์Šต ๊ธฐ๋ฒ•
  • v2 ๋Œ€๋น„ ์„ฑ๋Šฅ ํ–ฅ์ƒ (NDCG@10: 0.629)

๊ด€๋ จ ๊ธฐ์ˆ 

์ฐธ๊ณ  ์ž๋ฃŒ

๊ณต์‹ ๋ฌธ์„œ

๋ธ”๋กœ๊ทธ ๋ฐ ํŠœํ† ๋ฆฌ์–ผ

AWS ์ž๋ฃŒ

๋ชจ๋ธ ์ €์žฅ์†Œ

Footnotes

  1. Neural sparse search - OpenSearch Documentation โ†ฉ

  2. OpenSearch ๊ณต์‹ ๋ฌธ์„œ์—์„œ๋Š” doc-only ๋ชจ๋“œ๋ฅผ ๊ธฐ๋ณธ ๊ถŒ์žฅ ์„ค์ •์œผ๋กœ ์ œ์‹œํ•œ๋‹ค โ†ฉ

  3. opensearch-neural-sparse-encoding-v1 - Hugging Face โ†ฉ

  4. v2 ๋ชจ๋ธ์€ heterogeneous teacher ๋ชจ๋ธ์—์„œ distillationํ•˜๋Š” ๋ฐฉ์‹์ด InfoNCE loss๋กœ ์‚ฌ์ „ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํšจ๊ณผ์ ์ž„์„ ๋ณด์—ฌ์คŒ โ†ฉ

  5. opensearch-neural-sparse-encoding-multilingual-v1 - Hugging Face โ†ฉ

  6. Neural sparse two-phase processor๋Š” ๊ฒ€์ƒ‰ ๊ด€๋ จ๋„์— ๋ฏธ๋ฏธํ•œ ์˜ํ–ฅ๋งŒ ์ฃผ๋ฉด์„œ ์ตœ๋Œ€ 9.8๋ฐฐ ์†๋„ ํ–ฅ์ƒ ๋‹ฌ์„ฑ โ†ฉ

  7. Integrate sparse and dense vectors in RAG - AWS Blog โ†ฉ