Embedchain์€ RAG(Retrieval-Augmented Generation) ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋œ ์˜คํ”ˆ์†Œ์Šค Python ํ”„๋ ˆ์ž„์›Œํฌ๋‹ค.

๊ฐœ์š”

Embedchain์€ โ€œConventional but Configurableโ€(๊ด€๋ก€์ ์ด์ง€๋งŒ ์„ค์ • ๊ฐ€๋Šฅํ•œ) ์„ค๊ณ„ ์›์น™์„ ๋”ฐ๋ฅด๋ฉฐ, ๋น„๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ LLM๊ณผ ์—ฐ๊ฒฐํ•˜๋Š” ๊ณผ์ •์„ ๊ฐ„์†Œํ™”ํ•œ๋‹ค1. ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ, ์ฒญํ‚น(chunking), ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ, ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์ €์žฅ๊นŒ์ง€์˜ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž๋™์œผ๋กœ ์ฒ˜๋ฆฌํ•œ๋‹ค.

์ฃผ์š” ํŠน์ง•

๊ฐ„๋‹จํ•œ API

๋‹จ 4์ค„์˜ ์ฝ”๋“œ๋กœ AI ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค:

from embedchain import App
 
app = App()
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
app.query("How many companies does Elon Musk run?")

๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ ์†Œ์Šค ์ง€์›

  • ์›น ํŽ˜์ด์ง€ (URL)
  • YouTube ๋™์˜์ƒ
  • PDF ๋ฌธ์„œ
  • CSV, JSON ํŒŒ์ผ
  • Notion ํŽ˜์ด์ง€
  • Markdown, DOCX ํŒŒ์ผ
  • ์ปค์Šคํ…€ ํ…์ŠคํŠธ

๊ด‘๋ฒ”์œ„ํ•œ ํ†ตํ•ฉ ์ง€์›

LLM (Large Language Model)2:

  • OpenAI (GPT-3.5, GPT-4 ๋“ฑ)
  • Azure OpenAI
  • Anthropic Claude
  • Hugging Face ๋ชจ๋ธ
  • Llama2
  • Cohere

๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค2:

  • ChromaDB
  • Elasticsearch
  • OpenSearch
  • Pinecone
  • Weaviate
  • Qdrant
  • Zilliz
  • LanceDB

์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ:

  • OpenAI ์ž„๋ฒ ๋”ฉ (text-embedding-3-small, text-embedding-3-large)
  • Google AI
  • AWS Bedrock
  • Hugging Face Sentence Transformers
  • GPT4All (CPU ์ตœ์ ํ™”)
  • Google VertexAI
  • NVIDIA AI Foundation

์ž‘๋™ ์›๋ฆฌ

Embedchain์˜ RAG ํŒŒ์ดํ”„๋ผ์ธ์€ ๋‹ค์Œ ๋‹จ๊ณ„๋กœ ๊ตฌ์„ฑ๋œ๋‹ค:

  1. ๋ฐ์ดํ„ฐ ๋กœ๋”ฉ: ๋‹ค์–‘ํ•œ ์†Œ์Šค์—์„œ ์ž๋™์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ์ธ์‹ํ•˜๊ณ  ๋กœ๋“œ
  2. ์ฒญํ‚น: ๋ฐ์ดํ„ฐ๋ฅผ ์ ์ ˆํ•œ ํฌ๊ธฐ์˜ ์ฒญํฌ๋กœ ๋ถ„ํ• 
  3. ์ž„๋ฒ ๋”ฉ ์ƒ์„ฑ: ๊ฐ ์ฒญํฌ๋ฅผ ๋ฒกํ„ฐ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋ณ€ํ™˜
  4. ๋ฒกํ„ฐ ์ €์žฅ: ์ƒ์„ฑ๋œ ์ž„๋ฒ ๋”ฉ์„ ๋ฒกํ„ฐ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์— ์ €์žฅ
  5. ์ฟผ๋ฆฌ ์ฒ˜๋ฆฌ:
    • ์‚ฌ์šฉ์ž ์งˆ๋ฌธ์„ ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋ณ€ํ™˜
    • ๊ด€๋ จ ๋ฌธ์„œ ๊ฒ€์ƒ‰
    • LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ต๋ณ€ ์ƒ์„ฑ

์„ค์น˜ ๋ฐ ์‚ฌ์šฉ

์„ค์น˜

pip install embedchain

Python 3.9 ์ด์ƒ 3.13 ์ดํ•˜ ๋ฒ„์ „์„ ์ง€์›ํ•œ๋‹ค.

๊ธฐ๋ณธ ์‚ฌ์šฉ ์˜ˆ์ œ

from embedchain import App
 
# ์•ฑ ์ดˆ๊ธฐํ™”
app = App()
 
# ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
app.add("https://en.wikipedia.org/wiki/Elon_Musk")
app.add("https://www.forbes.com/profile/elon-musk")
 
# ์งˆ๋ฌธํ•˜๊ธฐ
response = app.query("How many companies does Elon Musk run and name those?")
print(response)

LangChain๊ณผ์˜ ํ†ตํ•ฉ

Embedchain์€ LangChain์˜ retriever๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค:

from embedchain import create
 
# Retriever ์ƒ์„ฑ
retriever = create()
 
# ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
retriever.add_texts([
    "https://en.wikipedia.org/wiki/Elon_Musk",
    "https://www.youtube.com/watch?v=RcYjXbSJBN8"
])
 
# ์ •๋ณด ๊ฒ€์ƒ‰
result = retriever.invoke("How many companies does Elon Musk run?")

์žฅ์ 

  • ์ถ”์ƒํ™”: ๋ณต์žกํ•œ RAG ํŒŒ์ดํ”„๋ผ์ธ์„ ๊ฐ„๋‹จํ•œ API๋กœ ์ถ”์ƒํ™”
  • ์ž๋™ํ™”: ๋ฐ์ดํ„ฐ ํƒ€์ž… ์ž๋™ ๊ฐ์ง€ ๋ฐ ์ฒ˜๋ฆฌ
  • ์œ ์—ฐ์„ฑ: ๋‹ค์–‘ํ•œ LLM, ๋ฒกํ„ฐ DB, ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ ์ง€์›
  • ํ™•์žฅ์„ฑ: YAML ์„ค์ •์„ ํ†ตํ•œ ์ปค์Šคํ„ฐ๋งˆ์ด์ง• ๊ฐ€๋Šฅ
  • ์ƒ์‚ฐ์„ฑ: ๋น ๋ฅธ ํ”„๋กœํ† ํƒ€์ดํ•‘๊ณผ ๊ฐœ๋ฐœ

LangChain๊ณผ์˜ ๋น„๊ต

Embedchain์€ LangChain ์œ„์— ๊ตฌ์ถ•๋œ ๋ž˜ํผ(wrapper)๋กœ, ๋” ๋†’์€ ์ˆ˜์ค€์˜ ์ถ”์ƒํ™”๋ฅผ ์ œ๊ณตํ•œ๋‹ค3. ์ด๋Š” ์‚ฌ์šฉ์ด ๋” ๊ฐ„๋‹จํ•˜์ง€๋งŒ, ์„ธ๋ฐ€ํ•œ ์ œ์–ด๋Š” LangChain์— ๋น„ํ•ด ์ œํ•œ์ ์ผ ์ˆ˜ ์žˆ๋‹ค.

๋Œ€์ƒ ์‚ฌ์šฉ์ž

  • ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž
  • ๋จธ์‹ ๋Ÿฌ๋‹ ์—”์ง€๋‹ˆ์–ด
  • ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด
  • AI ๊ฐœ๋ฐœ์ž
  • ๋…๋ฆฝ ๊ฐœ๋ฐœ์ž ๋ฐ ์ทจ๋ฏธ ๊ฐœ๋ฐœ์ž

์ฐธ๊ณ  ์ž๋ฃŒ

Footnotes

  1. Embedchain ๊ณต์‹ ๋ฌธ์„œ - Introduction โ†ฉ

  2. Analytics Vidhya - Introduction to Embedchain: A Data Platform Tailored for LLMs โ†ฉ โ†ฉ2

  3. What is the difference between Embedchain and LangChain - Getting Started AI โ†ฉ