The current prevailing approach to equip Large Language Models (LLMs) with Retrieval Augmented Generation (RAG) capability involves the use of embeddings connected to a vector store. However, the approximation of these embeddings may not be directly relevant to the actual needs of the user.
For example, when a user asks, "How's the weather today?", a series of complex decisions are needed: the current date and time, the user's location (based on IP, Geo), the external tool to be used for the inquiry, etc. Or, if a user asks about a word, the context at that moment is necessary for making assumptions.
By using embeddings and a vector store, it can feel like we're disregarding the advancements made in search engines, which seems counterproductive.
However, it prompts the question, what would the suitable search engine or contextual judgment look like for use in LLMs? Could it be a Knowledge Graph?
Chinese version:
目前要讓 LLM 具備 Retrieval Augmented Generation (RAG) 能力的主流做法,是透過 embedding 串接 vector store。但是 embedding 的近似,跟用戶的實際需求,可能是完全無關的兩件事情。
例如當用戶問「今天天氣如何?」的時候,需要的是一連串複雜的判斷:現在是幾月幾號幾點、用戶在哪裡(IP, Geo)、要用哪個外部工具查詢等等。或是當用戶問一個單詞的時候,得要根據當下的語境來做假設。
用 embedding & vector store,基本上是拋棄掉搜尋引擎累積的成果,感覺上是個彎路。
不過反過來問,適合給 LLM 使用的搜尋引擎、情境判斷,又會是什麼樣子呢?是 Knowledge Graph 嗎?我存疑。
跟以前NLP需充分捕捉上下文訊息,去解決需要深度理解和推理的問題。
如果一個語言模型沒有充足的上下文信息(我也常說是場景還原),或者沒有足夠的能力去提示、補充、識別和解析輸入中的Slot,在特定Domain下也有不同的intent和slot。
不過也在思考如果沒有滿意度較高答案下的intent下該怎麼情緒安撫和引導設計(讓GPT模仿特定溫馨安撫的老鄰居?)
我的認知是使用者的問題,不會是單一次的搜尋就收工。