Quite a lot to cover here! So in addition to the typical RAG pipeline, we have m...

Quite a lot to cover here! So in addition to the typical RAG pipeline, we have many other signals like learning from user feedback, time based weighting, metadata handling, weighting between title/content, and different custom deep learning models that run at inference and indexing time all to help the retrieval. But this is all part of the RAG component.

The agent part is the loop of running the LLM over RAG system and letting it decide which questions it wants to explore more (some similarities to retry|refuse|respond I guess?). We also have the model do CoT over its own results including over the subquestions it generates.

Essentially it is the deep research paradigm with some more parallelism and a document index backing it.

How does the agent traverse the information: there are index-free approaches where the LLM has to use the searches of the tools. This gives worse results than approaches that build a coherent index across sources. We use the latter approach. So the search occurs over our index which is a central place for all the knowledge across all connected tools.

Do you have any internal evals on how well the different model affect the overall quality of output, esp for a "deep search" type of task? I have model-picker fatigue: Yes, we have datasets that we use internally. It comprises of "company type data" rather than "web type" data (like short Slack messages, very technical design documents, etc.) comprising about 10K documents and 500 questions.

For which model to use: it was developed primarily against gpt-4o but we retuned the prompts to work with all the recent models like Claude 3.5, Gemini, Deepseek, etc.

Do you plan to implement knowledge graphs in the future? Yes! We're looking into customizing LLM based knowledge graphs like LightGraphRAG (inspired by, but not the same).