RAGFlow Blog

RAGFlow 0.25 — Ingestion pipeline, agent sandbox, and user-level memory

Thu, 23 Apr 2026 00:00:00 GMT

As RAG applications move toward large-scale deployment, the developer's challenge has shifted from "how to build" to "how to govern":

How can complex document parsing be flexibly customized?
How can online agents be iterated without disrupting business continuity?
How can AI truly understand and remember each user's unique context?
How can all of this be seamlessly integrated into existing corporate digital ecosystems?

v0.25 addresses these by focusing on Ingestion Pipeline orchestration and Agent engineering, providing full-stack maturity for enterprise RAG applications—from document parsing to agent delivery.

Ingestion pipeline enhancement

Document parsing and chunking are core to RAG. Previously, RAGFlow's built-in methods (General, Paper, Laws, etc.) covered major scenarios but struggled with specific business needs, such as maintaining table structures in financial reports or hierarchical slicing in legal contracts.

The Ingestion Pipeline, introduced in version 0.21, was created to solve this flexibility issue.

In 0.25, RAGFlow takes this further by adding 7 out-of-the-box pipeline templates that align with built-in strategies. Developers no longer have to choose between ease of use and flexibility: they can replicate official best practices with one click or adjust processing nodes like LEGO blocks.

These templates allow developers to see the logic behind built-in data processing and modify it to meet business requirements. The image below shows a Paper template pipeline, which now offers multi-column detection in the Parser node compared to version 0.21.

The Title Chunker has also been enhanced to support both Group and Hierarchy chunking methods.

Solid data processing foundations enable more imaginative Agents. A comparison of resume parsing illustrates the difference: built-in parsing may result in incomplete segmentation.

During chat, because the built-in method summarizes info rather than retaining the original text, specific details like GPA might be missed, leading to retrieval failure.

Even with a candidate's name, the lack of original text in the built-in method prevents the retrieval of detailed experiences.

In contrast, pipeline-based resume parsing slices content by section while preserving key information.

Since the pipeline defines metadata extraction, it automatically segments key attributes.

During chat, this metadata allows for accurate candidate positioning and retrieval of detailed experiences.

Agent enhancement

Agent publishing support

Iterating on Agents usually involves risk: will a prompt change cause regression? Will a new tool break the flow? Previously, developers had to manually copy and switch agents to avoid downtime.

Now, after orchestrating an agent, you can click the Publish button.

Once published, the current version is locked as the "online version." Subsequent experiments on the canvas (e.g., changing a translation target from English to Spanish) will not affect the live agent's behavior. To access the published version via API, set the release parameter to True.

For embedded pages, simply toggle the Published switch.

As shown below, while the developer changes the canvas to Spanish, the embedded chat remains locked to the published English version until the developer chooses to publish the update.

This "gray-friendly" design aligns agent iteration with business release cycles, allowing for bold experimentation and careful deployment.

Agent sandbox execution and charting

LLMs excel at qualitative analysis but often struggle with quantitative tasks. When asked for sales trends, LLMs can make calculation errors and cannot generate downloadable charts.

Agents can now generate and execute Python code within a sandbox. A new "Data Analytics" template is available to demonstrate this.

The CodeExec tool is mounted under the Agent node.

For example, when asked to analyze a 12-month sales dataset, calculate growth, and perform linear regression, the agent doesn't just calculate the numbers—it uses matplotlib to generate a business-style chart and provides a downloadable PNG.

This allows RAGFlow Agents to produce ready-to-use business assets.

Memory orchestration with user-level isolation

Version 0.25 introduces a User ID dimension to the Memory mechanism, enabling user-level isolation. This can be configured in the Message node on the Agent canvas.

Retrieval nodes can be set to only access memory associated with a specific User ID.

Developers can use the sys.user_id system variable to capture unique identifiers from external systems, turning the Agent into a personalized consultant for every user.

OpenClaw integration

Version 0.25 integrates with OpenClaw. This allows enterprise users to manage files, parsing, and knowledge base retrieval directly within the Feishu (Lark) environment. Documents in Feishu can automatically sync to RAGFlow for deep parsing, and the Feishu chat window can directly call RAGFlow knowledge bases.

For more information on configuration, refer to: RAGFlow x OpenClaw - The Enterprise-aware Claw

Conclusion

RAGFlow 0.25 includes 558 new Pull Requests and contributions from 65 developers. Beyond bug fixes, this version marks a steady step forward in data governance and Agent engineering. Moving forward, RAGFlow will explore the boundaries of context engines to build smarter, more reliable enterprise data infrastructure.

Data foundation in the era of agent harness - why RAGFlow is changing

Wed, 22 Apr 2026 00:00:00 GMT

In our 2025 year-end review, we clearly defined RAGFlow’s product evolution direction as a context engine and explained the technical necessity of this path from the technical underlying layer. Several months have passed, and the evolution speed of the AI ecosystem is restructuring industry perceptions almost on a monthly basis. Whether the vision proposed at that time is still valid today? This article attempts to conduct a phased review and use it to respond to community concerns regarding RAGFlow's subsequent positioning.

Agent harness has undergone extremely dense practical iterations over the past few months. From the modular tools and large-scale application of skills in OpenClaw, to the engineering breakthroughs of Hermes Agent in context management and skills evolution, and the indirect open-sourcing of claw-code which allowed the community to truly touch the engineering core of an agent harness—answering the core proposition of "what a reliable general agent harness should actually look like"—harness design paradigms are being redefined at an unprecedented speed. These iterations together point to a clear trend: the harness is gradually converging from independent solutions into infrastructure.

Recently, with the release of Claude Managed Agents, agent harness has entered a productized and enterprise-level stage. Currently, the components included in an agent harness, such as task orchestration, context engineering, skills systems, and memory operations, are simultaneously presenting two trends. On one hand, their definitions and usage methods are accelerating towards standardization and engineering; on the other hand, some capabilities originally undertaken by the harness are being natively absorbed by LLMs. Anthropic, therefore, abstracts the LLM and harness together as a "Brain"—a stateless decision core responsible for task planning and distribution. The actual capability boundary of an agent is determined by the tools it can call, which are started and executed in code form within a sandbox. This is a crucial architectural abstraction: the Brain remains stateless and can be started or stopped at will, while all persistent states are stored externally (e.g., in session data). The entire ecosystem is thus irreversibly moving toward a clear separation between the state layer and the stateless layer.

This separation trend coincides perfectly with RAGFlow's positioning—supplying data for agents and serving as the agent's data foundation. For enterprise scenarios, an open-source data foundation located at the state layer is particularly critical. It not only provides high-quality, strong-semantic context data for various stateless Brains, but more importantly, it allows enterprises to avoid being bound to a specific vendor's Brain implementation, thereby establishing a unified and independently controllable data infrastructure.

Why the data foundation must, and can only, be built with retrieval capability as its core

Intuitively, there are two technical paths for building an agent data foundation. One is to expand based on various existing databases—this line of thought directly inherits existing experience from big data platforms. Supplementing vector retrieval capabilities on the basis of data lakes and data warehouses seems to logically form the prototype of an "all-powerful foundation". However, this idea precisely ignores the following key facts:

The processing of unstructured documents goes far beyond storage and retrieval. An agent's demand for unstructured data like documents is not just "storing it and finding it." It involves a complex pipeline centered on heterogenous data source access, governance, cleaning, and semantic enhancement. At the same time, there are diverse deep demands in the retrieval process—such as what indexing strategy to use, whether real-time completion and navigation are needed during retrieval, and expanding semantic associations. These logics are highly complex and dynamic, making them difficult to complete elegantly in a closed manner within a database.
The structured data system does not need to be rebuilt. Existing enterprise data platforms and business systems already carry a large amount of structured assets; re-covering them with a new database is neither realistic nor necessary. What an agent truly needs is access interfaces to these systems, along with supporting documentation—including API reference manuals, calling examples, and related skills best practices. Providing rich search capabilities for these document-like metadata is sufficient to help agents efficiently filter out the vast majority of tool options that do not match current intentions during execution, thereby initiating calls with the correct parameters and forms.
The value of conversational memory data is reflected in two dimensions. Conversation data generated during agent operation serves as the source of truth for the harness session history—a complete execution log whose primary demand is reliable storage and governance. On the other hand, it needs to provide memory supply for subsequent agent behavior, where the core demand is efficient retrieval. However, retrieval here is not a simple search of raw session content, but a whole set of control logic decided by the harness regarding "when to read, when to write, what to write, how to compress, how to replay, and how to maintain consistency across sessions." This layer of memory harness orchestration and control, along with the data management pipeline supporting it—processing, abstracting, and indexing raw session data—is essentially no different from the cleaning and semantic enhancement pipeline experienced by document-type data.

In summary, retrieval capability constitutes the capability core of the agent data foundation, but it is not a simple "database with search functions." Instead, it is a whole set of data control planes built around retrieval. From the perspective of harness usage, a simple search command-line interface seems to meet most needs, but supporting this interface is a whole set of data engineering systems far beyond the scope of vector retrieval—from the access and understanding of multimodal data to the selection and tuning of indexing strategies, to intention perception and context enhancement during the retrieval process.

What kind of retrieval does agent harness actually need?

If the original design point of RAG was to provide precise answers for human users, then the agentic era requires the retrieval system to redesign its service paradigm for intelligent agents.

In a sense, RAG itself can be understood as a built-in, simplified version of an agent: humans propose query intentions, the system goes through built-in decision logic—such as deep research, intention recognition, content navigation, etc.—assembles the complete context, and finally hands it over to the LLM to generate an answer. In a typical daily knowledge base task, this type of behavioral interaction usually converges and completes within ten-odd times. However, as RAG technology further sinks, its role is shifting from a human-oriented "Q&A engine" to a "context supply layer" serving the agent framework. Humans no longer directly initiate queries to the knowledge base but entrust the requirement as a whole to the intelligent agent, which independently completes the acquisition and assembly of answers. Under this shift, retrieval behavior will present characteristics completely different from information retrieval research over the past few decades:

Fundamental migration of behavioral patterns:
- The volume of search requests initiated by agents will far exceed that of human users, possibly even spanning two orders of magnitude. Simultaneously, the agent's query trajectory will significantly lengthen, forming continuous, exploratory retrieval sequences.
- Therefore, the focus of system optimization should shift from traditional "query expansion and rewriting" to "execution efficiency optimization" and "diversity assurance with session awareness."
Re-examination of retrieval infrastructure:
- Probability ranking principles, user models, benchmark testing systems, and even the concept of "relevance" itself may need to be redefined and calibrated in agentic scenarios.
- How to robustly deal with the impact of mixed distributions of organic and synthetic queries will become a key technical challenge in the next stage.

	Traditional IR (human user)	Intelligent agent IR (AI user)
User profile	Unskilled searcher, needs help	Skilled researcher, but might "get stuck in a rut"
Core challenge	User intention is unclear → system must "guess"	User can search too much → system must "block"
Key capability	Understanding vague intentions	Predicting behavioral patterns, preventing resource exhaustion

New opportunities for semantic caching and truncation strategies:
- Queries within the same task trajectory are highly similar at the semantic level but have extremely low exact text matching rates, which brings huge room for semantic caching, while traditional caching strategies based on exact hits are almost completely invalid.
- Intelligent agents will not naturally give up when encountering difficulties like humans do; therefore, new truncation and stop strategies need to be designed to prevent the infinite spread of invalid retrieval.

Facing these emerging propositions, RAGFlow's evolution direction also corresponds to them one by one:

Refactor the online API service layer with the Go language. RAGFlow's early use of Python for rapid verification and iteration played an irreplaceable role in the project's success, but its performance bottlenecks have become increasingly prominent with the expansion of call scale. The core service rewritten in Go can calmly handle the two orders of magnitude increase in access for agentic retrieval and fully release the performance potential of the underlying database.
Provide a unified ContextEngine CLI access entry. Intelligent agents only need to use a single search command to seamlessly retrieve all information across heterogenous data sources, greatly reducing the complexity of access and usage on the harness side.
Build-in richer data processing pipelines. The upper limit of retrieval capability essentially depends on the depth and breadth of semantic enhancement during the data writing process. Continuous maintenance and diversified expansion of data pipelines are always the most critical and complex factors determining retrieval quality.

After completing the above evolution, RAGFlow will formally release version 1.0 in a more complete posture. But 1.0 is definitely not an end point. On the contrary, when retrieval users change from humans to intelligent agents, a whole new problem domain has just surfaced: How to define "relevance" from an agent's perspective? How to evaluate the cumulative utility of the retrieval system in hundreds of continuous calls? At what granularity and with what strategy should semantic caching be embedded into the retrieval pipeline? How should truncation and retry mechanisms be designed in coordination with the agent's task planning? These questions cannot find ready-made answers in traditional information retrieval papers; they together point to a fact—agentic retrieval will become one of the core battlefields for industry innovation in the next stage.

RAGFlow is closely monitoring and actively following this frontier. From selecting more efficient index structures to building a retrieval evaluation system oriented toward agent trajectories, to exploring deep integration with open-source harness frameworks, RAGFlow’s evolution direction is always in sync with the data demands of the agentic era. We believe that a true data foundation for intelligent agents will not be a finished product achieved in one go, but an organic system that continuously resonates with the ecosystem and constantly reshapes itself.

Bibliography

RAGFlow x OpenClaw - The Enterprise-aware Claw

Fri, 20 Mar 2026 00:00:00 GMT

As agent technology races forward, OpenClaw has quickly become a favorite among developers, celebrated for its powerful execution framework and versatile action capabilities. It lets agents take on tasks, use tools, and even navigate complex business operations—almost as naturally as a human would. However, when applied to enterprise settings, a key limitation emerges: OpenClaw cannot directly access or process internal private data. This data—often buried across vast numbers of documents, databases, and knowledge repositories—is essential for building truly specialised, expert-level agents.

This is where RAGFlow comes in. As a leading enterprise RAG engine, RAGFlow specialises in deep document understanding and precise information retrieval, transforming scattered proprietary knowledge into agent-ready datasets.

Now that RAGFlow has completed its foundational Skill integration with OpenClaw, any RAGFlow user can easily connect their dataset to the OpenClaw ecosystem. Whether on Feishu, Discord, or other platforms, OpenClaw agents can tap into company-specific knowledge in real time during conversations—making the leap from a general-purpose assistant to a true business expert.

Feature Overview

The currently released version of the RAGFlow skill is a foundational release, focusing on integrating core dataset capabilities. Once this skill is connected, RAGFlow's core services can be invoked directly through OpenClaw. The following features are demonstrated using the Feishu platform.

Dataset operations

Full CRUD (Create, Read, Update, Delete) control over RAGFlow datasets, directly via Feishu:

Basic operations: Create datasets, view details, and retrieve overviews.
Attribute management: Update existing dataset names, parsing methods, and descriptions.

Automated document processing

Upload, parse, and manage RAGFlow documents directly via Feishu:

Multi-file management: Upload and parse single or multiple files in mainstream formats such as PDF, TXT, and DOCX.
Status control: Enable or disable documents, change chunking methods, and rename files.

Semantic search and scope control

Search RAGFlow dataset content directly during conversations with the OpenClaw bot on Feishu:

Multi-dimensional search: Supports cross-dataset search, targeted dataset search, and fine-grained search within specific documents.

The examples above cover the core atomic capabilities of RAGFlow—dataset management, parsing, and retrieval.

The upper limit of an agent's expertise ultimately depends on the quality of the knowledge its developer has cultivated within its dataset. The value of the RAGFlow skill lies in ensuring that this private knowledge can be called upon by the OpenClaw framework with precision and efficiency—giving agents the ability to ground their responses in solid information and operate with genuine business insights.

To help developers quickly build this knowledge-driven architecture, the skill is now available via ClawHub—enabling OpenClaw to acquire a specialised brain in just a few steps.

A quickstart guide

Download the RAGFlow Skill from ClawHub https://clawhub.ai/yingfeng/ragflow-skill.
Obtain your RAGFlow API key and URL:
Go to your RAGFlow personal homepage and click on API.

Configure API Key and URL:
Locate the .env file in the skill root directory and enter the credentials obtained in the previous step:

# RAGFlow service base url
RAGFLOW_API_URL=http://your-ragflow-ip
# RAGFlow personal API Key
RAGFLOW_API_KEY=ragflow-your-api-key-here

Restart Gateway to take effect

After modifying the configuration, restart the OpenClaw Gateway service to complete the initialisation and loading of the skill.

Once the above configuration is complete, RAGFlow's dataset management and retrieval capabilities can be called directly within the OpenClaw framework.

Finale

The foundational integration between OpenClaw and RAGFlow marks a shift for enterprise AI assistants—from general‑purpose conversation to business‑focused operation. OpenClaw handles the execution layer and user interaction, while RAGFlow provides a continuously evolving knowledge foundation.

The current release focuses on standardising integration with core dataset capabilities. Future iterations of the RAGFlow Skills will not only introduce further enhancements but also launch a dedicated ContextEngine. This can be injected into OpenClaw as a System Prompt or directly interface with OpenClaw's ContextEngine API, delivering a robust context layer and data infrastructure for a wide range of agents.

We invite you to follow and star our project as we continue to grow RAGFlow. GitHub: https://github.com/infiniflow/ragflow

RAGFlow online service domain switch

Thu, 19 Mar 2026 00:00:00 GMT

We are excited to announce an important update regarding the RAGFlow online service!

Since the open-source launch of RAGFlow, we have provided a public playground at https://demo.ragflow.io for everyone to experience the power of retrieval-augmented generation. Over the past two years, the platform has gained tremendous traction, and we are deeply grateful for the continuous support and feedback from our community.

As we look toward the future, we are transitioning our online service to a new domain: https://cloud.ragflow.io. This change marks the first step toward our upcoming commercial SaaS offering, which will bring enhanced features, better scalability, and dedicated support.

What Does This Mean for Existing Users?

All user data from demo.ragflow.io has been securely migrated to the new platform. This includes accounts registered via both GitHub SSO and email. However, due to technical constraints during the migration process, users who registered with an email and password will need to reset their passwords before they can log in to the new cloud.ragflow.io site.

We sincerely apologize for this inconvenience. Please rest assured that your data—including documents, conversation histories, and settings—remains intact and will be available once you log in after resetting your password.

How to Access Your Account?

Visit https://cloud.ragflow.io.
Click on “Sign In” and choose the appropriate method:
- If you originally used GitHub SSO, simply log in with GitHub—your account should work seamlessly.
- If you registered with an email and password, click “Forgot Password” to initiate a password reset. Follow the instructions sent to your email to set a new password.

Looking Ahead

The move to cloud.ragflow.io lays the groundwork for our upcoming commercial SaaS version, which will offer advanced capabilities, higher reliability, and priority support. We are working hard to bring you a polished, production-ready service, and we can’t wait to share more details soon.

Thank you for being part of the RAGFlow journey. Your trust and patience mean the world to us. If you encounter any issues during the transition, please reach out to our support team—we’re here to help.

Stay tuned for more updates!

The RAGFlow Team

RAGFlow 0.24.0 — Memory API, knowledge base governance and Agent chat history

Sun, 15 Feb 2026 00:00:00 GMT

RAGFlow 0.24 introduces a series of features addressing core challenges developers face in real-world deployments: how to integrate and maintain Memory, how to ensure continuous Agent usability, and how to govern knowledge base data effectively.

Observable Memory: The console now features Memory extraction status and logs, making every step of asynchronous tasks clear and traceable. Additionally, HTTP APIs and a Python API are provided, enabling engineered integration for CRUD operations and intelligent retrieval of Memories.
Agent Chat Management: A new Launch button allows users to enter a chat interface consistent with the Chat design, where all conversation histories are preserved.
Manageable Knowledge Base Metadata: Supports batch management of metadata for multiple files, enhancing the user experience when configuring application metadata.

We will now delve into each of these features and improvements.

Memory management

Output memory extraction log

We have added extraction status and comprehensive log display to the Memory console.

Each Memory extraction, embedding, and storage operation is no longer a black-box process. It is now transparently shown to developers: when it started, the processing steps involved, whether it succeeded, and when it was written to storage.

This feature significantly enhances the system's observability, allowing developers to clearly understand how Memories are generated.

Manage management API

To facilitate developer integration, Memory now supports two invocation methods: RESTful API and Python SDK. These interfaces enable comprehensive management of Memories and their entries, including:

CRUD Operations: Create, query, update, or delete Memories; add memory entries; modify the enabled/disabled status of entries within a Memory; or manage the forgetting of entries no longer needed.
Session Management: Organise and retrieve relevant memories by session.
Intelligent Retrieval: Quickly locate memories using keyword and semantic similarity searches.

Whether you are building a conversational chatbot or a complex application requiring state persistence, you can seamlessly integrate Memory through straightforward API calls. For detailed usage and parameter descriptions, please refer to the HTTP API and Python API.

New Agent chat management interface

Previously, developers primarily used RAGFlow's Agent in two ways:

Building a custom frontend application to interface with the API
Publishing it for external use via an embedded page

The first approach offers flexibility but requires frontend development effort. The second has a lower barrier to entry, yet it presents a drawback for users wishing to use the Agent continuously: it does not retain historical chat records, nor does it allow users to resume previous conversations.

This release addresses the issue. Clicking the Launch button at the top of the Agent page now opens a chat interface similar to that of a Chat assistant application.

Within this interface, multiple conversations and their chat histories are preserved.

Batch metadata management

In previous versions of RAGFlow, maintaining metadata was a rather cumbersome process for developers. The platform now supports batch metadata management, allowing multiple files to be selected and their metadata managed simultaneously.

Additionally, batch deletion of metadata for a single file is now supported directly from the interface.

Previously, configuring metadata filtering logic within a chat application required manually entering filter values. With this update, RAGFlow has replaced this with a dropdown selection box, streamlining the configuration process for developers.

Chat's Thinking mode

The chat application has introduced a thinking mode to replace the previous reasoning toggle in the configuration. The underlying algorithmic strategy has been optimised for deep research scenarios, further improving retrieval precision.

Support for multiple admin users

In previous versions, superuser privileges were concentrated by default in the admin@ragflow.io account. While this is acceptable for personal use, in an enterprise environment a single superuser often creates a bottleneck and concentrates authority.

With this update, RAGFlow now supports assigning superuser privileges to multiple users, moving beyond the limitation of a single admin account.

Finale

This update to RAGFlow 0.24 aims to further assist developers in utilising and consolidating data assets within the platform:

Memory, as a long-term system for accumulating agent data assets, needs to provide developers with more convenient access and debugging capabilities.
A knowledge base should be more than just storage for files; it should be a governable data asset, requiring low-cost metadata management capabilities.

At the ecosystem level, this release adds support for OceanBase as the primary database, which can replace MySQL in deployments. It also introduces support for PaddleOCR-VL, further enhancing document and visual text processing capabilities.

In upcoming versions, RAGFlow will begin integrating agent skills, helping developers build agent systems with higher execution accuracy and greater stability.

Reference

RAGFlow 0.23.0 — Advancing Memory, RAG, and Agent Performance

Wed, 31 Dec 2025 00:00:00 GMT

The hallmark of this release is the introduction of a new Memory module, empowering AI Agents with lasting memory. This core advancement not only enables real‑time retrieval of the most task‑relevant historical experiences, but also supports the continuous, structured accumulation and optimization of knowledge assets—laying the foundation for an intelligent core capable of autonomous evolution. This version also brings significant enhancements across multiple fronts:

Enhanced Agent Capabilities:
Refactored underlying architecture for faster performance; added support for Webhook‑triggered automation; and opened voice input/output interfaces.
Upgraded RAG Performance:
Introduced batch metadata generation during document parsing; added contextual text around extracted images and tables within slices for richer semantics; and implemented a new Parent‑Child slicing strategy.
Improved APIs:
The Agent API now returns detailed execution logs, enabling developers to display reasoning processes on the front end. The Chat API offers enhanced metadata filtering for more precise answer control.

We will now delve into each of these features and improvements.

Memory management and usage

Memory is designed to preserve dynamically generated interaction logs and derived data during Agent operation (such as user inputs, LLM outputs, potential interaction states, and summaries or reflections generated by the LLM). Its purpose is to maintain conversational continuity, enable personalization, and facilitate learning from historical experience.

Beyond simply storing raw interaction data, this module is capable of extracting semantic memory, episodic memory, and working memory. Once stored, users can effortlessly reintroduce this information as part of the context in subsequent conversations, thereby enhancing both dialogue coherence and reasoning accuracy.

Manage memory

The Memory module supports convenient centralized management. Users can specify the types of memories to extract when creating a Memory and perform operations such as renaming and team sharing on the navigation page at Overview >> Memory. The image below illustrates the process of creating a Memory:

Within an individual Memory page, you can manage saved memory entries by enabling or disabling them during Agent calls, or by forgetting memory entries that are no longer needed.

Memory entries that have been actively forgotten will no longer appear in the results of Agent calls. When the storage capacity of the Memory reaches its limit and triggers the forgetting policy, those manually forgotten entries will also be prioritized for removal.

Enhanced Agent context

Under Agent >> Retrieval and Agent >> Message components, a new Memory invocation feature has been added. Users can configure the Message component to store data into a specified Memory and set up the Retrieval component to query from Memory. The diagram below illustrates how a simple Q&A bot Agent can utilize Memory.

In Agent components where Memory is configured, a Retrieval component must be included to recall information from Memory.

Select the corresponding Memory in the Message component under Save to Memory.

The Agent component must be configured with a User prompt to enable the Agent component to access information from the Retrieval component.

Memory lets Agent know you better

When a concept exists across multiple fields, the Agent may sometimes deviate from the user's actual intent and provide a response that is out of context.

For example, if a user asks, "What is Reflexion?" the Agent might default to explaining it from the perspective of cognitive science or neuropsychology, rather than addressing the Reflexion concept within the Agent/LLM reasoning framework that the user is actually interested in.

In such cases, the user can clarify and correct the Agent's understanding by explicitly indicating that this is a term from the field of Agent systems. This clarification will be recorded in the Memory as part of the user's long-term preferences and semantic context.

Afterward, when the user inquires about the same concept again, the Agent will automatically align with the user’s domain preference based on past interaction history, providing the definition and explanation of Reflexion directly from the Agent perspective—without requiring repeated clarification.

This example illustrates how Memory transforms the Agent from a one-time answering tool into an intelligent assistant that continuously calibrates its understanding and evolves alongside the user's interactions.

Enhanced Agent

Enhanced Agent performance

In previous versions, community feedback highlighted that while RAGFlow’s Agent demonstrated strong execution accuracy and generalization capabilities for complex tasks when configured with tools, its processing speed was slower and token consumption was higher for routine, less complex business scenarios.

This is because, in earlier versions, the Agent's underlying architecture was designed with a layered structure for handling more complex tasks, incorporating LLM-based planning and reflection.

In version 0.23.0, the Agent has been further optimized for simpler tasks, reducing the processing time by approximately 50% compared to the initially released Agent in version 0.20.0. These optimizations were primarily implemented at the architectural level, refining the execution flow of Agent components and the LLM invocation pipeline.

The following example demonstrates the construction of an Agent designed to search academic content.
In versions prior to 0.23.0, answering a conceptual question such as "What is Reflexion in the Agent domain?" took about 50 seconds:

In version 0.23.0, the time required to generate a response of similar length has been reduced to 27 seconds:

Developers can also customize the Agent's workflow according to their specific business processes, such as strictly defining the sequence of tool calls, as shown in the diagram below.

The logs confirm that ArXiv was indeed invoked first:

Then followed by Tavily Search:

The total runtime of the Agent remains within the range of 30 seconds.

Workflows triggered using Webhook

In real‑world business scenarios and development needs, it is common to rely on external systems triggering operations via HTTP requests—typical use cases include user form submissions, chatbot message pushes, and similar events.

In version 0.23.0, RAGFlow’s Agent now supports receiving external HTTP requests through Webhooks, enabling automated triggering and the initiation of workflows. This enhancement significantly improves the system's flexibility and extensibility.

As shown below, the system automatically generates a unique Webhook URL for each workflow. Users can simply use this URL to seamlessly integrate external systems and achieve automated connectivity.

On the Webhook configuration page, users can directly copy the generated Webhook URL to configure third‑party systems or API calls, and select the desired HTTP request method to receive the trigger.

Security configuration

Webhook provides multi‑level security authentication and traffic control mechanisms to ensure data transmission security and system stability.

It supports three authentication methods to meet different security requirements: Token-based authentication, Basic auth, and JWT authentication.
Additionally, it includes request rate limiting, maximum request body size restrictions, and IP whitelist configuration, effectively preventing malicious access and traffic overload to maintain reliable and stable service operation.

Schema configuration

Schema is used to specify the data format of the HTTP requests that the Webhook will receive, allowing RAGFlow to parse the requests accordingly. Schema configuration includes the Content types and query parameters of the HTTP request. Currently supported content types are as follows:

Response configuration

Two execution models are provided: Accepted response and Final response:

Accepted response: Once the request passes validation, a success acknowledgment is returned immediately, and the workflow executes asynchronously in the background.
Final response: The system returns the final processing result only after the workflow execution is completed.

Accepted response is configured via the Begin component: users can customize the success response status code (within the range of 200–399) and the response body content.

Final response is configured through the Message component, allowing users to customize the success response status code within the 200–399 range.

Test run and check the log

During the test run phase, the system uses a dedicated test URL for triggering.
Users can monitor the real‑time execution status of the entire workflow and overview the overall inputs and outputs on the Test run page.
Simultaneously, the Logs module records detailed execution processes and log information for each component, facilitating troubleshooting and debugging.

Global conversation variables

In real development scenarios, there is often a need to reuse intermediate variables within an Agent process. For example, during a conversation, determining whether the user’s current question has already appeared in a previous round requires saving the result of the earlier query into a variable and reusing or comparing it later in the workflow.

In RAGFlow, a Conversation variable can be defined as a global variable to store an Agent’s intermediate output, making it available for reuse throughout the conversation.

For example, you can define a global variable named his_query to store the user's previous question.

And assign a value to the global variable using the Variable assigner component.

Structured Agent component output

In earlier versions of RAGFlow, component outputs were limited to simple text or unstructured data, which constrained the precise parsing and utilization of information by subsequent components.

RAGFlow 0.23.0 introduces support for structured component output, enabling data to be passed in a standardized, formatted manner (such as JSON objects) that can be read and processed as variables by subsequent components.

In the Agent component's structured output configuration, you can add field names, types, and descriptions, and the system will automatically generate the corresponding JSON Schema. For example, after adding an author field, the JSON Schema is as shown above.

When executed, the output will follow this structured format and be passed as a variable to downstream components, ensuring accurate reading and processing. A sample result is shown below:

Voice input and output

RAGFlow version 0.23.0 supports end-to-end voice input and output, enabling the development of voice-enabled Agent applications such as digital humans. Voice functionality is available in both the Chat and Agent modules.

The voice input and output operation entry points in the Chat module are as follows:

The operation entry points for voice input and output in the Agent module are as follows:

Enhanced RAG

Auto-metadata

In the past, RAGFlow’s support for metadata was relatively limited, offering only manual editing of metadata at the individual file level.

Therefore, RAGFlow has introduced the Auto-metadata feature. Compared to the previous method of manually adding metadata to each file, users can now leverage large language models to automatically extract and generate metadata for documents.

By utilizing metadata effectively, it plays a dual role in the RAG workflow:

During retrieval: It filters out irrelevant files, narrowing the search scope to improve recall accuracy.
During generation: If a chunk is recalled, its associated metadata is passed to the LLM along with the text, providing richer contextual information about the document and enabling more informed responses.

Configuring auto-metadata generation rules

On the Knowledge Base Configuration page, start by selecting the Indexing model. This model will be used to generate the current knowledge base’s Knowledge Graph, RAPTOR, Auto-Metadata, Auto-Keyword, and Auto-Question.

Click Auto-metadata → Settings to enter the rule configuration page for automatic metadata generation.

Click + to add new field to enter the configuration page for that field.

Enter the field name, such as Author, and provide a description and example in the Description field to guide the large language model in accurately extracting values for this field. If this field is left blank, the model will generate values based solely on the field name. If you need to restrict metadata generation to a predefined set of values, you can enable Restrict to defined values mode and manually specify the allowed values, as shown in the image. In this case, the model will only output results within the preset range. Once configured, turn on the Auto-metadata toggle on the Configuration page. All newly uploaded files will have metadata automatically generated according to this rule during parsing. For files that have already been parsed, you can trigger metadata generation by reprocessing them. Users can then use the filtering function to check the metadata generation status of each file.

Manage metadata

The system supports metadata management at both the overall knowledge base level and the individual file level.

To access the metadata management interface, click Metadata within the knowledge base and enter the Manage Metadata page.

Here, you can centrally manage all generated metadata. You can modify existing values, and if you change two values to the same name, they will be automatically merged. You can also delete specific values or entire fields—all such operations will affect all associated files.

More refined management features will be rolled out in the future, such as setting specific generation rules for selected files, or manually adding metadata to files in batches.

For individual file management, as shown in the figure below, first click the parsing method of the file (e.g., "General"), then enter Set Metadata to view and edit its metadata. Users can add, remove, or edit metadata fields for that specific file. Any edits made here will be reflected in the global statistics of the knowledge base metadata.

Filter metadata

The filtering functionality is divided into two types: filtering at the knowledge base management level and filtering during the retrieval phase. By clicking the filter button within a knowledge base, you can view the number of files associated with each existing metadata field value. Checking a specific value will then display all the corresponding linked files.

You can also use “No metadata” to filter out files that have not yet generated metadata, allowing you to trigger bulk re-parsing for automatic metadata generation.

Similarly, metadata filtering is supported during the retrieval phase. Taking Chat as an example, after configuring a knowledge base, you can set metadata filtering rules as follows:

Automatic mode filters results based on the user’s query and the existing metadata in the knowledge base.
Semi-automatic mode allows users to first define the filtering scope at the field level (e.g., “Author”) before automatic filtering is applied within that range.
Manual mode gives users full control to set precise, value-level filtering conditions and supports various operators such as: Equals, Not equals, In, Not in, and more.

Add context to images and tables

RAGFlow utilizes built-in DeepDoc and external document models such as MinerU and Docling to parse document layouts.

In previous versions, images and tables identified via document layout analysis existed as independent Chunks. If retrieval did not match the content of the image or table itself, these elements would not be recalled. However, in real documents, charts and surrounding text are often arranged together, with the text providing descriptions or explanations of the chart content. Therefore, the ability to recall charts based on this contextual text is essential.

To address this, RAGFlow 0.23.0 introduces the Image & Table Context Window feature. Drawing inspiration from the research-oriented open-source project RAG-Anything, which specializes in multimodal RAG, this function allows text and its associated chart to be placed within the same Chunk—based on a user-configurable context window size—so they can be retrieved together. This enhancement significantly improves the accuracy of chart retrieval in practice.

As illustrated above, the value in the red box indicates that approximately N tokens of text above and below the chart will be extracted and inserted into the Chunk of type image/table as contextual supplementation for the chart.
During extraction, boundaries are optimized based on punctuation to preserve semantic coherence.
Users can adjust the number of contextual tokens according to their needs.

Child chunking

In practical applications of RAG, a long-standing challenge has been the structural contradiction in the traditional “chunk-embed-retrieve” pipeline: a single text chunk is tasked with both semantic matching (recall) and contextual understanding (utilization), two inherently conflicting objectives—recall demands fine-grained, precise positioning, while answer generation requires coherent and complete contextual information.

To resolve this tension, RAGFlow previously introduced the Table of Contents (TOC) enhancement, which leverages a large language model to generate a document's table of contents. During retrieval, the TOC helps automatically supplement missing context caused by chunk segmentation.

In version 0.23.0, this capability has been systematically integrated into the ingestion pipeline. Additionally, the release introduces a Parent-Child Chunking mechanism, which also addresses this contradiction in part: under this mechanism, a document is first divided into larger parent chunks, each preserving a relatively complete semantic unit to ensure logical and contextual integrity. Each parent chunk can then be further subdivided into multiple child chunks, designed for precise recall.

During retrieval, the system first locates the most relevant text segment based on child chunks, while automatically associating and retrieving its corresponding parent chunk. This approach maintains high recall relevance while providing sufficient semantic context for the generation phase.

For example, when processing a Compliance Handbook, if a user asks about "liability for breach of contract," the system might accurately retrieve a child chunk stating "the penalty for breach is 20% of the total contract value," but lack the context to clarify whether this applies to "general breach" or "material breach."

With the parent-child chunk mechanism, while returning that child chunk, the system also brings back the parent chunk containing the complete section of that clause, enabling the LLM to make accurate judgments based on fuller context and avoid misinterpretation.

Through this dual-layer structure of “precise positioning + contextual supplementation,” RAGFlow significantly enhances the reliability and completeness of generated answers while ensuring retrieval accuracy.

To use parent-child chunking, enable the “Child chunks are used for retrieval” toggle on the Knowledge Base Configuration page and set the delimiter for child chunks.

When child chunking is enabled, the logic for Recommended chunk size switches to the size of the parent chunk. You may appropriately increase this setting (e.g., to 1000–2000 tokens) to allow the LLM to retrieve more context when generating responses. The same applies to configurations within the ingestion pipeline.

Please note that the parent-child chunking feature and the TOC enhancement address the structural tension between “recall accuracy” and “context completeness” in RAG—from the perspectives of rule-based and model-based approaches, respectively. Both are transitional solutions designed to provide developers with ready-to-use optimization tools.
In future versions, RAGFlow will continue to advance its underlying capabilities, striving to resolve this challenge in a more systematic and fundamental manner at the architectural level.

API updates

Agent API returns trace logs

The RAGFlow Agent API now supports returning detailed execution trace logs, helping developers debug and analyze the execution flow of Agents.

Supported API

POST /api/v1/agents/{agent_id}/completions

Considerations

Enable trace logs by setting return_trace: true.
Trace logs include the execution time, inputs and outputs, and error messages for each component.
In streaming responses, trace information is returned via the component_finished event.
In non‑streaming responses, trace information is available in data.data.trace.

Curl example

Converse with Agent (with Trace logs)

curl --request POST \
     --url http://{address}/api/v1/agents/{agent_id}/completions \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ' \
     --data '{
        "question": "How to install neovim?",
        "stream": true,
        "session_id": "cb2f385cb86211efa36e0242ac120005",
        "return_trace": true
     }'

Chat API supports metadata filtering

Supported APIs

POST /api/v1/chats/{chat_id}/completions
POST /api/v1/chats_openai/{chat_id}/chat/completions

Curl example

Chat with a chat assisant (with metadata filtering):

curl --request POST \
     --url http://{address}/api/v1/chats/{chat_id}/completions \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ' \
     --data '{
          "question": "Who are you",
          "stream": true,
          "session_id":"9fa7691cb85c11ef9c5f0242ac120005",
          "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "author",
                "comparison_operator": "is",
                "value": "bob"
              }
            ]
          }
     }'

OpenAI-compatible API (with metadata filtering)

curl --request POST \
     --url http://{address}/api/v1/chats_openai/{chat_id}/chat/completions \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ' \
     --data '{
        "model": "model",
        "messages": [{"role": "user", "content": "Say this is a test!"}],
        "stream": true,
        "extra_body": {
          "reference": true,
          "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "author",
                "comparison_operator": "is",
                "value": "bob"
              }
            ]
          }
        }
      }'

See the corresponding API reference for details.

Finale

As the final update of RAGFlow in 2025, version 0.23.0 stands as a significant milestone that bridges past accomplishments with future ambitions. It marks the evolution of RAGFlow from a RAG engine focused primarily on knowledge-base scenarios into a contextual engine designed to support the construction of diverse AI Agents.

This release not only continues to strengthen core RAG capabilities but also introduces the all-new Memory module—designed to store and retrieve data generated during Agent interactions, seamlessly integrating it into ongoing contexts. The data preserved in Memory enables conversational systems to accumulate experience, gradually forming a self-reinforcing cycle of continuous improvement.

In the Agent architecture, if models are responsible for reasoning, and RAG and tools represent action, then Memory serves as the soul—carrying personalization and historical continuity. Our exploration of Memory is just beginning. Moving forward, we will advance towards more granular memory management and build effective mechanisms for Skills and Guidelines centered around tool usage.

As the foundational data platform for large language models, RAGFlow remains committed to its original vision: not only providing a complete no-code platform for Agent development but also striving to become the essential data cornerstone for all Agent systems—regardless of the tools used to build them.

We invite you to continue following our progress and to star our project, as we grow and evolve together. https://github.com/infiniflow/ragflow

From RAG to Context - A 2025 year-end review of RAG

Mon, 22 Dec 2025 00:00:00 GMT

As 2025 draws to a close, the field of Retrieval-Augmented Generation (RAG) has undergone profound reflection, vigorous debate, and marked evolution. Far from fading into obsolescence as some bold predictions foresaw—amid lingering scepticism over its supposedly transient role—RAG has solidified its indispensability as a cornerstone of data infrastructure in the demanding arena of enterprise AI adoption.

Looking back, RAG's trajectory this year has been complex. On one hand, its practical effectiveness faced significant skepticism, partly due to the "easy to use, hard to master" tuning challenges inherent to RAG systems. On the other hand, its share of public attention seemed to be overshadowed by the undisputed focus of 2025's LLM applications: AI Agents.

However, an intriguing trend emerged. Despite the controversies and not being in the spotlight, enterprises genuinely committed to building core AI competencies—especially mid-to-large-sized organizations—deepened and systematized their investments in RAG. Rather than being marginalized, RAG has solidified its core role in enterprise AI architecture. Its position as critical infrastructure remains unshaken, forming the robust foundation for enterprise intelligence.

Therefore, we must first move beyond surface-level debates to examine the intrinsic vitality of RAG technology. Is it merely a transitional "band-aid" to patch LLM knowledge gaps, or is it an architecture capable of continuous evolution into a cornerstone for next-generation AI applications? To answer this, we must systematically review its technical improvements, architectural evolution, and new role in the age of Agents.

Can RAG still be improved?

The debate about long context and RAG

In 2025, the core of many RAG debates stems from a widely acknowledged contradiction: enterprises feel they "cannot live without RAG, yet remain unsatisfied." RAG lowers the barrier to accessing private knowledge, but achieving stable and accurate results—especially for complex queries—often requires extensive, fine-tuned optimization, complicating total cost of ownership assessments.

Consequently, the theoretical question heatedly discussed in 2024—"Can Long Context replace RAG?"—rapidly entered practical testing in 2025. Some scenarios less sensitive to latency and cost, with relatively fixed query patterns (e.g., certain contract reviews, fixed-format report analysis), began experimenting with directly using long-context windows. They feed entire or large batches of relevant documents into the model at once, hoping to bypass potential information loss or noise from RAG retrieval and directly address inconsistent conversational quality.

However, research since 2024 offers a clearer picture of the technical comparison. Mechanically stuffing lengthy text into an LLM's context window is essentially a "brute-force" strategy. It inevitably scatters the model's attention, significantly degrading answer quality through the "Lost in the Middle" or "information flooding" effect. More importantly, this approach incurs high costs—computational overhead for processing long context grows non-linearly.

Thus, for enterprises, the practical question is not engaging in simplistic debates like "RAG is dead," but returning to the core challenge: how to incorporate the most relevant and effective information into the model's context processing system with the best cost-performance ratio. This is precisely the original design goal of RAG technology.

Improved long-context capabilities have not signaled RAG's demise. Instead, they prompt deeper thinking about how the two can collaborate. For example, RAG systems can use long-context windows to hold more complete, semantically coherent retrieved chunks or to aggregate intermediate results for multi-step retrieval and reflection. This "retrieval-first, long-context containment" synergy is a key driver behind the emerging field of "Context Engineering." It marks a shift from optimizing single "retrieval algorithms" to the systematic design of the end-to-end "retrieval-context assembly-model reasoning" pipeline.

Currently, paradigms for providing external knowledge to LLMs mainly fall into four categories:

Relying solely on LLM's long-context capability.
Utilizing KV Cache.
Using simple search methods like Grep.
Employing a full RAG architecture.

Cost-wise, there is roughly a two-order-of-magnitude gap between option 1 and option 4. Option 2 involves processing all document data through an LLM's forward pass into tensors stored in a dedicated KV Cache system. Its cost remains at least an order of magnitude higher than RAG. Even upgrading KV Cache to an integrated database-and-inference engine (as advocated by AlayaDB [Ref 1]) still faces fundamental technical limitations:

The data volume vs. retrieval performance dilemma: If preprocessed tensor data exceeds GPU memory capacity, the system must introduce secondary storage and corresponding retrieval mechanisms for on-demand loading. This turns inference into a "generate-while-search" process, imposing extremely stringent I/O latency and retrieval speed requirements. To meet ultra-low latency, the entire dataset and LLM inference service often must be tightly co-located on the same physical machine or within a high-performance computing cluster's close network domain. This greatly sacrifices architectural flexibility and scalability, limiting deployment options.
Scenario limitations: This method primarily suits relatively static, fact-based text libraries. It struggles to flexibly and cost-effectively incorporate complex, evolving business rules, real-time knowledge updates, and diverse composite queries common in enterprises.
Technology convergence trend: This method does not conflict with RAG. Even if it becomes practical, it will likely be absorbed into the RAG architecture as an optimized component.

Option 3 (index-free RAG) gained some attention after Claude Code introduced it for its coding assistant Agent. A simple question arises: in specific domains, can basic string matching (Grep) replace complex retrieval-based RAG architecture? For well-organized, pure-format, fixed-terminology codebases or highly structured text data (e.g., log files), rule-based Grep or keyword search might deliver decent results at very low cost, saving index construction and maintenance overhead.

However, for the vast majority of enterprise multi-modal, unstructured, or semi-structured data (e.g., product manuals, meeting notes, design drawings, reports with tables and images), this method fails completely. In fact, even for seemingly regular code search, leading products like Augment Code avoid simple Grep. Instead, they fine-tune specialized Embedding models for code semantics. The reason is that providing effective context for a coding assistant requires not just exact string matching (finding function names) but also semantically similar code snippets (different implementations of similar functions), related API documentation, and code block dependencies.

Furthermore, diverse natural language queries, high concurrency, low-latency responses, and filtering/sorting with business metadata—all essential for enterprise applications—are far beyond Grep's scope. Therefore, RAG and its predecessor—enterprise search engines—remain comprehensive technical and architectural solutions for complex enterprise needs. Their value lies in providing systematic, scalable, and governable knowledge access and management capabilities.

Optimizations for RAG conversational quality

Returning to RAG's core, a common source of inaccurate or unstable answers lies in a structural conflict within the traditional "chunk-embed-retrieve" pipeline: using a single-granularity, fixed-size text chunk to perform two inherently conflicting tasks:

Semantic matching (recall): For high-precision recall in semantic similarity search, smaller chunks (e.g., 100-256 tokens) are needed. This provides clear semantic focus and reduces irrelevant information.
Context understanding (utilization): To provide sufficiently complete and coherent context for the LLM to generate high-quality answers, larger chunks (e.g., 1024+ tokens) are needed to ensure logical completeness and sufficient background.

This forces system designers into a difficult trade-off between "precise but fragmented" and "complete but vague," often sacrificing one for the other. Small chunks may retrieve fragmented information, preventing the LLM from grasping the full picture. Large chunks may introduce noise, reduce retrieval precision, and lose internal detail differentiation because the vector representation summarizes the entire chunk.

A fundamental improvement is to decouple the RAG process into two logical stages—"Search" and "Retrieve"—allowing different text granularities for each:

**Search: Analogous to "scanning" or "locating," its core goal is to quickly and precisely identify all potentially relevant "clues" from massive data. This stage should use smaller, semantically pure text units for high recall and precision.
Retrieve: Analogous to "reading" or "understanding," its core goal is to assemble "reading material" for the LLM to generate answers. Based on clues from the Search stage, this stage should dynamically aggregate, stitch, or expand into larger, more complete, and coherent context fragments.

RAGFlow's TreeRAG technology embodies this idea. It uses an LLM in an offline phase to automatically analyze documents, constructing a hierarchical tree-like directory summary structure. This cleverly bridges the gap between "fine-grained search" and "coarse-grained reading." Its core workflow reflects the wisdom of "locate precisely first, then expand to read":

Offline processing (knowledge structuring): After document splitting, chunks are sent to an LLM for analysis, generating a multi-level tree directory summary (e.g., Chapter -> Section -> Subsection -> Key Paragraph Summary). Simultaneously, each node or original chunk can be enriched with semantic enhancements like summaries, keywords, entities, potential questions, metadata, and associated image context descriptions. Research [Ref 3] shows that during offline processing, fine-grained chunking requirements can be relaxed. Overly complex splitting can disrupt semantic units; simple, overlapping chunking is often more effective and robust in practice.
Online retrieval (dynamic context assembly): Upon receiving a user query, similarity search is first performed using the finest-grained "small fragments" (original chunks or their summaries) for fast, precise initial recall. Then, leveraging the offline-built "tree directory" as a navigation map, the system quickly locates parent, sibling, and neighboring nodes of the recalled chunk nodes. It automatically combines these semantically related fragments into a logically complete "large fragment" for the LLM. This effectively mitigates context fragmentation caused by fixed-size chunks, ensuring the material provided to the model contains both precisely matched core information and the surrounding context needed to understand it.

Similar work includes PageIndex [Ref 2], which focuses more on parsing and leveraging a document's inherent physical or logical table of contents (e.g., in PDFs). This method is efficient when document structure is clear and format is standard. Its limitation is heavy reliance on source document quality; its auto-location capability diminishes when documents lack clear or accurate TOCs.

TreeRAG and similar technologies effectively address the "Lost in the Middle" pain point caused by poor chunking. However, for more complex queries—where answers are scattered across dozens of non-adjacent chunks or require synthesizing information from multiple independent documents for reasoning—tree structures alone may not capture all relevant associations. The industry naturally turned to another technical path: GraphRAG.

GraphRAG extracts entities and relationships from documents to build a knowledge graph, using graph queries and reasoning to discover indirectly related information fragments. Yet, GraphRAG has also left many practitioners with mixed feelings since its inception, due to key challenges:

Massive token consumption: Entity extraction, deduplication, and community summarization can consume several to dozens of times more tokens than the original text.
Gap between expected and actual entity extraction quality: Traditional knowledge graphs, meticulously designed and validated by domain experts, offer high quality for direct visual analysis and interaction. In contrast, GraphRAG's automatically extracted entities and relations often contain significant noise, redundancy, or errors. Judging them by visual knowledge graph standards leads to disappointment—"visually appealing but impractical."
Knowledge fragmentation: Even after graph algorithms discover related communities and generate summaries, the output remains a collection of "knowledge fragments" around specific topics. Generating final answers from these discrete fragments places high demands on the LLM's integration and coherent narration abilities, often resulting in answers lacking a logical thread or missing crucial aspects.

Therefore, combining the strengths of TreeRAG and GraphRAG holds promise for further alleviating RAG's pain points. TreeRAG excels at resolving local semantic breaks caused by physical chunking, providing coherent context. GraphRAG, based on entity-relationship networks, can use graph traversal algorithms (e.g., Personalized PageRank) to help discover content fragments semantically highly related but physically distant in the original documents, even across documents.

Thus, whether TreeRAG, GraphRAG, or their hybrid architecture (collectively termed "Long-Context RAG"), the core paradigm introduces an additional semantic enhancement and structure-building layer atop the traditional RAG pipeline. During the data ingestion stage, beyond chunking, LLMs are leveraged for deep semantic understanding, summary generation, and structure extraction. During the retrieval stage, it moves beyond mere search, incorporating navigation, relational queries, and dynamic assembly based on predefined structures (tree, graph).

RAG technology is far from stagnant. Its improvement direction is increasingly clear: leverage LLMs' own capabilities to bridge the semantic gap between raw data and final Q&A early in the data lifecycle. During ingestion, use prompts with different instructions to request LLMs to analyze text from multiple angles and rounds, extracting rich, multi-layered semantic information (metadata). During retrieval, use this enhanced semantics as a "navigation map" to intelligently filter, aggregate, and fill context, going beyond simple vector similarity search. This is the more reliable path for tackling complex, open-domain Q&A challenges.

Products like RAGFlow are deeply exploring this direction, with future evolution centered on these core points. In short, modern RAG system design philosophy is to fully coordinate "powerful retrieval and reasoning capabilities" with the "limited yet precious LLM context window." Through intelligent preprocessing and dynamic assembly, it seeks the optimal balance between effectiveness, performance, and cost.

From knowledge base to data foundation

We often emphasize that RAG is an architectural paradigm, not a specific application. Yet, knowledge bases are undoubtedly RAG's most intuitive and successful application form. With the rapid rise of AI Agent development, a clear trend is emerging: Agents' complex task execution increasingly relies on real-time access and understanding of massive, diverse enterprise data.

Consequently, enterprise-grade RAG products are evolving beyond the singular "Q&A knowledge base" role towards a more foundational, general-purpose Agent data foundation. They need to serve as the unified, efficient, and secure access service for unstructured data for all types of Agents.

To achieve this, a robust, scalable, and configurable Ingestion Pipeline has become an indispensable core component of modern RAG engines. It handles the entire process of "taking over" the complex, unstructured data within an enterprise (documents, images, audio/video, code, emails, etc.) and "digesting" it into a standard format indexable and retrievable by the RAG engine.

If ETL/ELT (represented by tools like dbt, Fivetran, Airbyte) is the industrialized standard pipeline for processing structured data in modern data stacks—providing unified, flexible, and reliable data integration for data warehouses and lakes—then the Ingestion Pipeline for unstructured data (or PTI: Parse-Transform-Index) is its equivalent, critical infrastructure for the AI era. They share similar architectural positioning and processing philosophy but differ in technical methods due to data characteristics.

Examining each stage's similarities and differences:

Extract vs. parsing: Traditional ETL/ELT's "Extract" stage primarily pulls data from structured or semi-structured sources like relational databases, APIs, and log files. The Ingestion Pipeline, building on this, emphasizes the "Parsing" stage to tackle the challenge of information extraction from unstructured data. It employs various parsers: format parsers for PDF/Word, specialized document intelligence models like RAGFlow's DeepDoc, and OCR models fine-tuned on Vision-Language Models (VLM). The goal is to convert multi-modal, multi-format raw documents into clean, structured text representations, preserving original logical structure (headings, lists, tables) and metadata (author, date) as much as possible.
Transform: This stage shows the greatest divergence. Traditional ETL/ELT "Transform" focuses on data cleaning, format standardization, and business logic computation (aggregation, joins) using SQL or code. The Ingestion Pipeline's "Transform" stage is a series of semantic understanding and enhancement components centered around the LLM. They convert the raw text stream from the Parser into the advanced "materials" needed for retrieval. Beyond basic text chunking, all processes aimed at improving retrieval effectiveness and precision—like tree structure generation, knowledge graph (entity-relation) extraction, summary generation, question generation, and keyword extraction—occur here. This stage is key to injecting "intelligence," determining the depth of data "understanding."
Load vs. indexing: Traditional ETL/ELT loads (Load) processed clean data into target data warehouse/lake tables. The RAG engine, via its "Indexing" component, builds efficient indexes from the rich content produced in the Transform stage (original text chunks, summaries, vectors, metadata, graph relations). This supports various retrieval methods. The RAG engine is essentially a high-concurrency, low-latency serving layer. It must support hybrid retrieval (vector + keyword + metadata filtering) and may integrate advanced capabilities like tree-based or graph-based queries mentioned earlier.

Whether ETL for structured data or PTI for unstructured data, the intermediate "Transform" step is the core value-creation stage. For ETL, engineers design transformation logic (SQL) based on specific business analysis needs. For PTI, because the Ingestion Pipeline must be co-designed end-to-end with final retrieval requirements, decisions on chunk granularity, types of enhanced semantics to generate, and index structure complexity in the Transform stage depend on the precision, speed, and cost requirements of the upper application (e.g., Q&A, summarization, analysis).

Therefore, building an excellent RAG system is not merely about choosing the most advanced Parser model or the highest-performance vector database. It hinges on the collaborative design and continuous tuning of the entire "Data Ingestion -> Semantic Enhancement -> Index Building -> Retrieval Service" pipeline.

Equipped with such a powerful, flexible, and intelligent Ingestion Pipeline, a RAG system truly evolves from a "Q&A system" into a unified processing and access platform for enterprise unstructured data. It can standardize and automate the digestion of scattered internal knowledge assets, transforming them into readily available "fuel" for AI Agents.

This explains why, today, when enterprises commit to building a unified AI capability platform, its core and foundation must be such a RAG-engine-centric data foundation with robust data processing capabilities. It provides the single source of truth and real-time updated knowledge for all enterprise AI applications.

From RAG to context

In 2025, the most dynamic and attention-grabbing aspect of LLM applications has undoubtedly been various AI Agents. From an Agent framework perspective, RAG can be seen as a tool providing external knowledge access—similar to tools for calling a calculator, querying a weather API, or operating a database.

This tool-centric view, coupled with the rise of "Agentic RAG" (using Agent planning, reflection, etc., to enhance the RAG process itself), fueled narratives in 2025 suggesting "RAG will be replaced by more general Agents" or "RAG is just another ordinary Agent tool."

However, this view oversimplifies and potentially misunderstands RAG's fundamental value and position within the Agent ecosystem. It overlooks the essence of RAG architecture—its core capability and value foundation lies in Retrieval, not merely "Augmented Generation."

It is precisely this core ability to "efficiently, accurately, and scalably retrieve relevant information from vast private data" that enables RAG to become, and is becoming, the most indispensable, foundational data foundation for Agents. Because no matter how intelligent an Agent is, the quality of its decisions and actions fundamentally depends on the quality and relevance of the Context it receives.

How to dynamically and intelligently assemble the most effective context for different tasks at different moments became the hottest technical exploration and guiding principle in the latter half of 2025, giving rise to Context Engineering.

Let's analyze the main data types and technologies involved in Context Engineering to understand why "retrieval" is at its absolute core:

Context Engineering has emerged as a distinct field because practice repeatedly proves: bluntly cramming all potentially relevant data into the LLM's context window is not only prohibitively costly but also severely impairs the LLM's understanding, reasoning, and tool-calling abilities due to information overload. Therefore, intelligent filtering, sorting, and stitching of context are essential.

A typical Agent context usually needs to carefully assemble three types of data:

Domain knowledge data: For the knowledge base serving as a core tool (i.e., the RAG system), retrieval result quality directly determines answer success. Returning too many irrelevant fragments introduces noise, disrupting answer generation. Missing key fragments inevitably leads to factual errors or incomplete answers. Thus, RAG itself can be seen as the earliest Context Engineering 1.0 practice for domain knowledge.
Tool Data: For other functional tools, especially the many tools packaged via standards like the Model Context Protocol (MCP), their description information (name, function, parameter specs) is itself part of the context. When only a few tools are available, their descriptions can be hardcoded in prompts. But in enterprise scenarios, internal services, APIs, and functions packaged via MCP can number in the hundreds or thousands. Manually or statically specifying which tools to call for each conversation is impractical. This "tool selection" problem was only deeply felt by some users by late 2025, even sparking clickbait articles like "MCP is Dead After Just One Year" [Ref 9]. In reality, the author misplaces the blame—the core issue is indiscriminately dumping all tool descriptions into the context, overwhelming the LLM with "choice paralysis," not the MCP protocol itself. MCP is an extremely important protocol aiming to standardize tool-calling interfaces, but it does not and cannot solve the decision problem of "which tool to choose in a specific situation." So, who should bear this decision responsibility? This is a missing key piece in most current Agent frameworks. The answer is again Retrieval. We need semantic search over all tool descriptions, combined with retrieval of "Skills," "Playbooks," or "Guidelines" formed from historical tool usage. This dynamically and precisely filters the most relevant, most likely-to-be-correctly-used tool subset and parameters for the current Agent's current task into the context. This far exceeds what current features like Anthropic Skills cover. How to close the loop on generating, retrieving, and using Skills/Playbooks/Guidelines and productize it is a core problem Context Engineering must solve. This is fundamentally a data infrastructure (Infra) problem, not merely a model capability issue.
Conversation and state data: Beyond static knowledge and tools, context needs to include dynamic data related to the current interaction: conversation history, user personalization/preferences, and some Agent internal state (e.g., human input in Human-in-the-loop). Managing and accessing this data is often termed "Memory," which became a standalone hot topic in early to mid-2025. But technically, the core capability of memory systems—storing, indexing, and retrieving historical interaction information—is not fundamentally different from RAG. It can be seen as a specialized retrieval system for a specific data source (conversation logs) and usage pattern (emphasizing temporal and conversational relevance).

Deconstructing context composition clearly shows that Context Engineering's core task is still retrieval based on the three major data sources Agents need:

Retrieval of enterprise-private, unstructured domain document data—i.e., RAG.
Retrieval of conversation history and state data generated during Agent interactions, especially LLM-generated content—i.e., Memory.
Retrieval of tool description and usage guide data from encapsulated enterprise services and APIs—termed Tool Retrieval. This data can also reside in a dedicated area like Memory.

Thus, in the AI Agent era, RAG technology will undeniably evolve. It is no longer just a step in "Retrieval-Augmented Generation." With "retrieval" as its core capability, expanding its data scope, it evolves into a Context Engine supporting all context assembly needs, becoming the unified Context Layer and data foundation serving LLM applications.

How is retrieval for Agents different from the search era?

Understanding this evolution requires contrasting the fundamental differences between retrieval paradigms in the traditional search era and the Agent era.

In the traditional search era (e.g., using Google or enterprise search), the initiator, executor, and consumer of retrieval were humans. The user poses a question, the search engine returns a list of relevant document links, and the user must open multiple links, read, compare, and synthesize information to form an answer or decision. This is a "human-computer synergy, human-led" loop.

In the Agent era, the initiator and primary consumer of retrieval is the Agent (LLM-driven). A typical workflow: the LLM first comprehends and decomposes a user's complex question (hypothesizing/planning), then automatically initiates one or multiple retrieval requests on the user's behalf. Next, it comprehends, verifies, and refines the raw retrieval results (reflection), finally stitching the processed information into the context for generating the final answer. The content of retrieval expands from single web pages/documents to include tool usage, memory segments, etc. The entire closed loop of "question understanding -> retrieval -> result processing -> context assembly" is automated by the Agent.

Therefore, retrieval systems for Agents face unprecedented new demands: extremely high request frequency (potentially one to two orders of magnitude higher than human requests in the traditional search era), diverse query types (semantic queries on documents, keyword matching for tools, parameter matching for tool usage guides, associative queries on memory), stringent latency requirements (directly impacting Agent response speed), and the need for tight coupling with the Agent's reasoning flow.

A standalone search engine or vector database is far from sufficient. It requires building an intelligent intermediate service layer atop storage and indexing. This layer understands Agent intent, dynamically coordinates retrieval requests to different underlying data sources (document stores, memory stores, tool libraries) based on context assembly strategies, and performs necessary fusion, deduplication, ranking, and formatting of results, finally packaging them into LLM-ready context.

This usage pattern means the most intricate and specialized "Context Engineering" part of Agent development—currently highly manual and hardcoded in prompts—has the potential to move towards declarative configuration or even automation. This could significantly reduce Agent development and maintenance complexity while improving consistency and reusability.

Tools also need searching

Tool diversity directly determines the breadth and depth of problems an Agent can solve. In simple demos or prototypes, a common approach is embedding descriptions of all available tools (usually natural language text) entirely into the Agent's prompt. However, when tool counts grow from a few to dozens or hundreds, the drawbacks become apparent.

First, it consumes massive precious context tokens that could otherwise hold more important task descriptions and intermediate results. Second, too many options burden the LLM's tool selection logic, increasing the probability of "hallucinated" calls or incorrect tool choices.

Especially in enterprise scenarios, tool numbers can reach stunning scale. In principle, nearly all existing internal services, databases, APIs, and workflows can be wrapped into standardized tool interfaces understandable and callable by Agents. This is what protocols like MCP aim to solve—becoming the "TCP/IP" for the Agent era, addressing connectivity.

But we must clearly realize the fact: MCP solves the protocol problem of "how to call," not the decision problem of "which one to call." Advanced models (like some SOTA models under development) might struggle to handle hundreds of tool descriptions in context, but this is hardly a cost-effective solution. Treating the context window as a scarce resource and minimizing the inclusion of ineffective or low-probability-use information is a crucial design principle for controlling costs and improving overall system effectiveness.

Consequently, Tool Retrieval began attracting attention from academia and industry by late 2025. The core idea is: establish a specialized index library (vector, keyword, or hybrid) for all tool descriptions. When an Agent needs to call a tool, it first generates a "query" targeting tool functionality based on the current conversation context and task objective. This query performs a quick search against the index library, recalling only the most relevant few (e.g., Top-3) tool descriptions. These are dynamically inserted into the current context for the LLM's final selection and calling.

Research [Ref 10] shows even simple BM25 keyword retrieval serves as a strong baseline for this task. Of course, using fine-tuned, dedicated embedding models yields more precise matches.

Beyond tool descriptions themselves, the Agent development, testing, and usage process naturally accumulate a series of meta-knowledge about "how to correctly use tools." For example: "When handling customer refund requests, sequentially call Tool A to verify the order, Tool B to check inventory, and Tool C to execute the refund, with parameter X from conversation history Y." Such "playbook" or "guide" text is richer and more instructive context material than tool descriptions.

These should also be categorized into the retrievable system: when an Agent faces a complex task, it can first retrieve relevant task guides, including them as part of the context. This allows the LLM to "take an open-book exam," planning actions following best practices.

The volume and data density of such tool usage guide data may far exceed tool descriptions themselves. Efficient utilization undoubtedly relies on retrieval technology. Early explorations like the GEPA optimizer in the Dspy framework [Ref 11] use genetic algorithms to evolve better prompts (possibly containing tool usage logic), but its focus is optimizing the prompt text itself.

More systematic work like the ACE (Agentic Context Engineering) framework [Ref 12] begins to study how to generate, manage, and leverage such instructive contexts in a structured way. While not this article's focus, we must recognize this represents an important direction for Context Engineering's deepening. The outcomes of such work will evitably need to be included into the system covered by Retrieval, providing context alongside tool descriptions.

Does memory deserve to be a separate infra?

"Memory" received far more attention than RAG in 2025's technical discussions, often listed as a core Agent infrastructure component in articles and product pitches, with even boundary-blurring views like "Memory will replace RAG." So, what exactly is Memory? How do the myriad open-source memory projects differ from and relate to RAG systems at their core?

Simply put, memory systems emerged initially to meet the need for effectively managing and reusing historical Agent interaction information. Its basic usage pattern is identical to RAG: when a new conversation occurs, the system retrieves relevant past conversation fragments (e.g., same user's prior questions, LLM's previous answers, user feedback) from stored history based on the current query. These fragments are then submitted alongside the new question to the LLM, helping the model maintain conversation coherence, remember user preferences, or avoid repetition.

Therefore, from functional interface and core technology (retrieval), Memory and RAG share the same core.

Their core distinction lies in data source and management goals:

RAG: Processes primarily pre-existing, relatively static enterprise private knowledge assets (documents, manuals, KB articles). Its goal is providing domain facts and background knowledge.
Memory: Processes primarily dynamically generated interaction logs and derived data from Agent operation (user input, LLM output, possible interaction state, LLM-generated summaries/reflections). Its goal is maintaining conversational continuity, enabling personalization, and learning from historical experience.

As research deepens, finer division of data organization within memory systems has emerged, borrowing ideas and concepts from cognitive science: Working Memory, Episodic Memory, Semantic Memory, Meta Memory, etc. These different "memory regions" can be likened to different tables or collections in a database.

Memory system complexity lies not just in storage and retrieval but more in memory management logic: When and how are newly generated raw conversations written to memory? When to trigger an LLM to summarize a conversation segment and store it in semantic memory? How to establish associations between different memory fragments?

These functions of "memory flow, processing, and lifecycle management" are identical to the Ingestion Pipeline in RAG systems, except they handle dynamically generated streaming data.

If we broaden our view, dedicating a region within the memory system (or establishing a closely linked knowledge base) to store and retrieve the tool usage guides mentioned above, then a blueprint for a complete data foundation supporting AI Agents emerges:

This blueprint clearly shows: Memory (handling dynamic interaction data) and RAG (handling static domain knowledge) are technically having the same source (both retrieval-based) and functionally complementary. Together, they constitute the complete data foundation upon which AI Agents depend.

All Agent data access needs—whether existing unstructured documents (via RAG), real-time generated interaction logs (via Memory), or structured service interfaces (packaged via MCP, with their metadata, usage guides, and tutorials retrievable in a dedicated knowledge base)—can be unified, governed, and accessed within an integrated platform.

This platform is precisely the direction systems like RAGFlow are evolving towards: a Context Engine or Context Platform. It is no longer an isolated retrieval tool but an infrastructure providing comprehensive, intelligent context assembly services for AI applications.

The forward-looking term "Context Platform" was likely first systematically elaborated by investment firm Theory Ventures in their series of articles [Ref 13, 14]. In fact, this visionary firm explicitly highlighted the fundamental importance of retrieval for LLM applications as early as 2024 [Ref 15]. Today, how to upgrade the technically-focused Retrieval Engine into a Context Engine is becoming the key determinant for whether AI Agents can achieve scalable, cost-effective enterprise adoption.

As the comprehensive analysis above shows, many so-called "Agent intelligence" problems are still essentially about finding and presenting the right information to the LLM at the right time. If an Agent has a powerful "external brain" (Context Engine) helping it efficiently search, filter, verify, and organize all relevant data, the final user experience and results will far surpass a "bare" model relying solely on vast parameters and complex prompts.

The evolution from Context Engineering to Context Engine/Platform signifies that context creation, management, and delivery are shifting from a highly expert-dependent, manual craft towards a highly automated, productized, and operational platform science.

Currently, developing custom Agents often requires significant engineering and data science effort to manually craft complex prompt templates, meticulously design context stitching logic, and hardcode them into workflow orchestration. This approach is hard to maintain, doesn't scale, and leads to chaotic context ownership and update processes.

Future Context Platforms aim to change this:

Context creation will be automated through deep integration with data sources, with continuous synchronization and updates.
Context delivery will no longer be hardcoded. Instead, based on real-time intent from each conversation, a platform's unified retrieval and orchestration layer will dynamically retrieve and assemble context from various data sources.
Context maintenance will shift from vendor-led manual services to automated, customer-managed, visually configurable processes. Context ownership and control will clearly belong to the customer.

This paradigm shift is massive. The core value of enterprise AI adoption is shifting from pursuing the "smartest model" and "most clever prompts" back to building the "richest, most accurate, most usable context."

Context quality, real-time nature, dynamic assembly capability, and productization level will directly determine the competitiveness of next-generation enterprise AI applications. Context product will be the final key to unlocking large-scale AI application. It marks the official transition of enterprise AI from the early "handcrafted customization" stage to the "platform, scalable, sustainably operable" industrial era.

In our 2024 year-end review, we predicted multimodal RAG based on "late interaction" would be a 2025 technical keyword. However, this prediction didn't fully materialize. Does this mean multimodal RAG lacks practical significance? The answer is clearly no.

Taking test results from M3Retrieve [Ref 4], a benchmark dataset designed for medical literature, as an example, the value and positioning of multimodal RAG are clearly demonstrated:

Excels in Text-Image Combined Tasks: In scenarios like visual context retrieval or image-based Q&A, multimodal RAG leveraging both text and visual information performs significantly better than text-only solutions.
Less Advantage in Text-Dominant Tasks: For tasks like summarization or pure-text patient record retrieval, mature single-modal RAG, with its efficiency and precision, still holds advantage.

Thus, production-grade multimodal RAG isn't simply stitching image and text models. Its core demand is achieving truly efficient cross-modal retrieval and understanding. Technologically, multimodal RAG is undoubtedly the inevitable direction for RAG systems to deepen and cover broader data types.

Currently, two main technical paths exist for handling images and other multi-modal documents:

Modality Conversion Path: Using OCR or VLMs to convert images, tables, etc., into pure text descriptions. This reduces the multimodal problem to a single-modal text retrieval problem. It's compatible with existing text RAG architecture but risks losing crucial raw visual layout, color, and finer characterics.
Native Multimodal Path: Directly tokenizing images visually, inputting a unified multimodal encoder alongside text tokens to generate fused multi-vector representations. During retrieval ranking, late interaction models enable fine-grained similarity calculation.

A theoretical guidance article from Google DeepMind in September this year [Ref 5] indicates: retrieval based on a single global vector has inherent semantic loss, while using multi-vectors (high-dimensional tensors) preserves document fine-grained semantic information more completely.

Given such clear theoretical advantages, why haven't mature, productized multimodal RAG solutions emerged in 2025? The fundamental reason is that engineering challenges from prototype to stable product are not fully overcome yet.

A primary challenge for engineering cross-modal retrieval is determining recall units and strategies. Using the document "page" as the recall unit is common. Specific implementation schemes include:

Hybrid Indexing Scheme: Build full-text and single-vector indices for text content, and separate tensor indices for images. During retrieval, perform hybrid search and fuse results from the three indices.
Text-First, Tensor Reranking Scheme: Use only text parsed from PDFs to build full-text and single-vector indices for initial recall. Then, use tensor representations generated from the entire page as an image to perform finer reranking of initial results. This scheme can also be used for pure text retrieval to achieve better semantic recall.
Pure Tensor Retrieval Scheme: Rely soly on tensor indices. For cross-modal pages, calculate similarity between the query and text tensors, and image tensors separately, taking the maximum as the page's final relevance score.

While technically feasible, these schemes share a tricky engineering bottleneck: exploding storage and computational costs from exploding tensor data due to massive token counts.

Assuming a model like ColPali outputs 1024 tokens per page image (each corresponding a feature vector), with vector dimension 128 and float32 precision, the tensor data for a single page occupies ~512KB. For a million-page document base, the index size expands to TB levels—a collossal challenge for storage cost, memory loading, and retrieval latency.

To advance multimodal RAG towards practical engineering, two main technical paths are currently available:

Tensor Quantization Compression: Binarize or apply low-bit quantization (e.g., 1-bit) to each vector in the tensor, compressing storage to 1/32 or less of the original. The cost is some precision loss. Feasibility depends on training specialized embedding models robust to quantization.
Token Pruning: Significantly reduce the number of tokens generated per page image, e.g., from 1024 to 128 or fewer. Specific methods include:
- Random Projection: Project vectors from many tokens into a single high-dimensional vector (e.g., 10k dimensions), as in the MUVERA algorithm [Ref 6]. Computationally efficient but loses significant fine-grained information.
- Token Clustering: Perform unsupervised clustering on vectors generated tokens, using cluster centers to represent the original set. This doesn't require model changes but also sacrifices precision.
- Model-Side Token Pruning: Modify the Embedding model (especially its underlying VLM) to actively output fewer but more informative "refined" tokens based on its internal attention mechanism. This is the most fundamental method.

Therefore, the maturation of multimodal RAG requires not only that the underlying retrieval engine natively and efficiently supports Tensor Index and Tensor Reranker components, but also relies on the broader AI community producing more next-generation multimodal Embedding models that are quantization-friendly and support adaptive token pruning.

These developments are synergistic: lacking a mature, easy-to-use Retrieval Engine hinders validation and iteration of new models in real scenarios; without efficient models, the Retrieval Engine has nothing to leverage. Fortunately, this topic is gaining widespread attention, and a workshop dedicated to late interaction is scheduled for early 2026 [Ref 7].

Looking ahead to 2026, as AI infrastructure layers improve support for tensor computation and storage, we can expect more superior multimodal models tailored for engineering to emerge, truly unlocking the practical potential of cross-modal RAG.

Its engineering implementation will naturally give rise to widespread demand for multimodal contexts. For example, "multimodal memory" systems capable of simultaneously understanding and remembering text, images, and even video are no longer merely theoretical concepts but are already in the prototyping phase [Ref 16].

Looking ahead to 2026

Entering 2026, the enterprise focus on LLM applications will inevitably shift from proof-of-concept and early experimentation toward pragmatic, large-scale adoption and ROI. In this process, RAG technology, as the foundational data layer for all AI applications, will experience a wave of more robust and systematic construction.

While the ultimate value and form of AI Agents in complex enterprise scenarios remain in broad exploration, the foundational supporting role of RAG for all Agents—and indeed for all LLM applications—has become a consensus among a growing number of technical decision-makers.

A clear trend is that many enterprises have already taken the lead in building capabilities centered around an "AI Middle Platform" or an "Intelligent Data Foundation," whose core is precisely an unstructured data processing and provisioning platform built upon RAG engines. In contrast, the concrete development and deployment of upper-layer Agents can proceed on this stable foundation with greater flexibility and a more incremental pace.

To summarize the evolution trajectory and future outlook of RAG technology from 2025 to 2026 in one sentence: RAG is undergoing its own profound metamorphosis, evolving from the specific pattern of "Retrieval-Augmented Generation" into a "Context Engine" with "intelligent retrieval" as its core capability.

This evolutionary trend is now irreversible. It will move from the technical backend to the strategic forefront, becoming an indispensable core component for enterprises constructing next-generation intelligent infrastructure.

And RAGFlow is built precisely for this grand yet definitive future vision. We are building not just an open-source, high-efficiency RAG system, but the key cornerstone and driving force toward the era of the "Context Engine."

From our deep cultivation in multimodal parsing and semantic enhancement with DeepDoc, to exploring cutting-edge architectures like TreeRAG to bridge semantic gaps, to building a Context Engine that serves Agents—every iteration of RAGFlow aims to transform the technical directions discussed here into stable, user-friendly, high-performance product capabilities, steadily advancing enterprise intelligence from concept to reality.

We firmly believe that a retrieval-centric Context Engine is key to unlocking the full potential of LLMs and Agents.

Stay tuned for RAGFlow's evolution, visit GitHub to give us a star, and join developers worldwide in witnessing and building the next generation of enterprise AI infrastructure.

References

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference https://arxiv.org/abs/2504.10326?
https://github.com/VectifyAI/PageIndex
A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems https://arxiv.org/abs/2512.05411
M3Retrieve: Benchmarking Multimodal Retrieval for Medicine https://arxiv.org/abs/2510.06888
On the Theoretical Limitations of Embedding-Based Retrieval https://arxiv.org/abs/2508.21038
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings https://arxiv.org/abs/2405.19504
https://www.lateinteraction.com/
https://huggingface.co/blog/QuentinJG/introducing-vidore-v3
诞生才一周年，MCP凉了 https://mp.weixin.qq.com/s/LskoLb8g6t_PCGNSvlLh6g
https://huggingface.co/datasets/bowang0911/ToolSearch
Dspy GEPA https://dspy.ai/api/optimizers/GEPA/overview/
Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models https://arxiv.org/abs/2510.04618
Why the Business Context Layer Is the Key to Making AI Work in Enterprise https://theoryvc.com/blog-posts/business-context-layer
From Context Engineering to Context Platforms https://theoryvc.com/blog-posts/from-context-engineering-to-context-platforms
https://www.linkedin.com/pulse/every-llm-company-search-hard-future-retrieval-systems-7zigc/
MemVerse: Multimodal Memory for Lifelong Learning Agents https://arxiv.org/abs/2512.03627

RAGFlow’s Seamless Upgrade - from 0.21 to 0.22 and Beyond

Wed, 19 Nov 2025 00:00:00 GMT

Background

From version 0.22.0, RAGFlow no longer ships a full Docker image with built-in embedding models. Previously, some users relied on the bundled embedding models in that image to build their datasets.

After upgrading to 0.22.0, those models are no longer available, which leads to several issues: the embedding model originally used by the dataset is missing; you cannot add new documents; retrieval in the dataset stops working properly; and you cannot switch to a new embedding model because of the old logic constraints. To address these compatibility problems after upgrade, we introduced important improvements in version 0.22.1.

0.22.1 capabilities

Datasets containing parsed data are allowed to switch embedding models

Starting from RAGFlow 0.22.1, a safer, automated embedding compatibility check is introduced, allowing users to switch embedding models on data-containing datasets. To ensure the new embedding model does not disrupt the original vector space structure, RAGFlow performs the following checks:

Sample extraction: Randomly selects a few chunks (e.g., 5–10) from the current dataset as representative samples.
Re-encoding: Generates new vectors for the sampled chunks using the new embedding model chosen by the user.
Similarity calculation: For each chunk, calculates the cosine similarity between new and old vectors.
Switch decision: If the average similarity is 0.9 or above, the new and old models are deemed sufficiently consistent in vector space, and the switch is allowed. If below 0.9, the model switch request is denied.

Why use a 0.9 threshold?

The threshold is set to 0.9 because models with the same name from different model providers can have minor version differences, and RAGFlow’s embeddings also vary with strategies and parameters, so a new model cannot perfectly reproduce the old embedding environment. These “small differences” still usually give an average similarity above 0.9, so 0.9 works well as a cut-off for “safe to swap” models. In contrast, embeddings from completely different model families (for example, MiniLM to BGE‑M3) tend to sit around 0.3–0.6 in similarity, so they fall below this threshold and are correctly blocked, preventing a scrambled vector space.

How to switch embedding model

Configure a new model in the model settings interface to replace the unusable default model.

Navigate to the dataset's Configuration page, select the same model name from the new provider, and wait for the model switch to complete.

If switching the embedding model fails, an error message will appear.

Enter Retrieval testing to self-assess the new embedding model.

Functions involving dataset retrieval in chat apps for example are now working properly.

Our future releases will feature more sophisticated upgrade tools and automation, simplifying migration from older versions and reducing the maintenance burden for users.

RAGFlow 0.22.0 Overview — Supported Data Sources, Enhanced Parser, Agent Optimizations, and Admin UI

Tue, 11 Nov 2025 00:00:00 GMT

0.22 Highlights

Building a RAGFlow dataset involves three main steps: file upload, parsing, and chunking. Version 0.21.0 made the parsing and chunking stages more flexible with the Ingestion pipeline.

This 0.22.0 release focuses on the data upload step to help developers build datasets faster.

We also added these key improvements:

The Parser component in the Ingestion pipeline now offers more model choices for better file parsing.
We optimized the Agent's Retrieval and Await response components.
A new Admin UI gives you a clearer and easier way to manage the system.

Support for Rich External Data Sources

The new Data Sources module lets you connect external data to a Dataset. You can now sync files from different places directly into RAGFlow.

Use the "Data Sources" menu in your personal center to add and set up sources like Confluence, AWS S3, Google Drive, Discord, and Notion. This lets you manage all your data in one place and sync it automatically.

Example: S3 Configuration

Make sure you have an S3 storage bucket in your AWS account.

Add your S3 details to the S3 data source form.

After you add it, click the settings icon to see the data source details.

If you set "Refresh Freq" to "1", the system will check for new files every minute.
RAGFlow watches your specified S3 Bucket (like ragflow-bucket). If it finds new files, it immediately starts syncing them.
After syncing, it waits one minute before checking again. Use the "Pause" button to turn this automatic refresh on or off anytime.

Linking Data Sources to a dataset

Create a new dataset (for example, TEST_S3).
Click Configuration and go to the bottom of the page.
Click Link Data Source and pick the data source you want (like S3).

After you link successfully, you'll see three icons:

Rebuild: Click this to delete all files and logs in the dataset and import everything again.
Settings: Check the sync logs here.
Unlink: This disconnects the data source. It keeps all the files already in the dataset but stops new syncs.

Status Messages in Logs:

Scheduled: The task is in line, waiting for its next turn to check for files.
Running: The system is moving files right now.
Success: It finished checking for new files.
Failed: The upload didn't work. Check the error message for details.
Cancel: You paused the transfer.

You can link multiple data sources to one dataset, and one data source can feed into many datasets.

Enhanced Parser

MinerU

RAGFlow now works with MinerU 2.6.3 as another option for parsing PDFs. It supports different backends like pipeline, vlm-transformers, vlm-vlm-engine, and http-client.

The idea is simple: RAGFlow asks MinerU to parse a file, reads the results, and adds them to your dataset.

Key Environment Variables:

Variable	Explanation	Default	Example
`MINERU_EXECUTABLE`	Path to MinerU on your computer	`mineru`	`MINERU_EXECUTABLE=/home/ragflow/uv_tools/.venv/bin/mineru`
`MINERU_DELETE_OUTPUT`	Keep or delete MinerU's output files?	`1` (delete)	`MINERU_DELETE_OUTPUT=0` (keep)
`MINERU_OUTPUT_DIR`	Where to put MinerU's output	System temp folder	`MINERU_OUTPUT_DIR=/home/ragflow/mineru/output`
`MINERU_BACKEND`	Which MinerU backend to use	`pipeline`	`MINERU_BACKEND=vlm-transformers`

Starting Up:

If you use the vlm-http-client backend, set the server address with MINERU_SERVER_URL.
To connect to a remote MinerU parser, use MINERU_APISERVER to give its address.

How to Start:

From Source: Install MinerU by itself (its dependencies can conflict with RAGFlow's). Then set the environment variables and start the RAGFlow server.
Using Docker: Set USE_MINERU=true in docker/.env and restart your containers.

Docling

RAGFlow also supports Docling as another PDF parser. It works the same way as MinerU.

Docling finds the text, formulas, tables, and images in a document. RAGFlow then uses what Docling finds.

What Docling Can Do:

Pull out text (paragraphs, headings, lists).
Extract math formulas.
Identify tables and images (and save them).
Mark where everything is located.

Starting Up: Set USE_DOCLING=true in docker/.env and restart your containers.

Agent Optimizations

Retrieval Now Uses Metadata

You can now add tags (metadata) to files in your dataset. During retrieval, the Agent can use these tags to filter results, so it only looks at specific files instead of the whole library.

Example: Imagine a dataset full of AI papers. Some are about AI agents, others are about evaluating AI. If you want a Q&A assistant that only answers evaluation questions, you can add a tag like "Topic": "Evaluation" to the right papers. When the Agent retrieves information, it will filter for just those tagged files.

Before, this only worked in the Chat app. Now the Agent's Retrieval component can do it too.

Agent Teamwork Gets Better

You can now use an upstream Agent's output in the Await Response component's message.

The Old Way: The message in the Await Response component was always static text.

The New Way: You can insert dynamic content from earlier in the workflow, like a plan from a Planning Agent.

This is great for "Deep Research" agents or any time you need a human to check the work before continuing. It's also a key part of future improvements to the Ingestion Pipeline.

You can find this use case as a ready-to-use template in the agent template library.

Admin UI

This version adds a new Admin UI, a visual dashboard made for system administrators.

It takes jobs you used to do with commands and puts them in a simple interface, making management much easier.

See System Status Instantly

The Service Status dashboard shows the health of all core services. It lists their name, type, host, port, and status. If something goes wrong (like Elasticsearch times out), you can find the problem fast and copy the address to test it, without logging into different servers.

The UI also shows service details. You can see detailed logs and connection info (like database passwords) without ever touching a server command line. This makes fixing problems quicker and keeps the system more transparent and secure.

Manage Users Easily

The User Management section lets you create, enable, disable, reset passwords for, and delete users. You can quickly find users by email or nickname and see what datasets and agents they own.

Finale

RAGFlow 0.21.0 gave you a powerful Ingestion pipeline for your data. Now, RAGFlow 0.22.0 connects to all the places your data lives. Together, they help you break down "data silos" and gather everything in one spot to power your LLMs.

We also improved how Agents and people work together. Now you can step into the Agent's workflow and guide it, working as a team to get better, more accurate results than full automation alone.

We will keep adding more data sources, better parsers, and smarter pipelines to make RAGFlow the best data foundation for your LLM applications.

GitHub: https://github.com/infiniflow/ragflow

Reference:

https://ragflow.io/docs/dev/faq#how-to-use-mineru-to-parse-pdf-documents

RAGFlow in Practice - Building an Agent for Deep-Dive Analysis of Company Research Reports

Thu, 30 Oct 2025 00:00:00 GMT

In the actual work of the investment research department of financial institutions, analysts are exposed to a vast amount of industry and company analysis reports, third-party research data, and real-time market dynamics on a daily basis, with diverse and scattered information sources. The job of financial analysts is to swiftly formulate clear investment recommendations based on the aforementioned information, such as specifically recommending which stocks to buy, how to adjust portfolio allocations, or predicting the next direction of an industry. Therefore, we have developed the "Intelligent Investment Research Assistant" to help financial analysts quickly organize information. It can automatically capture company data, integrate financial indicators, and compile research report viewpoints, enabling analysts to determine within minutes whether a stock is worth buying, eliminating the need to sift through piles of materials and allowing them to focus their time on genuine investment decision-making. To achieve this goal, we have designed a comprehensive technical process.

The technical solution revolves around a core business process:

When an analyst poses a question, the system identifies the company name or abbreviation from the question and retrieves the corresponding stock code with the assistance of a search engine. If identification fails, a prompt is returned directly with the company code. After successfully obtaining the stock code, the system retrieves the company's core financial indicators from data interfaces, organizes and formats the data, and generates a clear financial table. Building on this, intelligent analysis further integrates research report information: on one hand, it gathers the latest authoritative research reports and market viewpoints, and on the other hand, it retrieves relevant research report content from the internal knowledge base. Ultimately, these organized financial data and research report information are combined into a comprehensive response, facilitating analysts in quickly reviewing key indicators and core viewpoints.

The workflow after orchestration is as follows:

This case utilizes RAGFlow to implement a complete workflow, ranging from stock code extraction, to the generation of company financial statements, and finally to the integration and output of research report information.

The following sections will provide a detailed introduction to the implementation process of this solution.

1. Preparing the Dataset

1.1 Create a dataset

The dataset required for this example can be downloaded from Hugging Face Datasets1.

Create an "Internal Stock Research Report" dataset and import the corresponding dataset documents.

1.2 Parse documents

For the documents in the "Internal Stock Research Report" dataset, we have selected the parsing and slicing method called Paper.

Research report documents typically include modules such as abstracts, core viewpoints, thematic analyses, financial forecast tables, and risk warnings. The overall structure follows a more thesis-like logical progression rather than a strictly hierarchical table of contents. If sliced based on the lowest-level headings, it can easily disrupt the coherence between paragraphs and tables.

Therefore, RAGFlow is better suited to adopt the "Paper" slicing approach, using chapters or logical paragraphs as the fundamental units. This approach not only preserves the integrity of the research report's structure but also facilitates the model's quick location of key information during retrieval.

The preview of the sliced financial report is as follows:

2. Building the Intelligent Agent

2.1 Create an application.

After successful creation, the system will automatically generate a "Start" node on the canvas.

In the "Start" node, you can set the initial greeting of the assistant, for example: "Hello! I'm your stock research assistant."

2.2 Build the function of "Extract Stock Codes"

2.2.1 Agent extracts stock codes

Use an Agent node and attach a TavilySearch tool to identify stock names or abbreviations from the user's natural language input and return a unique standard stock code. When no match is found, uniformly output "Not Found."

In financial scenarios, users' natural language is often ambiguous. For example:

"Help me check the research report on Apple Inc."
"How is NVIDIA's financial performance?"
"What's the situation with the Shanghai Composite Index today?"

These requests all contain stock-related information, but the system can only further query financial reports, research reports, or market data after accurately identifying the stock code.

This is why we need an Agent with the function of "extracting stock codes."

Below is the system prompt for this Agent:

 

Your responsibility is: to identify and extract the stock name or abbreviation from the user's natural language query and return the corresponding unique stock code. 

 



 

1. Only one result is allowed: - If a stock is identified → return the corresponding stock code only; - If no stock is identified → return “Not Found” only. 

2. **Do not** output any extra words, punctuation, explanations, prefixes, suffixes, or newline prompts. 3. The output must strictly follow the . 



Output only the stock code (e.g., AAPL or 600519)
Or output “Not Found”




User input: “Please check the research report for Apple Inc.” → Output: AAPL
User input: “How is the financial performance of Moutai?” → Output: 600519
User input: “How is the Shanghai Composite Index performing today?” → Output: Not Found



 - Tavily Search: You may use this tool to query when you're uncertain about the stock code. - If you're confident, there's no need to use the tool. 

 



 - Only output the result, no explanations, prompts, or instructions allowed. - The output can only be the stock code or “Not Found,” otherwise, it will be considered an incorrect answer.

2.2.2 Conditional node for identifying stock codes

Use a conditional node to evaluate the output result of the previous Agent node and guide the process flow based on different outcomes:

If the output is a stock code: It indicates successful identification of the stock, and the process will proceed to the "Case1" branch.
If the output contains "Not Found": It indicates that no valid stock name was identified from the user's input, and the process will proceed to the "Else" branch, where it will execute a node for replying with an irrelevant message, outputting "Your query is not supported."

2.3 Build the "Company Financial Statements" feature

The data for this feature is sourced from financial data provided by Yahoo Finance. By calling this API, we obtain core financial data for specified stocks, including operating revenue, net profit, etc., which drives the generation of the "Company Financial Statements."

2.3.1 Yahoo Finance Tools: Request for Financial Data

By using the "Yahoo Finance Tools" node, select "Balance sheet" and pass the stockCode output by the upstream Agent as a parameter. This allows you to fetch the core financial indicators of the corresponding company.

The returned results contain key data such as total assets, total equity, and tangible book value, which are used to generate the "Company Financial Statements" feature.

2.3.2 Financial table generation by Code node

Utilize the Code node to perform field mapping and numerical formatting on the financial data returned by Yahoo Finance Tools through Python scripts, ultimately generating a Markdown table with bilingual indicator comparisons, enabling a clear and intuitive display of the "Company Financial Statements."

Code:

import re

def format_number(value: str) -> str:
    """Convert scientific notation or floating-point numbers to comma-separated numbers"""
    try:
        num = float(value)
        if num.is_integer():
            return f"{int(num):,}"  # If it's an integer, format without decimal places
        else:
            return f"{num:,.2f}"  # Otherwise, keep two decimal places and add commas
    except:
        return value  # Return the original value if it's not a number (e.g., — or empty)

def extract_md_table_single_column(input_text: str) -> str:
    # Use English indicators directly
    indicators = [
        "Total Assets", "Total Equity", "Tangible Book Value", "Total Debt", 
        "Net Debt", "Cash And Cash Equivalents", "Working Capital", 
        "Long Term Debt", "Common Stock Equity", "Ordinary Shares Number"
    ]
  
    # Core indicators and their corresponding units
    unit_map = {
        "Total Assets": "USD",
        "Total Equity": "USD",
        "Tangible Book Value": "USD",
        "Total Debt": "USD",
        "Net Debt": "USD",
        "Cash And Cash Equivalents": "USD",
        "Working Capital": "USD",
        "Long Term Debt": "USD",
        "Common Stock Equity": "USD",
        "Ordinary Shares Number": "Shares"
    }

    lines = input_text.splitlines()

    # Automatically detect the date column, keeping only the first one
    date_pattern = r"\d{4}-\d{2}-\d{2}"
    header_line = ""
    for line in lines:
        if re.search(date_pattern, line):
            header_line = line
            break

    if not header_line:
        raise ValueError("Date column header row not found")

    dates = re.findall(date_pattern, header_line)
    first_date = dates[0]  # Keep only the first date
    header = f"| Indicator | {first_date} |"
    divider = "|------------------------|------------|"

    rows = []
    for ind in indicators:
        unit = unit_map.get(ind, "")
        display_ind = f"{ind} ({unit})" if unit else ind

        found = False
        for line in lines:
            if ind in line:
                # Match numbers and possible units
                pattern = r"(nan|[0-9\.]+(?:[eE][+-]?\d+)?)"
                values = re.findall(pattern, line)
                # Replace 'nan' with '—' and format the number
                first_value = values[0].strip() if values and values[0].strip().lower() != "nan" else "—"
                first_value = format_number(first_value) if first_value != "—" else "—"
                rows.append(f"| {display_ind} | {first_value} |")
                found = True
                break
        if not found:
            rows.append(f"| {display_ind} | — |")

    md_table = "\n".join([header, divider] + rows)
    return md_table

def main(input_text: str):
    return extract_md_table_single_column(input_text)

We have also received requests from everyone expressing a preference not to extract JSON fields through coding, and we will gradually provide solutions in future versions.

2.4 Build the "Research Report Information Extraction" function

Utilize an information extraction agent, which, based on the stockCode, calls the AlphaVantage API to extract the latest authoritative research reports and insights. Meanwhile, it invokes the internal research report retrieval agent to obtain the full text of the complete research reports. Finally, it outputs the two parts of content separately in a fixed structure, thereby achieving an efficient information extraction function.

System prompt:

 

You are the information extraction agent. You understand the user’s query and delegate tasks to alphavantage and the internal research report retrieval agent. 

 



 1. Based on the stock code output by the "Extract Stock Code" agent, call alphavantage's EARNINGS_CALL_TRANSCRIPT to retrieve the latest information that can be used in a research report, and store all publicly available key details.


2. Call the "Internal Research Report Retrieval Agent" and save the full text of the research report output. 

3. Output the content retrieved from alphavantage and the Internal Research Report Retrieval Agent in full. 





The output must be divided into two sections:
#1. Title: “alphavantage”
Directly output the content collected from alphavantage without any additional processing.
#2. Title: "Internal Research Report Retrieval Agent"
Directly output the content provided by the Internal Research Report Retrieval Agent.

2.4.1 Configure the MCP tool

Add the MCP tool:

Add the MCP tool under the agent and select the required method, such as "EARNINGS_CALL_TRANSCRIPT".

2.4.2 Internal Research Report Retrieval Agent

The key focus in constructing the internal research report retrieval agent lies in accurately identifying the company or stock code in user queries. It then invokes the Retrieval tool to search for research reports from the dataset and outputs the full text, ensuring that information such as data, viewpoints, conclusions, tables, and risk warnings is not omitted. This enables high-fidelity extraction of research report content.

System Prompt:

 

Read user input → Identify the involved company/stock (supports abbreviations, full names, codes, and aliases) → Retrieve the most relevant research reports from the dataset → Output the full text of the research report, retaining the original format, data, chart descriptions, and risk warnings. 





 

1. Exact Match: Prioritize exact matches of company full names and stock codes. 

2. Content Fidelity: Fully retain the research report text stored in the dataset without deletion, modification, or omission of paragraphs. 

3. Original Data: Retain table data, dates, units, etc., in their original form. 

4. Complete Viewpoints: Include investment logic, financial analysis, industry comparisons, earnings forecasts, valuation methods, risk warnings, etc. 

5. Merging Multiple Reports: If there are multiple relevant research reports, output them in reverse chronological order. 

6. No Results Feedback: If no matching reports are found, output “No related research reports available in the dataset.”

2.5 Add a Research Report Generation Agent

The research report generation agent automatically extracts and structurally organizes financial and economic information, generating foundational data and content for investment bank analysts that are professional, retain differentiation, and can be directly used in investment research reports.

 

You are a senior investment banking (IB) analyst with years of experience in capital market research. You excel at writing investment research reports covering publicly listed companies, industries, and macroeconomics. You possess strong financial analysis skills and industry insights, combining quantitative and qualitative analysis to provide high-value references for investment decisions. 

**You are able to retain and present differentiated viewpoints from various reports and sources in your research, and when discrepancies arise, you do not merge them into a single conclusion. Instead, you compare and analyze the differences.** 


 




 

You will receive financial information extracted by the information extraction agent.

 



Based on the content returned by the information extraction agent (no fabrication of data), write a professional, complete, and structured investment research report. The report must be logically rigorous, clearly organized, and use professional language, suitable for reference by fund managers, institutional investors, and other professional readers.
When there are differences in analysis or forecasts between different reports or institutions, you must list and identify the sources in the report. You should not select only one viewpoint. You need to point out the differences, their possible causes, and their impact on investment judgments.




##1. Summary
Provide a concise overview of the company’s core business, recent performance, industry positioning, and major investment highlights.
Summarize key conclusions in 3-5 sentences.
Highlight any discrepancies in core conclusions and briefly describe the differing viewpoints and areas of disagreement.
##2. Company Overview
Describe the company's main business, core products/services, market share, competitive advantages, and business model.
Highlight any differences in the description of the company’s market position or competitive advantages from different sources. Present and compare these differences.
##3. Recent Financial Performance
Summarize key metrics from the latest financial report (e.g., revenue, net profit, gross margin, EPS).
Highlight the drivers behind the trends and compare the differential analyses from different reports. Present this comparison in a table.
##4. Industry Trends & Opportunities
Overview of industry development trends, market size, and major drivers.
If different sources provide differing forecasts for industry growth rates, technological trends, or competitive landscape, list these and provide background information. Present this comparison in a table.
##5. Investment Recommendation
Provide a clear investment recommendation based on the analysis above (e.g., "Buy/Hold/Neutral/Sell"), presented in a table.
Include investment ratings or recommendations from all sources, with the source and date clearly noted.
If you provide a combined recommendation based on different viewpoints, clearly explain the reasoning behind this integration.
##6. Appendix & References
List the data sources, analysis methods, important formulas, or chart descriptions used.
All references must come from the information extraction agent and the company financial data table provided, or publicly noted sources.
For differentiated viewpoints, provide full citation information (author, institution, date) and present this in a table.




Language Style: Financial, professional, precise, and analytical.
Viewpoint Retention: When there are multiple viewpoints and conclusions, all must be retained and compared. You cannot choose only one.
Citations: When specific data or viewpoints are referenced, include the source in parentheses (e.g., Source: Morgan Stanley Research, 2024-05-07).
Facts: All data and conclusions must come from the information extraction agent or their noted legitimate sources. No fabrication is allowed.
Readability: Use short paragraphs and bullet points to make it easy for professional readers to grasp key information and see the differences in viewpoints.




Generate a complete investment research report that meets investment banking industry standards, which can be directly used for institutional investment internal reference, while faithfully retaining differentiated viewpoints from various reports and providing the corresponding analysis.





All section headings in the investment research report must be formatted as N. Section Title (e.g., 1. Summary, 2. Company Overview), where:
The heading number is followed by a period and the section title.
The entire heading (number, period, and title) is rendered in bold text (e.g., using  in HTML or equivalent bold formatting, without relying on Markdown ** syntax).
Do not use ##, **, or any other prefix before the heading number.
Apply this format consistently to all section headings (Summary, Company Overview, Recent Financial Performance, Industry Trends & Opportunities, Investment Recommendation, Appendix & References).

2.6 Add a Reply Message Node

The reply message node is used to output the "financial statements" and "research report content" that are the final outputs of the workflow.

2.7 Save and Test

Click "Save" - "Run" - and view the execution results. The entire process takes approximately 5 minutes to run. Execution Results:

Log: The entire process took approximately 5 minutes to run.

Summary and Outlook

This case study has constructed a complete workflow for stock research reports using RAGFlow, encompassing three core steps:

Utilizing an Agent node to extract stock codes from user inputs.

Acquiring and formatting company financial data through Yahoo Finance tools and Code nodes to generate clear financial statements.

Invoking information extraction agents and an internal research report retrieval agent, and using a research report generation agent to output the latest research report insights and the full text of complete research reports, respectively.

The entire process achieves automated handling from stock code identification to the integration of financial and research report information.

We observe several directions for sustainable development: More data sources can be incorporated to make analytical results more comprehensive, while providing a code-free method for data processing to lower the barrier to entry. The system also has the potential to analyze multiple companies within the same industry, track industry trends, and even cover a wider range of investment instruments such as futures and funds, thereby assisting analysts in forming superior investment portfolios. As these features are gradually implemented, the intelligent investment research assistant will not only help analysts make quicker judgments but also establish an efficient and reusable research methodology, enabling the team to consistently produce high-quality analytical outputs.

RAGFlow Named Among GitHub’s Fastest-Growing Open Source Projects, Reflecting Surging Demand for Production-Ready AI

Tue, 28 Oct 2025 00:00:00 GMT

The release of GitHub’s 2025 Octoverse report marks a pivotal moment for the open source ecosystem—and for projects like RAGFlow, which has emerged as one of the fastest-growing open source projects by contributors this year. With a remarkable 2,596% year-over-year growth in contributor engagement, RAGFlow isn’t just gaining traction—it’s defining the next wave of AI-powered development.

The Rise of Retrieval-Augmented Generation in Production

As the Octoverse report highlights, AI is no longer experimental—it’s foundational. More than 4.3 million AI-related repositories now exist on GitHub, and over 1.1 million public repos import LLM SDKs, a 178% YoY increase. In this context, RAGFlow’s rapid adoption signals a clear shift: developers are moving beyond prototyping and into production-grade AI workflows.

RAGFlow—an end-to-end retrieval-augmented generation engine with built-in agent capabilities—is perfectly positioned to meet this demand. It enables developers to build scalable, context-aware AI applications that are both powerful and practical. As the report notes, “AI infrastructure is emerging as a major magnet” for open source contributions, and RAGFlow sits squarely at the intersection of AI infrastructure and real-world usability.

Why RAGFlow Resonates in the AI Era

Several trends highlighted in the Octoverse report align closely with RAGFlow’s design and mission:

From Notebooks to Production: The report notes a shift from Jupyter Notebooks (+75% YoY) to Python codebases, signaling that AI projects are maturing. RAGFlow supports this transition by offering a structured, reproducible framework for deploying RAG systems in production.

Agentic Workflows Are Going Mainstream: With the launch of GitHub Copilot coding agent and the rise of AI-assisted development, developers are increasingly relying on tools that automate complex tasks. RAGFlow’s built-in agent capabilities allow teams to automate retrieval, reasoning, and response generation—key components of modern AI apps.

Security and Scalability Are Top of Mind: The report also highlights a 172% YoY increase in Broken Access Control vulnerabilities, underscoring the need for secure-by-design AI systems. RAGFlow’s focus on enterprise-ready deployment helps teams address these challenges head-on.

A Project in Active Development

RAGFlow's evolution mirrors a deliberate journey—from solving foundational RAG challenges to shaping the next generation of enterprise AI infrastructure.

The project first made its mark by systematically addressing core RAG limitations through integrated technological innovation. With features such as deep document understanding for parsing complex formats, hybrid retrieval that blends multiple search strategies, and built-in advanced tools like GraphRAG and RAPTOR, RAGFlow established itself as an end-to-end solution that dramatically enhances retrieval accuracy and reasoning performance.

Now, building on this robust technical foundation, RAGFlow is steering toward a bolder vision: to become the superior context engine for enterprise-grade Agents. Evolving from a specialized RAG engine into a unified, resilient context layer, RAGFlow is positioning itself as the essential data foundation for LLMs in the enterprise—enabling Agents of any kind to access rich, precise, and secure context, ensuring reliable and effective operation across all tasks.

RAGFlow is an open source retrieval-augmented generation engine designed for building production-ready AI applications. To learn more or contribute, visit the RAGFlow GitHub repository.

This post was inspired by insights from the GitHub Octoverse 2025 Report. Special thanks to the GitHub team for amplifying the voices of open source builders everywhere.

Is data processing like building with lego? Here is a detailed explanation of the ingestion pipeline.

Thu, 23 Oct 2025 00:00:00 GMT

Since its open-source release, RAGFlow has consistently garnered widespread attention from the community. Its core module, DeepDoc, leverages built-in document parsing models to provide intelligent document-sharding capabilities tailored for multiple business scenarios, ensuring that RAGFlow can deliver accurate and high-quality answers during both the retrieval and generation phases. Currently, RAGFlow comes pre-integrated with over a dozen document-sharding templates, covering various business scenarios and file types.

However, as RAGFlow becomes increasingly widely adopted in production environments, the original dozen-plus fixed sharding methods have struggled to keep pace with the complex and diverse array of data sources, document structures, and file types encountered. Specific challenges include:

The need to flexibly configure different parsing and sharding strategies based on specific business scenarios to accommodate varied document structures and content logic.

Document parsing and ingestion involve not only segmenting unstructured data into text blocks but also encompass a series of critical preprocessing steps to bridge the "semantic gap" during RAG retrieval. This often requires leveraging models to enrich raw content with semantic information such as summaries, keywords, and hierarchical structures.

In addition to locally uploaded files, a significant amount of data, files, and knowledge originate from various sources, including cloud drives and online services.

With the maturation of multimodal vision-language models (VLMs), models like MinerU and Docling, which excel in parsing documents with complex layouts, tables, and mixed text-image arrangements, have emerged. These models demonstrate unique advantages across various application scenarios.

To address these challenges, RAGFlow 0.21.0 has introduced a groundbreaking Ingestion pipeline. This pipeline restructures the cleaning process for unstructured data, allowing users to construct customized data-processing pipelines tailored to specific business needs and enabling precise parsing of heterogeneous documents.

The Ingestion Pipeline is essentially a visual ETL process tailored for unstructured data. Built upon an Agent foundation, it restructures a typical RAG data ingestion workflow—which usually encompasses key stages such as document parsing, text chunking, vectorization, and index construction—into three distinct phases: Parser, Transformer, and Indexer. These phases correspond to document parsing, data transformation, and index construction, respectively.

Document Parsing: As a critical step in data cleaning, this module integrates multiple parsing models, with DeepDoc being a representative example. It transforms raw unstructured data into semi-structured content, laying the groundwork for subsequent processing.

Data Transformation: Currently offering two core types of operators, including Chunker and Transformer, this phase aims to further process the cleaned data into formats suitable for various index access methods, thereby ensuring high-quality recall performance.

Index Construction: Responsible for the final data write-in. RAGFlow inherently adopts a multi-path recall architecture to guarantee retrieval effectiveness. Consequently, the Indexer incorporates multiple indexing methods, allowing users to configure them flexibly.

Below, we will demonstrate the construction and use of the Ingestion Pipeline through a specific example.

First, click on "Create agent" on the "Agent" page. You can choose "Create from blank" to create an Ingestion Pipeline from scratch:

Alternatively, you can select "Create from template" to utilize a pre-configured Ingestion pipeline template:

Next, we will begin to arrange various operators required for the Pipeline. When creating from scratch, only the Begin and Parser operators will be displayed on the initial canvas. Subsequently, you can drag and connect additional operators with different functions from the right side of the existing operators.

First, it is necessary to configure the Parser operator.

Parser

The Parser operator is responsible for reading and parsing documents: identifying their layouts, extracting structural and textual information from them, and ultimately obtaining structured document data.

This represents a "high-fidelity, structured" extraction strategy. The Parser intelligently adapts to and preserves the original characteristics of different files, whether it's the hierarchical outline of a Word document, the row-and-column layout of a spreadsheet, or the complex layout of a scanned PDF. It not only extracts the main text but also fully retains auxiliary information such as titles, tables, headers, and footers, transforming them into appropriate data forms, which will be detailed below. This structured differentiation is crucial, providing the necessary foundation for subsequent refined processing.

Currently, the Parser operator supports input from 8 major categories encompassing 23 file types, summarized as follows:

When in use, simply click "Add Parser" within the Parser node and select the desired file category (such as PDF, Image, or PPT). When the Ingestion pipeline is running, the Parser node will automatically identify the input file and route it to the corresponding parser for parsing.

Here, we provide further explanations for the parsers of several common file categories:

For PDF files, RAGFlow offers multiple parsing model options, with a unified output in JSON format:

Default DeepDoc: This is RAGFlow's built-in document understanding model, capable of recognizing layout, columns, and tables. It is suitable for processing scanned documents or those with complex formatting.

MinerU: Currently an outstanding document parsing model in the industry. Besides parsing complex document content and layouts, MinerU also provides excellent parsing for complex file elements such as mathematical formulas.

Naive: A pure text extraction method without using any models. It is suitable for documents with no complex structure or non-textual elements.

For Image files, the system will by default invoke OCR to extract text from the image. Additionally, users can also configure VLMs (Vision Language Models) that support visual recognition to process them.

For Audio files, it is necessary to configure a model that supports speech-to-text conversion. The Parser will then extract the textual content from the Audio. Users can configure the API keys of model providers that support this type of parsing on the "Model provider" page of the homepage. After that, they can return to the Parser node and select it from the dropdown menu. This "configure first, then select" logic also applies to PDF, Image, and Video files.

For Video files, it is necessary to configure a large model that supports multimodal recognition. The Parser will invoke this model to conduct a comprehensive analysis of the video and output the results in text format.

When parsing Email files, RAGFlow provides Field options, allowing users to select only the desired fields, such as "subject" and "body." The Parser will then precisely extract the textual content of these fields.

The Spreadsheet parser will output the file in HTML format, preserving its row and column structure intact to ensure that the tabular data remains clear and readable after conversion.

Files of Word and PPT types will be parsed and output in JSON format. For Word files, the original hierarchical structure of the document, such as titles, paragraphs, lists, headers, and footers, will be retained. For PPT files, the content will be extracted page by page, distinguishing between the title, main text, and notes of each slide.

The Text & Markup category will automatically strip formatting tags from files such as HTML and MD (Markdown), outputting only the cleanest text content.

Chunker

The Chunker node is responsible for dividing the documents output by upstream nodes into Chunk segments. Chunk is a concept introduced by RAG technology, representing the unit of recall. Users can choose whether to add a Chunker node based on their needs, but it is generally recommended to use it for two main reasons:

If the entire document is used as the unit for recall, the data passed to the large model during the final generation phase may exceed the context window limit.

In typical RAG systems, vector search serves as an important recall method. However, vectors inherently have issues with inaccurate semantic representation. For example, users can choose to convert a single sentence into a vector or the entire document into a vector. The former loses global semantic information, while the latter loses local information. Therefore, selecting an appropriate segment length to achieve a relatively good balance when represented by a single vector is an essential technical approach.

In practical engineering systems, how the Chunk segmentation results are determined often significantly impacts the recall quality of RAG. If content containing the answer is split across different Chunks, and the Retrieval phase cannot guarantee that all these Chunks are recalled, it can lead to inaccurate answer generation and hallucinations. Therefore, the Ingestion pipeline introduces the Chunker node, allowing users to slice text more flexibly.

The current system incorporates two built-in segmentation methods: segmentation based on text tokens and titles.

Segmentation by tokens is the most common approach. Users can customize the size of each segment, with a default setting of 512 tokens, which represents a balance optimized for retrieval effectiveness and model compatibility. When setting the segment size, trade-offs must be considered: if the segment token count remains too large, portions exceeding the model's limit will still be discarded; if set too small, it may result in excessive segmentation of coherent semantics in the original text, disrupting context and affecting retrieval effectiveness.

To address this, the Chunker operator provides a segment overlap feature, which allows the end portion of the previous segment to be duplicated as the beginning of the next segment, thereby enhancing semantic continuity. Users can increase the "Overlapped percent" to improve the correlation between segments.

Additionally, users can further optimize segmentation rules by defining "Separators." The system defaults to using \n (newline characters) as separators, meaning the segmenter will prioritize attempting to split along natural paragraphs rather than abruptly truncating sentences in the middle, as shown in the following figure.

If the document itself has a clear chapter structure, segmenting the text by tokens may not be the optimal choice. In such cases, the Title option in the Chunker can be selected to slice the document according to its headings. This method is primarily suitable for documents with layouts such as technical manuals, academic papers, and legal clauses. We can customize expressions for headings at different levels in the Title node. For example, the expression for a first-level heading (H1) can be set as ^#[^#], and for a second-level heading (H2) as ^##[^#]. Based on these expressions, the system will strictly slice the document according to the predefined chapter structure, ensuring that each segment represents a structurally complete "chapter" or "subsection." Users can also freely add or reduce heading levels in the configuration to match the actual structure of the document, as shown in the following figure.

Note: In the current RAGFlow v0.21 version, if both the Token and Title options of the Chunker are configured simultaneously, please ensure that the Title node is connected after the Token node. Otherwise, if the Title node is directly connected to the Parser node, format errors may occur for files of types Email, Image, Spreadsheet, and Text&Markup. These limitations will be optimized in subsequent versions.

Transformer

The Transformer operator is responsible for transforming textual content. Simply resolving the accuracy of text parsing and segmentation does not guarantee the final retrieval accuracy. This is because there is always a so-called "semantic gap" between a user's query and the documents containing the answers. By using the Transformer operator, users can leverage large models to extract information from the input document content, thereby bridging the "semantic gap."

The current Transformer operator supports functions such as summary generation, keyword generation, question generation, and metadata generation. Users can choose to incorporate this operator into their pipeline to supplement the original data with these contents, thereby enhancing the final retrieval accuracy. Similar to other scenarios involving large models, the Transformer node also offers three modes for the large model: Improvise, Precise, and Balance.

Improvise (Improvisational Mode) encourages the model to exhibit greater creativity and associative thinking, making it well-suited for generating diverse Questions.

Precise (Precise Mode) strictly constrains the model to ensure its output is highly faithful to the original text, making it suitable for generating Summaries or extracting Keywords.

Balance (Balanced Mode) strikes a balance between the two, making it applicable to most scenarios.

Users can select one of these three styles or achieve finer control by adjusting parameters such as Temperature and Top P on their own.

The Transformer node can generate four types of content: Summary, Keywords, Questions, and Metadata. RAGFlow also makes the prompts for each type of content openly accessible, which means users can enrich and personalize text processing by modifying the system prompts.

If multiple functions need to be implemented, such as summarizing content and extracting keywords simultaneously, users are required to configure a separate Transformer node for each function and connect them in series within the Pipeline. In other words, a Transformer node can be directly connected after the Parser to process the entire document (e.g., generating a full-text summary), or it can be connected after the Chunker to process each text segment (e.g., generating questions for each Chunk). Additionally, a Transformer node can be connected after another Transformer node to perform complex content extraction and generation in a cascaded manner.

Please note: The Transformer node does not automatically acquire content from its preceding nodes. The actual source of information it processes entirely depends on the variables referenced in the User prompt. In the User prompt, variables output by upstream nodes must be manually selected and referenced by entering the / symbol.

For example, in a Parser - Chunker - Transformer pipeline, even though the Transformer is visually connected after the Chunker, if the variable referenced in the User prompt is the output from the Parser node, then the Transformer will actually process the entire original document rather than the chunks generated by the Chunker.

Similarly, when users choose to connect multiple Transformer nodes in series (e.g., the first one generates a Summary, and the second one generates Keywords), if the second Transformer references the Summary variable generated by the first Transformer, it will process this Summary as the "new original text" for further processing, rather than handling the text chunks from the more upstream Chunker.

Indexer

The preceding Parser, Chunker, and Transformer nodes handle data inflow, segmentation, and optimization. However, the final execution unit in the Pipeline is the Indexer node, which is responsible for writing the processed data into the Retrieval engine (RAGFlow currently supports Infinity, Elasticsearch, and OpenSearch).

The core capability of the Retrieval engine is to establish various types of indexes for data, thereby providing search capabilities, including vector indexing, full-text indexing, and future tensor indexing capabilities, among others. In other words, it is the ultimate embodiment of "Retrieval" in the term RAG.

Due to the varying capabilities of different types of indexes, the Indexer simultaneously offers options for creating different indexes. Specifically, the Search method option within the Indexer node determines how user data can be "found."

Full-text refers to traditional "keyword search," which is an essential option for precise recall, such as searching for a specific product number, person's name, or code. Embedding, on the other hand, represents modern AI-driven "semantic search." Users can ask questions in natural language, and the system can "understand" the meaning of the question and retrieve the most relevant document chunks in terms of content. Enabling both simultaneously for hybrid search is the default option in RAGFlow. Effective multi-channel recall can balance precision and semantics, maximizing the discovery of text segments where answers reside.

Note: There is no need to select a specific Embedding model within the Indexer node, as it will automatically invoke the embedding model set when creating the knowledge base. Additionally, the Chat, Search, and Agent functionalities in RAGFlow support cross-knowledge base retrieval, meaning a single question can be searched across multiple knowledge bases simultaneously. However, to enable this feature, a prerequisite must be met: all simultaneously selected knowledge bases must utilize the same Embedding model. This is because different embedding models convert the same text into completely different and incompatible vectors, preventing cross-base retrieval.

Furthermore, finer retrieval settings can be achieved through the Filename embedding weight and Field options. The Filename embedding weight is a fine-tuning slider that allows users to consider the "filename" of a document as part of the semantic information and assign it a specific weight.

The Field option, on the other hand, determines the specific content to be indexed and the retrieval strategy. Currently, three distinct strategy options are provided:

Processed Text: This is the default option and the most intuitive. It means that the Indexer will index the processed text chunks from preceding nodes.

Questions: If a Transformer node is used before the Indexer to generate "potential questions that each text chunk can answer," these Questions can be indexed here. In many scenarios, matching "user questions" with "pre-generated questions" yields significantly higher similarity than matching "questions" with "answers" (i.e., the original text), effectively improving retrieval accuracy.

Augmented Context: This involves using Summaries instead of the original text for retrieval. It is suitable for scenarios requiring quick broad topic matching without being distracted by the details of the original text.

Link the Ingestion Pipeline to the Knowledge Base

After constructing an Ingestion pipeline, the next step is to associate it with the corresponding knowledge base. On the page for creating a knowledge base, locate and click the "Choose pipeline" option under the "Ingestion pipeline" tab. Subsequently, select the already created Pipeline from the dropdown menu to establish the association. Once set, this Pipeline will become the default file parsing process for this knowledge base.

For an already created knowledge base, users can enter its "Ingestion pipeline" module at any time to readjust and reselect the associated parsing process.

If users wish to adjust the Ingestion pipeline for a single file, they can also do so by clicking on the Parse location to make adjustments.

Finally, make adjustments and save the updates in the pop-up window.

Logs

The execution of an Ingestion Pipeline may take a considerable amount of time, making observability an indispensable capability. To this end, RAGFlow provides a log panel for the Ingestion pipeline, which records the full-chain logs for each file parsing operation. For files parsed through the Ingestion Pipeline, users can delve into the operational details of each processing node. This offers comprehensive data support for subsequent issue auditing, process debugging, and performance insights.

The following image is an example diagram of step logs.

Case Reference

When creating a Pipeline, you can select the "Chunk Summary" template as the foundation for construction, although you also have the option to choose other templates as starting points for subsequent building.

The orchestration design of the "Chunk Summary" template is as follows:

Next, switch the large model in the Transformer node to the desired model and set the "Result destination" field to "Metadata." This configuration means that the processing results (such as summaries) of the large model on text chunks will be directly stored in the file's metadata, providing capability support for subsequent precise retrieval and filtering.

Click Run in the top right corner and upload a file to test the pipeline:

You can click View result to check the test run results:

Summary

The above content provides a comprehensive introduction to the usage methods and core capabilities of the current Ingestion pipeline. Looking ahead, RAGFlow will continue to evolve in terms of data import and parsing clarity, including the following specific enhancements:

Expanding Data Source Support: In addition to local file uploads, we will gradually integrate various data sources such as S3, cloud storage, email, online notes, and more. Through automatic and manual synchronization mechanisms, we will achieve seamless data import into knowledge bases, automatically applying the Ingestion pipeline to complete document parsing.

Enhancing Parsing Capabilities: We will integrate more document parsing models, such as Docling, into the Parser operator to cover parsing needs across different vertical domains, comprehensively improving the quality and accuracy of document information extraction.

Opening Up Custom Slicing Functionality: In addition to built-in Chunker types, we will gradually open up underlying slicing capabilities, allowing users to write custom slicing logic for more flexible and controllable text organization.

Strengthening the Extensibility of Semantic Enhancement: Building on existing capabilities for summarization, keyword extraction, and question generation, we will further support customizable semantic enhancement strategies in a programmable manner, providing users with more technical means to optimize retrieval and ranking.

Through these enhancements, RAGFlow will continuously improve retrieval accuracy, serving as a robust support for providing high-quality context to large models.

Finally, we appreciate your attention and support. We welcome you to star RAGFlow on GitHub and join us in witnessing the continuous evolution of large model technology!

GitHub: https://github.com/infiniflow/ragflow

Bid Farewell to Complexity — RAGFlow CLI Makes Back-end Management a Breeze

Mon, 20 Oct 2025 00:00:00 GMT

To meet the refined requirements for system operation and maintenance monitoring as well as user account management, especially addressing practical issues currently faced by RAGFlow users, such as "the inability of users to recover lost passwords on their own" and "difficulty in effectively controlling account statuses," RAGFlow has officially launched a professional back-end management command-line tool.

This tool is based on a clear client-server architecture. By adopting the design philosophy of functional separation, it constructs an efficient and reliable system management channel, enabling administrators to achieve system recovery and permission control at the fundamental level.

The specific architectural design is illustrated in the following diagram:

Through this tool, RAGFlow users can gain a one-stop overview of the operational status of the RAGFlow Server as well as components such as MySQL, Elasticsearch, Redis, MinIO, and Infinity. It also supports comprehensive user lifecycle management, including account creation, status control, password reset, and data cleanup.

Start the service

If you deploy RAGFlow via Docker, please modify the docker/docker_compose.yml file and add parameters to the service startup command.

command: - --enable-adminserver

If you are deploying RAGFlow from the source code, you can directly execute the following command to start the management server:

python3 admin/server/admin_server.py

After the server starts, it listens on port 9381 by default, waiting for client connections. If you need to use a different port, please modify the ADMIN_SVR_HTTP_PORT configuration item in the docker/.env file.

Install the client and connect to the management service

It is recommended to use pip to install the specified version of the client:

pip install ragflow-cli==0.21.0

Version 0.21.0 is the current latest version. Please ensure that the version of ragflow-cli matches the version of the RAGFlow server to avoid compatibility issues:

ragflow-cli -h 127.0.0.1 -p 9381

Here, -h specifies the server IP, and -p specifies the server port. If the server is deployed at a different address or port, please adjust the parameters accordingly.

For the first login, enter the default administrator password admin. After successfully logging in, it is recommended to promptly change the default password in the command-line tool to enhance security.

Command Usage Guide

By entering the following commands through the client, you can conveniently manage users and monitor the operational status of the service.

Interactive Feature Description

Supports using arrow keys to move the cursor and review historical commands.

Pressing Ctrl+C allows you to terminate the current interaction at any time.

If you need to copy content, please avoid using Ctrl+C to prevent accidentally interrupting the process.

Command Format Specifications

All commands are case-insensitive and must end with a semicolon ;.

Text parameters such as usernames and passwords need to be enclosed in single quotes ' or double quotes ".

Special characters like \, ', and " are prohibited in passwords.

Service Management Commands

LIST SERVICES;

List the operational status of RAGFlow backend services and all associated middleware.

Usage Example

admin> list services; Listing all services +-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+ | extra | host | id | name | port | service_type | status | +-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+ | {} | 0.0.0.0 | 0 | ragflow_0 | 9380 | ragflow_server | Timeout | | {'meta_type': 'mysql', 'password': 'infini_rag_flow', 'username': 'root'} | localhost | 1 | mysql | 5455 | meta_data | Alive | | {'password': 'infini_rag_flow', 'store_type': 'minio', 'user': 'rag_flow'} | localhost | 2 | minio | 9000 | file_store | Alive | | {'password': 'infini_rag_flow', 'retrieval_type': 'elasticsearch', 'username': 'elastic'} | localhost | 3 | elasticsearch | 1200 | retrieval | Alive | | {'db_name': 'default_db', 'retrieval_type': 'infinity'} | localhost | 4 | infinity | 23817 | retrieval | Timeout | | {'database': 1, 'mq_type': 'redis', 'password': 'infini_rag_flow'} | localhost | 5 | redis | 6379 | message_queue | Alive | +-------------------------------------------------------------------------------------------+-----------+----+---------------+-------+----------------+---------+

List the operational status of RAGFlow backend services and all associated middleware.

SHOW SERVICE ;

Query the detailed operational status of the specified service.

: Command: LIST SERVICES； The service identifier ID in the displayed results.

Usage Example:

Query the RAGFlow backend service:

admin> show service 0; Showing service: 0 Service ragflow_0 is alive. Detail: Confirm elapsed: 4.1 ms.

The response indicates that the RAGFlow backend service is online, with a response time of 4.1 milliseconds.

Query the MySQL service:

admin> show service 1; Showing service: 1 Service mysql is alive. Detail: +---------+----------+------------------+------+------------------+------------------------+-------+-----------------+ | command | db | host | id | info | state | time | user | +---------+----------+------------------+------+------------------+------------------------+-------+-----------------+ | Daemon | None | localhost | 5 | None | Waiting on empty queue | 86356 | event_scheduler | | Sleep | rag_flow | 172.18.0.6:56788 | 8790 | None | | 2 | root | | Sleep | rag_flow | 172.18.0.6:53482 | 8791 | None | | 73 | root | | Query | rag_flow | 172.18.0.6:56794 | 8795 | SHOW PROCESSLIST | init | 0 | root | +---------+----------+------------------+------+------------------+------------------------+-------+-----------------+

The response indicates that the MySQL service is online, with the current connection and execution status shown in the table above.

Query the MinIO service:

admin> show service 2; Showing service: 2 Service minio is alive. Detail: Confirm elapsed: 2.3 ms.

The response indicates that the MinIO service is online, with a response time of 2.3 milliseconds.

Query the Elasticsearch service:

admin> show service 3; Showing service: 3 Service elasticsearch is alive. Detail: +----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+ | cluster_name | docs | docs_deleted | indices | indices_shards | jvm_heap_max | jvm_heap_used | jvm_versions | mappings_deduplicated_fields | mappings_deduplicated_size | mappings_fields | nodes | nodes_version | os_mem | os_mem_used | os_mem_used_percent | status | store_size | total_dataset_size | +----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+ | docker-cluster | 8 | 0 | 1 | 2 | 3.76 GB | 2.15 GB | 21.0.1+12-29 | 17 | 757 B | 17 | 1 | ['8.11.3'] | 7.52 GB | 2.30 GB | 31 | green | 226 KB | 226 KB | +----------------+------+--------------+---------+----------------+--------------+---------------+--------------+------------------------------+----------------------------+-----------------+-------+---------------+---------+-------------+---------------------+--------+------------+--------------------+

The response indicates that the Elasticsearch cluster is operating normally, with specific metrics including document count, index status, memory usage, etc.

Query the Infinity service:

admin> show service 4; Showing service: 4 Fail to show service, code: 500, message: Infinity is not in use.

The response indicates that Infinity is not currently in use by the RAGFlow system.

admin> show service 4; Showing service: 4 Service infinity is alive. Detail: +-------+--------+----------+ | error | status | type | +-------+--------+----------+ | | green | infinity | +-------+--------+----------+

After enabling Infinity and querying again, the response indicates that the Infinity service is online and in good condition.

Query the Redis service:

admin> show service 5; Showing service: 5 Service redis is alive. Detail: +-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+ | blocked_clients | connected_clients | instantaneous_ops_per_sec | mem_fragmentation_ratio | redis_version | server_mode | total_commands_processed | total_system_memory | used_memory | +-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+ | 0 | 3 | 10 | 3.03 | 7.2.4 | standalone | 404098 | 30.84G | 1.29M | +-----------------+-------------------+---------------------------+-------------------------+---------------+-------------+--------------------------+---------------------+-------------+

The response indicates that the Redis service is online, with the version number, deployment mode, and resource usage shown in the table above.

User Management Commands

LIST USERS;

List all users in the RAGFlow system.

Usage Example:

admin> list users; Listing all users +-------------------------------+----------------------+-----------+----------+ | create_date | email | is_active | nickname | +-------------------------------+----------------------+-----------+----------+ | Mon, 13 Oct 2025 15:58:42 GMT | admin@ragflow.io | 1 | admin | | Mon, 13 Oct 2025 15:54:34 GMT | lynn_inf@hotmail.com | 1 | Lynn | +-------------------------------+----------------------+-----------+----------+

The response indicates that there are currently two users in the system, both of whom are enabled.

Among them, admin@ragflow.io is the administrator account, which is automatically created during the initial system startup.

SHOW USER ;

Query detailed user information by email.

: The user's email address, which must be enclosed in single quotes ' or double quotes ".

Usage Example:

Query the administrator user

admin> show user "admin@ragflow.io"; Showing user: admin@ragflow.io +-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+ | create_date | email | is_active | is_anonymous | is_authenticated | is_superuser | language | last_login_time | login_channel | status | update_date | +-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+ | Mon, 13 Oct 2025 15:58:42 GMT | admin@ragflow.io | 1 | 0 | 1 | True | English | None | None | 1 | Mon, 13 Oct 2025 15:58:42 GMT | +-------------------------------+------------------+-----------+--------------+------------------+--------------+----------+-----------------+---------------+--------+-------------------------------+

The response indicates that admin@ragflow.io is a super administrator and is currently enabled.

Query a regular user

admin> show user "lynn_inf@hotmail.com"; Showing user: lynn_inf@hotmail.com +-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+ | create_date | email | is_active | is_anonymous | is_authenticated | is_superuser | language | last_login_time | login_channel | status | update_date | +-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+ | Mon, 13 Oct 2025 15:54:34 GMT | lynn_inf@hotmail.com | 1 | 0 | 1 | False | English | Mon, 13 Oct 2025 15:54:33 GMT | password | 1 | Mon, 13 Oct 2025 17:24:09 GMT | +-------------------------------+----------------------+-----------+--------------+------------------+--------------+----------+-------------------------------+---------------+--------+-------------------------------+

The response indicates that lynn_inf@hotmail.com is a regular user who logs in via password, with the last login time shown as the provided timestamp.

CREATE USER ;

Create a new user.

: The user's email address must comply with standard email format specifications.

: The user's password must not contain special characters such as \, ', or ".

Usage Example:

admin> create user "example@ragflow.io" "psw"; Create user: example@ragflow.io, password: psw, role: user +----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+ | access_token | email | id | is_superuser | login_channel | nickname | +----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+ | be74d786a9b911f0a726d68c95a0776b | example@ragflow.io | be74d6b4a9b911f0a726d68c95a0776b | False | password | | +----------------------------------+--------------------+----------------------------------+--------------+---------------+----------+

A regular user has been successfully created. Personal information such as nickname and avatar can be set by the user themselves after logging in and accessing the profile page.

ALTER USER PASSWORD ;

Change the user's password.

: User email address

: New password (must not be the same as the old password and must not contain special characters)

Usage Example:

admin> alter user password "example@ragflow.io" "psw"; Alter user: example@ragflow.io, password: psw Same password, no need to update!

When the new password is the same as the old password, the system prompts that no change is needed.

admin> alter user password "example@ragflow.io" "new psw"; Alter user: example@ragflow.io, password: new psw Password updated successfully!

The password has been updated successfully. The user can log in with the new password thereafter.

ALTER USER ACTIVE ;

Enable or disable a user.

: User email address

: Enabled or disabled status

Usage Example:

admin> alter user active "example@ragflow.io" off; Alter user example@ragflow.io activate status, turn off. Turn off user activate status successfully!

The user has been successfully disabled. Only users in a disabled state can be deleted.

DROP USER ;

Delete the user and all their associated data

: User email address

Important Notes:

Only disabled users can be deleted.

Before proceeding, ensure that all necessary data such as knowledge bases and files that need to be retained have been transferred to other users.

This operation will permanently delete the following user data:

All knowledge bases created by the user, uploaded files, and configured agents, as well as files uploaded by the user in others' knowledge bases, will be permanently deleted. This operation is irreversible, so please proceed with extreme caution.

The deletion command is idempotent. If the system fails or the operation is interrupted during the deletion process, the command can be re-executed after troubleshooting to continue deleting the remaining data.

Usage Example:

User Successfully Deleted

admin> drop user "example@ragflow.io"; Drop user: example@ragflow.io Successfully deleted user. Details: Start to delete owned tenant. - Deleted 2 tenant-LLM records. - Deleted 0 langfuse records. - Deleted 1 tenant. - Deleted 1 user-tenant records. - Deleted 1 user. Delete done!

The response indicates that the user has been successfully deleted, and it lists detailed steps for data cleanup.

Deleting Super Administrator (Prohibited Operation)

admin> drop user "admin@ragflow.io"; Drop user: admin@ragflow.io Fail to drop user, code: -1, message: Can't delete the super user.

The response indicates that the deletion failed. The super administrator account is protected and cannot be deleted, even if it is in a disabled state.

Data and Agent Management Commands

LIST DATASETS OF ;

List all knowledge bases of the specified user

: User email address

Usage Example:

admin> list datasets of "lynn_inf@hotmail.com"; Listing all datasets of user: lynn_inf@hotmail.com +-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+ | chunk_num | create_date | doc_num | language | name | permission | status | token_num | update_date | +-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+ | 8 | Mon, 13 Oct 2025 15:56:43 GMT | 1 | English | primary_dataset | me | 1 | 3296 | Mon, 13 Oct 2025 15:57:54 GMT | +-----------+-------------------------------+---------+----------+-----------------+------------+--------+-----------+-------------------------------+

The response shows that the user has one private knowledge base, with detailed information such as the number of documents and segments displayed in the table above.

LIST AGENTS OF ;

List all Agents of the specified user

: User email address

Usage Example:

admin> list agents of "lynn_inf@hotmail.com"; Listing all agents of user: lynn_inf@hotmail.com +-----------------+-------------+------------+----------------+ | canvas_category | canvas_type | permission | title | +-----------------+-------------+------------+----------------+ | agent_canvas | None | me | finance_helper | +-----------------+-------------+------------+----------------+

The response indicates that the user has one private Agent, with detailed information shown in the table above.

Other commands

? or \help

Display help information.

\q or \quit

Follow-up plan

We are always committed to enhancing the system management experience and overall security. Building on its existing robust features, the RAGFlow back-end management tool will continue to evolve. In addition to the current efficient and flexible command-line interface, we are soon launching a professional system management UI, enabling administrators to perform all operational and maintenance tasks in a more secure and intuitive graphical environment.

To strengthen permission control, the system status information currently visible in the ordinary user interface will be revoked. After the future launch of the professional management UI, access to the core operational status of the system will be restricted to administrators only. This will effectively address the current issue of excessive permission exposure and further reinforce the system's security boundaries.

In addition, we will also roll out more fine-grained management features sequentially, including:

Fine-grained control over Datasets and Agents

User Team collaboration management mechanisms

Enhanced system monitoring and auditing capabilities

These improvements will establish a more comprehensive enterprise-level management ecosystem, providing administrators with a more all-encompassing and convenient system control experience.

500 Percent Faster Vector Retrieval! 90 Percent Memory Savings! Three Groundbreaking Technologies in Infinity v0.6.0 That Revolutionize HNSW

Fri, 17 Oct 2025 00:00:00 GMT

Introduction

In RAG (Retrieval-Augmented Generation) and LLM (Large Language Model) Memory, vector retrieval is widely employed. Among various options, graph-based indexing has become the most common choice due to its high accuracy and performance, with the HNSW (Hierarchical Navigable Small World) index being the most representative one1 2.

However, during our practical application of HNSW in RAGFlow, we encountered the following two major bottlenecks:

As the data scale continues to grow, the memory consumption of vector data becomes highly significant. For instance, one billion 1024-dimensional floating-point vectors would require approximately 4TB of memory space.

When constructing an HNSW index on complex datasets, there is a bottleneck in retrieval accuracy. After reaching a certain threshold, it becomes difficult to further improve accuracy solely by adjusting parameters.

To address this, Infinity has implemented a variety of improved algorithms based on HNSW. Users can select different indexing schemes by adjusting the index parameters of HNSW. Each HNSW index variant possesses distinct characteristics and is suitable for different scenarios, allowing users to construct corresponding index structures based on their actual needs.

Introduction to Indexing Schemes

The original HNSW, as a commonly used graph-based index, exhibits excellent performance.

Its structure consists of two parts:

A set of original vector data, along with a graph structure jointly constructed by a skip list and an adjacency list. Taking the Python interface as an example, the index can be constructed and utilized in the following manner:

## Create index table_obj.create_index( "hnsw_index", index.IndexInfo( "embedding", index.IndexType.Hnsw, { "m": 16, "ef_construction": 200, "metric": "l2" }, ) ) ## Vector retrieval query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

To address the issues of high memory consumption and accuracy bottlenecks, Infinity provides the following solutions:

Introduce LVQ and RaBitQ quantization methods to reduce the memory overhead of original vectors during graph search processes.

Introduce the LSG strategy to optimize the graph index structure of HNSW, enhancing its accuracy threshold and query efficiency.

To facilitate users in testing the performance of different indexes locally, Infinity provides a benchmark script. You can follow the tutorial provided by Infinity on GitHub to set up the environment and prepare the dataset, and then test different indexing schemes using the benchmark.

###################### compile benchmark ###################### cmake --build cmake-build-release --target hnsw_benchmark ###################### build index & execute query ###################### # mode : build, query # benchmark_type : sift, gist, msmarco # build_type : plain, lvq, crabitq, lsg, lvq_lsg, crabitq_lsg ############################################################## benchmark_type=sift build_type=plain ./cmake-build-release/benchmark/local_infinity/hnsw_benchmark --mode=build --benchmark_type=$benchmark_type --build_type=$build_type --thread_n=8 ./cmake-build-release/benchmark/local_infinity/hnsw_benchmark --mode=query --benchmark_type=$benchmark_type --build_type=$build_type --thread_n=8 --topk=10

Among them, the original HNSW corresponds to the parameter build_type=plain. This paper conducts a unified test on the query performance of all variant indexes, with the experimental environment configuration adopted as follows:

OS: Ubuntu 24.04 LTS (Noble Numbat)

CPU: 13th Gen Intel(R) Core(TM) i5-13400

RAM: 64G

The CPU has a 16-core specification. To align with the actual device environments of most users, the parallel computing parameter in the benchmark is uniformly set to 8 threads.

Solution 1: Original HNSW + LVQ Quantizer (HnswLvq)

LVQ is a scalar quantization method that compresses each 32-bit floating-point number in the original vectors into an 8-bit integer3, thereby reducing memory usage to one-fourth of that of the original vectors.

Compared to simple scalar quantization methods (such as mean scalar quantization), LVQ reduces errors by statistically analyzing the residuals of each vector, effectively minimizing information loss in distance calculations for quantized vectors. Consequently, LVQ can accurately estimate the distances between original vectors with only approximately 30% of the original memory footprint.

In the HNSW algorithm, original vectors are utilized for distance calculations, which enables LVQ to be directly integrated with HNSW. We refer to this combined approach as HnswLvq. In Infinity, users can enable LVQ encoding by setting the parameter "encode": "lvq":

## Create index table_obj.create_index( "hnsw_index", index.IndexInfo( "embedding", index.IndexType.Hnsw, { "m": 16, "ef_construction": 200, "metric": "l2", "encode": "lvq" }, ) ) ## Vector retrieval query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

The graph structure of HnswLvq remains consistent with that of the original HNSW, with the key difference being that it uses quantized vectors to perform all distance calculations within the graph. Through this improvement, HnswLvq outperforms the original HNSW index in terms of both index construction and query efficiency.

The improvement in construction efficiency stems from the shorter length of quantized vectors, which results in reduced time for distance calculations using SIMD instructions. The enhancement in query efficiency is attributed to the computational acceleration achieved through quantization, which outweighs the negative impact caused by the loss of precision.

In summary, HnswLvq significantly reduces memory usage while maintaining excellent query performance. We recommend that users adopt it as the primary index in most scenarios. To replicate this experiment, users can set the parameter build_type=lvq in the benchmark. The specific experimental results are compared alongside the RaBitQ quantizer scheme in Solution two.

Solution 2: Original HNSW + RaBitQ Quantizer (HnswRabitq)

RaBitQ is a binary scalar quantization method that shares a similar core idea with LVQ, both aiming to replace the 32-bit floating-point numbers in original vectors with fewer encoded bits. The difference lies in that RaBitQ employs binary scalar quantization, representing each floating-point number with just 1 bit, thereby achieving an extremely high compression ratio.

However, this extreme compression also leads to more significant information loss in the vectors, resulting in a decline in the accuracy of distance estimation. To mitigate this issue, RaBitQ introduces a rotation matrix to preprocess the dataset during the preprocessing stage and retains more residual information, thereby reducing errors in distance calculations to a certain extent4.

Nevertheless, binary quantization has obvious limitations in terms of precision, showing a substantial gap compared to LVQ. Indexes built directly using RaBitQ encoding exhibit poor query performance.

Therefore, the HnswRabitq scheme implemented in Infinity involves first constructing an original HNSW index for the dataset and then converting it into an HnswRabitq index through the compress_to_rabitq parameter in the optimize method.

During the query process, the system initially uses quantized vectors for preliminary retrieval and then re-ranks the ef candidate results specified by the user using the original vectors.

## Create index table_obj.create_index( "hnsw_index", index.IndexInfo( "embedding", index.IndexType.Hnsw, { "m": 16, "ef_construction": 200, "metric": "l2" }, ) ) ## Construct RaBitQ coding table_obj.optimize("hnsw_index", {"compress_to_rabitq": "true"}) ## Vector retrieval query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

Compared to LVQ, RaBitQ can further reduce the memory footprint of encoded vectors by nearly 70%. On some datasets, the query efficiency of HnswRabitq even surpasses that of HnswLvq due to the higher efficiency of distance calculations after binary quantization.

However, it should be noted that on certain datasets (such as sift1M), the quantization process may lead to significant precision loss, making such datasets unsuitable for using HnswRabitq.

In summary, if a user's dataset is not sensitive to quantization errors, adopting the HnswRabitq index can significantly reduce memory overhead while still maintaining relatively good query performance.

In such scenarios, it is recommended to prioritize the use of the HnswRabitq index. Users can replicate the aforementioned experiments by setting the benchmark parameter build_type=crabitq.

Solution 3: LSG Graph Construction Strategy

LSG (Local Scaling Graph) is an improved graph construction strategy based on graph indexing algorithms (such as HNSW, DiskANN, etc.) 5.

This strategy scales the distance (e.g., L2 distance, inner product distance, etc.) between any two vectors by statistically analyzing the local information—neighborhood radius—of each vector in the dataset. The scaled distance is referred to as the LS distance.

During the graph indexing construction process, LSG uniformly replaces the original distance metric with the LS distance, effectively performing a "local scaling" of the original metric space. Through theoretical proofs and experiments, the paper demonstrates that constructing a graph index in this scaled space can achieve superior query performance in the original space.

LSG optimizes the HNSW index in multiple ways. When the accuracy requirement is relatively lenient (< 99%), LSG exhibits higher QPS (Queries Per Second) compared to the original HNSW index.

In high-precision scenarios (> 99%), LSG enhances the quality of the graph index, enabling HNSW to surpass its original accuracy limit and achieve retrieval accuracy that is difficult for the original HNSW index to attain. These improvements translate into faster response times and more precise query results for users in real-world applications of RAGFlow.

In Infinity, LSG is provided as an optional parameter for HNSW. Users can enable this graph construction strategy by setting build_type=lsg, and we refer to the corresponding index as HnswLsg.

## Create index table_obj.create_index( "hnsw_index", index.IndexInfo( "embedding", index.IndexType.Hnsw, { "m": 16, "ef_construction": 200, "metric": "l2", "build_type": "lsg" }, ) ) ## Vector retrieval query_builder.match_dense('embedding', [1.0, 2.0, 3.0], 'float', 'l2', 10, {'ef': 200})

LSG essentially alters the metric space during the index construction process. Therefore, it can not only be applied to the original HNSW but also be combined with quantization methods (such as LVQ or RaBitQ) to form variant indexes like HnswLvqLsg or HnswRabitqLsg. The usage of the user interface remains consistent with that of HnswLvq and HnswRabitq.

LSG can enhance the performance of the vast majority of graph indexes and datasets, but at the cost of additional computation of local information—neighborhood radius—during the graph construction phase, which thus increases the construction time to a certain extent. For example, on the sift1M dataset, the construction time of HnswLsg is approximately 1.2 times that of the original HNSW.

In summary, if users are not sensitive to index construction time, they can confidently enable the LSG option, as it can steadily improve query performance. Users can replicate the aforementioned experiments by setting the benchmark parameter to build_type=[lsg/lvq_lsg/crabitq_lsg].

Index Performance Evaluation

To evaluate the performance of various indexes in Infinity, we selected three representative datasets as benchmarks, including the widely used sift and gist datasets in vector index evaluations.

Given that Infinity is frequently used in conjunction with RAGFlow in current scenarios, the retrieval effectiveness on RAG-type datasets is particularly crucial for users assessing index performance.

Therefore, we also incorporated the msmarco dataset. This dataset was generated by encoding the TREC-RAG 2024 corpus using the Cohere Embed English v3 model, comprising embedded vectors for 113.5 million text passages, as well as embedded vectors corresponding to 1,677 query instructions from TREC-Deep Learning 2021-2023.

From the test results of each dataset, it can be observed that in most cases, HnswRabitqLsg achieves the best overall performance. For instance, on the msmarco dataset in RAG scenarios, RaBitQ achieves a 90% reduction in memory usage while delivering query performance that is 5 times that of the original HNSW at a 99% recall rate.

Based on the above experimental results, we offer the following practical recommendations for Infinity users:

The original HNSW can attain a higher accuracy ceiling compared to HnswLvq and HnswRabitq. If users have extremely high accuracy requirements, this strategy should be prioritized.

Within the allowable accuracy range, HnswLvq can be confidently selected for most datasets. For datasets that are less susceptible to quantization effects, HnswRabitq is generally a better choice.

The LSG strategy enhances performance across all index variants. If users are not sensitive to index construction time, it is recommended to enable this option in all scenarios to improve query efficiency. Additionally, due to its algorithmic characteristics, LSG can significantly raise the accuracy ceiling. Therefore, if the usage scenario demands extremely high accuracy (>99.9%), enabling LSG is strongly recommended to optimize index performance.

Infinity continues to iterate and improve. We welcome ongoing attention and valuable feedback and suggestions from everyone.

RAGFlow 0.21.0 - Ingestion Pipeline, Long-Context RAG, and Admin CLI

Wed, 15 Oct 2025 00:00:00 GMT

RAGFlow 0.21.0 officially released

This release shifts focus from enhancing online Agent capabilities to strengthening the data foundation, prioritising usability and dialogue quality from the ground up. Directly addressing common RAG pain points—from data preparation to long-document understanding—version 0.21.0 brings crucial upgrades: a flexible, orchestratable Ingestion Pipeline, long-context RAG to close semantic gaps in complex files, and a new admin CLI for smoother operations. Taken together, these elements establish RAGFlow’s refreshed data-pipeline core, providing a more solid foundation for building robust and effective RAG applications.

Orchestratable Ingestion Pipeline

If earlier Agents primarily tackled the orchestration of online data—as seen in Workflow and Agentic Workflow—the Ingestion Pipeline mirrors this capability by applying the same technical architecture to orchestrate offline data ingestion. Its introduction enables users to construct highly customized RAG data pipelines within a unified framework. This not only streamlines bespoke development but also more fully embodies the "Flow" in RAGFlow.

A typical RAG ingestion process involves key stages such as document parsing, text chunking, vectorization, and index building. When RAGFlow first launched in April 2024, it already incorporated an advanced toolchain, including the DeepDoc-based parsing engine and a templated chunking mechanism. These state-of-the-art solutions were foundational to its early adoption.

However, with rapid industry evolution and deeper practical application, we have observed new trends and demands:

The rise of Vision Language Models (VLMs): Increasingly mature VLMs have driven a wave of fine-tuned document parsing models. These offer significantly improved accuracy for unstructured documents with complex layouts or mixed text and images.

Demand for flexible chunking: Users now seek more customized chunking strategies. Faced with diverse knowledge-base scenarios, RAGFlow's original built-in chunking templates have proved insufficient for covering all niche cases, which can impact the accuracy of final Q&A outcomes.

To this end, RAGFlow 0.21.0 formally introduces the Ingestion Pipeline, featuring core capabilities including:

Orchestratable Data Ingestion: Building on the underlying Agent framework, users can create varied data ingestion pipelines. Each pipeline may apply different strategies to connect a data source to the final index, turning the previous built-in data-writing process into a user-customizable workflow. This provides more flexible ingestion aligned with specific business logic.

Decoupling of Upload and Cleansing: The architecture separates data upload from cleansing, establishing standard interfaces for future batch data sources and a solid foundation for expanding data preprocessing workflows.

Refactored Parser: The Parser component has been redesigned for extensibility, laying groundwork for integrating advanced document-parsing models beyond DeepDoc.

Customizable Chunking Interface: By decoupling the chunking step, users can plug in custom chunkers to better suit the segmentation needs of different knowledge structures.

Optimized Efficiency for Complex RAG: The execution of IO/compute-intensive tasks, such as GraphRAG and RAPTOR, has been overhauled. In the pre-pipeline architecture, processing each new document triggered a full compute cycle, resulting in slow performance. The new pipeline enables batch execution, significantly improving data throughput and overall efficiency.

If ETL/ELT represents the standard pipeline for processing structured data in the modern data stack—with tools like dbt and Fivetran providing unified and flexible data integration solutions for data warehouses and data lakes—then RAGFlow's Ingestion Pipeline is positioned to become the equivalent infrastructure for unstructured data. The following diagram illustrates this architectural analogy:

Specifically, while the Extract phase in ETL/ELT is responsible for pulling data from diverse sources, the RAGFlow Ingestion Pipeline augments this with a dedicated Parsing stage to extract information from unstructured data. This stage integrates multiple parsing models, led by DeepDoc, to convert multimodal documents (for example, text and images) into a unimodal representation suitable for processing.

In the Transform phase, where traditional ETL/ELT focuses on data cleansing and business logic, RAGFlow instead constructs a series of LLM-centric Agent components. These are optimized to address semantic gaps in retrieval, with a core mission that can be summarized as: to enhance recall and ranking accuracy.

For data loading, ETL/ELT writes results to a data warehouse or data lake, while RAGFlow uses an Indexer component to build the processed content into a retrieval-optimised index format. This reflects the RAG engine’s hybrid retrieval architecture, which must support full-text, vector, and future tensor-based retrieval to ensure optimal recall.

Thus, the modern data stack serves business analytics for structured data, whereas a RAG engine with an Ingestion Pipeline specializes in the intelligent retrieval of unstructured data—providing high-quality context for LLMs. Each occupies an equivalent ecological niche in its domain.

Regarding processing structured data, this is not the RAG engine’s core duty. It is handled by a Context Layer built atop the engine. This layer leverages the MCP (Model Context Protocol)—described as “TCP/IP for the AI era” —and accompanying Context Engineering to automate the population of all context types. This is a key focus area for RAGFlow’s next development phase.

Below is a preliminary look at the Ingestion Pipeline in v0.21.0; a more detailed guide will follow. We have introduced components for parsing, chunking, and other unstructured data processing tasks into the Agent Canvas, enabling users to freely orchestrate their parsing workflows.

Orchestrating an Ingestion Pipeline automates the process of parsing files and chunking them by length. It then leverages a large language model to generate summaries, keywords, questions, and even metadata. Previously, this metadata had to be entered manually. Now, a single configuration dramatically reduces maintenance overhead.

Furthermore, the pipeline process is fully observable, recording and displaying complete processing logs for each file.

The implementation of the Ingestion Pipeline in version 0.21.0 is a foundational step. In the next release, we plan to significantly enhance it by:

Adding support for more data sources.

Providing a wider selection of Parsers.

Introducing more flexible Transformer components to facilitate orchestration of a richer set of semantic-enhancement templates.

Long-context RAG

As we enter 2025, Retrieval-Augmented Generation (RAG) faces notable challenges driven by two main factors.

Fundamental limitations of traditional RAG

Traditional RAG architectures often fail to guarantee strong dialogue performance because they rely on a retrieval mechanism built around text chunks as the primary unit. This makes them highly sensitive to chunk quality and can yield degraded results due to insufficient context. For example:

If a coherent semantic unit is split across chunks, retrieval can be incomplete.

If a chunk lacks global context, the information presented to the LLM is weakened.

While strategies such as automatically detecting section headers and attaching them to chunks can help with global semantics, they are constrained by header-identification accuracy and the header’s own completeness.

Cost-efficiency concerns with advanced pre-processing techniques

Modern pre-processing methods—GraphRAG, RAPTOR, and Context Retrieval—aim to inject additional semantic information into raw data to boost search hit rates and accuracy for complex queries. They, however, share issues of high cost and unpredictable effectiveness.

GraphRAG: This approach often consumes many times more tokens than the original text, and the automatically generated knowledge graphs are frequently unsatisfactory. Its effectiveness in complex multi-hop reasoning is limited by uncontrollable reasoning paths. As a supplementary retrieval outside the original chunks, the knowledge graph also loses some granular context from the source.

RAPTOR: This technique produces clustered summaries that are recalled as independent chunks but naturally lack the detail of the source text, reintroducing the problem of insufficient context.

Context Retrieval: This method enriches original chunks with extra semantics such as keywords or potential questions. It presents a clear trade-off:

The more effective option queries the LLM multiple times per chunk, using both full text and the current chunk for context, improving performance but driving token costs several times higher than the original text.

The cheaper option generates semantic information based only on the current chunk, saving costs but providing limited global context and modest performance gains. The last few years have seen the emergence of new RAG schemes.

Complete abandonment of retrieval: some approaches have the LLM read documents directly, splitting them into chunks according to the context window and performing multi-stage searches. First, the LLM decides which global document is relevant, then which chunks, and finally loads those chunks to answer. While this avoids recall inaccuracies, it harms response latency, concurrency, and large-scale data handling, making practical deployment difficult.

Abandoning embedding or indexing in favour of tools like grep: this evolves RAG into Agentic RAG. As applications grow more complex and user queries diversify, combining RAG with agents is increasingly inevitable, since only LLMs can translate raw inquiries into structured retrieval commands. In RAGFlow, this capability has long been realized. Abandoning indexing to use grep is a compromise for simplifying agent development in personal or small-scale contexts; in enterprise settings, a powerful retrieval engine remains essential.

Long-Context RAG: introduced in version 0.21.0 as part of the same family as GraphRAG, RAPTOR and Context Retrieval, this approach uses LLMs to enrich raw text semantics to boost recall while retaining indexing and search. Retrieval remains central. Long-context RAG mirrors how people consult information: identify relevant chapters via the table of contents, then locate exact pages for detail. During indexing, the LLM extracts and attaches chapter information to each chunk to provide global context; during retrieval, it finds matching chunks and uses the table-of-contents structure to fill in gaps from chunk fragmentation.

Current experience and future direction: users can try Long-Context RAG via the “TOC extraction” (Table of Contents) feature, though it is in beta. The next release will add an Ingestion Pipeline. A key path to improving RAG lies in using LLMs to enrich content semantics without discarding retrieval altogether. Consequently, a flexible pipeline that lets users assemble LLM-based content-transformation components is an important direction for enhancing RAG retrieval quality.

Backend management CLI

RAGFlow’s progression has shifted from core module development to strengthening administrative and operational capabilities.

In earlier versions, while parsing and retrieval-augmented generation improved, system administration lagged. Administrators could not modify passwords or delete accounts, complicating deployment and maintenance.

With RAGFlow 0.21.0, fundamental system management is markedly improved. A new command-line administration tool provides a central, convenient interface for administrators. Core capabilities include:

Service lifecycle management: monitoring built-in RAGFlow services for greater operational flexibility.

Comprehensive user management:

Create new registered users.

Directly modify login passwords.

Delete user accounts.

Enable or disable accounts.

View details of all registered users.

Resource overview: listing knowledge bases and Agents created under registered users for system-wide monitoring.

This upgrade underlines RAGFlow’s commitment to robust functionality and foundational administrative strength essential for enterprise use. Looking ahead, the team plans an enterprise-grade web administration panel and accompanying user interface to streamline management, boost efficiency, and enhance the end-user experience, supporting greater maturity and stability.

Finale

RAGFlow 0.21.0 marks a significant milestone, building on prior progress and outlining future developments. It introduces the first integration of Retrieval (RAG) with orchestration (Flow), forming an intelligent engine to support the LLM context layer, underpinned by unstructured data ELT and a robust RAG capability set.

From the user-empowered Ingestion Pipeline to long-context RAG that mitigates semantic fragmentation, and the management backend that ensures reliable operation, every new feature is designed to make the RAG system smarter, more flexible, and enterprise-ready. This is not merely a feature tally but an architectural evolution, establishing a solid foundation for future growth.

Our ongoing focus remains the LLM context layer: building a powerful, reliable data foundation for LLMs and effectively serving all Agents. This remains RAGFlow’s core aim.

We invite you to continue following and starring our project as we grow together.

GitHub: https://github.com/infiniflow/ragflow

Tutorial - Build an E-Commerce Customer Support Agent Using RAGFlow

Fri, 12 Sep 2025 00:00:00 GMT

Currently, e-commerce retail platforms extensively use intelligent customer service systems to manage a wide range of user enquiries. However, traditional intelligent customer service often struggles to meet users’ increasingly complex and varied needs. For example, customers may require detailed comparisons of functionalities between different product models before making a purchase; they might be unable to use certain features due to losing the instruction manual; or, in the case of home products, they may need to arrange an on-site installation appointment through customer service.

To address these challenges, we have identified several common demand scenarios, including queries about functional differences between product models, requests for usage assistance, and scheduling of on-site installation services. Building on the recently launched Agent framework of RAGFlow, this blog presents an approach for the automatic identification and branch-specific handling of user enquiries, achieved by integrating workflow orchestration with large language models.

The workflow is orchestrated as follows:

The following sections offer a detailed explanation of the implementation process for this solution.

1. Prepare datasets

1.1 Create datasets

You can download the sample datasets from Hugging Face Datasets.

Create the "Product Information" and "User Guide" knowledge bases and upload the relevant dataset documents.

1.2 Parse documents

For documents in the 'Product Information' and 'User Guide' knowledge bases, we choose to use Manual chunking.

Product manuals are often richly illustrated with a combination of text and images, containing extensive information and complex structures. Relying solely on text length for segmentation risks compromising the integrity of the content. RAGFlow assumes such documents follow a hierarchical structure and therefore uses the "smallest heading" as the basic unit of segmentation, ensuring each section of text and its accompanying graphics remain intact within a single chunk. A preview of the user manual following segmentation is shown below:

2. Build workflow

2.1 Create an app

Upon successful creation, the system will automatically generate a Begin component on the canvas.

In the Begin component, the opening greeting message for customer service can be configured, for example:

Hi! I'm your assistant.

2.2 Add a Categorize component

The Categorize component uses a Large Language Model (LLM) for intent recognition. It classifies user inputs and routes them to the appropriate processing workflows based on the category’s name, description, and provided examples.

2.3 Build a product feature comparison workflow

The Retrieval component connects to the "Product Information" knowledge base to fetch content relevant to the user’s query, which is then passed to the Agent component to generate a response.

Add a Retrieval component named "Feature Comparison Knowledge Base" and link it to the "Product Information" knowledge base.

Add an Agent component after the Retrieval component, name it "Feature Comparison Agent," and configure the System Prompt as follows:

## Role You are a product specification comparison assistant. ## Goal Help the user compare two or more products based on their features and specifications. Provide clear, accurate, and concise comparisons to assist the user in making an informed decision. --- ## Instructions - Start by confirming the product models or options the user wants to compare. - If the user has not specified the models, politely ask for them. - Present the comparison in a structured way (e.g., bullet points or a table format if supported). - Highlight key differences such as size, capacity, performance, energy efficiency, and price if available. - Maintain a neutral and professional tone without suggesting unnecessary upselling. ---

Configure User prompt

User's query is /(Begin Input) sys.query Schema is /(Feature Comparison Knowledge Base) formalized_content

After configuring the Agent component, the result is as follows:

2.4 Build a product user guide workflow

The Retrieval component queries the "User Guide" knowledge base for content relevant to the user’s question, then passes the results to the Agent component to formulate a response.

Add a Retrieval component named "Usage Guide Knowledge Base" and link it to the "User Guide" knowledge base.

Add an Agent component after the Retrieval component, name it "Usage Guide Agent," and configure its System Prompt as follows:

## Role You are a product usage guide assistant. ## Goal Provide clear, step-by-step instructions to help the user set up, operate, and maintain their product. Answer questions about functions, settings, and troubleshooting. --- ## Instructions - If the user asks about setup, provide easy-to-follow installation or configuration steps. - If the user asks about a feature, explain its purpose and how to activate it. - For troubleshooting, suggest common solutions first, then guide through advanced checks if needed. - Keep the response simple, clear, and actionable for a non-technical user. ---

Write user prompt

User's query is /(Begin Input) sys.query Schema is / (Usage Guide Knowledge Base) formalized_content

After configuring the Agent component, the result is as follows:

2.5 Build an installation booking assistant

The Agent engages in a multi-turn dialogue with the user to collect three key pieces of information: contact number, installation time, and installation address. Create an Agent component named "Installation Booking Agent" and configure its System Prompt as follows:

# Role You are an Installation Booking Assistant. ## Goal Collect the following three pieces of information from the user 1. Contact Number 2. Preferred Installation Time 3. Installation Address Once all three are collected, confirm the information and inform the user that a technician will contact them later by phone. ## Instructions 1. **Check if all three details** (Contact Number, Preferred Installation Time, Installation Address) have been provided. 2. **If some details are missing**, acknowledge the ones provided and only ask for the missing information. 3. Do **not repeat** the full request once some details are already known. 4. Once all three details are collected, summarize and confirm them with the user.

Write user prompt

User's query is /(Begin Input) sys.query

After configuring the Agent component, the result is as follows:

If user information needs to be registered, an HTTP Request component can be connected after the Agent component to transmit the data to platforms such as Google Sheets or Notion. Developers may implement this according to their specific requirements; this blog article does not cover implementation details.

2.6 Add a reply message component

For these three workflows, a single Message component is used to receive the output from the Agent components, which then displays the processed results to the user.

2.7 Save and test

Click Save → Run → View Execution Result. When inquiring about product models and features, the system correctly returns a comparison:

When asked about usage instructions, the system provides accurate guidance:

When scheduling an installation, the system collects and confirms all necessary information:

Summary

This use case can also be implemented using an Agent-based workflow, which offers the advantage of flexibly handling complex problems. However, since Agents actively engage in planning and reflection, they often significantly increase response times, leading to a diminished customer experience. As such, this approach is not well suited to scenarios like e-commerce after-sales customer service, where high responsiveness and relatively straightforward tasks are required. For applications involving complex issues, we have previously shared the Deep Research multi-agent framework. Related templates are available in our template library.

The customer service workflow presented in this article is designed for e-commerce, yet this domain offers many more scenarios suitable for workflow automation—such as user review analysis and personalized email campaigns—which have not been covered here. By following the practical guidelines provided, you can also easily adapt this approach to other customer service contexts. We encourage you to build such applications using RAGFlow. Reinventing customer service with large language models moves support beyond “mechanical responses,” elevating capabilities from mere “retrieval and matching” to “cognitive reasoning.” Through deep understanding and real-time knowledge generation, it delivers an unprecedented experience that truly “understands human language,” thereby redefining the upper limits of intelligent service and transforming support into a core value engine for businesses.

Tutorial - Building a SQL Assistant Workflow

Tue, 12 Aug 2025 00:00:00 GMT

Workflow overview

This tutorial shows how to create a SQL Assistant workflow that enables natural language queries for SQL databases. Non-technical users like marketers and product managers can use this tool to query business data independently, reducing the need for data analysts. It can also serve as a teaching aid for SQL in schools and coding courses. The finished workflow operates as follows:

The database schema, field descriptions, and SQL examples are stored as knowledge bases in RAGFlow. Upon user queries, the system retrieves relevant information from these sources and passes it to an Agent, which generates SQL statements. These statements are then executed by a SQL Executor component to return the query results.

Procedure

1. Create three knowledge bases

1.1 Prepare dataset files

You can download the sample datasets from Hugging Face Datasets.

The following are the predefined example files:

Schema.txt

CREATE TABLE `users` ( `id` INT NOT NULL AUTO_INCREMENT, `username` VARCHAR(50) NOT NULL, `password` VARCHAR(50) NOT NULL, `email` VARCHAR(100), `mobile` VARCHAR(20), `create_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (`id`), UNIQUE KEY `uk_username` (`username`), UNIQUE KEY `uk_email` (`email`), UNIQUE KEY `uk_mobile` (`mobile`) ); ...

Note: When defining schema fields, avoid special characters such as underscores, as they can cause errors in SQL statements generated by the LLM. 2. Question to SQL.csv

What are the names of all the Cities in Canada SELECT geo_name, id FROM data_commons_public_data.cybersyn.geo_index WHERE iso_name ilike '%can% What is average Fertility Rate measure of Canada in 2002 ? SELECT variable_name, avg(value) as average_fertility_rate FROM data_commons_public_data.cybersyn.timeseries WHERE variable_name = 'Fertility Rate' and geo_id = 'country/CAN' and date >= '2002-01-01' and date < '2003-01-01' GROUP BY 1; What 5 countries have the highest life expectancy ? SELECT geo_name, value FROM data_commons_public_data.cybersyn.timeseries join data_commons_public_data.cybersyn.geo_index ON timeseries.geo_id = geo_index.id WHERE variable_name = 'Life Expectancy' and date = '2020-01-01' ORDER BY value desc limit 5; ...

Database Description EN.txt

### Users Table (users) The users table stores user information for the website or application. Below are the definitions of each column in this table: - `id`: INTEGER, an auto-incrementing field that uniquely identifies each user (primary key). It automatically increases with every new user added, guaranteeing a distinct ID for every user. - `username`: VARCHAR, stores the user’s login name; this value is typically the unique identifier used during authentication. - `password`: VARCHAR, holds the user’s password; for security, the value must be encrypted (hashed) before persistence. - `email`: VARCHAR, stores the user’s e-mail address; it can serve as an alternate login credential and is used for notifications or password-reset flows. - `mobile`: VARCHAR, stores the user’s mobile phone number; it can be used for login, receiving SMS notifications, or identity verification. - `create_time`: TIMESTAMP, records the timestamp when the user account was created; defaults to the current timestamp. - `update_time`: TIMESTAMP, records the timestamp of the last update to the user’s information; automatically refreshed to the current timestamp on every update. ...

1.2 Create knowledge bases in RAGFlow

Schema knowledge base

Create a knowledge base titled "Schema" and upload the file Schema.txt.

Tables in the database vary in length, each ending with a semicolon (;).

CREATE TABLE `users` ( `id` INT NOT NULL AUTO_INCREMENT, `username` VARCHAR(50) NOT NULL, `password` VARCHAR(50) NOT NULL, ... UNIQUE KEY `uk_mobile` (`mobile`) ); CREATE TABLE `products` ( `id` INT NOT NULL AUTO_INCREMENT, `name` VARCHAR(100) NOT NULL, `description` TEXT, `price` DECIMAL(10, 2) NOT NULL, `stock` INT NOT NULL, ... FOREIGN KEY (`merchant_id`) REFERENCES `merchants` (`id`) ); CREATE TABLE `merchants` ( `id` INT NOT NULL AUTO_INCREMENT, `name` VARCHAR(100) NOT NULL, `description` TEXT, `email` VARCHAR(100), ... UNIQUE KEY `uk_mobile` (`mobile`) );

To isolate each table as a standalone chunk without overlapping content, configure the knowledge base parameters as follows:

Chunking Method: General

Chunk Size: 2 tokens (minimum size for isolation)

Delimiter: Semicolon (;) RAGFlow will then parse and generate chunks according to this workflow:

Below is a preview of the parsed results from Schema.txt:

We now validate the retrieved results through retrieval testing:

Question to SQL knowledge base

Create a new knowledge base titled "Question to SQL" and upload the file "Question to SQL.csv".

Set the chunking method to Q&A, then parse Question_to_SQL.csv to preview the results.

We now validate the retrieved results through retrieval testing:

Database Description knowledge base Create a new knowledge base titled "Database Description" and upload the file "Database_Description_EN.txt".

Configuration (Same as Schema Knowledge Base):

Chunking Method: General

Chunk Size: 2 tokens (minimum size for isolation)

Delimiter: Semicolon ### Below is a preview of the parsed Database_Description_EN.txt following configuration.

We now validate the retrieved results through retrieval testing:

Note: The three knowledge bases are maintained and queried separately. The Agent component consolidates results from all sources before producing outputs."

2. Orchestrate the workflow

2.1 Create a workflow application

Once created successfully, the Begin component automatically appears on the canvas.

You can configure a welcome message in the Begin component. For example:

Hi! I'm your SQL assistant, what can I do for you?

2.2 Configure three Retrieval components

Add three parallel Retrieval components after the Begin component, named as follows:

Schema

Question to SQL

Database Description Configure each Retrieval component:

Query variable: sys.query

Knowledge base selection: Select the knowledge base whose name matches the current component's name.

2.3 Configure the Agent component

Add an Agent component named 'SQL Generator' after the Retrieval components, connecting all three to it.

Write System Prompt:

### ROLE You are a Text-to-SQL assistant. Given a relational database schema and a natural-language request, you must produce a **single, syntactically-correct MySQL query** that answers the request. Return **nothing except the SQL statement itself**—no code fences, no commentary, no explanations, no comments, no trailing semicolon if not required. ### EXAMPLES -- Example 1 User: List every product name and its unit price. SQL: SELECT name, unit_price FROM Products; -- Example 2 User: Show the names and emails of customers who placed orders in January 2025. SQL: SELECT DISTINCT c.name, c.email FROM Customers c JOIN Orders o ON o.customer_id = c.id WHERE o.order_date BETWEEN '2025-01-01' AND '2025-01-31'; -- Example 3 User: How many orders have a status of "Completed" for each month in 2024? SQL: SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, COUNT(*) AS completed_orders FROM Orders WHERE status = 'Completed' AND YEAR(order_date) = 2024 GROUP BY month ORDER BY month; -- Example 4 User: Which products generated at least \$10 000 in total revenue? SQL: SELECT p.id, p.name, SUM(oi.quantity * oi.unit_price) AS revenue FROM Products p JOIN OrderItems oi ON oi.product_id = p.id GROUP BY p.id, p.name HAVING revenue >= 10000 ORDER BY revenue DESC; ### OUTPUT GUIDELINES 1. Think through the schema and the request. 2. Write **only** the final MySQL query. 3. Do **not** wrap the query in back-ticks or markdown fences. 4. Do **not** add explanations, comments, or additional text—just the SQL.

Write User Prompt:

User's query: /(Begin Input) sys.query Schema: /(Schema) formalized_content Samples about question to SQL: /(Question to SQL) formalized_content Description about meanings of tables and files: /(Database Description) formalized_content

After inserting variables, the populated result appears as follows:

2.4 Configure the ExeSQL component

Append an ExeSQL component named "SQL Executor" after the SQL Generator.

Configure the database for the SQL Executor component, specifying that its Query input comes from the output of the SQL Generator.

2.5 Configure the Message component

Append a Message component to the SQL Executor.

Insert variables into the Messages field to enable the message component to display the output of the SQL Executor (formalized_content):

2.6 Save and test

Click Save → Run → Enter a natural language question → View execution results.

Finale

Finally, like current Copilot technologies, NL2SQL cannot achieve complete accuracy. For standardized processing of structured data, we recommend consolidating its operations to specific APIs, then encapsulating these APIs as MCPs (Managed Content Packages) for RAGFlow. We will demonstrate this approach in a forthcoming blog.

RAGFlow 0.20.0 - Multi-Agent Deep Research

Thu, 07 Aug 2025 00:00:00 GMT

Deep Research: The defining capability of the Agent era

The year 2025 is hailed as the dawn of Agent adoption. Among the many unlocked possibilities, Deep Research—and the applications built upon it—stands out as especially significant. Why is this? Using Agentic RAG and its reflection mechanism, Deep Research enables large language models to reason deeply with users’ proprietary data. This is key for Agents to tackle more advanced tasks. For example, whether helping with creative writing or aiding decision-making in different industries, these rely on Deep Research. Deep Research represents a natural step forward from RAG, improved by Agents. Unlike basic RAG, it explores data at a deeper level. Like RAG, it can work on its own or serve as the base for other industry-specific Agents. A Deep Research workflow typically follows this sequence:

Decomposition & Planning: The large language model breaks down the user’s query and devises a plan.

Multi-Source Retrieval: The query is sent to multiple data sources, including internal RAG and external web searches.

Reflection & Refinement: The model reviews the retrieved information, reflects on it, summarizes key points, and adjusts the plan as needed.

Iteration & Output: After several iterations, a data-specific chain of thought is formed to generate the final answer or report.

Built-in Deep Research was already implemented in RAGFlow v0.18.0. Although at that stage it primarily served as a demo to validate and explore deep reasoning’s potential in enterprise settings. Across the industry, Deep Research implementations generally fall into two categories:

No-Code Workflow Orchestration: Using a visual workflow interface with predefined workflows to implement Deep Research.

Dedicated Agent Libraries or Frameworks: Many current solutions follow this approach (see [Reference 1] for details).

RAGFlow 0.20.0, a unified RAG and Agent engine supporting both Workflow and Agentic modes with seamless orchestration on a single canvas, can implement Deep Research through either approach. However, employing Workflow for Deep Research is only functional, as its interface appears like this:

This gives rise to two main problems:

Overly Complex and Unintuitive Orchestration: If the basic Deep Research template is already this complicated, a full application would become a tangled web of workflows that are hard to maintain and expand.

Better Suited to Agentic Methods: Deep Research naturally relies on dynamic problem breakdown and decision-making, with control steps driven by algorithms. Workflow drag-and-drop interfaces only handle simple loops, lacking the flexibility needed for advanced control.

RAGFlow 0.20.0 offers a comprehensive Agent engine designed to help enterprises build production-ready Agents using no-code tools. As the key Agent template, Deep Research strikes a careful balance between simplicity and flexibility. Compared to other solutions, RAGFlow highlights these strengths:

Agentic Execution, No-Code Driven: A fully customisable application template—not just an SDK or runtime—that uses Agentic mechanisms while remaining easy to access through a no-code platform.

Human Intervention (Coming Soon): Although Deep Research depends on LLM-generated plans, which can feel like a black box, RAGFlow’s template will support manual oversight to introduce certainty—a vital feature for enterprise-grade Agents.

Business-Focused and Outcome-Oriented:

Developers can customise the Deep Research Agent’s structure and tools, such as configuring internal knowledge bases for enterprise data retrieval, to meet specific business needs. Full transparency in the Agent’s workflow—including its plans and execution results—allows timely optimisation based on clear, actionable insights.

Practical Guide to Setting Up Deep Research

We use a Multi-Agent architecture [Reference 2], carefully defining each Agent’s role and responsibilities through thorough Prompt Engineering [Reference 4] and task decomposition principles [Reference 6]:

Lead Agent: Coordinates the Deep Research Agent, handling task planning, reflection, and delegating to Subagents, while keeping track of all workflow progress.

Web Search Specialist Subagent: Acts as the information retrieval expert, querying search engines, assessing results, and returning URLs of the best-quality sources.

Deep Content Reader Subagent: Extracts and organises web content from the URLs provided by the Search Specialist, preparing refined material for report drafting.

Research Synthesizer Subagent: Generates professional, consultancy-grade deep-dive reports according to the Lead Agent’s instructions.

Model selection

Lead Agent: Prioritizes models with strong reasoning capabilities, such as DeepSeek-R1, Qwen-3, Kimi-2, ChatGPT-o3, Claude-4, or Gemini-2.5 Pro.

Subagents: Optimized for execution efficiency and quality, balancing reasoning speed with output reliability. Context window length is also a key criterion based on their specific roles.

Temperature setting

Given the fact-driven nature of this application, we set the temperature parameter to 0.1 across all models to ensure deterministic, grounded outputs.

Deep research lead agent

Model Selection: Qwen-Max

Excerpts from Core System Prompt:

The prompt directs the Deep Research Agent's workflow and task delegation to Subagents, greatly enhancing efficiency and flexibility compared to traditional workflow orchestration.

**Stage 1: URL Discovery** - Deploy Web Search Specialist to identify 5 premium sources - Ensure comprehensive coverage across authoritative domains - Validate search strategy matches research scope **Stage 2: Content Extraction** - Deploy Content Deep Reader to process 5 premium URLs - Focus on structured extraction with quality assessment - Ensure 80%+ extraction success rate **Stage 3: Strategic Report Generation** - Deploy Research Synthesizer with detailed strategic analysis instructions - Provide specific analysis framework and business focus requirements - Generate comprehensive McKinsey-style strategic report (~2000 words) - Ensure multi-source validation and C-suite ready insights

Dynamically create task execution plans and carry out BFS or DFS searches [Reference 3]. While traditional workflows struggle to orchestrate BFS/DFS logic, RAGFlow’s Agent achieves this effortlessly through prompt engineering.

... **Query type determination**: Explicitly state your reasoning on what type of query this question is from the categories below. ... **Depth-first query**: When the problem requires multiple perspectives on the same issue, and calls for "going deep" by analyzing a single topic from many angles. ... **Breadth-first query**: When the problem can be broken into distinct, independent sub-questions, and calls for "going wide" by gathering information about each sub-question. ... **Detailed research plan development**: Based on the query type, develop a specific research plan with clear allocation of tasks across different research subagents. Ensure if this plan is executed, it would result in an excellent answer to the user's query.

Web search specialist subagent

Model Selection: Qwen-Plus Excerpts from Core System Prompt Design:

Role Definition:

You are a Web Search Specialist working as part of a research team. Your expertise is in using web search tools and Model Context Protocol (MCP) to discover high-quality sources. **CRITICAL: YOU MUST USE WEB SEARCH TOOLS TO EXECUTE YOUR MISSION** Use web search tools (including MCP connections) to discover and evaluate premium sources for research. Your success depends entirely on your ability to execute web searches effectively using available search tools. **CRITICAL OUTPUT CONSTRAINT**: You MUST provide exactly 5 premium URLs - no more, no less. This prevents attention fragmentation in downstream analysis.

Design the search strategy:

1. **Plan**: Analyze the research task and design search strategy 2. **Search**: Execute web searches using search tools and MCP connections 3. **Evaluate**: Assess source quality, credibility, and relevance 4. **Prioritize**: Rank URLs by research value (High/Medium/Low) - **SELECT TOP 5 ONLY** 5. **Deliver**: Provide structured URL list with exactly 5 premium URLs for Content Deep Reader **MANDATORY**: Use web search tools for every search operation. Do NOT attempt to search without using the available search tools. **MANDATORY**: Output exactly 5 URLs to prevent attention dilution in Lead Agent processing.

Search Strategies and How to Use Tools Like Tavily

**MANDATORY TOOL USAGE**: All searches must be executed using web search tools and MCP connections. Never attempt to search without tools. **MANDATORY URL LIMIT**: Your final output must contain exactly 5 premium URLs to prevent Lead Agent attention fragmentation. - Use web search tools with 3-5 word queries for optimal results - Execute multiple search tool calls with different keyword combinations - Leverage MCP connections for specialized search capabilities - Balance broad vs specific searches based on search tool results - Diversify sources: academic (30%), official (25%), industry (25%), news (20%) - Execute parallel searches when possible using available search tools - Stop when diminishing returns occur (typically 8-12 tool calls) - **CRITICAL**: After searching, ruthlessly prioritize to select only the TOP 5 most valuable URLs **Search Tool Strategy Examples:** * **Broad exploration**: Use search tools → "AI finance regulation" → "financial AI compliance" → "automated trading rules" * **Specific targeting**: Use search tools → "SEC AI guidelines 2024" → "Basel III algorithmic trading" → "CFTC machine learning" * **Geographic variation**: Use search tools → "EU AI Act finance" → "UK AI financial services" → "Singapore fintech AI" * **Temporal focus**: Use search tools → "recent AI banking regulations" → "2024 financial AI updates" → "emerging AI compliance"

Deep content reader subagent

Model Selection: Moonshot-v1-128k Core System Prompt Design Excerpts:

Role Definition Framework

You are a Content Deep Reader working as part of a research team. Your expertise is in using web extracting tools and Model Context Protocol (MCP) to extract structured information from web content. **CRITICAL: YOU MUST USE WEB EXTRACTING TOOLS TO EXECUTE YOUR MISSION** Use web extracting tools (including MCP connections) to extract comprehensive, structured content from URLs for research synthesis. Your success depends entirely on your ability to execute web extractions effectively using available tools.

Agent Planning and Web Extraction Tool Utilization

1. **Receive**: Process `RESEARCH_URLS` (5 premium URLs with extraction guidance) 2. **Extract**: Use web extracting tools and MCP connections to get complete webpage content and full text 3. **Structure**: Parse key information using defined schema while preserving full context 4. **Validate**: Cross-check facts and assess credibility across sources 5. **Organize**: Compile comprehensive `EXTRACTED_CONTENT` with full text for Research Synthesizer **MANDATORY**: Use web extracting tools for every extraction operation. Do NOT attempt to extract content without using the available extraction tools. **TIMEOUT OPTIMIZATION**: Always check extraction tools for timeout parameters and set generous values: - **Single URL**: Set timeout=45-60 seconds - **Multiple URLs (batch)**: Set timeout=90-180 seconds - **Example**: `extract_tool(url="https://example.com", timeout=60)` for single URL - **Example**: `extract_tool(urls=["url1", "url2", "url3"], timeout=180)` for multiple URLs **MANDATORY TOOL USAGE**: All content extraction must be executed using web extracting tools and MCP connections. Never attempt to extract content without tools. - **Priority Order**: Process all 5 URLs based on extraction focus provided - **Target Volume**: 5 premium URLs (quality over quantity) - **Processing Method**: Extract complete webpage content using web extracting tools and MCP - **Content Priority**: Full text extraction first using extraction tools, then structured parsing - **Tool Budget**: 5-8 tool calls maximum for efficient processing using web extracting tools - **Quality Gates**: 80% extraction success rate for all sources using available tools

Research synthesizer subagent

Model Selection: Moonshot-v1-128k Special Note: The Subagent handling final report generation must use models with very long context windows. This is essential for processing extensive context and producing thorough reports. Models with limited context risk truncating information, resulting in shorter reports. Other potential model options include but are not limited to:

Qwen-Long (10M tokens)

Claude 4 Sonnet (200K tokens)

Gemini 2.5 Flash (1M tokens) Excerpts from Core Prompt Design:

Role Definition Design

You are a Research Synthesizer working as part of a research team. Your expertise is in creating McKinsey-style strategic reports based on detailed instructions from the Lead Agent. **YOUR ROLE IS THE FINAL STAGE**: You receive extracted content from websites AND detailed analysis instructions from Lead Agent to create executive-grade strategic reports. **CRITICAL: FOLLOW LEAD AGENT'S ANALYSIS FRAMEWORK**: Your report must strictly adhere to the `ANALYSIS_INSTRUCTIONS` provided by the Lead Agent, including analysis type, target audience, business focus, and deliverable style. **ABSOLUTELY FORBIDDEN**: - Never output raw URL lists or extraction summaries - Never output intermediate processing steps or data collection methods - Always output a complete strategic report in the specified format **FINAL STAGE**: Transform structured research outputs into strategic reports following Lead Agent's detailed instructions. **IMPORTANT**: You receive raw extraction data and intermediate content - your job is to TRANSFORM this into executive-grade strategic reports. Never output intermediate data formats, processing logs, or raw content summaries in any language.

Autonomous Task Execution

1. **Receive Instructions**: Process `ANALYSIS_INSTRUCTIONS` from Lead Agent for strategic framework 2. **Integrate Content**: Access `EXTRACTED_CONTENT` with FULL_TEXT from 5 premium sources - **TRANSFORM**: Convert raw extraction data into strategic insights (never output processing details) - **SYNTHESIZE**: Create executive-grade analysis from intermediate data 3. **Strategic Analysis**: Apply Lead Agent's analysis framework to extracted content 4. **Business Synthesis**: Generate strategic insights aligned with target audience and business focus 5. **Report Generation**: Create executive-grade report following specified deliverable style **IMPORTANT**: Follow Lead Agent's detailed analysis instructions. The report style, depth, and focus should match the provided framework.

Structure of the generated report

**Executive Summary** (400 words) - 5-6 core findings with strategic implications - Key data highlights and their meaning - Primary conclusions and recommended actions **Analysis** (1200 words) - Context & Drivers (300w): Market scale, growth factors, trends - Key Findings (300w): Primary discoveries and insights - Stakeholder Landscape (300w): Players, dynamics, relationships - Opportunities & Challenges (300w): Prospects, barriers, risks **Recommendations** (400 words) - 3-4 concrete, actionable recommendations - Implementation roadmap with priorities - Success factors and risk mitigation - Resource allocation guidance **Examples:** **Executive Summary Format:**

Key Finding 1: [FACT] 73% of major banks now use AI for fraud detection, representing 40% growth from 2023

Strategic Implication: AI adoption has reached critical mass in security applications

Recommendation: Financial institutions should prioritize AI compliance frameworks now

...

Upcoming versions

The current RAGFlow 0.20.0 release does not yet support human intervention in Deep Research execution, but this feature is planned for future updates. It is crucial for making Deep Research production-ready, as it adds certainty and improves accuracy [Reference 5] to what would otherwise be uncertain automated processes. Manual oversight will be vital for enterprise-grade Deep Research applications.

We warmly invite everyone to stay tuned and star RAGFlow at https://github.com/infiniflow/ragflow

Bibliography

Awesome Deep Research https://github.com/DavidZWZ/Awesome-Deep-Research

How we built our multi-agent research system https://www.anthropic.com/engineering/built-multi-agent-research-system

Anthropic Cookbook https://github.com/anthropics/anthropic-cookbook

State-Of-The-Art Prompting For AI Agents https://youtu.be/DL82mGde6wo?si=KQtOEiOkmKTpC_1E

From Language Models to Language Agents https://ysymyth.github.io/papers/from_language_models_to_language_agents.pdf

Agentic Design Patterns Part 5, Multi-Agent Collaboration https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/

Agentic Workflow - What's inside RAGFlow 0.20.0

Tue, 05 Aug 2025 00:00:00 GMT

1. From Workflow to Agentic Workflow

After a long wait, RAGFlow 0.20.0 has finally been released—a milestone update that completes the bigger picture of RAG/Agent. A year ago, RAGFlow introduced Agent functionality, but it only supported manually managed Workflows and lacked Agentic Workflow capabilities. By RAGFlow's definition, a true Agent requires both: Workflows for human-defined tasks and Agentic Workflows powered by LLM-driven automation. Anthropic’s 2024 article "Building Effective Agents" emphasized this distinction, noting that Workflows continue to be the primary way Agents are used. Now, in 2025, as LLMs become more advanced, Agentic Workflows are enabling more impressive use cases.

Ideally, fully LLM-driven Agentic Workflows are the ultimate goal for most Agent applications. However, due to current limitations of LLMs, they introduce unpredictability and a lack of control—issues that are especially problematic in enterprise settings. On the other hand, traditional Workflows take the opposite approach: using low-code platforms where every variable, condition, and loop is explicitly defined. This allows non-technical business users to effectively “program” based on their understanding of the logic. While this ensures predictability, it often leads to overly complex, tangled workflows that are hard to maintain and prone to misuse. More importantly, it makes it difficult to properly break down and manage tasks. Therefore, the long-term solution requires both Agentic and manual Workflows working together in harmony. Only this unified approach can truly meet the demands of enterprise-level Agents. With full Agent capabilities now in place, RAGFlow has become a more enterprise-ready, platform-level LLM engine. In the enterprise ecosystem, RAG occupies a role similar to traditional databases, while Agents serve as the specific applications—yet there are important differences. First, RAG focuses on leveraging unstructured data rather than structured datasets. Second, the interaction between RAG and Agents is much more frequent and intensive compared to typical application-database relationships because Agents need real-time, precise context to ensure their actions align closely with user intent. RAG plays a vital role in providing this context. For these reasons, completing the Agent capabilities is key to RAGFlow’s evolution.

Let’s take a look at the key features of RAGFlow 0.20.0.

2. Key Updates in RAGFlow 0.20.0

The key features of this release include:

Unified orchestration of both Agents and Workflows.

A complete refactor of the Agent, significantly improving its capabilities and usability, with support for Multi-Agent configurations, planning and reflection, and visual features.

Full MCP functionality, enabling MCP Server import, Agents to act as MCP Clients, and RAGFlow itself to function as an MCP Server.

Access to runtime logs for Agents.

Chat histories with Agents available via the management panel.

How does the updated Agent-building experience differ for developers?

Take the 0.19.1 customer service template as an example: previously, building this Workflow required seven types of components (Begin, Interact, Refine Question, Categorize, Knowledge Retrieval, Generate, and Message), with the longest chain for a category 4 question involving seven steps.

In the new version, building in pure Workflow mode requires only five types of components (Begin, Categorize, Knowledge Retrieval, Agent, and Message), with the workflow for a category 4 question (product related) reduced to five steps.

With Agentic mode, just three types of components are needed - the original workflow logic can now be handled entirely through prompt engineering.

Developers can inspect Agents' execution paths and verify their inputs/outputs.

Business users can view Agents' reasoning processes through either the embedded page or chat interface.

This comparison quantitatively demonstrates reduced complexity and improved efficiency. Further details follow below - we encourage you to try it yourself.

3. A unified orchestration engine for Agents

RAGFlow 0.20.0 introduces unified orchestration of both Workflow and Agentic Workflow. As mentioned earlier, these represent two extremes, but enterprise scenarios demand their collaboration. The platform now supports co-orchestration on one, inherently Multi-Agent canvas—users can designate uncertain inputs as Agentic Workflows and deterministic ones as Workflows. To align with common practice, Agentic Workflows are represented as separate Agent components on the canvas. This release redesigns the orchestration interface and component functions around this goal, while also improving usability issues from earlier Workflow versions. Key improvements include reducing core Components from 12 to 10, with the main changes as follows:

Begin component

It now supports a task-based Agent mode that does not require a conversation to be triggered. Developers can build both conversational and task-based Agents on the same canvas.

Retrieval component

Retrieval can function as a component within a workflow and also be used as a Tool by an Agent component, enabling the Agent to determine when and how to invoke retrieval queries.

Agent component

An Agent capable of independently replacing your work needs to have the following abilities:

Autonomous reasoning [1], with the capacity to reflect and adjust based on environmental feedback

The ability to use tools to complete tasks [3]

With the new Agent component, developers only need to configure the Prompt and Tools to quickly build an Agent, as RAGFlow has already handled the underlying technical implementation.

Besides the single-agent mode, the new agent component also supports adding subagents that can be called during runtime.

You can freely add agents to build your own unlimited agent team.

Add and bulk import your already deployed MCP Servers.

Tools from added MCP Servers can be used within the agent.

Await Response component

The original Interact component has been refactored into an await-response component, allowing developers to actively pause the process to initiate preset conversations and collect key information via forms.

Switch component

Improved the usability of the switch component.

Iteration component

The input parameter type for the Iteration component has been changed to an array; during iteration, both the index and value are accessible to internal components, whose outputs can be formed into an array and passed downstream.

Reply Message component

Messages can now be replied to directly via Reply Message, eliminating the need for the Interact component.

Text Processing component

Developers can easily concatenate strings through Text Processing.

You can also split strings into arrays.

Summary

RAGFlow 0.20 enables simultaneous orchestration of both Agentic and Workflow modes, with built-in Multi-Agent support allowing multiple agents to coexist on a single canvas. For open-ended queries like “Why has company performance declined?” where the approach isn’t predetermined, users can define the process in Agentic style. In contrast, scenarios with clear, fixed steps use Workflow style. This lets developers build Agent applications with minimal configuration. The seamless integration of Agentic and Workflow approaches is the best practice for deploying enterprise-grade intelligent agents.

4. Application Ecosystem and Future Development

With a complete, unified no-code Agent framework in place, RAGFlow naturally supports a wide range of scenario-specific Agent applications—this is a major focus for its long-term development. In other words, a vast number of Agent templates will be built on top of RAGFlow, backed by our new ecosystem co-creation initiative.

RAGFlow 0.20.0 introduces Deep Research as a built-in template—an exceptional Agent that can serve both as a standalone application and as a foundation for building other intelligent agents. We will explain how to build the Deep Research template in detail in a forthcoming article.

The example below shows that on the RAGFlow platform, Deep Research can be created using either Agentic Workflow or traditional Workflow methods, with the former offering far greater flexibility and simplicity. Deep Research built via a traditional Workflow:

Deep Research built with an Agentic Workflow shows significantly reduced complexity compared to the Workflow implementation above:

RAGFlow’s ecosystem initiative aims to provide enterprise-ready Agent templates embedded with industry know-how. Developers can easily customize these templates to fit their own business needs. Among these, Deep Research is the most important, as it represents the most common form of Agentic RAG and represents the essential path for Agents to unlock deeper value from enterprise data. Built on RAGFlow’s built-in Deep Research template, developers can quickly adapt it into specialized assistants—such as legal or medical advisors—significantly narrowing the gap between business systems and underlying infrastructure. This ecosystem approach is made possible by the close collaboration between RAG and Agents. The 0.20.0 release marks a major step in integrating RAG and Agent capabilities within RAGFlow, with rapid updates planned ahead, including memory management and manual adjustment of Agent Plans. While unifying Workflow and Agentic Workflow greatly lowers the barriers to building enterprise Agents, and the ecosystem expands their application scope, the data foundation that merges structured and unstructured data around RAG remains the cornerstone of Agent capabilities. This approach is now known as “context engineering,” with traditional RAG representing its 1.0 version. Our future articles will explore these advancements in greater detail.

We welcome your continued support - star RAGFlow on GitHub: https://github.com/infiniflow/ragflow

Bibliography

ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629

Reflexion: Language Agents with Verbal Reinforcement Learning https://arxiv.org/abs/2303.11366

A Practical Guide to Building Agents https://cdn.openai.com/business-guides-and-resources/a-practical-guide-to-building-agents.pdf

RAG at the Crossroads - Mid-2025 Reflections on AI’s Incremental Evolution

Wed, 02 Jul 2025 00:00:00 GMT

Six months have passed since our last year-end review. As the initial wave of excitement sparked by DeepSeek earlier this year begins to wane, AI seems to have entered a phase of stagnation. This pattern is evident in Retrieval-Augmented Generation (RAG) as well: although academic papers on RAG continue to be plentiful, significant breakthroughs have been few and far between in recent months. Likewise, recent iterations of RAGFlow have focused on incremental improvements rather than major feature releases. Is this the start of future leaps forward, or the beginning of a period of steady, incremental growth? A mid-year assessment is therefore both timely and necessary.

Since it began, RAG has been the focus of ongoing debate — from the 2023 “fine-tuning debates” to the 2024 “long-context disputes.” However, since 2025, discourse around RAG has diminished as attention has shifted towards Agent systems. This shift has given rise to claims that “Agents eliminate the need for RAG.” As practitioners in the field, we recognize such assertions as a market-driven stunt, though we also acknowledge their potential to mislead non-specialists. Some have even begun rebranding RAG as “Agentic RAG,” accompanied by exaggerated market forecasts predicting its dominance over conventional RAG [Ref 1]. It is this growing confusion that prompts our review.

Notably, the earliest references to “Agentic RAG” appeared around the time RAGFlow launched its “Agent” feature a year ago. As a result, RAGFlow is frequently cited in academic literature as a benchmark for comparisons involving Agentic RAG. Our analysis therefore begins with an examination of both RAG and Agents.

Definitional Clarification: We define “Agent” as encompassing both Workflows and intelligent agents. In RAGFlow’s current version (v0.19), the year-old “Agent” label remains limited to Workflow functionality and does not yet possess full agentic capabilities. Unlike Anthropic’s proposal to separate these concepts [Ref 2], RAGFlow adheres to an integrated design philosophy, wherein Workflows and Agents are intrinsically unified.

Reflection-Driven: The Key to Agents Empowering RAG Reasoning

Through manual or model-driven reflective loops, Agents tackle RAG reasoning challenges and enable intelligent breakthroughs; the two are inseparable.

Throughout our events from late 2024 to early 2025, we consistently highlighted three key features for RAG in 2025: reasoning, memory, and multimodality. The first two are inherently linked to Agents. In our initial blog this year, we offered a comprehensive overview of implementation of reasoning. A recent survey [Ref 3] further synthesizes reasoning and RAG, and we have adapted and condensed its framework as shown below:

It is evident that the author has incorporated last year’s work into their reasoning framework. Implementations such as Self-RAG, RAPTOR, and Adaptive-RAG in RAGFlow from a year ago are classified as “predefined reasoning” in the source material. We propose instead defining these as Workflow-Based Approaches. Accordingly, the “Agentic RAG” described in our earlier publications employs workflows—manually defined interactions between RAG and Agents—to implement reflection (a core Agent capability) via components like Iteration and Switch. This approach addresses reasoning challenges such as ambiguous intents and long-context comprehension.

By contrast, Agentic-Based Approaches use models to autonomously drive reflection. Examples include Search O1, various open-source DeepResearch implementations, and Search R1. These divide further into two categories:

Prompt-Driven Reflection (above the arrow): relying on LLM prompting.

Training-Dependent Reflection (typically reinforcement learning): learning domain-specific chains-of-thought (CoT) and reflection termination conditions.

A crucial clarification: Search R1-style methods are not inherently superior. Their primary role is to optimize CoTs and termination conditions for domain-specific data within general-purpose LLMs, yet they remain fundamentally reliant on prompt-based Agent frameworks.

The Foundation of Memory: How RAG Supports the Agent’s Memory System

RAG builds the agent’s long-term memory, enabling task state tracking and context acceleration through indexing, forgetting, and consolidation, while working with short-term memory to form a complete architecture.

Agents, regardless of their implementation, do little beyond RAG itself. So how do they make the seemingly routine RAG more intelligent and less reliant on reasoning models? Their transformative power lies in turning LLMs from single-step “intuitive guesswork” into systems capable of iterative observation and reflection—much like human cognition. This fundamental synergy explains why RAG frameworks such as RAGFlow naturally progress towards full Agent integration (beyond workflows), a key feature of RAGFlow's upcoming release.

Often dubbed the "Year of the Agent," 2025 saw a dazzling range of Agent applications emerge. However, core Agent frameworks showed little advancement compared to 2024. The rise in Agent adoption is chiefly due to improved In-Context Learning (ICL) in Large Language Models (LLMs), followed by the maturing Tools ecosystem and buzzwords like multi-agent systems enabling new use cases. Thus, beyond inherent LLM improvements, the core Agent paradigm exhibits limited technological innovation. One notable area of progress is the development of so-called "Memory" mechanisms.

If OpenAI’s 2024 acquisition of Rockset aimed to enhance Retrieval-Augmented Generation (RAG), its 2025 investment in Supabase seeks to equip Agents with more accessible Tools and partly to offer memory management. From the Agent’s perspective, RAG and various Data Infrastructure solutions are functionally equivalent—simply Tools invoked within the Agent’s context. However, the intrinsic link between RAG and Memory distinguishes RAG from other Data Infrastructure components.

Memory gains significance only within the context of an Agent, prompting the question: what distinguishes Memory from RAG? [Ref. 4] offers a detailed summary, broadly dividing Memory into Contextual Memory and Parametric Memory—the latter typically relating to KV Cache and models, which we will address later. Generally, “Agent Memory” refers to Contextual Memory, which benefits the Agent in two key ways:

Storing Task Management Metadata: For example, in Agentic Reasoning, introducing determinism into Planning (such as incorporating human feedback) means the Plan is not solely dictated by the LLM. Instead, a mechanism is needed to store the plan’s state, transforming the Agent from stateless to stateful. Additionally, tracking task decomposition during reasoning requires a repository for task-related metadata.

Context Management: Beyond retaining context, Memory caches and accelerates LLM outputs and provides personalised data necessary for tailored responses.

From an interface perspective, the diagram shows that Memory must provide four key functions. While Updating is straightforward, the other three are explained below:

Indexing: Memory must offer advanced search capabilities beyond simple queries. For Context Management—the second key value of Agent Memory—real-time search is often essential. For example, session data stored in short-term memory may need to be searched by topic to enrich subsequent interactions.

Forgetting: This refers to intentional forgetting, mimicking human cognition. Forgetting helps maintain focus and, technically, smaller datasets often improve search precision.

Consolidation: Meaning “strengthening,” this simulates cognitive processes by summarising and annotating stored data. Technically, it aligns closely with GraphRAG in the RAG paradigm, where an LLM organizes Memory content into a knowledge graph to enhance recall by providing richer context.

The diagram below captures the true relationship between Memory and RAG, revealing that RAG is essentially part of long-term memory. Memory also includes short-term memory, which typically holds an Agent’s session-based interactions and personalized data, often in raw or unprocessed form. High-value data is then transferred via Consolidation as another part of long-term memory.

Therefore, Memory without strong RAG support is fundamentally unsustainable. Beyond this reliance, other aspects of Memory remain limited. Regarding Parametric Memory, though it may seem closer to the essence of “memory,” its core principle offers no inherent technical advantage: it is a computationally intensive method based on KV Cache and Attention operations, tightly integrated with the LLM’s inference engine, essentially a dense attention mechanism. In contrast, long-term memory built on RAG provides filtered, supplementary material for reasoning within an effectively infinite context—also an attention mechanism, but a sparse one. What would be the implications of implementing KV Cache with sparse attention? We will explore this question later.

RAG 2025: Overcoming the Plateau of Technological Challenges

Long-context reasoning depends on hierarchical indexing; multimodal data struggles with storage inflation; and slow infrastructure limits innovation.

Having examined the relationship between RAG and Agents, let us now refocus on RAG itself. Although RAG-related papers continued to be published steadily in 2025, genuine innovation in concepts and systems was notably scarce. Has RAG technology reached a critical plateau? At its core, RAG relies on information retrieval (IR), a well-established field. Yet RAG presents new challenges beyond traditional IR, including query diversity and multimodal data.

Query diversity remains a perennial challenge in information retrieval (IR), bridging the semantic gap between queries and answers. Numerous methods address this, including notable 2024 works such as GraphRAG, Contextual Retrieval, RAPTOR, and RAGFlow’s approach using automated tag libraries informed by domain experts. These methods essentially employ forms of sparse attention: complex queries require longer contexts and the identification of relevant attention within them. For simple queries, effective solutions exist, relying on good chunking and efficient multi-path recall. However, truly effective implementations for complex queries remain elusive.

Consequently, some argue that if bridging the semantic gap depends largely on LLMs generating auxiliary data, why not inject knowledge directly into the LLM’s working memory, bypassing such workarounds? This idea originated with CAG [Ref. 5], which proposed using KV Cache to store all data converted into KV format by the LLM. Later efforts sought to reduce the heavy bandwidth and computational costs of dense attention by combining KV data with database techniques to achieve sparse attention. Examples include RetrievalAttention [Ref. 6], RetroInfer [Ref. 7], and AlayaDB [Ref. 8]. These solutions split KV Cache data between two regions: a portion remains in the traditional KV Cache, while the bulk is stored in vector indexes or databases. During generation—specifically the Decoder phase of LLM inference—the current query vector (Q) retrieves relevant value vectors (V) from the index or database. These V vectors are then loaded into the KV Cache to complete the final attention computation, as illustrated below.

While this technology shows promise in addressing current RAG challenges, it still faces significant hurdles. The main aim of such schemes is often to reduce LLM inference costs. Traditional inference, using Prefill/Decoder separation, relies on dense attention mechanisms that deliver high accuracy but at considerable cost and heavy GPU memory demands. In contrast, sparse attention schemes utilize CPU memory, disk storage, and Approximate Nearest Neighbour (ANN) vector search to lower costs.

These solutions require deep integration with the LLM inference engine, necessitating modifications to handle both text and vector data, which effectively limits their use to open-source models. Moreover, frequent vector retrievals during the Decoder phase demand co-located deployment of the retrieval system and inference engine to reduce network latency, restricting applicability mainly to private or on-premises setups.

Paradoxically, this integrated “Attention Engine” approach may not fully resolve core RAG issues, especially with lengthy documents. In long-context LLMs, overly verbose input can impair performance, causing key details to be overlooked or misinterpreted. For precise detail retrieval, traditional RAG methods still hold the edge.

Therefore, while we must keep a close eye on the “Attention Engine” approach, the practical focus remains on RAG outside the LLM, improving support for reasoning over long texts. Whether it’s an Attention Engine or a Search Engine, their strengths do not fully overlap—the former excels at rapid inference over smaller datasets, the latter at fast retrieval across vast data. They remain largely complementary, even as the scope of RAG continues to evolve and expand.

Currently, aside from methods like GraphRAG and RAPTOR that support cross-chunk reasoning, few solutions for retrieval and reasoning over very long texts demonstrate strong engineering viability. The main approaches can be summarized as follows:

No Chunking, Whole Document Retrieval: Skip chunking and recall entire documents based on brief queries, feeding them directly into the context. This works for a small number of documents but struggles at scale due to poor understanding of global document context, resulting in low recall relevance.

Hierarchical Indexing & In-Document Agentic RAG: Construct a tree-like index during ingestion reflecting document structure (e.g., sections, subsections). Recall happens at the document level, followed by structured traversal within the document using the hierarchical index to locate relevant chunks, enabling “Agentic RAG” within documents.

Overlapped Chunking & Multi-Granular Retrieval: Use chunking with significant overlap and build a multi-layered index (e.g., document, section, paragraph levels). This employs a combined retrieval strategy leveraging both coarse and fine granularities. Though conceptually straightforward, each approach poses unique challenges. As a tool provider, RAGFlow plans to offer similar functionalities in due course.

Turning to the second aspect: multimodal data. In our year-end review, we highlighted Multimodal RAG (MM-RAG) as a key trend for 2025. Yet, by mid-year, this trend has failed to gain momentum. The primary obstacle remains the immaturity of the supporting infrastructure. As noted, late interaction models continue to dominate MM-RAG pipelines, meaning embedding models produce Tensors, or multi-vectors. For instance, a single image may be represented by 1,024 vectors, each comprising 128-dimensional floats, as illustrated below.

Several vector databases now claim to offer native Tensor support; however, comprehensive solutions for practical Tensor utilization remain scarce. This scarcity stems from the dramatic data expansion caused by Tensors, which can increase storage demands by up to two orders of magnitude. Consequently, beyond native Tensor support, holistic approaches are required to mitigate storage bloat. These include:

Binary quantization at the database level: representing each vector dimension with a single bit, thereby reducing storage to approximately one thirty-second of the original size.

Index support for quantised multi-vectors or Tensor indexes: ensuring vector indexes can efficiently manage these binary-quantised multi-vectors.

Reranker compensation: to minimise precision loss from quantisation, binary vectors are de-quantised back into floats during the reranking phase to recalculate similarity scores, thus preserving accuracy.

At the model level, efforts are needed to reduce overhead from Tensor storage growth. This includes:

Using Multi-Representation Learning (MRL) to lower the dimensionality of each vector, for example, cutting dimensions to 64 could halve storage but slightly reduce recall accuracy.

Applying Token or Patch merging to reduce the number of vectors, such as shrinking from 1,024 patches to 128.

While some progress has been made in optimizing models for text ranking, much more work is needed to meet the demands of Multimodal RAG. As a result, widespread adoption of MM-RAG depends on the development of its supporting infrastructure.

End

In summary, our analysis shows that core RAG technology saw little significant progress in 2025. Meanwhile, the interdependence between RAG and Agents has deepened considerably—whether as the foundation of Agent Memory or enabling DeepResearch capabilities. From an Agent’s perspective, RAG may be just one Tool among many, but by managing unstructured data and Memory, it stands as one of the most fundamental and critical Tools. It is fair to say that without robust RAG, practical enterprise deployment of Agents would be unfeasible. Consequently, RAG’s value as a distinct architectural layer is now more pronounced than ever. These insights will directly inform the flagship features of the next RAGFlow release.

As for the complex challenges in RAG’s evolution, let's leave it to time for solution. After all, RAG is fundamentally an architectural framework; its true potential will be realized through the co-evolution of Infrastructure and models. Stay tuned and welcome to star RAGFlow: https://github.com/infiniflow/ragflow

Bibligraphy

https://market.us/report/agentic-retrieval-augmented-generation-market/

https://www.anthropic.com/engineering/building-effective-agents

Reasoning RAG via System 1 or System 2: A Survey on Reasoning Agentic Retrieval-Augmented Generation for Industry Challenges https://arxiv.org/abs/2506.10408

Rethinking Memory in AI: Taxonomy, Operations, Topics and Future Directions https://arxiv.org/abs/2505.00675

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks https://arxiv.org/abs/2412.15605

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval https://arxiv.org/abs/2409.10516

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference https://arxiv.org/abs/2505.02922

AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference https://arxiv.org/abs/2504.10326

RAGFlow Blog

RAGFlow 0.25 — Ingestion pipeline, agent sandbox, and user-level memory

Ingestion pipeline enhancement

Agent enhancement

Agent publishing support

Agent sandbox execution and charting

Memory orchestration with user-level isolation

OpenClaw integration

Conclusion

Data foundation in the era of agent harness - why RAGFlow is changing

Why the data foundation must, and can only, be built with retrieval capability as its core

What kind of retrieval does agent harness actually need?

Bibliography

RAGFlow x OpenClaw - The Enterprise-aware Claw

Feature Overview

Dataset operations

Automated document processing

Semantic search and scope control

A quickstart guide

Finale

RAGFlow online service domain switch

What Does This Mean for Existing Users?

How to Access Your Account?

Looking Ahead

RAGFlow 0.24.0 — Memory API, knowledge base governance and Agent chat history

Memory management

Output memory extraction log

Manage management API

New Agent chat management interface

Batch metadata management

Chat's Thinking mode

Support for multiple admin users

Finale

Reference

RAGFlow 0.23.0 — Advancing Memory, RAG, and Agent Performance

Memory management and usage

Manage memory

Enhanced Agent context

Memory lets Agent know you better

Enhanced Agent

Enhanced Agent performance

Workflows triggered using Webhook

Security configuration

Schema configuration

Response configuration

Test run and check the log

Global conversation variables

Structured Agent component output

Voice input and output

Enhanced RAG

Auto-metadata

Configuring auto-metadata generation rules

Manage metadata

Filter metadata

Add context to images and tables

Child chunking

API updates

Agent API returns trace logs

Supported API

Considerations

Curl example

Chat API supports metadata filtering

Supported APIs

Curl example

Finale

From RAG to Context - A 2025 year-end review of RAG

Can RAG still be improved?

The debate about long context and RAG

Optimizations for RAG conversational quality

From knowledge base to data foundation

From RAG to context

How is retrieval for Agents different from the search era?

Tools also need searching

Does memory deserve to be a separate infra?

How far has multi-modal RAG progressed?

Looking ahead to 2026

References

RAGFlow’s Seamless Upgrade - from 0.21 to 0.22 and Beyond

Background

0.22.1 capabilities