Real-time Enterprise RAG with SharePoint

Retrieval Augmented Generation (RAG) applications empower you to deliver context-specific answers based on private knowledge bases using LLMs/Gen AI.
SharePoint offered via Microsoft 365 is a common data source on which you might want to build your RAG applications. Microsoft SharePoint leverages workflow applications, "list" databases, and other web parts and security features to enable business teams to collaborate effectively and is widely used by Microsoft Office users for sharing files in a SharePoint document library.
Pathway, on the other hand, is crucial for building Enterprise RAG systems to work with live enterprise data and managing dynamic data sources like Microsoft SharePoint while maintaining high accuracy, real-time synchronization and reliability.
What is Real-time RAG?
In practical scenarios, files in data repositories are dynamic, i.e., frequently added, deleted, or modified. These ongoing changes require real-time synchronization and efficient incremental indexing to ensure the most current information is always available.
Real-time Enterprise RAG Applications stay in permanent sync with your dynamic data sources.
This app template will help you build a Real-time Enterprise RAG application that integrates with Microsoft SharePoint as a data source. Your application will always provide up-to-date knowledge, synchronized with any file insertions, deletions, or changes at any point in time, making your work easier. It avoids the need for constant ETL (Extract, Transform and Load) adjustments for such bound-to-implement considerations.
You can easily run this app template in minutes using Docker containers while ensuring the best practices needed in an enterprise setup.
Real-time RAG with SharePoint
Real-time RAG with SharePoint refers to an approach where RAG is integrated with Microsoft SharePoint as the data source, and enhanced by real-time AI capabilities. In this setup:
- Continuously index documents as they're added, changed, or removed in SharePoint.
- Securely authenticate and manage documents behind enterprise-grade permissions and certificate-based authentication.
- Provide up-to-date answers with minimal latency, thanks to real-time synchronization.
Ready to Elevate Your RAG with SharePoint?
Discover how Pathway can streamline your SharePoint integration and drive intelligent document retrieval.
Features of Real-time Enterprise RAG with SharePoint
Real-Time Synchronization
Real-time RAG Apps must stay in sync with your data repositories to provide relevant responses.
- Pathway's SharePoint connector supports both static and streaming modes.
- Ensures that your app continuously indexes documents from SharePoint, maintaining an up-to-date knowledge base.
Imagine senior executives making strategic decisions based on last month's financial reports or outdated project statuses. This lag in information leads to misinformed decisions, missed opportunities, or significant financial losses. Real-time synchronization ensures your app delivers the most current and accurate information, preventing such scenarios.
Detailed Metadata Handling
Enterprise RAG applications include comprehensive metadata such as file paths, modification times, and creation times in the output table. This additional context is crucial for effectively tracking and managing documents.
- Pathway's streaming mode ensures that this metadata is always up-to-date.
High Security with Certificate-Based Authentication
Enterprise workflows must ensure high security and compliance with enterprise standards.
- Pathway's certificate-based authentication future-proofs your system against the potential deprecation of simpler authentication methods by SharePoint.
- For enhanced security, locally deployed LLMs can be set up within an isolated environment, like a Faraday cage, that protects against external interference. This setup ensures that sensitive data remains secure and private, adhering to the highest security standards.
While this template uses the OpenAI API as an example, you can easily swap it with private RAG setups using the additional resources provided at the end.
Scalable and Production-Ready Deployment
Enterprise applications handle vast and ever-growing data sources, often increasing as many users within a company work on them.
- Pathway provides fast, built-in, and persistent vector indexing for up to millions of pages of documents, eliminating the need for complex ETL processes.
- Pathway is built for scale, and it offers an integrated solution where the server and endpoints are part of the same application.
- The easy Docker setup ensures consistency across different environments.
High Accuracy and Enhanced Query Capabilities
Pathway's SharePoint connector allows you to easily query and manage your datasets stored in SharePoint, providing flexible and powerful options for accessing your data.
- You can configure the connector to read data from specific directories or entire subsites, with options for both recursive and non-recursive scans.
- Starting with a basic RAG pipeline provides initial accuracy, but leveraging more advanced methods such as hybrid indexing and multimodal search can significantly increase accuracy.
Step-by-Step Process to Implement a Production-Ready RAG with SharePoint Connector
This template guides you in connecting Pathway with SharePoint to build a real-time Enterprise RAG app.
Important: SharePoint connector requires a Pathway license key. If you haven’t already, request your free license key to unlock the SharePoint connector and other enterprise features. The application will be updated with this key in Step 5.
Prerequisites for the Enterprise RAG App Template
- Docker Desktop: You can download it from the Docker website.
- OpenAI API Key: Sign up on the OpenAI website and generate an API key from the API Key Management page. Keep this key secure as you will need to use it in your configuration.
- Pathway License Key: Get your free license key here.
- Certificate-Based Authentication Setup for SharePoint Integration
For better security, we use certificate-based authentication to access data from SharePoint. For this we use Azure AD, which is now renamed to Microsoft Entra ID.
You can follow the steps in the video below to create and upload your SSL certificate to obtain necessary parameters for Pathway's SharePoint connector.

Once done, you will use these parameters to update the app.yaml
file to successfully build and deploy your Real-time Enterprise RAG with Microsoft SharePoint and Pathway.
Components of your Real-time RAG Pipeline
This folder contains several objects:
app.py
, the application code using Pathway and written in Python;app.yaml
, the file containing configuration of the pipeline, like LLM models, data sources or server address;requirements.txt
, the dependencies for the pipeline. It can be passed topip install -r ...
to install everything that is needed to launch the pipeline locally;Dockerfile
, the Docker configuration for running the pipeline in the container;.env
, a short environment variables configuration file where the OpenAI key must be stored;ui/
, a simple ui written in Streamlit for asking questions.
Step 1: Clone the Pathway LLM App Repository
Clone the llm-app repository from GitHub. This repository contains all the files you’ll need.
git clone https://github.com/pathwaycom/llm-app.git
If you have previously cloned an older version, update it using a pull command.
git pull
Step 2: Navigate to the Demo-Question-Answering Directory
Change to the directory where the example is located:
cd llm-app/examples/pipelines/demo-question-answering
Step 3: Create a .env
File and put your Open AI API key
Rename the .env.example
file in the project directory to .env
and update it with your OpenAPI key:
OPENAI_API_KEY=sk-*******
Save the file after making the changes.
Step 4: Modify the app.yaml
File
By default, the YAML configuration reads documents from a local data folder. If files need to be pulled from external repositories—such as SharePoint, Google Drive, or Amazon S3—Pathway provides seamless integration through dedicated connectors. In this configuration, the !pw.xpacks.connectors.sharepoint.read
block replaces the default local source, allowing documents to be directly ingested from SharePoint with metadata enrichment and periodic refresh intervals.
For the LLM service, the configuration uses gpt-3.5-turbo
by default, but you can switch to OpenAI models like GPT-4 or GPT-4o as needed. Additionally, Pathway supports 300+ LLMs through the LiteLLM Class, offering flexibility in model selection. Users can also integrate open-source models hosted locally, providing full control over inference and deployment, ensuring privacy and cost efficiency.
$sources:
- !pw.xpacks.connectors.sharepoint.read
url: $SHAREPOINT_URL
tenant: $SHAREPOINT_TENANT
client_id: $SHAREPOINT_CLIENT_ID
cert_path: sharepointcert.pem
thumbprint: $SHAREPOINT_THUMBPRINT
root_path: $SHAREPOINT_ROOT
with_metadata: true
refresh_interval: 30
$llm: !pw.xpacks.llm.llms.OpenAIChat
model: "gpt-4o"
retry_strategy: !pw.udfs.ExponentialBackoffRetryStrategy
max_retries: 6
cache_strategy: !pw.udfs.DefaultCache
temperature: 0
capacity: 8
$embedder: !pw.xpacks.llm.embedders.OpenAIEmbedder
model: "text-embedding-ada-002"
cache_strategy: !pw.udfs.DefaultCache
$splitter: !pw.xpacks.llm.splitters.TokenCountSplitter
max_tokens: 400
$parser: !pw.xpacks.llm.parsers.UnstructuredParser
cache_strategy: !pw.udfs.DefaultCache
$retriever_factory: !pw.stdlib.indexing.BruteForceKnnFactory
reserved_space: 1000
embedder: $embedder
metric: !pw.stdlib.indexing.BruteForceKnnMetricKind.COS
dimensions: 1536
$document_store: !pw.xpacks.llm.document_store.DocumentStore
docs: $sources
parser: $parser
splitter: $splitter
retriever_factory: $retriever_factory
question_answerer: !pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer
llm: $llm
indexer: $document_store
# You can set the number of documents to be included as the context of the query
# search_topk: 6
# You can use your own prompt for querying.
# For that set prompt_template to string with `{query}` used as a placeholder for the question,
# and `{context}` as a placeholder for context documents.
# prompt_template: "Given these documents: {context}, please answer the question: {query}"
# Change host and port by uncommenting these lines
# host: "0.0.0.0"
# port: $PATHWAY_PORT
# Cache configuration
# with_cache: true
# If `terminate_on_error` is true then the program will terminate whenever any error is encountered.
# Defaults to false, uncomment the following line if you want to set it to true
# terminate_on_error: true
Mandatory Parameters:
url
: The SharePoint site URL, including the site's path. For example: https://company.sharepoint.com/sites/MySite.tenant
: The ID of the SharePoint tenant, typically a GUID.client_id
: The Client ID of the SharePoint application with the required grants to access the data.cert_path
: The path to the certificate (typically a .pem file) added to the application for authentication.thumbprint
: The thumbprint for the specified certificate.root_path
: The path for a directory or file within the SharePoint space to be read.refresh_interval
: Time in seconds between scans if the mode is set to "streaming".
For more details on additional configurations, visit Pathway's SharePoint Connector page.
Example Configuration:
To illustrate the utility of this connector, consider a scenario where you need to access a dataset stored in the Shared Documents/Data
directory of the SharePoint site Datasets
. Below is a basic example demonstrating how to configure the connector for reading this dataset in streaming mode:
t = pw.xpacks.connectors.sharepoint.read(
url="https://company.sharepoint.com/sites/Datasets",
tenant="c2efaf1f-8add-4334-b1ca-32776acb61ea",
client_id="f521a53a-0b36-4f47-8ef7-60dc07587eb2",
cert_path="certificate.pem",
thumbprint="33C1B9D17115E848B1E956E54EECAF6E77AB1B35",
root_path="Shared Documents/Data",
)
In this setup, the connector targets the Shared Documents/Data
directory and recursively scans all subdirectories. This method ensures that no file is overlooked, providing comprehensive access to all pertinent data within the specified path.
Step 5: Obtain and Update the Pathway License Key in app.py
Pathway is an open-source framework that provides core functionalities for free. However, to use advanced features like SharePoint connector, you need a Pathway license key. This key unlocks additional enterprise-grade capabilities such as enhanced RAM limits, enterprise connectors (e.g., SharePoint, Delta Table, Iceberg), full persistence and monitoring.
To obtain your free license key, visit Pathway License Key Page and follow the instructions.
Once you have the key, update it in app.py
by replacing the existing demo key:
# Set up license key for using Sharepoint feature
pw.set_license_key("demo-license-key-with-telemetry")
Step 6: Running the Project
Locally
If you are using Windows, refer to the Docker instructions in the next section. For a local run, first install the dependencies:
pip install -r requirements.txt
Then, start the app:
python app.py
With Docker
Build the Docker with:
docker compose build
And, run with:
docker compose up
This will start the pipeline and the ui for asking questions.
Step 7: Querying the Pipeline
Check the Indexed Files
Check if your files in SharePoint are indexed for information retrieval for LLMs. To test it, query to get the list of available inputs and associated metadata using curl:
curl -X 'POST' 'http://localhost:8000/v2/list_documents' -H 'accept: */*' -H 'Content-Type: application/json'
This will return the list of files e.g. if you start with this file uploaded on your sharepoint the answer will be as follows:
[{"created_at": null, "modified_at": 1718810417, "owner": "root", "path":"data/IdeanomicsInc_20160330_10-K_EX-10.26_9512211_EX-10.26_Content License Agreement.pdf", "seen_at": 1718902304}]
If you add or remove files from the connected folder, repeat the request to see the updated index. The service logs will display the progress of indexing new and modified files.
Ask a Question
You can now run the RAG service. Start by asking a simple question. For example:
curl -X 'POST' \
'http://0.0.0.0:8000/v2/answer' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"prompt": "What is the start date of the contract?"
}'
This will return the following answer:
{"response": "The start date of the contract is December 21, 2015."}
If the answer is in any of your indexed documents, the pipeline will return the most accurate, up-to-date response—powered by real-time AI.
Conclusions
In this app template, you:
- Learned about Real-time RAG and key considerations for Enterprise RAG applications.
- Successfully created and deployed a Enterprise RAG application using Pathway with Microsoft SharePoint as a data source.
By leveraging the combined power of Pathway and Microsoft SharePoint, you built a secure, efficient and scalable Enterprise RAG system tailored to your specific needs. This traditional RAG setup can be refined with rerankers, adaptive RAG, multimodal RAG, and other techniques.
Additional Resources on Enterprise RAG
- Slides AI Search: Set up high accuracy multimodal RAG pipelines for presentations and PDFs on the Slides AI Search GitHub repo. This template helps you build a multi-modal search service using GPT-4o with Metadata Extraction and Vector Index. You can also try out the hosted demo here.
- Private RAG with Connected Data Sources using Mistral, Ollama, and Pathway: Set up a private RAG pipeline with adaptive retrieval using Pathway, Mistral, and Ollama. This app template allows you to run the entire application locally while ensuring low costs without compromising on accuracy, making it ideal for production use-cases with sensitive data and explainable AI needs. Get started with the app template here.
- Multimodal RAG for PDFs with Text, Images, and Charts: This showcase demonstrates how you can launch a MultiModal RAG pipeline that utilizes GPT-4o in the parsing stage. Pathway extracts information from unstructured financial documents in your folders, updating results as documents change or new ones arrive. Learn more here.
Are you looking to build an Enterprise RAG app?
Pathway is trusted by industry leaders such as NATO and Intel, and is natively available on both AWS and Azure Marketplaces. If you'd like to explore how Pathway can support your RAG and Generative AI initiatives, we invite you to schedule a discovery session with our team.
Schedule a 15-minute demo with one of our experts to see how Pathway can be the right solution for your enterprise needs.
Troubleshooting
To provide feedback or report a bug, please raise an issue on our issue tracker. You can also join the Pathway Discord server (#get-help) and let us know how the Pathway community can help you.
Join our Discord community and dive into discussions on tricks and tips for mastering Retrieval Augmented Generation
- Pathway Teamshowcase · llm · case-studyJan 12, 2024LlamaIndex and Pathway: RAG Apps with always-up-to-date knowledge
- Avril AyshablogApr 19, 2024Machine Unlearning for LLMs: Build Apps that Self-Correct in Real-Time
- Pathway Teamblog · case-studyApr 29, 2024Building End-to-End RAG with NPCI’s AI Leader