Problem Statements for Code Cubicle 5.0 by by Pathway

Empower AI with Real-Time Data – A Hackathon Challenge for Builders

Background: The Power of Real-Time RAG

Large Language Models (LLMs) are incredibly powerful, but they have a weakness—their knowledge can become stale.

Enter Retrieval-Augmented Generation (RAG): a technique where an AI retrieves up-to-date information from external sources before generating answers. This means an AI assistant could fetch the latest data on the internet or even your private document store, and then summarize or answer questions based on current information, bypassing the static limits of its training data.

Real-Time RAG takes this a step further by ensuring that as soon as data changes, the AI’s knowledge and responses change with it. The result? Answers that are not only contextually accurate but also instantly updated with live information, greatly reducing outdated responses and AI hallucinations.

About Pathway

Meet Pathway — the open-source engine (Pathway Engine GitHub | Pathway LLM App GitHub) powering this hackathon’s realtime magic. Pathway is a Python-based framework for high-performance streaming data processing and AI integration. In simple terms, Pathway lets you build pipelines that ingest data continuously (from APIs, files, databases, or sensors) and transform or index it on the fly. Under the hood, it uses a powerful Rust engine and incremental computation to handle streams and batches in one unified platform. It’s trusted by the likes of F1 team(s), NATO, Intel, etc.

The Pathway team (a crack squad of researchers and advisors from places like Google Brain and OpenAI) built this tool to make real-time ETL and live indexing easy for developers—no separate Kafka or Flink stack needed. Pathway’s advisor panel includes Lukasz Kaiser (co-creator of TensorFlow and co-author of foundational models behind ChatGPT), while the full-time research team features global programming champions and leaders such as Jan Chorowski (ex-Google Brain researcher and co-author with Nobel Prize winner Geoffrey Hinton).

What does this mean for you? It means you can focus on your application logic while Pathway handles the heavy lifting of keeping data flows synchronized. Pathway can, for example, ingest updates from 300+ sources with built-in connectors, and automatically maintain up-to-date data tables or vector indexes in memory. It’s already trusted in domains from logistics to finance, but it’s also perfect for hackathon projects—you get an always-on data backbone for your idea. Crucially for RAG, Pathway supports real-time vector/hybrid search and indexing: as new text or documents arrive, it can embed and index them immediately, so your LLM queries always hit the latest info. In short, Pathway is the real-time brain behind dynamic RAG apps.

You can check out the two repos below for a bit more context.

Problem Statement Tracks

Track 1: Open Innovation: Real-time RAG Playground

This track is all about building something cool with real-time AI.

Your challenge: create an AI that keeps learning from live data as it comes in.

Here’s the only rule:

👉 Use Pathway to pull in data live (from a file, an API, or any stream) and make sure your AI updates instantly when new data arrives.

Think of it like a playground for real-time ideas:

A chatbot that follows breaking news, pulling in tweets or articles the moment they’re published, and answering questions with the latest info.
A team knowledge assistant where people can keep adding notes/documents and everyone can query it — answers change the moment new info is added.
An AI control panel that monitors live sensor data (like temperature, machines, or traffic) and explains what’s happening or gives smart alerts.

Basically: surprise us with something nobody expects. It can be any domain, any use case — as long as it shows how AI + Pathway can handle fresh, streaming data in real-time.

Track 2: Stale AI in a Real-Time Financial World

Financial markets move in the blink of an eye, but too often our AI tools are a step behind. Traders, analysts, and compliance officers face an onslaught of information – from rapid-fire news feeds and price ticks to regulatory filings – yet many AI systems respond with static, outdated insights. Traditional large language models (LLMs) lack real-time awareness, relying on stale training data and missing the latest context. This leads to critical pain points in finance and investing:

Information Overload & Delays: Markets generate more data than any human or static system can handle. Key signals from APIs, streams, or message queues arrive continuously, but legacy tools struggle to ingest and act on them instantly. Opportunities are lost in the lag.
Outdated AI Responses: Whether in trading algorithms, compliance tools, or robo-advisors, AI that doesn’t update with new data can give yesterday’s answers to today’s problems. A static trading bot might ignore this morning’s earnings surprise, and a compliance assistant might miss a new regulation – with costly consequences.
Crowded Insights & Groupthink: In hyper-competitive markets, everyone has access to similar data. Without real-time, unique signal integration, analysts fall victim to groupthink, lacking the edge to uncover novel opportunities. The winners are those who can synthesize fresh, alternative data faster than the crowd.
Keeping Up with Change: From surprise economic news to fast-evolving regulations, the financial domain is in constant flux. Humans can’t reliably keep up with hundreds of pages of filings, news articles, and social media buzz each day. Delayed awareness means compliance risks and missed market moves.
Missed Signals, Missed Opportunities: When systems can’t integrate new information on the fly, critical signals (a sudden spike in credit card fraud, an anomaly in portfolio performance, a geopolitical headline) slip through the cracks. The result? Missed opportunities, or worse – unseen risks.

The Solution: Build a Live Fintech AI Solution (Hint for Track 2)

💡 Your Mission

Build an AI app for finance that thinks in real-time.

Normally, AI tools look at old data and give answers that can go stale. But in this track, your job is to create an app that keeps learning every second as new data comes in — whether it’s stock prices, payments, or breaking news.

You’ll use Pathway (a Python tool that handles live data) so your AI updates instantly whenever new info arrives.

🔎 Think of it like this:

If stock prices jump, your app should notice right away and update the insights.
If breaking news comes in about a company, your app should instantly show how it affects an investor’s portfolio.
If a customer suddenly does something unusual (like spending 50× more than usual), your app should alert the team and explain why it looks suspicious.

🛠 Examples you could build

Trading Buddy: A chatbot where you ask, “Why is Tesla’s stock moving?” and it pulls in live market data + news to answer in plain English.
Payment Guard: If a transfer matches a newly sanctioned person, your AI immediately pauses it and says, “This name was added to the list 3 minutes ago.”
Customer Behavior Watchdog: Imagine a small shop suddenly wiring ₹40 lakh abroad. The AI flags it: “This is 25× their usual daily spend and the country is new.”News-to-Portfolio Tool: When a tweet or news breaks (e.g. “Apple delays iPhone launch”), the app instantly tells an investor, “Your Apple shares may drop, here’s the possible impact.”

✅ The Key

Don’t worry about building a full financial product. The goal is simple:

👉 Show how AI + Pathway can work together to handle live data and react instantly instead of being stuck with old info.

Think of it as building an AI that’s always awake, always updated, and always explaining what’s going on right now.

Track 3: Real-Time Cybersecurity Anomaly Detection 🛡️

Cyberattacks don’t wait — suspicious logins, data exfiltration, or malware can spread in seconds. Traditional monitoring tools often detect issues too late.

We need a system that can spot anomalies in security data in real-time and explain them clearly, so analysts know why something looks suspicious.

🎯 What is Expected

Ingest live security-like data streams (mock logs, network traffic, login attempts).
Detect anomalies (unusual patterns, suspicious spikes, access from unusual locations).
Send an instant alert with a short AI-generated explanation.

🛠 What You Can Build (Examples / Hacks)

Login Watchdog
- Normal: user logs in from Delhi daily.
- Anomaly: same account logs in from Russia at 3 AM.
- Alert: “Unusual login detected: first-time login from Russia, outside normal hours.”
Network Traffic Monitor
- Normal: 50 requests/minute.
- Anomaly: sudden spike to 5000 requests (possible DDoS).
- Alert: “Traffic 100× higher than baseline — potential attack.”
Data Theft Detector
- Normal: 20MB daily file transfer.
- Anomaly: 2GB downloaded in 5 minutes.
- Alert: “File transfer volume 100× higher than usual, suspicious data exfiltration attempt.”

🔗 How Pathway Can Be Used

Ingest Streams: Continuously read system logs, API calls, or mock login events (via CSV, API, Kafka, etc.).
Detect Anomalies: Use Pathway rules/statistics (e.g. thresholds, drift detection) to flag unusual patterns.
Real-Time Processing: Pathway ensures as soon as data arrives, it’s processed — no waiting for batch jobs.
Explain with LLM: For every anomaly, generate a human-friendly message so analysts don’t just see numbers but understand why it’s risky.

Alerts: Output can be shown on a dashboard, or pushed to Slack/Discord.

✅ Why it’s a great hackathon problem:

Easy to simulate logs with CSV or mock APIs.
Clear wow factor: “Our AI caught a fake login in real-time.”
Beginners can start simple (rule-based + alerts).

Advanced teams can push further (combine Pathway + RAG + LLM for deep explanations).

Getting Started and What to Aim For

This challenge is beginner-friendly – you don’t need prior experience with streaming data or Pathway to participate. Pathway’s high-level API, extensive connectors, and example templates will help you hit the ground running. Over the next 5-7 days, focus on delivering an MVP (minimum viable product) that demonstrates real-time AI capabilities. Judges will be looking for:

Real-Time Functionality: Does your application truly update and respond live as new data arrives? Show off that dynamic behavior – e.g., an evolving dashboard, continuous alerts, or AI responses that change over time.
Effective Use of Pathway: Are you leveraging Pathway’s unique strengths (stream ingestion, live indexing, etc.) to solve the problem? We want to see you unlock Pathway’s “LiveAI™” superpowers in your design.
Innovation & Impact: How creatively does your solution address the chosen finance problem? Will it make a meaningful difference for users (traders, analysts, consumers, etc.) by alleviating information overload or providing a timely edge?
Clarity & Polish: Since this is a hackathon, your prototype doesn’t need to be perfect, but a clear presentation of how it works and what it achieves is essential. Bonus for intuitive UIs or visualizations that help showcase the live aspect of your project.

Imagine a future where trading desks, investment apps, or compliance departments run on AI co-pilots that are never out of date, always learning from the latest data. That future starts now – with you. This hackathon is your chance to explore how Pathway’s real-time AI platform can transform finance, turning data deluge into actionable insight in the moments that matter. We can’t wait to see the live Fintech innovations you create. Good luck, and happy hacking!

Solution Requirements

No matter which track you choose, every project must hit a few key requirements (think of these as the judging criteria basics):

Pathway-Powered Streaming ETL: Use the Pathway framework to handle your data ingestion and processing. Your pipeline should continuously ingest and process data in real-time (e.g., reading a file directory, listening to an API or webhook, etc.) – this forms the backbone of your solution.
Live Indexing (No Rebuilds): New or updated data should be indexed or integrated on-the-fly, with no manual reloading of your AI’s knowledge base. In other words, show off Pathway’s real-time indexing engine – data changes flow through to answers immediately.
Live Retrieval/Generation Interface: Provide an interface for users (or an API) to query or get outputs from your system. This could be a chatbot, a search bar, a Q&A endpoint, or a generative text/insight output. The crucial part: the responses reflect the latest data. If the underlying data updates at T+0, a query at T+1 should already include that update.
Demo Video: Prepare a short video (e.g. 2-5 minutes) demonstrating your project in action. This should showcase live updates – for instance, you might screen-record your app, first showing it answer a query, then introduce a data change (add a file, trigger an update), and finally show the app responding to the same query with the new info. Prove to us in the video that your solution truly works in real time!
Optional: Want multi-step logic or escalations? Orchestrate agents with LangGraph, Crew, etc. and use Pathway for connectors/vector stores, ETL for RAG, etc.—all of this, depending on your bandwidth. If you’re short on time, you can label this as future scope if time runs short.

You can use any additional libraries or models you like (LangChain, LlamaIndex, OpenAI API, etc.), but Pathway must be the core engine for streaming data and incremental processing. Also, as mentioned, agentic RAG is optional – if you want to incorporate an autonomous agent that decides how to route queries or when to fetch new data, go for it, but it’s not required for a successful submission.

Non-Negotiable: Participants integrating an AI agent framework must deploy the custom agentic workflow by exposing their agent logic via a REST API endpoint OR ensuring seamless interaction with the real-time RAG pipeline powered by Pathway

Deliverables

Each team should submit the following:

Working Prototype – A link to your running application on a GitHub repository with a runnable project. It should include clear instructions (in a README) on how to set up and run the system along with a working demo of how your solution works. We will be looking to run your Pathway pipeline and interface to test the real-time behavior and the presented solution.
Code Repository – Your source code, preferably on GitHub itself. Make sure to document how Pathway is used in your solution (e.g., which module handles data ingestion, how the indexing is done, how the query interface works). If you used any pre-built components or templates, note that as well.
Demo Video – As described in requirements, a short screen-recorded video demonstrating the live update flow. This is crucial for us to experience your hack without needing to run it from scratch immediately. Ensure the video highlights the before-and-after of a data update clearly.
Brief Presentation or Write-up – (Optional but encouraged). You might include a few slides or a markdown document summarizing your project’s architecture and the problem it solves. Emphasize how data moves through Pathway, and how the LLM/RAG component produces results. This helps judges appreciate your design and any creative choices you made.

Evaluation Criteria

We’re keeping the judging light and fun, but with an eye on key aspects of your hack. Here’s what we’ll be looking for:

Real-Time Functionality: Does your project truly achieve real-time updates? We’ll check that data changes propagate to the user-facing results with minimal latency. Using Pathway effectively here is a big plus.
Technical Implementation: How well did you integrate Pathway and build your pipeline? Clever use of Pathway’s features (connectors, streaming joins, vector indexing, etc.) will be noted. Also, overall code quality and project completeness matter (but remember, this is a hackathon – scrappy is okay as long as it works!).
Creativity & Innovation: We reward originality. Did you tackle a unique problem or combine tools in an interesting way? Is your solution something that could be extended into a real product or open-source project?
Impact & Usefulness: Think about the “so what” factor – does your hack demonstrate a compelling use-case for real-time RAG? Would a user or business find it genuinely useful or cool? We’ll favor hacks that showcase why live data integration makes a difference.
User Experience & Demo Quality: We don’t expect polished UIs in a hackathon, but a clear presentation helps. If there’s a user interface, is it intuitive to follow? Does your demo (and any write-up) explain the project well? Make it easy for us to understand what’s happening and why it’s awesome.

All criteria will be weighed collectively – this isn’t a strict point system, but rather a holistic assessment. A simple hack that nails the real-time aspect and is well-presented can beat a complex hack that only half works. Aim for a working proof-of-concept that highlights the core idea effectively.

Starter Resources

To help you get going with Pathway and dynamic RAG, we’ve compiled some useful resources:

Pathway Documentation – The official docs for the Pathway framework (installation, API, guides). Start here to learn how to set up data sources, create processing pipelines, and use Pathway’s features (like its vector store, streaming joins, etc.).
Pathway GitHub Repo – Explore Pathway’s source code, examples, and README for insights into usage. The GitHub examples folder contains ready-to-run notebooks and templates for common scenarios.
RAG Beginner’s Guide – New to Retrieval-Augmented Generation? Check out “Retrieval Augmented Generation: Beginner’s Guide to RAG Apps” on the Pathway blog. It covers why RAG is useful and how real-time data integration changes the game (great background reading).
LangChain Integration Guide – If you plan to use LangChain or LlamaIndex with Pathway, see “LangChain and Pathway: RAG Apps with always-up-to-date knowledge”. This guide shows how Pathway can serve as a live data backend for LangChain pipelines, enabling up-to-date document search from LLMs.
Pathway Discord & Forums – Got questions or need help debugging? The Pathway community is there for you (check the Discord link in the docs). While we can’t give away solutions, folks can often point you in the right direction if you’re stuck.

And of course, don’t hesitate to use your own ingenuity and other tools. Stack Overflow, AI forums, and Pathway’s examples are your friends. We’ve given you the building blocks – now it’s up to you to build something amazing!

All the resources you need to get started

RAG Introductory Blog
Building your first Realtime RAG pipeline with Pathway:
Building a Realtime Agentic RAG pipeline using LangGraph and Pathway
LLM Tooling (Pathway’s core software development kit for building a custom RAG pipeline, integrating Pathway into your existing codebase, or doing deep customizations)
API Documentation for Pathway LLM xpack
Pathway Developer Documentation: Link to Pathway Developer Docs
Learn how to expose Pathway’s Document Store as an MCP server and connect it to AI assistants like Claude Desktop.
How to deploy agents with Pathway?
- Here you will see how you can build custom endpoints using Pathway RAG classes. There are two ways to serve agents: using the serve_callable API (which is easier to manage and recommended) or with an external web server like FastAPI. If you prefer, you can start with an external web server and move the endpoint to Pathway later.
Tips for resolving doubts?
- Leverage Gen AI wisely. If you see difficult-to-comprehend error messages, the least you should do is ask the query on Gemini/ChatGPT, etc.
- Utilize the #get-help channel on Pathway’s Discord, if needed. However, given the nature of the competition, we wouldn’t be able to share direct answers.