CrewAI Tutorial (Python): building a RAG pipeline for beginners

By Cyprian AaronsUpdated 2026-04-21
crewaibuilding-a-rag-pipeline-for-beginnerspython

This tutorial shows you how to build a small Retrieval-Augmented Generation (RAG) pipeline in Python using CrewAI. You’ll load documents, retrieve the most relevant chunks for a user question, and have an LLM answer using only that context.

What You'll Need

  • Python 3.10+
  • A virtual environment
  • crewai
  • crewai-tools
  • langchain-openai
  • langchain-community
  • faiss-cpu
  • OpenAI API key set as OPENAI_API_KEY
  • A few local .txt documents to index

Install everything with:

pip install crewai crewai-tools langchain-openai langchain-community faiss-cpu

Set your API key:

export OPENAI_API_KEY="your_key_here"

Step-by-Step

  1. Start by creating a tiny document corpus and loading it into LangChain documents. For beginners, plain text files are enough; the point is to get the retrieval loop working before adding PDFs, databases, or vector stores in production.
from pathlib import Path
from langchain_community.document_loaders import TextLoader

docs_dir = Path("docs")
docs_dir.mkdir(exist_ok=True)

(docs_dir / "policy.txt").write_text(
    "Claims must be filed within 30 days. "
    "Escalate fraud cases to the investigations team."
)

(docs_dir / "benefits.txt").write_text(
    "Premium support includes 24/7 phone access. "
    "Standard support is email only."
)

documents = []
for file_path in docs_dir.glob("*.txt"):
    loader = TextLoader(str(file_path), encoding="utf-8")
    documents.extend(loader.load())

print(f"Loaded {len(documents)} documents")
  1. Next, split those documents into chunks and build a FAISS vector store. This gives you semantic retrieval, which is the core of RAG: find the right context first, then ask the model to answer from it.
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=50)
chunks = splitter.split_documents(documents)

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
print(f"Indexed {len(chunks)} chunks")
  1. Now define two CrewAI agents: one retrieves context, the other answers the question using that context only. In real systems, this separation makes the pipeline easier to test and easier to swap out later.
from crewai import Agent

retriever_agent = Agent(
    role="Retriever",
    goal="Find the most relevant document chunks for a user question",
    backstory="You are precise and return only supporting context.",
    verbose=True,
)

answer_agent = Agent(
    role="Answerer",
    goal="Answer questions using only retrieved context",
    backstory="You cite only what is present in the provided context.",
    verbose=True,
)
  1. Create a simple retrieval function and wire it into a CrewAI task flow. CrewAI handles orchestration; your code handles fetching the right evidence and formatting it cleanly for generation.
from crewai import Task, Crew, Process
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

def retrieve_context(question: str) -> str:
    docs = retriever.get_relevant_documents(question)
    return "\n\n".join(
        f"[Source: {doc.metadata.get('source', 'unknown')}]\n{doc.page_content}"
        for doc in docs
    )

question = "How long do customers have to file claims?"
context = retrieve_context(question)

retrieve_task = Task(
    description=f"Retrieve supporting context for: {question}",
    expected_output="Relevant context passages",
    agent=retriever_agent,
)

answer_task = Task(
    description=(
        "Answer the question using only this context:\n\n"
        f"{context}\n\nQuestion: {question}"
    ),
    expected_output="A concise grounded answer",
    agent=answer_agent,
)
  1. Finally, run the crew and print the result. The important part here is that the answer task receives retrieved text instead of letting the model guess from memory.
crew = Crew(
    agents=[retriever_agent, answer_agent],
    tasks=[retrieve_task, answer_task],
    process=Process.sequential,
    verbose=True,
)

result = crew.kickoff()
print("\n--- ANSWER ---\n")
print(result)

Testing It

Run the script with a few different questions and check whether the answer changes based on retrieved content. Good test questions are ones that clearly map to your sample docs, like “What support plan includes phone access?” or “When should fraud cases be escalated?”

If retrieval is working, you should see relevant chunks pulled from policy.txt or benefits.txt before the final answer is generated. If answers start drifting beyond the source material, tighten your prompt and keep temperature at zero.

For a stronger test, add a third document with conflicting information and verify that retrieval returns the closest match rather than every chunk in the index. That tells you your vector search is doing real work instead of acting like keyword matching.

Next Steps

  • Add PDF ingestion with PyPDFLoader and keep the same retrieval layer.
  • Replace FAISS with Pinecone or Weaviate when you need persistence and multi-instance access.
  • Add citations to your final answer so reviewers can trace every claim back to source text.

Keep learning

By Cyprian Aarons, AI Consultant at Topiax.

Want the complete 8-step roadmap?

Grab the free AI Agent Starter Kit — architecture templates, compliance checklists, and a 7-email deep-dive course.

Get the Starter Kit

Related Guides