RAG Querying Workflows

Introduction

Welcome to the final lesson of Managing Data for GenAI with Bedrock Knowledge Bases! Having successfully built your vector storage infrastructure, populated it with document embeddings, and created a Bedrock Knowledge Base, you have now arrived at the culmination of your GenAI data management journey: implementing powerful querying capabilities that transform your stored knowledge into actionable intelligence.

In this fourth and final lesson, we'll explore two distinct yet complementary approaches to retrieving information from your carefully constructed data pipeline. You'll master both direct vector similarity search using S3 Vectors for precise mathematical matching and the complete RAG (Retrieval-Augmented Generation) workflow, which combines intelligent document retrieval with response generation. By comparing these approaches side by side, you'll develop a nuanced understanding of when to use each method and how they work together to create truly intelligent question-answering systems that can provide both precise document matches and contextually rich, human-readable responses.

RAG Versus Direct Vector Search

Let's begin by understanding the fundamental distinction between direct vector similarity search and full RAG workflows. These two approaches serve different purposes in the GenAI ecosystem and offer complementary strengths for various use cases.

Direct vector similarity search operates at the mathematical level, transforming your query into an embedding vector and finding the most similar document vectors through geometric distance calculations. This approach excels when you need precise, deterministic matches and want to examine the raw similarity scores between your query and stored documents. It's particularly valuable for debugging your vector index, analyzing the quality of your embeddings, or building custom retrieval logic where you need direct access to similarity metrics.

Full RAG workflows, in contrast, orchestrate multiple AI operations to provide complete question-answering experiences. The RAG process begins with the same vector similarity search but then feeds the retrieved documents into a large language model that synthesizes the information into coherent, contextual responses. This approach transforms raw document matches into human-readable answers complete with proper citations, making it ideal for end-user applications, customer support systems, or any scenario where you need intelligent, conversational responses rather than just document references.

It's important to note that RAG workflows are significantly more expensive than direct vector search due to the additional large language model inference required for response generation. While direct search only incurs the computational cost of vector similarity calculations, RAG workflows also consume substantial tokens for both input context (retrieved documents) and output generation (synthesized responses), making cost optimization an important consideration when choosing between these approaches for your specific use case.

Setting Up Client Connections and Configuration

Our implementation begins by establishing the necessary AWS service clients and defining the configuration parameters that will govern both our direct search and RAG operations, like we did in previous lessons:

This configuration establishes three distinct AWS service clients, each serving a specific role in our querying architecture. The s3_vectors_client provides direct access to your vector index for mathematical similarity searches, the bedrock_runtime_client handles embedding generation to transform text queries into searchable vectors, and the bedrock_agent_runtime_client orchestrates the complete RAG workflow by coordinating retrieval and generation operations. The model specifications ensure consistency between our query processing and the original document embeddings: EMBEDDING_MODEL_ID must match exactly what we used when populating our vector index, while GENERATION_MODEL_ID specifies the large language model that will synthesize retrieved information into coherent responses.

Generating Query Embeddings for Direct Search

To perform direct vector similarity search, we first need to transform our text query into the same vector representation used for our stored documents:

The embedding generation process mirrors exactly what we performed when populating our vector index in lesson two, ensuring mathematical consistency between our query vectors and stored document vectors. The "normalize": True parameter is particularly important for similarity search operations, as it standardizes vector magnitudes to focus purely on directional relationships rather than magnitude differences. The response processing involves JSON parsing to extract the raw numerical vector, which we convert to a list of Python floats for compatibility with the S3 Vectors API. This query embedding becomes the mathematical representation that will be compared against all stored document vectors to identify the most semantically similar content.

Performing Direct Vector Similarity Search

With our query vector prepared, we can now execute the direct similarity search against our S3 Vectors index:

The query_vectors operation performs the core mathematical computation that drives semantic search, calculating similarity scores between your query embedding and every vector in your index. The topK parameter controls result quantity, returning only the three most similar documents rather than exhaustive results, while returnDistance=True provides the numerical similarity scores that quantify how closely each result matches your query. The results processing demonstrates how to safely extract information from the S3 Vectors response structure, using the get() method to handle potential missing fields gracefully.

Examining Direct Search Results

When executed with our test query, the direct search produces output that reveals the mathematical relationships within your vector space:

These distance scores provide valuable insights into your vector index's behavior: smaller distances indicate higher semantic similarity, with the business requirements document showing the closest match at 0.4789 distance, followed by the product requirements at 0.5374, and the design document at 0.5871. The consistent ranking demonstrates that our vector embeddings successfully capture semantic relationships, correctly identifying documents about Nimbus Assist as most relevant to queries about this topic.

Implementing Full RAG Workflow

Moving beyond direct search, we now implement the complete RAG workflow that transforms document retrieval into intelligent question-answering:

The retrieve_and_generate operation represents the sophisticated orchestration that makes RAG workflows so powerful. This single API call coordinates multiple AI operations: it embeds your query using the same model configured in your knowledge base, performs vector similarity search to identify relevant documents, ranks and selects the most appropriate content, and feeds this context into a large language model for response generation.

The input query is represented as a dict containing the text key. The retrieveAndGenerateConfiguration parameter provides a hierarchical structure that organizes all workflow settings:

Top-level configuration: Specifies "type": "KNOWLEDGE_BASE" to indicate we're using a Bedrock Knowledge Base as our source.

Processing RAG Responses and Citations

After the RAG workflow completes, we need to extract and display both the generated answer and the source document citations:

This response processing code demonstrates proper extraction of both the generated text and the critical citation information that makes RAG responses trustworthy and verifiable. The citation processing implements a fallback hierarchy for source identification: it first looks for x-amz-bedrock-kb-source-uri, which provides the full source URI, then falls back to filename from the metadata, and finally defaults to "Unknown" if neither is available, ensuring robust handling of various document metadata configurations.

Analyzing Complete RAG Results

When executed, our RAG workflow produces comprehensive results that showcase the system's ability to handle diverse query types and provide detailed, cited responses:

These results demonstrate the capabilities of well-configured RAG systems. The first query produces a comprehensive, well-structured answer that synthesizes information from multiple source documents, providing detailed context about Nimbus Assist's purpose, capabilities, and integration points. The second query showcases the system's ability to handle technical, detailed questions by extracting specific configuration parameters and organizing them clearly with proper formatting. Most importantly, the third query demonstrates responsible AI behavior: when the knowledge base lacks relevant information (in this specific case, the PTO policy), the system explicitly states this limitation rather than fabricating an answer, maintaining trust and reliability by acknowledging the boundaries of its knowledge.

Conclusion and Next Steps

Congratulations on completing the final lesson of Managing Data for GenAI with Bedrock Knowledge Bases! You've successfully mastered the complete spectrum of GenAI data management, from initial vector storage setup through RAG querying implementations. Throughout this course, you've built production-ready infrastructure for intelligent document retrieval, learned to balance direct mathematical search with AI-powered response generation, and developed the skills to create trustworthy, citation-backed question-answering systems. Your achievement in reaching this point represents a significant milestone in your GenAI expertise.

Your journey through this foundational course has equipped you with essential knowledge of AWS's GenAI ecosystem and practical experience implementing the core components that power modern AI applications, which you will apply in the upcoming practice section. Then, as you move forward to the next course, "Putting Bedrock Models to Action with Strands Agents", you'll build upon this solid foundation to create even more sophisticated conversational agents that can take real-world actions. You'll explore advanced capabilities, including tool integration, knowledge base connections, and cutting-edge Model Context Protocol features that will take your GenAI expertise to the next level of practical application and real-world impact. Keep learning!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal