This page shows you how to connect your RAG Engine to Vertex AI Vector Search.
RAG Engine is a powerful tool that uses a built-in vector database powered by Spanner to store and manage vector representations of text documents. The vector database enables efficient retrieval of relevant documents based on the documents' semantic similarity to a given query. By integrating Vertex AI Vector Search as an additional vector database with RAG Engine, you can use the capabilities of Vector Search to handle data volumes with low latency to improve the performance and scalability of your RAG applications.
Vertex AI Vector Search setup
Vertex AI Vector Search is based on Vector Search technology developed by Google research. With Vector Search you can use the same infrastructure that provides a foundation for Google products such as Google Search, YouTube, and Google Play.
To integrate with RAG Engine, an empty Vector Search index is required.
Set up Vertex AI SDK
To prepare Vertex AI Vector Search instances for the RAG application, follow these steps:
To set up Vertex AI SDK, see Setup.
Set your environment variables to the following:
PROJECT_ID=YOUR_PROJECT_ID LOCATION=YOUR_LOCATION_ID
Optional: If you are using Vertex AI Workbench, then it is pre-authenticated, and this step isn't required. Otherwise, to run the notebook, you must run the following cell authentication:
# If it's Colab runtime, authenticate the user with Google Cloud if "google.colab" in sys.modules: from google.colab import auth auth.authenticate_user()
Enable your APIs by entering this command:
! gcloud services enable compute.googleapis.com aiplatform.googleapis.com --project "{PROJECT_ID}"
Initialize the aiplatform
SDK
To initialize the aiplatform
SDK, do the following:
# init the aiplatform package
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)
Create Vector Search index
To create a Vector Search index that's compatible with your RAG Corpus, the index has to meet the following criteria:
IndexUpdateMethod
must beSTREAM_UPDATE
, see Create stream index.Distance measure type must be explicitly set to one of the following:
DOT_PRODUCT_DISTANCE
COSINE_DISTANCE
Dimension of the vector must be consistent with the embedding model you plan to use in the RAG corpus. Other parameters can be tuned based on your choices, which determine whether the additional parameters can be tuned.
# create the index
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
display_name="your-display-name",
description="your-description",
dimensions=768,
approximate_neighbors_count=10,
leaf_node_embedding_count=500,
leaf_nodes_to_search_percent=7,
distance_measure_type="DOT_PRODUCT_DISTANCE",
feature_norm_type="UNIT_L2_NORM",
index_update_method="STREAM_UPDATE",
)
Create Vector Search index endpoint
Public endpoints are supported by RAG Engine.
# create IndexEndpoint
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
display_name="your-display-name", public_endpoint_enabled=True
)
Deploy an index to an index endpoint
Before we do the nearest neighbor search, the index has to be deployed to an index endpoint.
DEPLOYED_INDEX_ID="YOUR_DEPLOYED_INDEX_ID"
my_index_endpoint.deploy_index(index=my_index, deployed_index_id=DEPLOYED_INDEX_ID)
If it's the first time that you're deploying an index to an index endpoint, it takes approximately 30 minutes to automatically build and initiate the backend before the index can be stored. After the first deployment, the index is ready in seconds. To see the status of the index deployment, open the Vector Search Console, select the Index endpoints tab, and choose your index endpoint.
Identify the resource name of your index and index endpoint, which have the following the formats:
projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}
.
If you aren't sure about the resource name, you can use the following command to check:
print(my_index_endpoint.resource_name)
print(my_index.resource_name)
Use Vertex AI Vector Search in RAG Engine
After the Vector Search instance is set up, follow the steps in this section to set the Vector Search instance as the vector database for the RAG application.
Set the vector database to create a RAG corpus
When you create the RAG corpus, specify only the full INDEX_ENDPOINT_NAME
and
INDEX_NAME
. The RAG corpus is created and automatically associated with the
Vector Search index. Validations are performed on the
criteria. If any of the requirements aren't met,
the request is rejected.
Python
CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
index_resource_name = my_index.resource_name
endpoint_resource_name = my_index_endpoint.resource_name
vector_db = rag.VertexVectorSearch(index=index_resource_name, index_endpoint=endpoint_resource_name)
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME, vector_db=vector_db)
REST
# TODO(developer): Update and un-comment the following lines:
# CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
# Full index/indexEndpoint resource name
# Index: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexes/${INDEX_ID}
# IndexEndpoint: projects/${PROJECT_ID}/locations/${LOCATION_ID}/indexEndpoints/${INDEX_ENDPOINT_ID}
# INDEX_RESOURCE_NAME = "YOUR_INDEX_ENDPOINT_RESOURCE_NAME"
# INDEX_NAME = "YOUR_INDEX_RESOURCE_NAME"
# Call CreateRagCorpus API to create a new RagCorpus
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"rag_vector_db_config" : {
"vertex_vector_search": {
"index":'\""${INDEX_NAME}"\"'
"index_endpoint":'\""${INDEX_ENDPOINT_NAME}"\"'
}
}
}'
# Call ListRagCorpora API to verify the RagCorpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora"
Optional: Create RAG corpus without Vector Search information
To create an empty RAG corpus without Vector Search information that you can update later, select one of the code samples:
Python
CORPUS_DISPLAY_NAME = "YOUR_CORPUS_DISPLAY_NAME"
vector_db = rag.VertexVectorSearch()
rag_corpus = rag.create_corpus(display_name=CORPUS_DISPLAY_NAME, vector_db=vector_db)
REST
# TODO(developer): Update and un-comment the following lines:
# Call CreateRagCorpus API to create a new RAG corpus without the Vector Search information.
curl -X POST -H "Authorization: Bearer $(gcloud auth print-access-token)" -H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
"display_name" : '\""${CORPUS_DISPLAY_NAME}"\"',
"rag_vector_db_config" : {
"vertex_vector_search": {}
}
}'
# Call ListRagCorpora API to verify the RagCorpus is created successfully
curl -sS -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
"https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora"
After Vector Search resources have been set up, you can update the RAG corpus with the corresponding information.
Python
index_resource_name = my_index.resource_name
endpoint_resource_name = my_index_endpoint.resource_name
vector_db = rag.VertexVectorSearch(index=index_resource_name, index_endpoint=endpoint_resource_name)
updated_rag_corpus = rag.update_corpus(corpus_name=rag_corpus.name, vector_db=vector_db)
REST
curl -X PATCH \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora -d '{
"rag_vector_db_config" : {
"vertex_vector_search": {
"index":'\""${INDEX_NAME}"\"'
"index_endpoint":'\""${INDEX_ENDPOINT_NAME}"\"'
}
}
}'
Import files using the RAG API
Use the ImportRagFiles
API to import
files from Cloud Storage or
Google Drive into the Vector Search index. The files are embedded and
stored in the Vector Search index.
Python
RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"
GCS_BUCKET = "YOUR_GCS_BUCKET"
response = rag.import_files(
corpus_name=RAG_CORPUS_RESOURCE,
paths=[GCS_BUCKET],
chunk_size=512, # Optional
chunk_overlap=100, # Optional
)
REST
# TODO(developer): Update and un-comment the following lines:
# RAG_CORPUS_ID = "YOUR_RAG_CORPUS_ID"
//
# Google Cloud Storage bucket and file location.
# For example, "gs://rag-fos-test/"
# GCS_URIS= "YOUR_GCS_URIS"
# Call ImportRagFiles API to embed files and store in the BigQuery table
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}/ragFiles:import \
-d '{
"import_rag_files_config": {
"gcs_source": {
"uris": '\""${GCS_URIS}"\"'
},
"rag_file_chunking_config": {
"chunk_size": 512
}
}
}'
# Call ListRagFiles API to verify that the files are imported successfully
curl -X GET \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}/ragFiles
Retrieve relevant contexts using the RAG API
After completion of the file imports, the relevant context can be retrieved from
the Vector Search index by using the RetrieveContexts
API.
Python
RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"
RETRIEVAL_QUERY = "YOUR_RETRIEVAL_QUERY"
response = rag.retrieval_query(
rag_resources=[
rag.RagResource(
rag_corpus=RAG_CORPUS_RESOURCE,
# Optional: supply IDs from `rag.list_files()`.
# rag_file_ids=["rag-file-1", "rag-file-2", ...],
)
],
text=RETRIEVAL_QUERY,
similarity_top_k=10, # Optional
vector_distance_threshold=0.3, # Optional
)
print(response)
REST
# TODO(developer): Update and un-comment the following lines:
# RETRIEVAL_QUERY="YOUR_RETRIEVAL_QUERY"
# Full RagCorpus resource name
# Format:
# "projects/${PROJECT_ID}/locations/${LOCATION_ID}/ragCorpora/${RAG_CORPUS_ID}"
# RAG_CORPUS_RESOURCE="YOUR_RAG_CORPUS_RESOURCE"
# Call RetrieveContexts API to retrieve relevant contexts
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}:retrieveContexts \
-d '{
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
},
"vector_distance_threshold": 0.3
},
"query": {
"text": '\""${RETRIEVAL_QUERY}"\"',
"similarity_top_k": 10
}
}'
Generate content using Vertex AI Gemini API
To generate content using Gemini models, make a call to the
Vertex AI GenerateContent
API. By specifying the
RAG_CORPUS_RESOURCE
in the request, the API automatically retrieves data from
the Vector Search index.
Python
from vertexai.preview.generative_models import GenerativeModel, Tool
RAG_CORPUS_RESOURCE = "projects/{PROJECT_ID}/locations/{LOCATION_ID}/ragCorpora/YOUR_RAG_CORPUS_ID"
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(
rag_corpus=RAG_CORPUS_RESOURCE,
# Optional: supply IDs from `rag.list_files()`.
# rag_file_ids=["rag-file-1", "rag-file-2", ...],
)
],
similarity_top_k=10, # Optional
vector_distance_threshold=0.3, # Optional
),
)
)
rag_model = GenerativeModel(
model_name="gemini-1.5-flash-001", tools=[rag_retrieval_tool]
)
GENERATE_CONTENT_PROMPT="YOUR_GENERATE_CONTENT_PROMPT"
response = rag_model.generate_content(GENERATE_CONTENT_PROMPT)
print(response.text)
REST
# TODO(developer): Update and un-comment the following lines:
# MODEL_ID=gemini-pro
# GENERATE_CONTENT_PROMPT="YOUR_GENERATE_CONTENT_PROMPT"
# GenerateContent with contexts retrieved from the Vector Search index
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" https://${LOCATION_ID}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${LOCATION_ID}/publishers/google/models/${MODEL_ID}:generateContent \
-d '{
"contents": {
"role": "user",
"parts": {
"text": '\""${GENERATE_CONTENT_PROMPT}"\"'
}
},
"tools": {
"retrieval": {
"vertex_rag_store": {
"rag_resources": {
"rag_corpus": '\""${RAG_CORPUS_RESOURCE}"\"',
},
"similarity_top_k": 8,
"vector_distance_threshold": 0.32
}
}
}
}'
What's next
- To learn more about choosing embedding models, see Use embedding models with RAG Engine.
- To learn more about importing files, see Import RAG files.