
In today’s data-driven world, the ability to efficiently search and retrieve information is paramount. Google Cloud’s Vertex AI Agent Builder is a powerful tool that leverages Google’s expertise in search and conversational AI to create Gen AI applications. One of its newest features is Search Tuning (still in Preview at the time of writing), which allows for fine-tuning of the search model to meet your industry or company needs better.
In this post, we’ll briefly explore this service (and point to the relevant documentation to dig deeper) and provide a script to perform preliminary data checks to ensure you can hit the ground running and your search tuning efforts stay on the right track.
What is Vertex AI Agent Builder?
First things first: since, as developers, we care about naming and since it might have been hard to keep up with the product names for these features; let’s clarify: initially introduced as Gen App Builder in early 2023, it was rebranded as Vertex AI Search and Conversation not much later, and is today called Vertex AI Agent Builder.
Now that we got that out of the way and know what to call the beast, let’s dive into what it can do for us!
Vertex AI Agent Builder is Google Cloud’s way of making it very easy for you and your business to leverage Gen AI. It abstracts away a lot of the legwork you need to do to build a conversational AI, a semantic search experience, or even a Retrieval Augmented Generation (RAG) system and helps you leverage their state-of-the-art foundation models and their world-renowned search technology that has been 25 years in the making.
Compared to other offerings in the Vertex AI suite (e.g. the Gemini 1.5 Pro API or Vector Search), it sits on the more managed spectrum of things and thus provides less customisability and flexibility which you trade for a faster time-to-market and less need for scarcely available in-house talent.
Within Vertex AI Agent Builder we have access to Vertex AI Agents and Vertex AI Search:
- Vertex AI Agents provides an easy way to build conversational user interfaces using a natural language understanding platform built on large language models (LLMs).
- Vertex AI Search, on the other hand, helps you build AI-enabled search and recommendation experiences.
For today, we will focus on Vertex AI Search, since this is the product that provides Search Tuning and for which we will present a data-checking script.
For a comprehensive introduction, please refer to the documentation here.
What is Vertex AI Search?
Within Vertex AI Search, we have the option to build search apps and recommendation apps. Our focus today is on the search apps.
Some of the key features are:
- Out-of-the-box natural language understanding and semantic search. Semantic search understands the context and intent behind a search query. In contrast, more traditional approaches like keyword search rely on the exact matching of specific keywords without understanding the reasoning or context behind them. For example, when we search for: “When did Apple announce the last iPhone?”, the semantic search will understand we are talking about Apple Inc. and not your favourite fruit.
- Out-of-the-box capabilities to understand synonyms, correct spellings, and auto-suggest searches.
- Get generative AI-powered summarization and conversational search for unstructured documents. For example, we can decide what replies get served:
- Search (single-turn)
- Search with an answer (single-turn search with summarization)
- Search with follow-ups (multi-turn search)
Setting up Vertex AI Search consists of 2 stages on a high level:
- Preparation: we set up a data store, ingesting our structured or unstructured data. It will be processed, chunked, and embedded along with its metadata. It will be stored for retrieval into the highly performant Vector Database on Google Cloud called Vector Search. You still have some customisation options: you can bring your own schema, bring your own embeddings (in Preview at the time of writing), and even customise parsing and chunking.
- Runtime: this is when we do the actual querying using our user input. It will retrieve the relevant documents for you and generate an answer using those documents (if that’s what you want). Also here there is room for customisation: you obviously bring your own prompts, you can specify certain controls to e.g. filter the returned results, and you get to decide if you want short answers (snippets) or longer paragraphs (extractive answers or segments) in your responses. You don’t get to bring your own ranker, but more on that below!

Example of an extractive segment on an unstructured data store (PDF) with. Note that the page number from the original file gets included in the reference. (screenshot adjusted for space requirements)
What is Search Tuning?
So what if you’re not 100% happy with the quality of the search results you are getting? You can turn to the Search Tuning feature (in Preview at the time of writing) for your unstructured data stores. Say, the data you want to search over is very niche to your company, it might make it difficult for a vanilla model to do accurate retrieval. But we can help; and for that, we just need to feed it some data!
How to use Search Tuning?
More specifically, you will need to bring 3, and potentially 4 training datasets:
- Training queries: these are the search queries you expect your users to bring to your system. E.g.:
What is the maximum student-to-instructor ratio for confined water dives?
2. Extractive segments: these are snippets taken from the documents in the data store (your corpus). Some segments will need to answer some of the queries defined above, some will not (they both serve their own purpose; positively reinforce the model and negatively reinforce the model). These segments need to be sufficiently long as well. E.g.:
## Ratios
### Confined Water 10:1 — May add four student divers per certified assistant.
### Open Water 8:1 — May add two student divers per certified assistant
3. Relevance scores: these training labels relate the queries with extractive segments using a score (non-negative integer) where 0 signifies that the segment is not relevant to the query and the higher the score, the more relevant the segment is for a certain query. E.g.:
The relevance score in this case would be
1since the segment is relevant to the question asked.
4. (Optional) Test labels: this data is similar to the relevance scores, but is used to evaluate the performance of the tuned model. If we don’t supply this data ourselves, Search Tuning will use 20% of the queries of the training labels defined in 3.
What could possibly go wrong?
This data is supplied as files and needs specific formatting and needs to fit some requirements. Some of our customers were experiencing issues when trying to use the service though; getting an Error Code 13, with the message “Internal error encountered. Please try again. If the issue persists, please contact our support team.”
The root cause turned out to be that not all files were adhering to the requirements laid out in the documentation, but unfortunately, the error message didn’t reveal that information. Don’t despair; DoiT to the rescue! We have coded up some simple checks that will help you catch malformatted files early and avoid the error and it’s available on GitHub and as easy to run as:
python search_tuning_checks.py.py <corpus_path> <query_path> <scoring_path>`.
An example output could look something like this:
General dataset checks
- - - - - - - - - - -
Number of segments in Corpus file that don't have a match in Scoring file: 6030
Number of segments in Scoring file that don't have a match in Corpus file: 1551
Number of queries in Query file that don't have a match in Scoring file: 0
Number of queries in Scoring file that don't have a match in Query file: 0
Documentation dataset checks
- - - - - - - - - - - - - -
Training query requirements met: ✅ met
|___ Subcheck: At least one extractive segment per query: ✅ met
|___ Subcheck: At least 10 000 additional extractive segments: ✅ met
Extractive segment requirements met: ✅ met
Relevance score requirements met: ✅ met
|___ Subcheck: At least 100 segments that contain query answers: ✅ met
|___ Subcheck: At least 10 000 random segments: ❌ not met
|___ Subcheck: At least 10 000 segments with 0 as score: ✅ met
Corpus file requirements met: ❌ not met
Query file requirements met: ✅ met
|___ Subcheck: Same ids in query and scoring data: ✅ met
|___ Subcheck: Column 'score' contains non-negative integer values: ✅ met
Training labels requirements met: ✅ met
The `general data checks` just do an integrity check of the provided data: do all the IDs of the provided segments correspond to a score on one hand and do all IDs of the provided queries correspond to a score on the other hand.
The `documentation dataset checks` runs the following checks that are documented in the Google Cloud documentation:
- For training data in general (first 3 checks)
- For the Corpus file
- For the Query file
- For the Training labels file
It’s worth making sure getting ‘✅’ all around before submitting your files since the feedback loop can be a few hours!
This article explored Google Cloud’s Vertex AI Agent Builder, focusing on the Search Tuning feature within Vertex AI Search. Search Tuning allows fine-tuning of search models for better accuracy and relevance. The process requires specific datasets: training queries, extractive segments, and relevance scores. To address common formatting issues that can lead to errors, we introduced a Python script that helps with preliminary data checks. This tool can help users ensure their datasets meet the necessary requirements before submission, potentially saving time and improving the effectiveness of their search tuning efforts.
Running into any other issues? Visit doit.com/services and learn how we can help!