6 Lessons from Building TeqMate AI’s Reasoning Architecture

Today, TeqMate AI is a multi-agent system capable of handling data retrieval, detecting anomalies, analyzing root causes, independently validating decisions, and passing context between systems. But it didn't start that way.

We began with a single reasoning layer: one model, one retrieval pipeline, one response that reduces manual investigation and helps AdOps teams work with operational context more reliably. The lessons from that first iteration — what failed, what we had to rebuild, and what grounded the system — directly shaped how we later split responsibilities among agents.

Lesson 1: Basic RAG was only the first grounding layer

AdOps questions are mostly highly specific and operational: how a pricing rule behaves in a given setup or why tweaking a configuration leads to a performance shift. The challenge lies in correctly understanding what the user is actually asking and identifying the exact operational context behind the question, not something adjacent to it.

That’s why the initial version of TeqMate AI started as a simple RAG-based chat assistant. When a user asked a question, the system first looked through internal documentation in Confluence and Jira to find the most relevant context. The language model then used this retrieved context to generate an answer.

This already improved reliability compared to a standalone LLM. But as soon as we started testing the system with real questions, it became clear that it was still making mistakes — missing context, returning partial matches, and giving incorrect answers.

What we expected vs what actually happened

Expectation	Reality
The model finds the correct information	The model finds something related, but not an exact match
The answer is fully grounded	The answer sounds correct but can still be wrong
Finding the right information solves the problem	Finding information is not the same as understanding the answer

This was the first important shift in TeqMate AI’s evolution: retrieval could ground the system, but it could not yet support operational reasoning on its own.

Lesson 2: Operational troubleshooting requires context continuity

The problem we identified early was that the system treated every question as a completely new request. If a user asked, “Why did CPM drop?” and then followed up with “What changed in auction performance?” the system didn’t connect these steps. Each query was processed in isolation, without understanding the broader troubleshooting flow. In AdOps, that continuity is critical — root cause analysis depends on linking signals across multiple layers of data rather than evaluating them in isolation.

isolated questions system Without Conversational RAG

Alt text: conversation aware system With Conversational RAG

At this stage, the challenge was no longer simple retrieval but maintaining operational continuity across multi-step troubleshooting workflows. Instead of treating follow-ups as new questions, the system preserves the analytical context across the entire troubleshooting process, turning separate searches into a continuous investigation. This made the analysis more consistent and allowed the system to act less like a search tool and more like an operational assistant that helps AdOps teams connect signals and identify root causes.

Our key takeaway was that AdOps troubleshooting is a continuous reasoning flow, not a set of isolated questions. Preserving that flow became an early foundation for TeqMate AI’s later multi-agent workflows.

Lesson 3: Fragmented context breaks operational reasoning

Another practical challenge we faced was handling large documents and performance reports. Language models aren’t always able to extract the exact answer from a long document and correctly map it to a user’s question when too much unrelated context is present.

To ensure TeqMate provides relevant answers, we decided to divide the documents into smaller chunks. These fragments were then indexed and retrieved individually. But after splitting, important context was often spread across different fragments. One fragment could describe a rule, while another explained a condition that changes how that rule behaves. When we separated these parts, the system sometimes produced answers that looked correct on the surface but missed key dependencies and exceptions.

At first, we thought the issue was mostly about size — small chunks lose context, while large ones cause the system to retrieve too much unrelated information. Later, it became clear that the structure also matters. Once related parts were split incorrectly, the system stopped understanding how they depend on each other.

Overlapping chunks

This is where we learned that there is no universal chunking strategy. Technical manuals, contracts, FAQs, and support documentation all behave differently and require unique approaches. We shifted the focus from “splitting documents” to preserving structure. Hierarchical chunking helped keep related parts together with their headings, so the system could understand how sections are connected. Through preserving context at the chunk level, the system could return answers with full context, reducing misinterpretation of complex setups and preventing costly configuration errors.

This lesson was not only about better chunking. It showed that TeqMate AI needed to preserve relationships between rules, conditions, and configurations before it could reason reliably across AdOps workflows.

Lesson 4: Operational intent does not always match system terminology

When looking for answers, people rarely use the same wording found in documentation. They use platform-specific references, partner names, shortcuts, abbreviations, inconsistent naming, and sometimes incomplete wording. On top of that, they rarely reference formal documentation terms — instead, they use different naming conventions for partners, feature toggles, deal IDs, or reporting fields. The system may miss the correct signal even when the relevant information is present. The problem is not missing data, but the gap between operational intent and the way that intent is represented in documentation, reports, and configurations.

A single concept in AdOps can have several valid expressions, and without understanding these variations, the query loses precision before retrieval even begins.

From a system perspective, this exposed a key limitation of embedding-based retrieval. Vector search still depends heavily on how concepts are represented in text. It does not inherently understand that one operational concept can have multiple valid expressions or context-dependent meanings. We realized that no single retrieval approach can cover the full range of AdOps queries. Embedding-based search helped match similar meanings, but it wasn’t perfect.

Why one search method is not enough

Method	Limitation
Embeddings	miss exact terms (IDs, names)
Single query	misses variations

To make retrieval more resilient, we moved toward a hybrid retrieval approach. By combining semantic vector search with traditional keyword-based ranking (BM25), we introduced query expansion to generate multiple variations of the same question. This allowed the system to cover both sides of the problem — understanding the meaning and exact terminology.

This shift changed how we approached retrieval entirely.

Building a reliable AI assistant requires understanding not only the data, but also how end users formulate problems in real operational workflows.

Lesson 5: Similar signals are not always causal signals

In early tests, the language model selected documents based on their similarity to the question. But “similar” doesn’t always mean “relevant.” What we saw was that operational data is often distributed across multiple signals, described in different ways, and applied differently depending on context. Different wording in the question prevented the system from even finding the right answer.

This becomes especially visible when users ask why performance dropped for a specific endpoint or why auction behavior changed under a particular setup. The system could find related information about performance analysis, auction dynamics, or endpoint configuration. However, it could still fail to connect those pieces into an answer relevant to the exact issue. That's because all related signals were treated as equally important.

We introduced a reranking layer on top of the initial retrieval. Instead of relying solely on surface-level similarity, the system began evaluating which pieces of information are most relevant to the question's specific operational context. For questions without direct one-to-one answers, the system had to combine general operational logic with the user’s specific context. It also needed to determine which signals were most useful for understanding the situation.

This became one of the earliest reasoning mechanisms that moved the system beyond retrieval and toward operational interpretation of competing signals.

Lesson 6: Operational reasoning needs measurable validation

As the TeqMate AI architecture evolved, it became clear that relying on human intuition is subjective. Small changes in retrieval, ranking, or prompting affected the system's interpretation of operational issues. Sometimes an update improved one type of answer while making another type less accurate.

When we started iterating on the system, improvements often seemed positive at first glance, but we couldn’t reliably separate real progress from subjective perception. Without a clear way to measure quality, the actual root causes were difficult to notice before they affected real troubleshooting workflows. This led teams to the wrong conclusions rather than helping them resolve issues faster.

smart assistant workflow TeqMate AI Workflow Overview

To solve this, we moved away from subjective evaluation and suggested a structured measurement approach:

A set of real questions with verified answers helped us measure whether the system could consistently reason through specific cases rather than simply return related information.
Automated evaluation using LLM-as-judge techniques enabled us to assess whether answers were complete, clearly structured, and grounded in the provided context.

Nevertheless, we still needed to measure not just whether the answer looks good, but whether the system correctly identifies and explains the root cause based on the retrieved context. At this stage, evaluation was no longer only about answer quality.

It became a way to validate whether the system could reason through operational context consistently enough to support later agentic workflow

How these early lessons shaped TeqMate AI

The early retrieval and reasoning lessons formed the foundation of TeqMate AI’s broader multi-agent operational workflows. Real-world testing showed that the right answer is rarely the result of a single step. It emerges from a sequence of decisions across context, retrieval, and interpretation. What made the difference:

Hybrid retrieval, query expansion, and reranking can connect real operational intent with the right signal in the system.
Structured evaluation replaced intuition with clear signals, allowing us to see whether changes actually improved performance.
Smart assistants must be trained on the right parameters and optimized to behave more human-like, so their responses are natural, context-aware, and aligned with how AdOps teams reason through problems.

What started as an early retrieval and reasoning layer later became one of the foundations for TeqMate AI’s broader multi-agent architecture. We understood that reliable AdOps assistance requires more than just retrieving information: the system has to preserve context, connect signals, validate the quality of reasoning, and support multi-step operational workflows.

What we learned building the first reasoning layer of TeqMate AI