Feb 15, 2026

Agent Evaluators Overview and Demo

11views

Agent Evaluators Overview and Demo

Chapters

00:00

Agents as Digital Workers

Introduction to evaluating agents as digital workers.

00:16

Agent Performance Dashboard

00:33

Individual Agent Metrics

00:49

Performance Over Time

01:17

Conversation Analysis

02:02

Lessons Learned and Fixes

02:22

Improving Agent Instructions

Transcript

00:00

So we've created something that's rather rare in the industry where we're looking at the agents, as digital workers.

00:08

And we're evaluating their performance a lot like you would any other, worker within your organization.

00:16

So you can see here this dashboard that gives you a bird's eye view of how the agents are performing on the system.

00:23

we're tracking these things over time.

00:24

This is a report that shows you your top performing agents, agents that need some attention, some recent alerts pertaining to the performance of the agents.

00:33

And you can of course drill down and look at how any particular agent is doing.

00:38

Look at this agent over here where we can see its performance metrics.

00:42

It's being rated on several dimensions.

00:44

Some of them are displayed here.

00:46

and we're tracking that performance over time.

00:49

So a new data point gets created pretty much, for every version of the agent.

00:56

we mark those as specific milestones and that helps us track how the agent is performing differently based on different versions of it.

01:05

And you can see that something's changed recently here.

01:07

As of January 27th, it looks like we're having a few more tool, errors and some knowledge gaps.

01:15

So we can actually drill down into that.

01:17

We can look at individual conversations.

01:20

We can see the ratings for each one.

01:22

I can click into this particular conversation that kind of had a low score and I can review the evaluation of that specific conversation.

01:29

It's got a score of 5.4 out of 10.

01:32

and we can see what the particular conversation was about, the outcomes that were achieved, what the exact rating was composed of.

01:41

The agent scored well on helpfulness and accuracy and clarity.

01:45

But user satisfaction was low and tool success rate was low.

01:50

That's kind of what brought the performance down.

01:53

So the evaluator also computes lessons learned, which are things that are then fed back into the agent for potential improvement.

02:02

Every agent gets evaluated like this.

02:05

So going back to the performance evaluation for our agent, we can look at the knowledge gaps that were discovered.

02:12

And for any given knowledge gap, we have also suggested potential fixes.

02:16

I can click on one of those and it will tell me what I need to do in order to improve the agent's performance.

02:22

In this case, it is suggesting that we add some instructions to the agent system prompt.

02:27

It can review this, and based on what I'm seeing, if I approve it, it will be added to the agent's instructions.

02:35

And then we can measure how the agent's performance, changes based on those changes later on.

Agent Evaluators Overview and Demo

Agent Evaluators Overview and Demo

Agent Evaluators Overview and Demo

Chapters

Transcript

Comments

Comments