Learn how to build effective metrics on Cekura by creating a basic metric -> providing feedback -> Using Auto-Improve
Chapters
00:00
Intro to Secura Metrics
Introduction to building metrics on Secura
00:06
What to Track
Understanding what to track in conversations
00:12
Trackable Items
Examples of trackable items and custom needs
Transcript
00:00
this video is about how to build good metrics on Secura.
00:05
Before starting, you first need to understand clearly what you're trying to track.
00:09
Some good things to track in a conversation can be topic, latency, instruction.
00:15
Follow.
00:16
While some of these might already be defined by us, like topic and latency found under the predefined metrics section, we do recommend custom metrics to track specific issues particular to your agent.
00:31
For example, we build testing agents who go and talk to your A.I.
00:35
agent.
00:37
We want to track how accurate our testing agents are, in ending those conversations.
00:43
Are our testing agents ending too soon?
00:45
Are they not completing all the steps, etc.
00:53
Let's now go to the metric section from the left panel and let's create a new agent metric.
01:00
We can name it correct and call by main agent.
01:04
Note, when you send calls to us for monitoring, we refer to your AI agent as the main agent and the other party as testing agent.
01:14
We'll select a metric type next Boolean here.
01:17
Since we either correctly end the call or we don't.
01:20
the next field is does it affect our call success?
01:24
This is subjective and up to you.
01:27
If the metric fails, we'll mark those calls as a failure.
01:30
This metric is important to me, so I'll mark it as a FX call success.
01:35
Now let's give it a quick description.
01:37
What does this metric even mean?
01:40
So this metric checks if the main agent is ending the call appropriately.
01:51
If all steps mentioned in metadata, instructions.
02:04
Are completed before ending the call, it is considered a success.
02:19
Note, I used a metadata instructions field.
02:22
You'll always find yourself wanting to give some more context to your metric Metadata and other available variables are precisely built for that purpose.
02:33
Once done, you can click on improve and get like a very basic version out here.
02:39
So you see it expands the metric description according to the guidelines mentioned here, Next step is trigger.
02:48
I want this metric to run on every call.
02:50
Hence it's always you can pick custom and put a trigger.
02:55
Like for example, there are some metrics you only want to be checked if an appointment is booked.
03:01
So you can add that here.
03:03
Return true.
03:03
If the main agent is trying to book an appointment, let's switch back to always.
03:10
This step is interesting.
03:11
We are looking to test a metric.
03:13
Now what this means is we can quickly run it on some of our old calls and see if it is even working as expected.
03:24
What this is helpful for is just seeing, okay, like you're using the right metadata variables.
03:29
So yeah, as you can see it works correctly and I do get some score and an explanation.
03:34
Once I'm happy I can just go and click Create metric.
03:38
So you see the metric here.
03:41
Now let's go to the observability section on the left and let's run this metric on this page.
03:51
So the workflow to follow here is build a metric and run it on a small set of calls to see how it is performing.
04:01
Now let's look into one of these calls.
04:05
As you can see, the metric failed here and the reason is the main agent ended the call prematurely without the testing agent completing the scenario steps.
04:14
Now this is incorrect and I want my main agent to be completing the scenario steps.
04:21
The main agent should complete the scenario steps and not the testing agent.
04:37
And I would call this successful call because yeah, like all the steps were followed as can be seen from the instructions here.
04:47
And the testing agent said agreed to the last thing and we ended the call.
04:51
So let's just click and add to lab This is a very good way to leave feedback on calls around your metric.
05:03
Next we can review a few other calls and repeat the same exercise.
05:09
We do this in order to teach our metric what is success and what is failure.
05:16
There are so many edge cases like if the testing agent gets stuck in a loop, the main agent should still end the call.
05:25
If the testing agent is continuing the conversation after all the steps are completed, the main agent should still end the call and a lot more.
05:36
The best way to Once you leave feedback on at least six calls, you can head to the lab section.
05:44
Lab section is basically where we optimize these metrics.
05:49
We use a metric optimizer which is DSPY based and use all the feedback that you have left us.
05:57
It is important to leave good detailed feedback here.
06:02
Once done, you can click Auto improve and let it do the magic.
06:10
Once it is done, you'll see a notification on the bottom left and you can click on view changes.
06:19
You see the metric is completely redone here.
06:23
It's very detailed and you'll see the new score which is 616 here compared to 0 out of 6.
06:30
If you're happy, you can just hit save and the metric will get saved.
06:35
Note you still have your old annotated data set and you can keep adding more in the future To summarize, create a metric by providing very basic description and the right context.
06:52
Run it on a small set of calls and provide feedback on cases where you disagree.
07:02
Head to the labs and hit Auto Improve and let us handle the rest.
07:08
You'll notice that within two iterations of providing feedback and running Improve Auto Improve, the metric accuracy improves significantly.
07:20
Also, in the future, maintenance of this metric becomes extremely easy for you.