build in public #1 - project showcase

David Szabo-Stuban

Feb 19, 202627 views

Chapters

00:00

Introduction & Projects

Introduction to the 'building public' blog and an overview of the projects discussed.

00:45

Alfred: AI Butler Project

Details on the Alfred AI butler project, its evolution, challenges, and current state.

21:30

Dovetail: Dev Workflow

Exploring Dovetail, an AI-powered tool for automating developer workflows and context management.

39:03

Joe: Proactive Voice Agent

Insights into Joe, a proactive voice agent for construction, focusing on interaction and performance.

51:14

Jig: Cognitive Assembly Line

Understanding Jig, a cognitive assembly line for deterministic and agentic task orchestration.

01:08:50

Other Projects & Conclusion

A look at the 'No Code No Clue' project and concluding remarks on future updates.

Transcript

00:00

This is my very first Build in Public blog, but I wanted to go away from just very well-researched blog articles because I'm not a pundit, I'm a builder.

00:09

So I want to show you what I'm building because if I did this, I would have a lot more content, and I'm pretty sure that you guys are more interested in this one.

00:16

So today I want to talk about 4 things— things, Alfred Dovetail Joe and Jake.

00:22

Alfred is the project that you guys know about.

00:24

Dovetail is a new project that I teased in my latest blog article, which helps me be not just more productive but also be a better engineer without having to be a better engineer.

00:35

And then Joe and Jig are client projects, which I'm going to talk about in just a few minutes.

00:39

And then there are a couple other fun stuff that I'm going to show you.

00:42

So, Alfred— I started working on Alfred, last August, right, when I said I'm building my own butler.

00:51

Because basically what I wanted— And everybody was promising that with AI, but it's not doing that.

00:56

And it's pretty annoying because of hallucinations.

01:00

And I And then even a couple months later, MCP was introduced in November, and I'm like, yeah, that's what we need.

01:07

But even that is very brittle.

01:09

So anyway, I started building n8n workflows, because that's actually easier than coding.

01:15

At least that's what I thought.

01:16

And then basically I figured, hey, I want Alfred to just run these n8n workflows for me and build these n8n workflows for me.

01:27

And then really quickly I realized that a big problem of this is connection, like having MCP connection and then not paying a shit ton of money to Make.com or Zapier or n8n.

01:39

So I need self-hosting, and then I went into this whole self-hosting idea.

01:42

So I started building on Alfred OS, which was supposed to, host a bunch of open source applications for you and just run a bunch of apps, but the open source alternatives for that.

01:55

So like instead of Make.com, let's use n8n because you can self-host it.

01:59

Instead of Zoom, Jitsi because it's the same thing but open source.

02:03

And then I would have skills for Alfred, n8n workflows that can really do stuff well.

02:08

And then, you know, if I want to use MCP servers, I can just add them to skills.

02:12

But the point is that an Alfred skill would be deterministic, right?

02:17

That's the big problem that I've had because you go into Claude and you create a bunch of connectors and then it starts doing stuff with the MCP servers, but then it starts hallucinating and I need someone to hold the agent's hand to not drift off because the moment it starts working with lots of data points, it starts to drift off.

02:36

And what happens if I'm like, hey, get me data from this, from this, from this, from this, and then prepare a report for me and then— then— it's just so much bloat and data that it, it drifts off.

02:46

So that was one big problem.

02:48

And the second big problem was again, I'm shit at DevOps, so I needed to figure this out somehow.

02:54

And ultimately I ended up creating a Railway template, which know, you click once and it deploys a bunch of apps, like n8n and NoCodeDB and cal.com.

03:04

Some of them work, some of them don't.

03:06

And then it also deploys a custom version of LibreChat, which is just a ChatGPT clone that would allow you to use your own API keys to have a chat with Alfred.

03:16

And then with all— inside Alfred, Alfred would be connected to all the apps that you had.

03:20

And I'm like, yeah, that's pretty cool, but I still want the N8N workflows to be running automatically.

03:26

And that— for that, I would want it to be, you know, hosted locally because I want the hosting part to be done.

03:31

So I'm not paying for the cloud version.

03:33

and then the N8N licensing issue was a big problem because that I cannot do unless I pay pay up $50 grand a year and I don't want to.

03:40

So I was kind of stuck, right?

03:42

I wanted Alfred to just know things and, and, and do things and save money.

03:47

And then the ultimate idea was that I would have Alfred and then boom, installs the, the whole Alfred OS and the agent and the skills and everything on a server, and then Alfred can do stuff for you.

04:00

Now the problem is, even if I solve that I still cannot work with NA10 because I don't have the embed license.

04:07

So instead of doing that, I needed to figure out some other ways, and there are a couple other ways.

04:12

Like for example, what's really promising is Manus.

04:15

Now has a thing.

04:18

It has an API, right?

04:19

I don't even know where the API key that— or maybe here.

04:23

So Manus has an API, which means that I can just give tasks to Manus.

04:28

But the problem still remains.

04:29

It's like, okay, I don't have to do the N810 thing.

04:32

I can still probably figure out how to build the cloud bit, but I never done that before.

04:36

But even then, this is not really going to be entirely free because you still need to pay for credits.

04:41

So I decided to kind of streamline the whole to— what I'm showing you is something that I built in the last 3 days, and this is how powerful my new workflow has become, is that I was able to solve a problem that I don't have the skills for in, in 3 days.

04:56

So the idea of Alfred is basically it's an execution layer for right?

05:00

You get an MCP server, you connect it to Claude or ChatGPT or whatever, and then it just knows things, does things, and saves money for you, and that's it.

05:09

And how it works, it's, is not technical.

05:12

It is technical, but your experience of it is not technical.

05:16

So there is a big, big problem with AI execution, which is These things are very smart, but they're not logical.

05:22

They cannot employ actual reasoning, which means that reasoning unlocks deterministic workflows.

05:28

I tell you to follow step 1, 2, 3, 4 in this order, and the AI may or may not do that.

05:34

There is no validation, there's no enforcement, there's no deterministic stuff.

05:38

For that, you need to write code, and that's kind of the plumbing that's missing.

05:43

So basically, we would need separation of concerns.

05:45

We want the agentic work, which is flexible and scalable, but also can hallucinate and, and be unreliable.

05:51

Reliable, but it's really good with unstructured information.

05:54

And that unstructured information is the idea I have in my head.

05:58

and— but once it's clear enough, I want the AI to kind of lock it down.

06:03

It's like, hey, here's the deterministic way of getting things done, and here's how I'm going to do it again and again and again.

06:08

Claude skills are a, pretty good move in that direction, right?

06:13

There are agent skills inside Claude But these are just mega prompts and short scripts, and it's not entirely what I would want.

06:21

So it's like, okay, we're getting there, but the flow I'm expecting is— and the original flow I was expecting is I talk to Alfred via chat, and then once I figure out what I want, Alfred builds an N8N workflow for me, and then I can set when it should run, or I can just say, hey Alfred, run this for me.

06:40

And that would turn Alfred into an operator, like a chief of staff, but AI.

06:45

And it doesn't work because all the N8N workflows Claude or any other AI tools can build are not very good right now, and it's just— it's just brittle.

06:54

So we need something better.

06:56

And the MCP was an idea that, you know, MCP servers, and I just the tasks to Alfred— to Claude, and then it just calls the MCP tools.

07:05

But it's it's then— it's unreliable and hallucinates.

07:08

So I either have spaghetti inside Na10 that I need to plumb, or I use an agent that tries to kind of cosplay as Dexter from Dexter's Laboratory, but really it's DeeDee.

07:20

And that, that's the big problem.

07:22

So in order to solve that, I came up with two things called Talk Mode and Work Mode.

07:27

And the version of AllFree that I'm building right now, basically what you get is that the moment you log in, What is you get deployed, you get provisioned a new server.

07:37

So let me show you.

07:38

This is the current version.

07:40

It's, it's very early but I'm using the OpenSaaS template.

07:45

It's running on a WASP framework, and I definitely recommend you take a look at it if you want to build apps because it's complete stuff— Stripe integration, authentication, everything.

07:55

So When you're in Alfred, it's, you know, the namings and everything are not, done yet.

08:00

It's all running locally, but the idea was I log in and then when I log I would just click on Launch My Alfred OS and then it starts getting created.

08:10

And then when it gets created, it's pending, which means there is a server, remotely that's currently setting things up for me.

08:19

I'm currently using Contabo because I'm going through the verification process with Hetzner, which I don't know, I've built— I've opened bank accounts with less scrutiny.

08:29

But if you look at it now, it's provisioning, and now I got a specific URL, dynamicpheasant.alfredoess.site.

08:37

This is set, the DNS configuration is set, but you also get the IP address.

08:43

Now currently nothing's installed on it..

08:46

But now there's an automatically generated virtual machine that I can— that sets itself up and then with a setup script.

08:54

And then inside that setup script, I can add whatever app I want to add.

08:59

So that's the next step.

09:00

So I solved this and, we're getting pretty close to the Alfred Cloud because the big change is that I let go of the idea of installing N810 on this.

09:10

okay, so once you have that installed, you will have basically two apps that comes with the install: LibreChat and NoCodeDB.

09:18

And that's, very intentional because you're not— probably you're never going to have to open that.

09:22

What you're gonna have as a quick start, you're going to get a simple URL and an API key, and all you need to do is go to Claude or ChatGPT and add Alfred as a connector.

09:33

And once you add Alfred as a connector, then you can open Claude and say, hey Alfred, do this and do that, and it will actually use Alfred instead of Claude.

09:42

Like Claude turns into Alfred for that session.

09:44

this is the chat part.

09:45

If you want to go to LibreChat because you don't want to use Claude, you can still open the LibreChat.

09:50

That's why it's there.

09:51

You can open LibreChat and, and use Alfred there..

09:55

It will have the MCP connections to, to NoCodeDB.

09:58

It will have all the MCP connections you want to, and then the Alfred MCP will do the same thing that LibreChat can do.

10:04

So there is the same thing.

10:06

But the, the use case here is not that.

10:09

And that was my biggest pet peeve, is that I was getting distracted with the Alfred OS.

10:13

This is just an infrastructure thing.

10:15

And now, now that I figured out how to run this, I will just create Alfred Cloud where you pay, I don't know, probably $40 or something per month, and then you get a server, full access to it if you want to.

10:27

You get a and then, it runs Alfred, it runs all the apps, it runs LibreChat, and then you have your own little thing and all the data is yours.

10:35

but the interesting thing is happening inside work mode because the idea that I want— I have is that I want to teach Alfred a skill once.

10:44

And then I want Alfred to just remember how to run it forever.

10:48

So I would just describe something like, hey, process my emails and organize them into labels by topic of importance.

10:54

And then I want Alfred to automatically look at my connections and see what, what's what, like get the context on my, little world, my reality, and then come up with an idea of how this should actually work.

11:11

And then it comes up with a solution.

11:13

It's like, hey, I think what we should do is connect to Gmail and then fetch unread emails from the last 24 hours, etc., etc.

11:22

And then also give me a couple ideas like, hey, what if we do a sentiment analysis from another MCP server as step 4?

11:30

Or what if I do this and that?

11:32

And you say, okay, you know what, I want you to do option C.

11:36

And then I just say learn it.

11:38

And then when I say learn it, then Alfred— it doesn't save the skill into memory.

11:45

It actually creates deterministic code that will trigger Alfred running every time, the exact same time, the exact same way, the exact same tools are being used.

11:57

It's deterministic and reliable and does the exact same thing all the time, much like an N8N workflow, but you just built it in 4 steps because you let it.

12:04

Alfred do the heavy lifting for you.

12:07

So talk mode and work mode is really just separation of concerns, right?

12:10

Talk mode is about authoring.

12:12

It's like ad hoc conversations, exploration.

12:15

Hey Alfred, do this, do that.

12:16

And then whenever something happens, it's like, hey, what we just did, just give me a, like a few-step walkthrough of what you did and what tools you used.

12:23

And then when Alfred gives you something like, this is a really good workflow, I want to learn it.

12:27

And I want you to run it every Monday at 7 AM.

12:31

And then Alfred actually transforms that into a skill, and that replaces an n8n workflow.

12:36

And, and that's it.

12:39

Inside WorkMode, this— you will have a trigger, you will have the actual skill which have the actual steps, and then Alfred always runs it, completely deterministically all the time.

12:49

And then the idea here is that You know, Claude skills are supposed to kind of do that, but the authoring bit is missing and the orchestration bit is missing.

12:58

The execution layer is missing.

12:59

So let me give you a couple examples.

13:02

Let's say I just type in, hey, every Monday I want you to pull revenue from Stripe, support tickets, project updates, Slack mentions, and then send me a summary on Slack.

13:12

And then Alfred understands what I want and then it creates a workflow suggestion And then it's not like a 45-node spaghetti.

13:20

It's really just 6 steps, right?

13:23

Because it's like agentic in execution, but deterministic in orchestration.

13:27

And then once I say, yeah, that's it, learn it, then it saves it for itself and it will done those— do those steps exactly the same thing.

13:35

Or another thing, it's like, hey Alfred, I want you to analyze support— my support tickets, find buying motives, build our ICP, research leads in Apollo, and then save them into my database.

13:45

Base.

13:45

Like, that's a pretty big work.

13:48

So I'm like, okay, let's see, let's— we trigger this manually and pull support tickets via Zendesk.

13:54

Then I'm gonna analyze— it says off via AlfredOS, which means that Alfred is thinking, instead of using external tools.

14:01

So analyze buying motives, build ICP.

14:04

Then based on that, I'm going to use the Apollo MCP to research matching leads and then save them to the database.

14:10

Base.

14:10

And then I can also do stuff like self-maintenance, right?

14:13

I can just say, Alfred, I want you to— every Monday, I want you to analyze our conversations from the last week and suggest me skills to add or remove or improve.

14:22

And because can examine itself, it will pull conversation from the history from last week using its own MCP, then analyze skill usage patterns, identify gaps, generate improvement suggestions, and then send you a report via Slack.

14:36

And then Alfred might suggest to relearn some skills if there are some, errors or something.

14:41

So you don't need to deal with any of that.

14:43

another example, which is, something that I started working on, is I have a Home Assistant, unit here at home, and I started, connecting everything.

14:51

And then there is the Home Assistant, voice preview addition.

14:57

Which is, which is basically like Alexa but, for Home Assistant.

15:01

And then you can run anything on it.

15:03

So I have a Home Assistant voice preview.

15:05

It's downstairs in the kitchen.

15:07

It's connected to, chat— it's connected to OpenAI.

15:10

I think it's using GPT-5.

15:12

And it's also— also has the, the iconic Alfred voice from ElevenLabs.

15:18

And, and that's it.

15:19

So then I have cameras, I can— I have smart, switches and stuff.

15:23

So what I can do is I can say, hey, if someone's at the door and I don't reply on Slack in 60 seconds, call my phone.

15:29

So then Alfred is like, okay, first I need to have a workflow created.

15:33

Trigger would be a doorbell event.

15:34

I need to figure out what that is.

15:36

I just connect to Home Assistant through an MCP connection and find that event.

15:40

And then I will just— okay, that's the webhook that I need to set.

15:43

So you don't need to deal with any of that.

15:46

And then it's like, okay, step 1, detect the doorbell rings and Slack alert, wait 60 seconds, no response.

15:54

If it reply— replies, then, you know, I can see that on Slack, so I don't do anything.

15:58

If no reply, then I'm going to initiate a phone call, which can be either via Twilio or it even can be using VAPI, which, you know, just have a voice agent.

16:07

So Alfred can actually call me and say, David, someone's at the door.

16:10

Like, go check it out.

16:11

You're not looking at right now.

16:13

So, the idea here is that this is the kind of stuff that I want to use, and it's pretty annoying to me that I can use Clothen skills and everything, but it's not doing the stuff that I would want to do, because it's hallucinating, or I can build N8N workflows, which are again not doing what I want because they're so complicated and spaghetti and they're brittle and fragile and, and unreliable.

16:35

So the idea here is that Alfred will come together as everything, right?

16:40

You get— you just click on deploy once like you did here.

16:44

If I, if I, refresh, you can see it's running, it got an IP address, it got a host, and that's it, and everything's there.

16:51

So log in, sign up for Alfred, then you get an MCP server and API key so you can start immediately using it.

17:00

Then, Alfred OS gets deployed into that server for you.

17:04

So Alfred has instant connection to open source tools, and then you will have apps that, you— like open source apps.

17:10

I'm probably going to build or add more apps the App Store, and— or you can also add your own MCP connections.

17:17

Like let's say you love Zoom, you don't want to use Jitsi, so instead of installing Jitsi on Alfred OS, and you're going to just create an MCP connection for, for, for Zoom to Alfred.

17:29

And from that moment onwards, Alfred understands what's about.

17:33

I'm probably going to have— for the Alfred Cloud, we're going to have a solo version, which is going to be the lowest tier, and going to be, you know, including the, whole thing, very basic LibraChat and NoCodeDB, and then whenever there is new apps coming up in the App Store, you immediately get access to them, but you can start using Alfred immediately.

17:55

I'm not entirely sure about the pricing.

17:57

It may be— I don't know, it, it may be, usage-based or, or like skill runs, runs, or it may be a simple monthly fee.

18:05

the AlfredOS bit is just, you know, it's, it's flat.

18:08

It's, it's, it's— you have full root access to the server, and on top of that you have Alfred as the agent.

18:16

the way the Alfred OS gets deployed, I'm still going to make that open source.

18:20

So if you don't want to use Alfred as the agent, just want the self-host stuff, you get to that.

18:28

then, I'm I'm I'm also thinking about, another, tier, I'm not entirely sure how this would work.

18:34

I may just have the Alfred solo and that's it, and then, you know, you can add as many MCP connections as you want.

18:40

But the whole point is, you know, collaboration and, and skills library, because the real value is going to be in skills.

18:46

And then for those who have existing businesses and they're like, hey, I have a bunch of that I need here, I have the Alfred Pro subscription, which would come, with a white glove onboarding, and I will, I will come on board and help you create the skills, help you set everything up for yourself, and then you just get a URL to log in, and you also get the Alfred MCP, setup as a one-pager.

19:09

And then, you know, we would monitor the skills and the skill development.

19:14

So, so you really get sort of a performance supervisor for your Alfred AI chief of staff.

19:20

don't know about pricing yet, I'm still thinking about it.

19:23

There's a lot of stuff to do.

19:24

I just want this to work first.

19:26

And finally, it sort of clicked everything together and it works and the skill gaps are closed.

19:32

So yeah, I'm going to launch this website soon and, and, and, probably give some more updates.

19:40

But also because I don't want to just, you know, show you and talk to you about theoretical stuff.

19:47

I have started using Linear for tracking my work, and inside Linear I have a bunch projects.

19:53

Some of them are, are, you know, client projects, some of them are my own projects.

19:58

And here is Alfred, Alfred, which to my project management tool, again Linear, and it has all the stuff that I'm working.

20:07

So full build in public mode Right?

20:08

Full build in public mode.

20:10

You can see all the stuff that I'm working on.

20:12

you're going to see how these are created.

20:16

I'm building everything with Cloud Code.

20:19

So this is public and I'm going to share the links to the different roadmaps, to the different products I have below.

20:25

And, and you will have the ability to, to keep track of things and ask questions and whatnot.

20:30

So again, I want to build fully in public.

20:34

yeah, so, so this is Alfred, and, I— probably the first question most people are gonna ask is when is this gonna be ready?

20:41

honestly, I don't know.

20:43

I keep building it, and now I feel like it's probably going be closer than I figured.

20:47

there are a couple people who paid, for the original Alfred, and I will give them the option to either have, for the Alfred Cloud, or, use, use it for sort a white glove onboarding session, and then, you know, keep them on.

21:04

So I don't know, we'll figure, we'll figure things out.

21:06

A couple months ago I suggested that if you paid for it, you can just ping me and I will help you set it up for yourself however you want.

21:13

That still stands.

21:14

Okay, so I'm going to keep posting about what I do.

21:18

The first video is like pretty long, but, I'm going to start releasing more updates to you on YouTube and here on, on the Lumberjack.

21:27

And it's just, you know, immediacy.

21:28

Hey, here's what I'm building, here's where I'm at.

21:30

So you will be able to see that.

21:32

Now, how and why can I do this?

21:35

So then we get to Dovetail.

21:37

Dovetail is a really interesting project because First, here's the evolution, right?

21:43

I started with IFTTT in 20— 2014.

21:46

That was the first time I built a no-code automation, and it was— I think it was archiving photos from my iPhone to Dropbox.

21:54

And back then, I was running a software company.

21:57

I was designing and launching products, and I was raising VC funds and doing stuff, and And I did a lot of technical stuff in my life, but I never really coded.

22:06

So this whole no-code idea pretty cool.

22:08

And then as, you know, life happened and the AI thing came and, and then, I— there was the Promaster era where I really got into Make.

22:18

So in make.com, I started using it in 2023 and I got really, really big on that.

22:23

But the problem was it got really expensive really quickly.

22:27

So in 2024, I moved over to N8N.

22:30

And the interesting thing about here is that these are progressively more and more complicated.

22:35

Somewhere in between these two, I was using Zapier, but that's like super expensive.

22:40

So with n810, it wasn't enough, so I had, had to add JetBrains, which was, not JetBrains, sorry, let me me check for— check it.

22:47

So I joined— I started So I started using JetAdmin for creating, a UI.

22:51

And then there are a couple solutions like that, like Bubble and Softr and Framer a couple other stuff.

22:58

So, I was looking around and then I started getting into, using Airtable a bit more and then I moved over to Supabase.

23:06

So, you know, my stack started expanding, getting more complicated as I was building more more complicated stuff.

23:13

And then, in 2025, I started using Lovable.

23:19

And basically, Lovable plus Supabase replaced everything else.

23:21

So I didn't need to use N8N, I could just create Supabase Edge Functions.

23:25

I didn't need to use JetAdmin because Lovable was building the UI.

23:28

I didn't need to use Airtable because Lovable had a native Supabase integration.

23:34

I felt powerful.

23:35

It was really good.

23:36

But then the problem is the same thing that we discussed with the Alfred problem, which is it's brittle.

23:42

Even lovable UI gets really, really, complex and, and brittle very quickly.

23:48

So it wasn't very reliable.

23:50

So I started focusing more and more on dev work.

23:52

I was learning a lot of stuff, and then ultimately I ended up using Claude Code.

23:59

Claude Code., a lot, right?

24:01

So with Cloud Code, I tried Warp, I tried Cursor.

24:04

I even used Cloud Code in the simple terminal or PowerShell, and, you know, it works perfectly.

24:11

It's fine.

24:12

So Cloud Code is pretty cool.

24:14

And, then I started going through like a productivity boom, right?

24:19

So then I went into Cloud Flow.

24:23

Which is built by RivenCoin, and and it basically launches a bunch of cloud code instances, and then you just, you know, have an agent swarm that builds stuff for And I don't know, man, it's, it got— it got overwhelming really quickly because the hallucinations problem was still there, right?

24:41

So Claude Code was really great, but the— it was still hallucinating and making stuff up.

24:45

Claude Flow is the same.

24:47

I love it, it's pretty cool, but there is a lot of it that's just smoke and There's a lot of, you know, I don't know, there's a lot of tricks in there, I think.

24:56

so after a while I realized that basically, I have the same stack every time I work., right?

25:04

All of— almost all of my projects are almost always built on the same stack.

25:08

I use Supabase for DB, I use Supabase for authentication, I use Supabase for serverless functions, and, and for, APIs as well.

25:17

It just gives you— like, you, you create a project in Supabase and then you're done.

25:22

I started using Linear for documentation and project management, because I was— I was— I wasn't doing anything correctly.

25:31

So I had to start forcing myself for— to start documenting stuff.

25:37

And then I started using Fly.io Fly.io for hosting.

25:40

It's pretty cool, it's pretty lightweight, pretty thin, and you just create an app on on Fly.io, deploy it, and runs it.

25:46

And Cloud Code can deal with it pretty easily.

25:49

So I'm like Supabase, Linear, Fly.io, And then I was still using Lovable for UI, but then immediately once I have the UI more or less nailed down, I will move it to GitHub and then pull the GitHub repo into Cloud Code and keep working from there.

26:06

But again, there were a couple problems, namely around code changes and commits, commits and auditability, etc.

26:15

So the big problem I had with this is that I just kept working and working, and then in the beginning I was looking at the stuff that Claude Code was changing, but then after a while I got complacent, and then, you know, it started replacing entire working features with placeholders for no particular reason.

26:33

So I needed to learn a bit more about, about, branches and pull requests and everything in GitHub.

26:39

But then, then all of a sudden I had this whole idea that, okay, so I have an idea, I go to Claude Code and I say, hey, do this and change that.

26:48

And then Claude Code, out of context, it just pulls stuff from its memory and then it starts building something and then it pushes it to GitHub, which then pushes stuff to fly.io and it gets deployed.

27:01

And then if deleted something from the database, like, I need to go back, but I don't actually realize it until like an hour later.

27:08

So in order to make this work, I needed to have a pretty strict flow, right?

27:14

I needed to— first of all, when I start working on a project, I don't want to deal with any of this shit because anything— everything needed to be connected.

27:22

So, but let's say I have this set up, everything is set up, so Then what I do is I send the prompt.

27:30

Let's say I start Claude code, code, and then Claude code, code or me.

27:39

Somebody to check the Git repo status, status, latest commits, latest, latest updates in linear, connections, and basically get, get the context or get context on what's what, right?

27:52

And then I say, okay, send the prompt on what to build.

27:57

But if I'm not doing it, right, if, if it's manual, then inside the prompt, in the prompt, I would need to do stuff like, check linear, check DB, do this, do that.

28:10

And even then, in order to like do everything properly, I would need to say check linear to see if my request is related to any issues, issues, then check the latest commits on GitHub to see if we worked, worked on this before, then figure out what I want, right?

28:31

And then if it figures out what I want, instead of just starting to build it, I would need to say create a new— or, or move to a new branch in Git.

28:44

And start working working on that.

28:46

Once done, create a pull request and update Linear.

28:49

So if it's manual, then I do all of this before I send the prompt, or I could just say Claude Code does it.

28:56

Claude Code does it, and then we could have skills, which is automatically invoked or not.

29:01

So I don't have control over that.

29:04

That's not really good.

29:06

I can use a Claude, Claude.md prompt, which again works or not.

29:11

So then we have the new Claude plugins, which was pretty cool.

29:15

You know, it's, it, it does— it kind of does what I want.

29:20

But I was like, okay, so Claude plugins are basically commands and hooks.

29:24

But what is the hook that we're running here, right?

29:25

how do we, how do we make it work?

29:27

so I was— I had the idea of launching sub-agents, but then again, it works or it doesn't, because then the agent is doing the orchestration.

29:37

So anyway, so I decided that none of the current solutions solved this, so I started working on Dovetail.

29:45

And what Dovetail does is, first off, I can just, I can just create— I say dovetail init.

29:50

Let me show you.

29:52

So basically the way you install Dovetail is you say it's on npm, so npm install g numberjankso/dovetail, and then you install it, and then it pulls the, pulls the package from npm, it, it pulls the, the, the code, and then You can check the version, and then this is the current version, and you can take a look at all the different— all the different commands Dovetail currently has.

30:17

If it's the first time you're running, you can go to Dovetail Onboard, or if it's not the first time, you can go to Dovetail Config.

30:27

And in the config, you can see that there are a bunch of different, connections put together.

30:34

There is GitHub, the Linear CLI and and the Linear API.

30:38

Linear doesn't have a command line interface.

30:41

Supabase connection, Fly.io connection, and then also a couple of default stuff like what's the default organization, what's the Linear team key, what's the Supabase default organization.

30:51

It just walks you through that, right?

30:53

And then that's it.

30:55

Now the interesting thing about all this is if something's not working, you can also just, launch dovetail doctor and that's it.

31:02

Now if I want to create a new project from scratch, what I have to do is I just say dovetail init and then project name.

31:10

I'm not going to start it now because I just started— I just ran it before.

31:15

So it starts it.

31:16

Let's say the project is dovetail live.

31:19

And then what happens, it runs the authentication, right?

31:22

It asks me to confirm if this is the repository, do I want to make this public, what's the Linear, team key, what's the Fly.io region, etc.

31:31

And then it configures a project and then it creates the scaffolding.

31:37

So it creates a simple PernStack application code, like the basics are set up, for running a an app that can be hosted on fly.io and whatever.

31:45

It's, it's a container.

31:46

You can build whatever you want in it.

31:49

Then you— it creates a GitHub repository.

31:51

So creates a scaffolding, creates a GitHub repo, creates a Linear project, creates a Supabase project, creates a fly.io app, then installs all the dependencies, wire everything together, set everything up.

32:03

And then once it— the whole thing is put together, the last step is it installs a bunch of hooks to Claude, right?

32:10

right?

32:10

And then if I want to, I can go ahead and take a look and see that here is my scaffolded project that I just created.

32:20

I can go to Linear and see my Linear project was just created right, with a couple of basic issues.

32:26

or I can go to Supabase, and I don't think this is going to log it— load it because that's the project URL.

32:35

But let me it for you.

32:37

So I go to my project here And as you can see, the Dovetail Live project, which ends with XYLE, as you can XYLE.

32:48

So this was just creating and created.

32:50

And then, if I go to fly.io, you can see that there's Dovetail Live staging and Dovetail Live production.

32:58

It's still pending because, it hasn't been committed yet.

33:01

Now I have this here, and and if I want to, I can just go over to, Dovetail Live, and then let's see what's what.

33:11

So I just go cd dovetail-live.

33:11

Let's go over here.

33:12

There you go.

33:14

And I just can— I can just say dovetail status.

33:17

And I can see we don't have an active issue, so I can start running stuff here like dovetail check issue, and then it checks all the issues.

33:25

Are there any open?

33:26

No.

33:26

So I'm going to create a new one.

33:29

What's the issue title?

33:32

And let's say I want to create a DIY gardening landing page.

33:37

—no description, medium priority.

33:38

So here is the manual mode, right?

33:41

And it doesn't even work, right?

33:42

Okay, so what we can do is I just go to cd dev— live And then what I can do is, you know, I can check Dovetail status.

33:53

Status— it shows no active issues.

33:54

So there are a couple stuff that I can do here.

33:57

Or if I don't want to do stuff manually, what I can do is I can just launch Claude.

34:02

And then when I launch Claude, what actually happens is that because I installed a bunch of hooks, Claude now automatically runs those hooks every time I want them to run..

34:11

So for example, every time a session starts up, there is a Dovetail, command that runs which gets us all the— what's the project, what branch, branch are we on, what are the services, what are any issues, let's see what the commits commits are there, what are the latest updates updates on, on Linear, and that's it, right?

34:29

And then, no MCP servers are configured.

34:31

That's something that I will need to add to Dovetail so it automatically configures, uh,, Linear MCP server and Superbase MCP server and everything for you, so you don't have to deal with that.

34:43

But the idea here is that when I say build a, DIY gardening blog page, then the idea is that before I actually send a new— a prompt, first let's say if there is an existing issue— there is none, so it— the first hook fails.

35:03

So now it starts looking at the actual code inside the database— inside the code base, right?

35:09

And then it creates— it fills its own context, but it's not doing anything.

35:14

It's not changing anything.

35:15

It's just learning about your, your code.

35:16

It's learning about the the environment it's in.

35:18

And it says this is a Pern stack with a Vite and React, etc.

35:24

So now it understands What's up?

35:26

And then the idea is that all of these things— this is done automatically by a hook.

35:33

So the idea here is that Dovetail has a session start— start hook, has a user prompt submit hook, has a pre-tool use hook, and the post-tool use hook.

35:45

And that means that that every time a session starts, check what's, what's up with the project, get context on the last work.

35:53

When I send the prompt, double-check if we are on the— we are, on, on an issue.

35:56

Is there an issue?

35:57

Is it the project?

35:58

Like, do we have anything like that?

36:01

And then before there is a pre-session— look at this, there is a pre-to-use hook that runs before— it shows error but it's not an error, so it runs before write action is and, and launches a sub-agent.

36:16

And that— or sorry, first checks if, are we trying to use a restricted tool?

36:21

Yes, the write tool, changing a file, that's restricted.

36:23

So let's see, what's the project, what's the main— oops, no active issue.

36:27

So we're going to block, block the, the, task.

36:31

And instead we're going to launch a Dovetail Sync Agent.

36:34

That job is to say, here's what the user wants, here is the, the project, here is everything about the project, and then figure out, do we need to create a new, new issue?

36:42

Do we need to create a new branch in GitHub?

36:45

Like, do we do?

36:46

And it just gets everything done for you together.

36:49

there is a current issue with the hooks.

36:51

I'm— that's what I'm working on right now.

36:54

it's not very stable at the moment, so— because I think the agents weren't installed for some reason.

37:01

But then the agent gets everything done for you and then creates a new branch on GitHub related to the issue, so the two are synced together, and moves to that branch.

37:10

So you're not going to change anything that's working.

37:14

And then once it's done working, it triggers the postToolUse hook, that basically does, okay, so we just did this, is this issue done?

37:21

If not, then we keep working on this.

37:23

If it's done, I'm going to create a pull request, document everything in Linear, and update the or create new issues if needed.

37:31

And then, and only then, I'm, I'm done, right?

37:35

And that means that I move forward, I keep building stuff, and then, Dovetail sort of cleans up in front of me, like clears the pathway in front of me, gets me the context I need, and then it also cleans up after me.

37:47

And because of the branch protections on GitHub, and, and the pull requests and the document— heavy documentations and the automatic deployment, everything's pretty automatic.

37:57

And there is literally zero chance that, cloud code this way can actually like fuck stuff up, for real, during development.

38:08

And the interesting thing is, because this way the whole context is now externalized inside the commit messages in GitHub and the linear documentation, now the agent always has really good context even if, if it doesn't have that in its memory.

38:20

So I can just say, hey, tell me what to do.

38:24

I want to build this Alfredo OS Cloud and tell me what are the stuff we should do.

38:29

And then it says, hey, so we have 5 milestones.

38:32

Milestone 1 requires 6 steps.

38:34

And I actually said to Claude, fine, in that case, launch 6 agents, for all of the— my— all of the milestone steps, make them write, whatever needed to be built, independently.

38:45

And then launch another agent that orchestrates it and tests everything.

38:49

And then because the hooks deterministically enforce what to do and how to go through that, it always works the same way.

38:55

So yeah, that's what I'm working on right now with Dovetail, and, I'm pretty sure that I can, I can release it soon.

39:01

It's— again, it's not stable, it's not working, but anyway, I'm going to release it anyway.

39:08

yeah, so where are we?

39:10

Okay, so that was Dovetail.

39:12

And So you go into here and then you can actually look at the, issues and everything that's in here.

39:22

So everything that, that I'm building is documented in here.

39:25

this documentation is done through like an earlier version of Dovetail.

39:29

So this is fully automatically built, the whole Linear.

39:32

I never open Linear.

39:34

okay, so let's go back.

39:36

So the next thing is Joe.

39:38

Now Joe is interesting because it's a voice agent that I'm building for a construction company, and, we have a couple of, of stuff that we need to, adhere to.

39:48

One, Joe needs to have— there you go.

39:52

So Joe needs to have a less than 10 cents, per minute, cost, also has to have a less than 500 milliseconds latency, and, and also has to— we have, we have about 30 different tasks that, the client said they want to have chosen.

40:09

We need to have at least 60% coverage and we need to have at least 80% success rate, right?

40:17

So again, the kind of the problem is that because agents are doing stuff on their own, achieving this is actually not very easy.

40:26

so at first what I was doing— there was also a, an evolution to this, right?

40:32

So at first what I doing is— let's go here.

40:36

So at first I figured I would build, I had an NA10 chat agent before we said we would want to have, a voice agent, and I moved the NA10 chat agent over to a custom built, app.

40:52

So I actually built a voice agent using the OpenAI, real-time API, which was expensive, but it did the job really well.

41:00

And there was one big problem that we couldn't really do anything with it, which is Joe has to be able to do stuff but also has to be able to interrupt me.

41:13

So this is is actually an interesting problem with, with voice agents, right?

41:17

Because it's usually one-directional.

41:17

I call the voice agent, I say, hey, I want you to do this and that.

41:21

Okay, I'm on it, wait for— wait a second.

41:23

And then it keeps doing it, or it gives me a call and we start having a conversation, but it's all happening synchronously.

41:31

And what I want to do is, let's I'm mid-call Right?

41:33

And what I can do is I can say, okay, hey Joe, I want you to do this and that.

41:38

Let me know when you're done.

41:40

Bam, hang up.

41:41

And then when Joe finishes, that could start another trigger that says Joe gives me a call and says, hey David, I'm done.

41:48

But what happens if I want that happen mid-call, right?

41:51

What happens if what I want to do is say, hey Joe, I want you to do this and that and then says, okay, I'm doing that in the background.

41:57

Is there anything else you want to— you want, want to discuss.

42:01

And while we're having a chat, if, Joe's background task is completed, whatever we're done— and whenever I stop talking, it's like, Joe, okay, thanks for that.

42:08

By the way, I just got the message.

42:10

Here is the result.

42:11

Do you want to hear the result of the previous query?

42:16

That kind of interaction makes the whole thing really, really, realistic, and it's very, very hard to make it work.

42:22

So I started working on, on, on a custom-built app But then the whole logic was a bit, challenging.

42:31

And then I moved the whole thing to the ElevenLabs conversation agent, conversation agent, which does everything perfectly except for this, this bit.

42:41

ElevenLabs agent cannot deal with the interruption stuff.

42:43

It cannot like have proactive mode for any voice And then I moved over to VAPI, which through the VAPI API, it is possible, it is possible, but you need some extra plumbing, right?

42:58

But need plumbing, because basically what we had in here was we had Joe— let me, let me just explain.

43:06

So Joe was a VAPI agent, and then what I built first was Joe had N8N workflow, right?

43:17

And that N8N workflow was called through MCP tool— MCP call.

43:20

I also experi— experimented with an MCP call or a direct function call.

43:24

it doesn't really matter, whatever works.

43:27

But what happens is that I'm having the conversation with Joe.

43:31

So user says something, and then Joe says something back to the user.

43:37

But also routes the channel— routes the task back to this N8N workflow.

43:42

And then whenever the N8N workflow is done, that workflow then calls the Joe, agent, and Joe proactively starts talking to the user, like mid-sentence, right?

43:53

Now the challenge with this was with, as always N8N, so I was looking at different ways of making it work, and, the big problem that we had with this is even inside n8n, you know, it works really well, but it, it took a lot of debugging.

44:10

But the really, really big problem was that Joe's task list here, right, we needed to have 60% coverage and 80% success rate.

44:17

So what I would do is I would take a task number 1, task number 1, and then actually take a look at all the tools Joe can have, which is just, you know, business data, MCP connections, via MCP connections.

44:34

There was an ERP system the company uses.

44:37

And basically a task needs to have a set of instructions, in sequence to complete.

44:45

Like, for example, step 1, tool call 1, step 2, tool call 5, and so on.

44:49

And I want that to happen exactly like that.

44:52

And that's when we had the challenge, same challenge that I was dealing with Alfred, which is if I just give it as a prompt, then sometimes Joe will do it that way.

45:02

Sometimes Joe will not do it that way.

45:04

Which is a problem, right?

45:05

Especially because what if, this tool call 1, tool call 5— what if that's not actually how it works?

45:12

the first idea I had was that we had a SQL database connection, so every tool call would be an execute SQL tool call.

45:20

And the idea was, is is that send— I would send the prompt And then once I send the prompt, Joe the agent would understand or translate— translate prompt, prompt to a SQL query, run the query, return answer.

45:36

Which seemed nice, but what if I want the financial projection based on the last 2 quarters of data and I want to know how much bonus I can get out of the company for Christmas?

45:45

I— that's not a— that's not a simple solution.

45:47

There's a lot of very complicated SQL queries that needed to run, and I was just kind of waiting for stuff to happen for minutes, and that wasn't— wasn't really useful.

45:57

That's when client said that actually we should have sort of a less than 2 minutes, completion in order for make it worth to have a live conversation.

46:08

So then I started translating these tasks into tool calls.

46:10

But then the question is, how do I do that?

46:15

Do I create an 8N workflow, like, an 8N workflow per task?

46:17

That's not very scalable, right?

46:19

Every new we would need to create a new workflow.

46:22

So that's not what we want.

46:25

if I just use the prompt, it's unreliable, unreliable.

46:29

so what can we do?

46:32

And then turns out that we needed to have an, a new MCP connection for the ERP system.

46:39

And that was really interesting.

46:40

That was, not a simple problem because what I had to have is we had the old API for the ERP system which was not built for agentic execution.

46:51

And also it was not built for human work, right?

46:53

It was a technical API, so it was following a technical logic.

46:58

I needed to build a new API for operations.

47:02

So instead of having an API for different database operations, I would have an API for, I don't know, updating a— like creating a status update or notifying stakeholders in a project or something like That, right?

47:15

We're creating a financial projection.

47:17

And then I ended up creating a, uh— I think there's 84 different API endpoints and then turned every endpoint into an MCP tool.

47:28

And that means that our Joe MCP connection can now, understand those 48 actions with schema.

47:34

So it doesn't have to always figure out what the query is.

47:37

It just calls that API endpoint.

47:40

But in order for that to work, I would need to basically translate task number 1 to a sequence of tool calls from that one— from that 48, bucket.

47:52

Like, how, how do you express task number 1 as a sequence of these 48 tool calls?

47:57

And once we had that we— the system now works in a way that I just ran the evaluations yesterday we're exactly there.

48:07

So out of the 30 tasks, we have 80 task coverage.

48:12

there are— if we add, if we add 3, extra MCP servers, stuff like Gmail, Weather API, and that sort of thing, we actually go up to 27 task coverage.

48:25

So 60% goes up to what, 97, 95%?

48:28

I don't even know what the percentage of that is.

48:33

Anyway, so we're almost there.

48:35

And then, we also had a median 80% success rate.

48:39

That's, success rate in the evals.

48:41

that— I think it's actually higher, but there were some logging issues, right?

48:45

Like, not all the MCP tool calls were actually logged in the evals.

48:49

And the way it works, it did— it does have— so Joe has a WAPI custom prompt and, custom system prompt.

48:57

And there are two more interesting aspects of it.

49:03

One is it runs on a Groq infrastructure, Groq inference.

49:06

It runs Llama Maverick, 17 billion billion billion parameters.

49:14

And, this gives us an exactly $0.1 per minute cost and the 475 millisecond latency.

49:21

So if I'm looking at these parameters, This works, this works, this works, this works.

49:26

This also kind of works.

49:28

I'm not— I think I'm not logging execution time yet.

49:31

So that's where I'm at right now.

49:33

so yeah, it's, there is one more thing that makes things faster, and this is this part, and that is the memory layer.

49:44

So looking at Looking at memory, I said user prompt, Joe calls Joe MCP, done.

49:51

What if I say after that, that, save what happened to memory, both what the user asked, if that was, good and the actual output.

50:01

And then I would change the process as well.

50:05

So I would go this and then also, get memory, right?

50:09

So instead of trying to do everything right away, I would want Joe to just get stuff from memory.

50:15

And for that, I'm using Memzero, which is pretty cool.

50:17

And, they just got a bunch of funding, so I'm pretty sure they're to be raising prices soon.

50:23

There is a startup package.

50:25

And, yeah, basically, you go to Smithery, and inside Smithery there are a bunch of— there are a bunch of, memzero MCP servers.

50:33

And there's also an official one, but honestly, it wasn't working.

50:35

at least it wasn't working for me, and I don't know what was the issue, but all the MC— all the memzero MCP servers had the same issue.

50:43

So I built one for myself and you can find it as David AI.

50:46

that's what I built, and it basically only has two tools.

50:50

It adds a memory and it searches memories, and that's it.

50:53

So it does exactly what we want it to do.

50:55

I say something, search my memories to see if there is a relevant bit, and then if no, call the— calls the Joe MCP and then says what happened.

51:04

If there is an actual then just immediately answer it without moving forward.

51:09

That actually sort of memory caching logic that makes things work a lot faster.

51:14

So that's the Joe project, and I'm hoping to hand it all— hand it over soon, because we are at the last stages of this work.

51:23

And then there is the last project, which is Jig, and, that's, that's the most interesting one.

51:31

Right, because JIG is nothing else than— is a cognitive assembly line.

51:37

Now this is interesting because there is again same problem like N8N versus agents, right?

51:43

I've been, I've been talking about it a lot because N8N, with N8N, you have, you have deterministic plumbing.

51:51

So, but you have deterministic workflow and manual plumbing, which means that if I want to create a 5-step process, I actually need to create a 50-node workflow because JSON expression, aggregate, merge, custom JavaScript, whatever.

52:07

And then if anything breaks, the whole thing breaks.

52:11

And then N8n also cannot just resume from that step unless you're saving executions properly.

52:16

And it's, it's just very complicated and very So, but if it works, right, if it works, it's great.

52:22

If it doesn't, you're debugging at 2 AM and you don't know why.

52:26

And that's, that's the big problem.

52:28

And also it gets very, very complicated.

52:30

Like, like the graphs are so complicated.

52:32

And then you have— you look at agents, which is, you know, ChatGPT agent, Manus, whatever.

52:41

And then they are probabilistic, so they drift off, off and hallucinate, but they can, can, be resourceful too.

52:48

Right?

52:48

So if, if let's say your JSON is broken and N8n breaks, Manus or ChatGPT agent or Claude code, it would just figure it out.

52:54

So hey, it looks like it's broken.

52:55

It looks like there is an extra comma there and I need to remove and that's it.

52:59

And you know, I don't want to be debugging at 2 AM because of an extra comma.

53:04

I want the agent to automatically figure it out.

53:07

So what happens here is, Do you have— you have a bunch of nodes.

53:11

This is how an ADN works, right?

53:13

You have a bunch of nodes, that's it, and then you also have a trigger, and that's it.

53:18

That's how an ADN works.

53:20

And then You have an output at the end.

53:24

Okay, so, the agent works differently, right?

53:27

The agent just sits in the middle and just does things as a black box.

53:32

And you have zero control over what's happening.

53:35

Or you can build an N8N workflow in a way that it does stuff similarly, like this.

53:40

It triggers an agent, then it triggers another agent, then it triggers another agent, and then it generates generates the output.

53:48

That's actually not a bad idea.

53:50

But then the problem is, basically the, the payload.

53:54

So let's just add a bag, right?

53:56

Here's our payload.

53:58

So all the information that we send over here, these are all, completely custom designed, right?

54:04

Inside N8n, you need to tell what are you sending off to the next agent.

54:09

What's the context that it should be getting, right?

54:13

So you need to be manually designing every single bit.

54:16

We— however, inside an agent, it just gets an idea and then it figures everything out on its own.

54:21

But if you want to build AI agents inside n8n, you have to still do the same plumbing thing.

54:26

So that means that even though it might work, you still need to, you know, deal with JSONs and commas and whatnot.

54:35

And my idea was that what if we had this logic, right?

54:40

What if we had this logic, but whatever the payload is, this would be managed somewhat— I don't know, somehow it would be managed differently, would be managed, with purpose, with intent.

54:50

Like, how would that look like?

54:52

How would, how would it look like if— oops.

54:56

How would it look like if I wanted to intelligently pass on context so the agent can actually understand what the other, what the previous agent did, but in a, in a, in a very compressed way so it doesn't bloat, doesn't get bloated.

55:11

And that's when I went to Claude code because I felt, okay, so what happens If I start Claude, let's say I would start Claude code, I would give agent MCP tool number 1, and I also give it a simple prompt and, prompt for a single task, right?

55:27

And then whatever the output is, I don't give a shit what the output is, just send everything over to the other agent, figure out what the ideal output should be.

55:39

And then the next agent would do MCP number 2 with another prompt, single task number 2.

55:46

And that means that basically we are launching new agents for every single task, just like what we did with NA-10, but then the context is decided— it's just everything is just shoved over together from one to the— from one to the other.

56:00

And here is the problem with this, right?

56:02

The problem with this is that basically it's like you launch Claude, Claude code, you turn on some connections, you have— you let it run, then you turn off those connections, give another prompt but new connections, and then you run it again and again and again and again.

56:18

And because it's the same session, it's actually very hard to make it so that, you know, you either have to manually turn everything on and off all the time, or, if you don't want to do that, then you need to figure out how to, how to restrict tools per each agent run.

56:35

And you either— this— to solve that, you either have to solve the N8N problem.

56:40

So instead of sending everything from one agent to another or using the same agent again and again, you need to figure it out manually what to send over, or you need to have a system of— that manages context between these agents at a very, very effect— effective lossless compression while also orchestrating the deterministic part of do this task, this is the tool you use, this is the the input I'm expecting, this is the context, this is the output I'm expecting, and that's that's it.

57:10

And, and then as those things run, then the agent itself, basically does everything, which means that the macro bit is completely deterministic, the micro bit is completely agentic.

57:24

And that hybrid between, managing context was really important for us because the idea was, is what happens if this is a human and not an agent?

57:31

What happens if I want to replace one with another?

57:34

Then how do I feed the right context to the human without overwhelming them?

57:38

It has to be— and this was one of the key ideas of Jig— that it has to be intelligence agnostic.

57:45

It shouldn't matter who the human is or who the actor is.

57:47

Like, is this a human?

57:49

Is this an agent?

57:50

Is this a manager?

57:51

Is this an engineer?

57:52

Like, everybody should able to to it.

57:55

And And the way we did it is we created the, Jig DSL, which— let me start with another thing.

58:03

So we created a framework, the IKO framework, which really just builds on stuff like Jobs to Be Done and other frameworks.

58:12

The idea here is that in an AI-native world, every task you do inside a business can be described by these 4 things.

58:21

Let me explain.

58:23

First big statement around JIG is that every operation is a CRUD operation.

58:28

Everything you do in your business is a CRUD operation.

58:32

You are creating, reading, updating, or deleting some some value in some database, even if that database exists in your head.

58:38

so the question is, how do we get access to all the databases that we need to do, and how do we actually describe these CRUD operations?

58:46

And that's when we came up with this IKO framework, which means that in a truly AI-native business, you would have— every CRUD operation has an intent, has some relevant context a specific action you do and the specific output, right?

59:02

if you remember, it's like the Jobs to Be Done framework says that you don't buy products, you hire them to get a job done, right?

59:10

Me wanting to get a job done is the intent.

59:13

Why I want to get the job done is also part of the intent.

59:16

And also Once the job is done, how do I know it's done?

59:21

That's my output.

59:21

And getting it done is the action.

59:23

And then how to get it it done and what you need to get it done is the context.

59:30

So I— ideally, every action inside the business can be described by this framework, which means that, you know, we have a specific action.

59:39

Let's say you have a task.

59:41

That can be described with this IKO framework.

59:44

What happens if you create a workflow which have 3 tasks— 3 tasks in it?

59:49

The workflow itself can also be described with IKO framework.

59:52

And then what happens if you have entire business units that have, I don't know, 5 workflows in it?

59:59

Then again, that can be described.

01:00:00

So it's really like a fractal pattern.

01:00:02

And that's why we called it IKO fractal.

01:00:05

Now, in order for this to be translatable into, agentic work, we really needed to have a very specific, very efficient compression of information that is understandable by humans.

01:00:20

And that is the Jig DSL.

01:00:23

It's a YAML-based, YAML-based language which really just describes these intent, context, action, and output in a structured format.

01:00:32

And then once we have— once we created that, right, so we had the intelligence-agnostic idea, which then led us to the IKO fractal, which then led us to the JIG DSL, which then led us to the idea that ideally I should be able to create, a database, right?

01:00:52

A Jig database— Jig database— that where I can just take a workflow description that's described in this YAML, and then every, every part of this YAML is translated into some database entry.

01:01:05

And once I have that done I gain a really interesting thing because now I can orchestrate work through a database instead of through code.

01:01:16

n8n orchestrates through code, Make orchestrates through code, Zapier orchestrates through code.

01:01:21

It's not very durable.

01:01:23

So if I, if I orchestrate everything through a database, which is execute task in, that has an ID ABC in this database, and here is all the configuration pattern of that inside this database, then it becomes very durable if we actually log every act, every step of the way.

01:01:41

So we would have like a run log in the database that says, okay, so step number 3, this is what was the input, this is what happened, this is what was the output, then we got an error.

01:01:52

So instead of redoing everything, I immediately have everything saved in the database that says it failed at step number 3.

01:01:57

We just want to figure out how to make it work.

01:02:02

So inside this Jig database, right, I would have— I, I would have tasks somehow explained, tasks explained.

01:02:10

I would have workflows explained.

01:02:11

Oopsie, let me just Okay, so we would have tasks which we call stations and workflows.

01:02:18

And let me explain this.

01:02:19

So we call them stations because when it— Henry Ford created the assembly line, what happened is that before Ford, before the assembly line craftsmen— craftsmen owned production end to end, right?

01:02:38

And then after the assembly line— after the assembly line, factory workers owned a specific task.

01:02:43

Now, why did that happen?

01:02:45

It happened because Henry Ford, did the research and a study on his own factory and realized that because a craftsman was owning a longer process, they had to work around the factory.

01:02:57

So if there was somebody who was working 10 hours a day, they they they actually were pedestrians in 40% of the time.

01:03:05

It's not factory workers, it's pedestrians.

01:03:07

Right?

01:03:08

You walk around for 4 hours and do actual work for 6 hours.

01:03:12

So instead of humans going the work, he asked, how do I eliminate people being pedestrians?

01:03:16

And the answer was, let's go it the other way.

01:03:19

Let's not have people to go to the work.

01:03:22

Let's have work to come to the people.

01:03:25

And that was what the assembly line became.

01:03:28

Now, how does that translate?

01:03:30

To, knowledge work, right?

01:03:32

Inside knowledge work, you have problems, right?

01:03:35

And that means that you own the problem end to end.

01:03:38

So as a knowledge worker, you have a series of problems that you need to keep solving all the time, and you own it.

01:03:44

You have full ownership of the problem.

01:03:45

You need to know when the problem arises.

01:03:48

You need to know when the problem needs solving, when problem needs escalating, you need to follow through, you own the problems And then the big problem is that problems have lots of smaller problems, lots of smaller problems, that require different context.

01:04:07

So then this, this is the key, key learning here.

01:04:14

Is that we had pedestrians, pedestrians for factory workers, right?

01:04:17

We know that they weren't as productive because they were being pedestrians.

01:04:21

They were working around different, parts of the factory.

01:04:25

What's the— what's the knowledge work equivalent of that?

01:04:28

And the knowledge work equivalent of that is context And right, because different problems require different context.

01:04:36

So if the solution for pedestrians, for making people not pedestrians was to make people make, move work, not right, then the idea here is that we would move context, not people.

01:04:50

So, the cognitive version of an assembly line is that the context always comes to you the same way, but the task itself never changes.

01:04:58

And that's a big problem because human work is messy, it's, it's random, and knowledge work is messy and is random.

01:05:05

So we really need to figure out how we can manage context, and that's, that's the most interesting part of Jig, how we manage context.

01:05:12

I'm not going to go the details.

01:05:15

There's a whole context layer and the context ledger that connects everything, and, but the point is that you have this Jig database that creates the foundation for all this.

01:05:26

And if I move here, so here is a specific, Jig, run.

01:05:31

As can see, there was an actual execution by an agent.

01:05:35

It cost 2.86 cents.

01:05:36

we had a specific type.

01:05:37

Here's a number of tokens that were created.

01:05:40

And what you can see here is the full execution log of what the agent did.

01:05:45

And we had what was the intent.

01:05:47

I was like, search my memories about David and report findings.

01:05:50

Here are the tools that were used and how many times, you can also use— see what connections were provided, what was the actual prompt, that we gave.

01:06:01

And then here's the context blocking, the context ledger.

01:06:04

Basically, after every execution, Jig analyzes itself and says, okay, did we actually achieve the intent this work, and how confident am I that my answer is the right answer?, and then it also writes a report, that's human-readable for later.

01:06:18

If I say let's create a file, it saves it to an artifact as well.

01:06:21

So, and then I can take a look at all the workflow history as well.

01:06:26

And then there were a bunch of eval— evaluation stuff because, the idea here is that if you have this, you can use Jig to run evaluations.

01:06:34

You can use Jig to do a bunch of stuff.

01:06:38

So Yeah, the the interesting thing is now we have a Jig MCP server that creates these things, and basically the execution part is following this, right?

01:06:50

So the Jig logic transports the context, and every agent is a Claude code agent.

01:06:54

So it's like— or a Claude agent built on a Claude agent SDK.

01:06:58

It It spins up a Claude agent gives it— gives it the tools it needs, everything, and starts with the— with the workflow.

01:07:06

Then it generates the output as specified, in the jig, and then calls the next agent.

01:07:12

The next agent forks the previous Claude agent, so it has all its memories, but now it has a different task, it has a different intent, a different, prompt, different tools, different access.

01:07:23

It's a different one.

01:07:24

It's kind of the same same but different.

01:07:26

And then it keeps forking the Claude agents until it gets to a point it reaches the final output, and then it does the final output.

01:07:35

creating JIGs, out of SOPs is a really powerful thing for now because, I got a Loom video for— that would last 5 minutes from, the operations manager of the client, and on how they do reconciliation of transactions at an e-commerce company.

01:07:52

And I fed it into this JIG system.

01:07:55

I have an, architect that can transform these raw translations or raw transcripts into JIGs and then implemented it.

01:08:05

And like with a single one-shot attempt, we got 95+ accuracy in the completion.

01:08:08

And we're not creating new agents, we're just using existing ones.

01:08:11

And if you have a JIG MCP, then you can actually, actually, run everything from Claude or ChatGPT or whatever.

01:08:19

So like there are a lot of stuff that like between different projects because they're all trying to kind of work in the same problem space.

01:08:26

the interesting thing about, about Jig is, a last— the latest thing that I built is that I can, I can create a station and run it and then like as a temporary station which really allows me to just create short-lived Claude agents as part of a workflow and save all the context they generate.

01:08:46

It's getting very complicated very quickly.

01:08:48

but so this is the other project that I'm working— I'm very happy about this project.

01:08:52

This is a really cool project.

01:08:53

yeah, so these are, these are most of the projects that I work on.

01:08:57

And then there are a couple other stuff, like the boot camp is returning next week, and then I also have a fun project about— I think it's here— the No Code No Clue process, which translates n8n workflow JSON files into tutorials and radio plays.

01:09:16

And I started building an actual, workflow to generate the content.

01:09:21

And inside n8n, it was very, very complicated.

01:09:24

and ultimately I ended up using my Dovetail-powered Cloud Code, Code, process and basically built the whole system, I don't know, in, in less than an hour.

01:09:33

So now what it does is this.

01:09:36

So basically it runs a simple dashboard, but it also works from the terminal.

01:09:40

So, I have Launch that with a simple command.

01:09:43

It says npm run dashboard.

01:09:45

And then what I have here— I know I call everything Alfred, I should really stop that, stop doing that.

01:09:50

So what happens is that when I start running it, it starts running through a bunch of steps and it starts generating— like, it starts going through a 13-step process.

01:10:00

And then if some error— if it contains or encounters some errors, it can pick it up and continue on from that later.

01:10:08

And what happens is that it takes a look at the workflow and it understands the workflow.

01:10:12

So it was— it took a JSON file that was Jackie, an AI assistant.

01:10:18

I got it from the N8N Template Marketplace.

01:10:20

And then it actually tries to understand what that workflow does.

01:10:23

It creates an analysis, and then it creates another analysis, which is, you know, where does this actually solve a problem?

01:10:30

What's the context?

01:10:30

where, where is this an overkill.

01:10:33

So it generates some extra context, and then as a next step, it starts creating a script.

01:10:38

So it's like, okay, let's try to generate a— hold on, yeah, there was an error there.

01:10:45

There you go.

01:10:46

So step number 3 is it actually creates a script, metadata.

01:10:50

There is like a sitcom episode generator which has like a guide, and, I can walk you through that in a minute.

01:10:56

And then it generates the pack, and from that it generates a script.

01:10:59

And now we have the script here, which is a pretty long script for a sitcom.

01:11:04

and then I'm like, okay, next turn that into a JSON.

01:11:07

And, and once you have that JSON, I want you to go through each and every one of those and then generate the actual radio play.

01:11:16

So turn the script, we have, you know, voices assigned to different persistent actors, permanent actors, and then there are random voices for actors.

01:11:24

And then it generates something like this, which is like pretty cool.

01:11:28

It's a full radio play, here.

01:11:31

And you can also check the one on the Notion Tripling workflow, and it's fully automated.

01:11:38

11Labs now has the V3, model available the API.

01:11:41

So it uses FFmpeg, MPEG, MPEG to generate all the, dialogue, blobs.

01:11:45

And then, it generates one big audio file for the radio play and saves it.

01:11:50

And then it generates some metadata, some SEO stuff.

01:11:54

And then once it has everything, it takes the actual workflow and then an actual tutorial.

01:11:58

And, And then it— the last step is that it, posts it to, Ghost as a blog article.

01:12:07

And I'm still working on it, but I will probably turn this into like an automated regular n8n tutorial, article sequence, maybe once a day.

01:12:15

It will help me with SEO, and it will probably also, give you guys some ideas on, on how to do stuff on n8n if you still want go down that road.

01:12:22

So you can see, you know, there is like— okay, here's the source, what tools are used, setup time, difficulty, time saved per month, and then there is the, the tutorial.

01:12:31

So I'm still playing around with it, we'll see how it goes.

01:12:35

But that's kind of a fun side project that I'm doing.

01:12:37

So, okay, the raw recording is almost 90 minutes now, so I'm gonna talking.

01:12:41

There's a lot of stuff that I'm, I'm working on right now, as you can see, which, I just wanted to start talking a bit more about that.

01:12:49

So I'm going to— I'm going to share the linear document, the linear roadmap links, and I'm going to just start uploading YouTube videos separately per project and then also sending out different, different emails like on a weekly basis.

01:13:04

So thanks for watching.

01:13:05

If there is any questions you have

build in public #1 - project showcase

Chapters

Transcript

Comments