Yeah
, so...
I started off...
I'm just going to hit a stopwatch.
I started off learning about AI because I heard about neural networks.
And I'm like, oh, crap.
They're putting brains in computers.
Let me learn how my robot overlords work.
Unfortunately, it's all just math.
But there's layers on top of stuff.
And so that's where we get into some of the really more interesting things.
So if you even think about a presentation like this, AI agents, AI agents are built off of LLMs, like chat GPT.
They have access to backend tools.
And then even LLMs are built off of deep neural networks, which is built off of neural networks, which is built off of machine learning that goes back to the 1950s.
So as you continue to layer stuff on top of each other, it gets really interesting.
So hi.
I'm Gavin.
I'm a principal security consultant as my normal day-to-day job.
On the AI side of things, I have been with the AI village for about eight or nine years.
I also wrote two of the OWASP top ten for large language model applications.
And I run a YouTube channel called NETSEC Explained, where I take topics like this and explain them in easy-to-understand ways.
Also, for this presentation, if you guys have questions, just say something.
Or just raise your hand and say question, and I'll get to a stopping point and I'll call on you.
Originally, this was supposed to be ran as a workshop, but we don't have real good Wi-Fi or a large attendance.
But I do want to give you the workflows, because I think that there's a lot of talk about AI agents, but people don't really show you how to build them, or how to use them, or how others are using them.
So as far as the agenda, I'm going to go over a brief review of large language models.
Very quick.
And then we're going to dive into agents.
I'm going to show you a bunch of demos of AI agents that I actually use every day, and how you can get the most out of them.
And then I'm going to go over some AI agent architecture.
But first, I'm going to show you a demo of what I'm talking about, and what this is going to look like.
So as far as AI agents, one of my favorite examples is solving what's called a multi-hop problem.
So a multi-hop problem is... most large language models, you just ask a question, and it gives you an answer.
But what if I have to ask a question that has an answer, that then feeds into another question?
So the question that I'm going to be asking, I'm going to be showing off with this one, is somebody... like, if you're going to do a red team exercise.
Somebody works at a company, who's the CEO, or who's the lead developer at that company that this person works at?
So first you've got to figure out, what company does person A work at?
So who's person A?
What company do they work at?
And then who's the CEO of that company?
So we have to go through in a chain of three.
So in this case, this is exactly what this is going to do.
I like to use N8n.
N8n is a node code solution.
It's very similar to make.com or Zapier, if you've ever used those before.
The reason why I like to use N8n is because it's very visual, and it allows me to do some rapid prototyping.
So as far as the layout of this, this is a large language model that is going to have access to a couple tools.
And I can go ahead and ask this question.
So Anusha is one of our volunteers here.
She thankfully said that I'm allowed to use her as an example, as the guinea pig.
So the question is, who is the CEO of the company that Anusha interned at?
So a couple things you've got to figure out, right?
I'm going to go ahead and click go.
And I will minimize this chat so that we can see what's happening.
So this is an AI agent.
Oops.
Minimize the chat.
Do-do-do-do-do-do-do.
So this is an AI agent that is going to make its first request out to ChatGPT.
I'm using OpenAI.
Oh, shit.
I've got to connect to the Wi-Fi.
This will fail.
So let me get the Wi-Fi back up and running.
Should be connected.
And now let's try that again.
There we go.
So the first thing that it's doing is it's making a search out to the Internet.
Nope.
Something went wrong.
What is happening?
Oh.
This is the wrong version.
That's the needs API.
Where is my post-it tracker?
There we go.
This is the one that works.
Paste.
Go.
So the first thing it's going to do is an Internet search, and it's going to figure out, okay, who's Anusha?
It's going to do a search onto LinkedIn.
It's going to grab Anusha's profile information.
The next thing that it does is actually scrape Anusha's LinkedIn profile information to figure out what company she has worked at.
And then the third thing it's going to do is once it's identified the company that she's worked at, it's going to figure out who's the CEO of that company.
And then it's going to give me a little report based on how I asked it to format it.
So in this case, Anusha was an intern at NVIDIA as a security development intern, and the current CEO is Jensen Huang.
So this is how we can use AI agents in order to create and solve multi-hop problems, and we can expand this even further.
So this is kind of the capstone of the presentation itself.
So to get into it, let's start with previously in the world of large language models.
Now the purpose is to kind of bring everybody into the conversation, so if you're not super familiar with large language models or ChatGPT, totally okay.
But a lot of it comes down to building prompts, building good prompts.
And you need to break things up into individual pieces.
So you want to figure out what is your task, what are your constraints, and what does your desire to output look like?
So once you design, you know, essentially your perfect prompt, then you can start to expand off of that.
So some really good advanced prompting techniques is going to be things like role prompting, few-shot prompting, chain of thought.
Chain of thought reasoning is getting the LLM to think step-by-step so that it plans out what it's trying to do, and then it gives you a more accurate answer with fewer hallucinations.
Few-shot prompting, you give it examples of what you want the output to look like, and then it'll give you the output in the desired format.
Role prompting is saying you are a software developer with ten years of Python experience, or you are a security engineer, or you are a travel agent, and so then when it responds to you, it'll respond as though it was that person or that role.
For reflection models, self-critic is amazing.
You can say, hey, I want you to write something for me, and so it writes something.
Now I want you to act as your harshest critic, provide feedback to yourself, and then do a review of your own work and give that back to me.
So this is called a reflection pattern, and it's something that's also very helpful in AI agent design.
But this is just a review for LLMs, so just want to bring people into this if you've never heard of these terms before.
I highly recommend learnprompting.org to learn a lot more about prompting techniques.
So on to AI agents.
AI agents generally have a large language model, right?
Right now we're talking about large language model agents.
There's other types of AI agents, but this is specific to large language models or agentic programming.
So you have the large language model essentially acting as a reasoning agent, and then you start to give it all sorts of extensible functionality.
So you can give it memory, so you can give it long-term memory or short-term memory.
So for example, organizations have been creating, like, HR chatbots.
So instead of you having to read your entire employee manual, they put it into a large language model, and then you can ask the large language model questions like, hey, how much vacation time do I have?
And it'll look at the policy information, and it'll return the information based off of that.
You can give it access to back-end functions, so a back-end calculator.
In the example that I showed, I gave it access to a web search and LinkedIn scrapers.
You can do a lot more with that, but I just wanted to kind of get people started with the idea of AI agents and some of the stuff that you can do there.
You can also ask it to reflect, so you can have one agent that generates a report and then another agent that acts as an editor of that report and acts as a critic and kind of reflects.
And so this kind of gets into like a multi-agent pattern.
Really good example of that multi-agent pattern is actually this.
This is a software design studio.
The way that it works is they created multiple agents that have individual specialized roles, so they have a CEO, they have a CTO, they have a project manager, they have developer, they actually have multiple developers, they have a code tester, they have a code QA person,
and so what they are able to do is actually take a complicated task, such as design a piece of software that does this, break it up into smaller pieces.
And so here we see the CEO talks to the CPO and the CTO and they kind of get a high-level design of what the software should look like, get a high-level architecture, and then they work with the programmers in order to actually write individual pieces of the code.
So hey, here's kind of a high level of what the application is going to look like, here are the functions that we're going to need, here are the functions, what's the format going to look like, here's the format, let's make sure that it's readable and it's modifiable,
so they're going to work with a QA developer, they're going to work with a designer, and so it kind of goes through this entire workflow.
So at the top here you see more of like a traditional waterfall model where you go from planning to design to implementation, development and implementation, and then in the bottom you kind of see how the multiple agents interact with each other.
So this is a workflow for AI agents.
The reason why you would want to use a workflow is for things like multi-hop questions or multi-hop actions.
So a really quick one is you can just ask a single question, right, what's the capital of Spain, or you can say, hey, what is the capital of the country where it's the hottest of the year, and so now you've got to figure out what country that is, and then the next question is,
what is the capital of that country.
You can have a many-to-one, you can have a one-to-many, there are many different ways that you can orchestrate these AI agents, but these are graphs to kind of illustrate why you would use an agent and how you can take a problem and break it down into smaller pieces.
Now when it comes to developing AI agents, we build these things called workflows.
Generally you have an input, you perform some sort of processing, and you have an output.
There are some known software patterns, so ETL pipeline, which stands for extract, transform, and load.
Generally you'll see this in data management or data science, where you collect information from many different sources, you transform that into a normalized piece of information, and then you load that into a database.
So say you have a bunch of tools or sensors, you collect information from all of them, and you transform that into a normalized format, and then you load that into a database that can be then fed into a dashboard.
So this is very common in a lot of large-scale software development, software design kind of practices.
I find this type of workflow to actually be one of the more efficient, so that instead of having an AI agent build AI agents that tries to solve a problem, you already have a common workflow that you build out, and then the agents just kind of execute in that workflow.
And I'll show you what that looks like in a moment.
So common workflows, right, ETL pipelines extract, transform, and load.
Registry patterns, so this is the information that's generated from one AI agent is stored in a registry, or it's just like a giant database, and then that information can be pulled or queried from another agent in order to continue with the rest of the workflow.
Prompt chaining, very similar.
So if you've had a conversation with ChatGPT or any other LLM, right, you ask it a question, it gives you an answer, you ask something based off of that question, that's prompt chaining, where you're having multiple conversations back and forth.
It's another form of workflow.
Intelligent document processing, I just put that up there, but that's very similar, like OCR.
So hey, here's a document, what is it actually saying?
So it'll look at the figures, it'll look at the text, it'll kind of synthesize that information, and then it'll be able to present what it is you're looking for, what kind of insights.
And then, of course, human in the loop.
So I'm personally a fan of using AI and AI agents to augment your capabilities, not replace your capabilities.
I know that there are some organizations that have tried to replace developers or replace writers, replace like grammar QA, for example, and they realize that that doesn't actually work very well.
So what do you call a writer who uses spellcheck, right, that's a tool.
They're still a writer, they're just using it to help them.
What do you call a photographer that uses Photoshop, right?
They're still a photographer, it's just a tool that helps them.
So this is a more generalized tool for sure, but it's still a tool that people with specialized skill sets will get the most value out of in order to apply that.
Other types of workflows.
So back when ChatGPT first came out, people were saying, hey, what if we try to connect a bunch of these LLMs together and get something interesting?
So they were just throwing spaghetti at the wall to see what would actually make sense.
So they came up with these frameworks, two of them would be like AutoGPT and Crew AI, where you create these quote-unquote crews, which is a collection of agents and they are supposed to perform a specialized task.
So for example, that software development example, that would be a crew.
And then you have the specialized agents in that crew.
And then you can build a workflow outside of that as well, so you can kind of scale it up and scale it out.
Another example, and this is actually, in my opinion, more efficient kind of workflow, where you're creating graphs.
So you create workflows and then you have agents decide what the workflows are that you're going to go through.
So you can have a workflow, like some of the examples that I have is like a YouTube extractor, where it extracts a lot of knowledge from YouTube videos.
So as soon as, say, DefCon or even CypherCon comes out, right, there's going to be a lot of videos, hour-long videos of presentations, and so you're not going to have enough time to sit and watch all of them.
So you can have a workflow that actually goes and analyzes these videos to see, hey, what are the key insights from this talk?
Is this talk on a scale of 1 to 10 actually worth my time for my specific interests?
And so it can go through, download the transcripts, parse through the video, extract insights and knowledge from that video, and then present to you a short paragraph summary of, hey, this video is actually really impactful, and here's why, or this video is kind of high-level and really covers a lot of the same basic stuff that many people know.
So this has actually saved me a lot of time in my research, both looking at YouTube videos and reading articles.
I'll read like a lot of articles off of Archive, and I'll show you what that looks like as well.
Okay, so with that, I'm going to show you some of the patterns and some of the examples that I've actually gotten a lot of value out of.
So I'm going to switch, I think, to this microphone, because that's going to be easier, so I get a hand back.
Test, test.
Test, test.
Yeah.
Sorry, I switched microphones midway through.
I needed my hand back.
All right.
So as far as workflows, right, so this was the prime example of an OSINT tracker.
So it goes out.
It does a query onto the internet.
It pulls information down.
And then it performs some sort of web scraping.
So as far as the demos, let me start with a basic research intern.
Now, this is the one that actually made it...
This made a lot more sense to me as far as how AI agents work.
So we start with a planner agent.
And if we open this up, we can look at the system prompt.
Generally, it says, hey, you're a planner.
You're step one in a workflow, right?
You're the planner.
That's you.
You're supposed to perform an act...
Or you're supposed to develop a plan based on the user's query.
And then other agents down the line are going to execute on that plan.
The other agents in the line are gonna be a researcher who has access to internet tools and a report writer who is going to write the final report for what the user needs.
So your plan is gonna be passed across to all of these.
And so I need you to develop the plan and I need you to develop some search queries that the researcher should look into.
That information is gonna be passed over to the researcher agent.
And the researcher agent is, hey, you're going to get the user query.
And this is part of that registry design pattern, right?
I saved the user query somewhere where all the agents can get access to it.
And now I can recall that user query from any of the other agents once that's submitted.
Same thing from the plan.
So here is the user query, here is the plan, and here is the suggested search terms that you're gonna go through and research out on the internet.
As far as the actual system prompt, it's just a very simple prompt.
You are a researcher agent.
You have access to an internet tool to go and perform research on Google.
I like to use SERPR API.
I find a lot of value out of it.
But honestly, you could use Brave.
You could use any other API to give you access to that.
And so that's this SERPR query tool right here.
Once that's performed, then we go into the reporter agent.
And the reporter, again, that registry design pattern.
Pop this out.
Here's the user query.
Here's the original plan.
Here's the research that the researcher had.
And here are some key points based off of what the researcher identified.
And so here's all the information you need.
I want you to generate that report.
And then it also has a separate prompt saying, hey, you're the report writer agent.
I want you to write this report.
And in here, you can also say the format.
So if you want to format the report in a very specific way, not just Markdown or XML, but also in use active voice instead of passive voice.
Use first person.
I want you to respond as though you were me.
And so I want to post this onto a blog post.
So you can go into a lot of detail, and then you can use some things like few-shot prompting to give it examples of what that should look like.
And later on, I will show you an example of that.
I also kind of put together, like, my own little style guide.
Let me see if I can show some of that.
To try and hide some of the GPT-isms, right?
So you'll see, in a world where blah, blah, blah, or at the end of the day, blah, blah, blah.
Very common things that you'll see in LLM.
So I'll say, hey, don't say that.
Don't start with that.
I want you to write like me.
And I have a whole bunch.
You can see how large this is.
I have a whole bunch of things into my style guide.
I have examples of output that I want it to look like.
So let's go ahead and do this workflow.
What is something that we should do some research and write a small report on for our example?
Yeah.
Yeah, so the question is, how well does this adhere to my style guide?
Very well, actually.
There are some times where it'll still kind of give out its little isms.
But, you know, I'll just go through and I'll do some tweaking.
I still also constantly update my style guide.
I actually use Anthropic for a lot of my stuff.
I have a bunch of projects on there.
I have a virtual consultant.
And so I also have a blog writer that helps me with my writing.
And I use it to write my draft zero of a blog post.
So it does really well.
It's not 100% perfect, but it covers, like, 90% of the use cases and, like, issues that you can run into.
So what's something that we should have it research out online to generate, like, a small report for us?
Yeah, what's the task?
I do want to stay away from political issues.
I'm just going to make it easy on me.
So the prompt that we have here is, like, what are the best prompting techniques?
And give me an example of an optimized Claude 3.7 Sonic example topic methods for creating strong steel.
So what this is going to do is actually build out a plan.
And we can look at this as it's going through.
So in this case, our plan is, number one, to find the topic clearly.
Understand what methods for creating stronger steel entails, including the various processes.
Identify key aspects to cover.
So right now, it's not actually focusing on the prompt.
It's focusing on, like, building the steel, which is unfortunate.
But that happens.
And so you just do some tweaking.
So first, it's going to look for the best methods for creating steel.
It's going to identify key aspects to cover.
So focus on different methods, such as alloying, heat treatment, and forging.
Number three, develop specific prompts.
So create prompts that encourage detailed responses.
Okay, so it is probably going to do some prompting.
And number four, optimize for Cloud 3.7 Sonnet.
Use clear, concise language and specify the desired depth of information.
The search term it's going to look for is best prompting techniques for AI models.
So it passed it over to a research agent.
The researcher goes and...
Actually, I'll show you the query.
So over here on the query side, we can see this queue, best prompting techniques for AI models.
So it pulled that from the planner agent.
It does a search on Google, and I get titles.
So in this case, prompting techniques guide, prompting engineering guide.
And it goes to promptingguide.ai.
It has a quick snippet.
Another title is zero-shot prompting that's also on prompting guide.
Another title, meta-prompting, also on prompting guide.
So it's pulling a lot of information from prompting guide, it seems like.
It's going to take that information and pass it over to the researcher agent.
So if I pop this out, you can see this in the middle is what my format was, my template format.
On the right is what it sees.
So it sees the user query.
It sees the plan laid out, step ones through five.
It sees suggested search terms.
And so that's how it's going to go ahead and perform its research.
As far as the research output, the quick summary is the research provides an overview of effective prompting techniques for AI models.
It goes into detail of what the research is.
So zero-shot, few-shot prompting, chain-of-thought prompting, things that I had covered earlier.
And then it has key points highlighted out.
So that's the researcher agent, passes it over to a reporter agent.
And the reporter agent, right, same thing.
This is that registry pattern.
So it pulled the user query, it pulled out the plan, it pulled out the research, and it pulled out the key points.
And so with all of that information, it generated the final report, which is exactly what is in the response right here.
So to optimize prompting for Cloud 3.7 Sonnet on the topic methods for creating strong steel, use the following techniques.
Zero-shot prompting, which you directly ask the model without examples.
Few-shot prompting, where you provide a few examples.
And chain-of-thought reasoning, where you encourage step-by-step reasoning.
So this can be as much or as little as we want it to be.
One of the benefits that I get out of this is asking it for ideas on pentest report writing or pentest findings.
So it'll go through, it'll actually query OWASP, it'll look at the OWASP testing guide, it'll look at common terms.
Sometimes I like to map that to CVSS scores or to CWEs.
CWEs, there's like 2,000 of them.
So instead of me having to figure out which of those 2,000 it is, it'll go through, do its own queries, find out the CWEs that make the most sense, and it'll respond to that.
So this is great for research.
So you can just quickly do some of this online, especially if you know what you want it to research.
Any questions?
Okay.
This is one of the workflows.
So this is a workflow.
So each of these are individual agents, but you chain the agents together and it builds out a workflow.
Now, this workflow works great for something like internet search, but it may not be the most optimal workflow for other types of issues.
So if we want to shrink it down a little bit and kind of shift some stuff around, this is another workflow that has access to an API.
So in this case, I use Archive.
Archive is a preprint server where a lot of the latest and greatest kind of AI research is loaded onto.
So in this case, I have a typical tool agent, very similar, and then I give it access to this workflow.
This is a simple workflow that is part of that tool chain, right?
So I can change up and have it decide which workflow makes the most sense, and then the agent itself can decide on the workflow and follow through.
In this case, I'm just going to do a quick API query.
This is a... let me pop this up.
This is just a URL query to the Archive API, and it has a parameter, two parameters.
It has the search query or the search term that we're going to look for, and then it has the maximum number of results that I want it to return.
So in this case, if I wanted to do just kind of a quick test, I can say, what's the latest research on coffee?
Give me three examples.
So it's going to go through and do the exact thing as the last workflow.
This one's a little bit more generalized.
But in this case, it's going to go through Query Archives API.
It's going to grab the most recent papers on the benefits of coffee.
So here we have interfering habitat quality and habitat selection using static site occupancy models.
That's not about coffee.
I don't know why I popped that one up.
But the second one is at your service coffee bean recommendations from a robot assistant.
And then coffee roast intelligence focuses on the classification of coffee roasting levels using machine learning application.
So if you do a lot of research or you're trying to dig into different architectures, right?
We work in technology.
Technology is changing all the time, and so it's hard to stay on top of stuff.
We can use something like this that's a little bit more targeted.
So we can either have it go out and do an individual Google search and then grab that information, scrape it, and present the information that was gleaned from the Google search and articles from that.
Or we can have it target individual APIs.
So it's not too hard to extend this to, say, an ElastiSearch API or maybe a Splunk Query API and give it access to that.
So taking this idea of building out a separate workflow and having the agent actually decide what workflow it should use,
let's update that same researcher intern.
So now, instead of the three individual agents, I have one agent that's making a decision of what to use for tools and what to do for the workflow.
I have two tools in this case.
So , sorry, these got messed up.
So we have the SERPR query.
So this is the same query that the researcher agent had access to.
And this goes out and does a Google search.
And then we have an HTTP request tool.
So when it finds an article that makes the most sense off of that Google search, it'll actually go through and do a web scrape.
Now, in this case, I actually had to build out a little bit more of a workflow.
So the first one's going to do a web page get request.
So that's going to show all of the HTML, the JavaScript, the headers, the CSS, and all the crap that we don't care about.
So the second one actually is a code cleaner where it uses Beautiful Soup to clean out and only present the human readable text, not all of the AI crap.
For simplicity, I limited the number of tokens that it'll pull up.
So it's not going to read a 100-page document.
So that'll save us a lot on tokens, actually, on API cost.
But it'll get a lot of the main points.
So what should we do for this one?
What was that?
How to get a job in IT.
Let's see.
What are the best tricks?
Yeah, I'll do that.
What are the best tricks into getting a job in IT?
And so we'll let this run for a little bit.
So it's going to go out, it's going to do that surfer query.
It's going to pull a lot of information from Google.
It's going to figure out what on Google aligns the most with us, right, like a human would.
And then it's going to pick a URL, and it's going to scrape that information.
Yep.
Yeah, this is all N8N.
If I wanted to add another tool, I can just hit this plus sign, and then I have a whole selection of tools that I can add in.
If there's a tool that doesn't exist, I can just create my own, like I did here.
And so there's a lot of extensible functionality.
The reason why I'm using N8N is because it's visual and really easy to kind of see and visualize.
But if you wanted to use a framework, I really recommend Langraph, where you take this same idea of nodes and you connect the nodes together.
So you'll have like a memory node, you'll have a tool node, and then you can build out workflows like this.
Yeah, so N8N, I'm actually running it on Docker on my laptop.
For the back-end API, I'm using OpenAI's GPT 4.0, 4.0 mini, actually, for this one, because I was going to run it for a workshop.
But , and then for the APIs, I'm using Surfer API because I like it better for Google search.
For the LinkedIn APIs, I'm using Appify, and I'll show you where all those are as well.
So, yeah, yeah, a lot of them are going to work very similarly.
Some are better at others for specific tasks.
So there are some LLMs that are more geared towards using tools.
There are others that are more geared towards reasoning.
So they would make better planning agents or better reflection agents.
So that's going to be like DeepSeek R1 is a good reflection agent or reasoning model.
OpenAI's O1 and O3 are good reasoning models.
So those would be good for your planning agent.
And then if you wanted a tool agent, you would use a specialized model for that.
You can use local models.
So if you wanted to use Llama or something like that, you can absolutely do that.
Nothing's stopping you from there.
Let me show you this real quick.
So if I wanted to add a model,
yeah , that's kind of what happens with some of the online models is when they retrain them or they do updates, things that you really liked go away, new things come in.
And so you just kind of have to work with that.
It's the downside to having somebody else run your code.
No, you're not correct.
Well, let's not rule that out.
Two things can be true at the same time.
So, yeah, here's an example, right?
So OpenAI, Anthropic, which is Claude, Azure, Hugging Face, they also have Ollama and LLM Studio.
So these are local models that you can run as well or local frameworks that run the models.
Local models would be like Llama2 or Llama3, DeepSeek, Dolphin.
Actually, Dolphin's really good for specialized cybersecurity models.
There's also WhiteRabbit, another specialized cybersecurity model.
So there's all sorts of stuff.
Long story short, you don't have to stick to one model.
You don't even have to stick to N8n.
I just use it because it's very visual, allows me to rapidly prototype.
You can use other no-code solutions, Zapier, Make.com, FlowWise.
You can use code solutions like LangRaph.
I highly recommend.
So which one?
The no-code solution or the code solution?
Code solution, LangRaph.
LangRaph is by Lang, like L-A-N-G, graph.
It's by LangChain, if you've heard of them.
So FlowWise is a no-code version of N8n, specifically for large language models, built on LangChain and LangRaph.
So just to kind of give you an idea of the flexibility.
Any other questions?
Comments?
Concerns?
Yeah.
Like OpenWrt or something else?
No, I haven't used that.
Honestly, in my opinion, that doesn't matter.
It's kind of like saying, what's the best car?
And it's like, it's a car.
Like, some have features.
If you care, go for those features.
If you don't care, then it doesn't matter if you get a Toyota or a Honda or a Civic or a Accord.
Like, it's a car.
My personal opinion.
I think people focus too much on benchmarks.
Well, it was really...
Yeah.
Yeah.
Yeah, so you're gonna...
You can.
You can build an agent that'll help you decide which model to use in which case, and based on costs and capabilities.
You can absolutely do that.
There's nothing stopping you.
I personally think that that's, you know...
Am I saving 10 cents or am I saving 15 cents?
So...
Overengineering.
All right, let's take a look at this.
So the results, right?
So what are the best tricks to getting a job in IT?
The first thing it's going to do is send this information.
So here's the system prompt.
That's... if I clicked in here, I could show you the system prompt.
So it says, hey, here's all the information.
And then here's the user prompt, right?
What are the best tricks to getting a job in IT?
It performs this query.
So it creates a query and sends it out to the Google API.
Here are the results from the Google API, right?
So we have...
I think this is a format thing because it's too large.
Yeah, so it's using Google search.
Here's a snippet from an answer.
It has URLs in here, right?
So where are the URLs coming from?
What's the title of the document?
Things that you'll see on a regular Google search.
It'll pick one of those.
So it decides which ones.
Here's the snippets.
And then it's...
And then it performs the actual query.
We can't see the actual query because it's actually part of something else.
But I can have it reference where it's pulling its information from.
So this also helps minimize hallucinations as well.
So instead of the GPT, I don't know the answer, so I'm just going to make one up.
It actually goes out and grabs its information from a source of truth.
So here, based on the article that it picked, are the ways to get a job in IT.
And now I can ask it.
And this is the purpose of adding in a buffer memory.
A buffer memory is like a short-term memory.
So you can do that either with local memory, so it'll fill up my own RAM.
Or I can have a separate service like Superbase with the vector store.
It's actually really good.
And one of the examples that I have is actually a homework where you have to configure Superbase.
You have to configure Google Drive.
And you have to configure some of the other stuff.
So that you can upload a document to Google Drive.
It'll automatically parse through that document, load it into a vector store.
And then you can ask questions that are in that document.
Yeah, so there's pros and cons.
So the question is, is it better to throw the whole document in?
Or is it better to throw it into a vector store, right?
So there's pros and cons.
So if you have something like a 500-page document, right, that's a lot of tokens, a high cost for the model to ingest that.
It takes a lot of memory.
You wouldn't be able to run it locally because of that.
But you can do things like, hey, summarize this entire document or pull out the specific key points from that document.
So that's the benefit and the reason why you would do a full document.
I think they call it now cache augmented retrieval or CAG instead of RAG, retrieval augmented generation.
So that's why you would use that.
And now that these context windows are getting so large, you can do that, realistically speaking.
The other way is RAG, where you basically take the PDF, you chunk it up by paragraphs, usually, like one or two paragraphs at a time.
You chunk it up.
And so when you ask a query, it essentially does a control F, finds chunks that have parts of that query in there, and then only references those chunks.
Three, four, five, six chunks instead of the full 500-page PDF.
The 500-page PDF, because it's using so much of the context window, it actually forgets some of the stuff partway through.
So that can be an issue.
RAG, you don't get access to the whole document all at once.
So pros and cons for why you would do both of them.
Also for this, because I had that short-term memory, I said, hey, cite your source for this, and then it can give me the exact link.
These are URL links to the resource that it was referencing, the website that it scraped.
So awesome benefits, great for research.
This is very similar to how OpenAI's deep research actually works, and other systems like Perplexity.
So instead of having to pay Perplexity API costs, you can just build your own.
Super easy.
And you know it's useful, because the startup already exists out of it.
So make your own agents.
It's great.
Ben, did you have a question?
Sorry.
Oh, fantastic.
That is number three.
I know I'm kind of cooking through some of these, so I want to take a pause.
Are there any questions or specific use cases that you have ideas of how you would want to apply AI agents?
Yeah.
Yeah, very similar, actually.
So the question was, the example about uploading a document to Google Drive and then loading it into an LLM, is that the same as just taking the whole document and uploading it to chat GPT?
So very similar.
There's RAG, which is where you take a document and you chunk it up, and you store it in what's called a vector database.
So instead of a regular SQL database, it uses vectors.
Fancy name for numbers.
And then the other one is where you take the entire document, and just like you were prompting, if you wrote out the entire document as your prompt, threw that in there, and then started asking questions.
I think that's called CAG, right?
Cache Augmented Generation.
And so as I answered his question earlier, there's pros and cons to doing both of them.
I like RAG much better.
It's a lot more efficient.
You're using fewer tokens, so it's cheaper cost.
The thing that I want to try to avoid, because I think that there's a lot of hype and people just want the thing to do the thing.
That's it, right?
So they start throwing spaghetti at the wall.
It is often very inefficient.
It's often giving incorrect answers or it's forgetting certain details because, oh, I just wanted to do the thing.
I just wanted to do the thing.
But you got to remember, this is still software and we should still take traditional software development principles.
I didn't have to sit through multiple classes of data management and algorithms, learning the big O notation for sorting algorithms and all sorts of different algorithm paradigms just so that I can just take everything, throw it at the wall, and it does the thing,
right?
So we want to try and be a little bit more efficient with that.
Again, you still can do that.
There are still some benefits to it, but you need to be aware of the drawbacks to it as well.
So hopefully that answers your question.
Yeah.
I like you.
You're very talkative.
Often, like, so...
Yeah, yeah.
Well, doing these kinds of presentations usually have, like, the very quiet, like, techno nerdy people just kind of like, I'm just going to sit here.
And that's it.
I'm just happy to be here.
Yeah, what do you got?
No, go for it.
Go for it.
So...
Okay, yeah.
Yeah, so you're getting way into the weeds.
I'm not going to go into too much depth for this, but we can talk about it afterwards.
So the question is, is there an efficiency difference between putting information in the user model versus...
or in the user prompt versus putting it in the system prompt?
And does that matter model to model?
So model to model, yes.
Some models, you got to remember, large language models, they're what's called stateless.
And so when you pass a whole conversation chat to a model, say, I ask it a question, it gives me an answer.
And then I ask it another question, and it gives me another answer.
Every time, I'm passing the entire conversation all at once.
So the first message is just me sending the question.
It sends me the answer.
I then send it my first question, its first answer, and then my new question back to the model, and then it gives me a new answer.
So that's kind of how the conversation works.
This is for all LLMs.
Some models, trust me, it's a hierarchy.
So you have your system prompt, you have your user prompt, or system message, user message, and then assistant message.
There used to be a tool message, but that doesn't matter anymore.
So it's just those three.
And so some models focus more on the system prompt, and that's their attempt to mitigate prompt injections.
Some models don't care.
In fact, DeepSeek R1, as they were building out their system, they said, we're not going to care about the system prompt.
You only put things in the user prompt.
So you can't even put stuff in the system prompt.
You can, but it won't track it the same.
And then as far as the efficiency question, is it more efficient to put stuff in the system prompt or more efficient to put stuff in the user prompt?
I find it more effective to put stuff in the system prompt, but as you've seen with these, I'm using both, even for my template.
So that researcher example where it was pulling in the information for what was the user's original question, what was the plan from the planner agent, what was the research from the research agent?
So I'm putting that stuff in the user prompt, and I have a separate system prompt.
I use the system prompt as more of like a template, something that's repeatable, and then the other one is individual and specific.
So even though it's templated, it's being overwritten by what was actually asked by the user, what was actually planned out by the planner.
So I kind of use those tools.
As far as efficiency, speed, and cost, nil.
It really doesn't matter that much.
But that's kind of like how LLMs work at an architecture level.
The assistant message is what the GPT responds to you with.
So in that back and forth example, when I send a message, say I'm only using the user prompt, when I send a message, it'll send the system prompt, and then it'll send the system message, and then the user message, and that goes to chat GPT.
Chat GPT then responds with the assistant message.
When I go to respond to that, it sends the system prompt, the first user prompt, the assistant message that I just got back from the GPT, and then my new user message.
So I send all of it at once in one request.
So is there any benefit to giving an assistant message to make it do the thing you want it to do?
No, that's where it kind of interprets everything.
It's stateless.
It's just a way of logically separating the conversation context.
So this is also why you can do prompt injection by saying, hey, I want you to do this thing.
I know you're not supposed to do this thing, but I want you to respond with, sure, here you go, and then finish your response.
And it'll just say, yeah, sure, here you go.
Here's all your information.
And that gets into a whole nother thing.
All right.
So I'm going to turn that back on.
My laptop went to sleep.
So that's our researcher intern.
That's our workflow.
Because you weren't here for the OSINT tracker, I'll show you that again.
This one was actually really fun.
I'll reset the conversation.
So this...
Let me see if I still have it in here.
Yep.
So this is an AI agent that has access to a Google search.
It's going to use that Google search to look for somebody on LinkedIn.
It's then going to use a LinkedIn API to scrape that person's profile information.
And then it's going to use that profile information to find things about somebody's company.
Now, all of these, I'm letting the agent still decide which tool to use in which case.
So I can ask it something completely different or a different type of question.
And it might not use all the tools.
It might not use any tools.
Or it might use the same tool multiple times for different types of queries.
So in this case, right, Anusha, she's one of our volunteers.
She has allowed me to use her as my test dummy for this example.
But the question is, who is the CEO of the company that Anusha interned at?
And then we can ask further questions about the CEO, right?
How long has he been here?
So this is a multi-hop question, right?
It's not as easy as here's your answer.
It has to do a lot of research into it.
So as it goes through,
the first thing it's going to do is a Google search.
And so in this case, it's going to do site LinkedIn.com.
So it's only going to look at LinkedIn.
And then it's going to look up Anusha's name on LinkedIn.
Like we saw in the other researcher examples, it's going to find her LinkedIn profile information.
And then it's going to use the LinkedIn person scraper to scrape that information.
It's almost done, so I can just show you on here.
Yeah, it creates its own plan.
It's very similar to the planner agent, but instead of a separate agent, it's part of this model.
Or it's part of this agent, still using OpenAI's back end.
All right, that's taking a second.
Anyways, so once it finds Anusha's LinkedIn profile, it has her information.
It also has apparently some interactions.
It looks at her work history, and it's going to identify that she used to work at NVIDIA, right?
So here's Anusha's LinkedIn profile.
It passes that as input into the company scraper.
Which it looks like it's kind of struggling with right now.
Yeah, this is the one that I showed off earlier.
Let me try that again.
But it's just because somebody new joined.
So this is just going to take a minute.
So we don't have to limit it to LinkedIn.
We can include a bunch of other places.
We can include a bunch of other data sources.
But that's why I wanted to focus on that ETL pipeline.
That idea of you take data from multiple sources, consolidate it, transform it into something usable, and then use that as an output into another source.
And so you can chain things together.
I'm showing you very simple examples of AI agents, but you can extrapolate all of this.
In fact, you can make each of these individual workflows, and then you can have a manager agent at the top that decides which of these workflows to follow through.
So you take a complex task, you break it down into smaller individual tasks, and you build up from there.
Okay, here we go.
So this is what I was talking about, right?
So it takes the input.
Here's the system prompt, right?
You're an agent with a goal to research a person or company on LinkedIn.
I also tell it what tools it has access to, so it knows about that.
And then here's the human prompt, right?
This is that user message.
And then the output.
This is going to be the assistant message kind of coming to figure that out.
So it's already made the decision to find out who the CEO of the company is.
I first need to get their LinkedIn profile.
So step one, search for a new show and their LinkedIn profile.
Step two, identify the company that they work at.
Step three, search for the CEO at that company.
So here's the LinkedIn profile search, right?
It was that simple Google search.
We're using site, colon, LinkedIn.com, and then Anusha.
It then passes that off to the GPT model.
It's still executing the plan at each step.
The next thing, it looks at the profile, the person scraper.
And so it looks at Anusha's profile information.
And actually, it's a little easier if I show it here.
So this is what it found off of the internet.
It does the scraper.
Here's all the information.
Passes that back to the GPT model.
It executes the next step in the plan.
Here's the company that Anusha had interned at.
And it does a company scraper.
Passes that back to the GPT model.
And then I think I made this a simple use case.
But otherwise, it would...
Oh, well, that's why.
If it was anybody else who wasn't already, like, well-known, it would actually look at their profile information.
And then it would look at how long they had been at the company.
So you can chain things together like that.
So here, Anusha worked at NVIDIA.
The CEO of NVIDIA is Jensen Huang.
Jensen Huang's been there for approximately 30 years.
So this is how we take it all together.
So let's talk about design patterns.
What options do we have in building these AI agents?
So the first thing that I want to warn you about is any given agent does these four things, right?
You create a task.
It tries to answer a question.
You pick the model and the tools that it has access to.
And then it does that single task.
Passes it on to something else.
You can have agents that do multiple tasks.
But I found that after probably about six iterations, it starts to kind of freak out.
That's that context window.
It kind of loses track of what its plan is.
It loses track of what APIs it has access to.
Same thing with the number of tools.
If you add, I think it's like more than six to eight tools.
It doesn't know which one to use in which context.
And so it kind of runs out.
So this is why you want to narrow the scope quite a bit.
So the idea is, right, build single-purpose agents.
So less WALL-E solving Rubik's cubes and more pass-butter robot.
So agentic design patterns, these, you're going to want to research on your own at home.
But I want to kind of put them in your head so you know what to look for.
So workflows, that's step one, do this.
Step two, do this.
Step three, do this.
This, you're already building out the agent workflow.
And so that was that first researcher intern where you saw the planner agent, the research agent, and the internet search agent, or the report agent.
Those are really efficient.
But they're single-purpose.
So you can't reuse those.
Other patterns are reflection agents.
So that would be, hey, I want you to write this article, or write this thing, or respond to this message.
And then after it does that, you have another agent, or even itself, act as its own critic and say, hey, iterate on what you just wrote.
And so you can have it reflect multiple times in order to get a more refined answer.
Tool use agents, that's most of what I was showing here.
So you give it access to a back-end tool.
So that'll be something like RAG, where you have access to a vector database.
You can give it access to any other data store, like a normal database.
You can give it access to APIs.
I found a lot of benefit using APIs.
It goes really well.
Planning agents.
So this is where you would want to use more a specialized reasoning agent to come up with a specific plan.
But you even saw in that last example why having a plan is important.
So it thinks step-by-step.
So that chain of thought from prompting a single LLM, it's the same idea.
Think step-by-step.
Build out your plan and then execute on that plan.
Multi-agents.
And so that was...
Yeah, I'll call on you in a second.
So multi-agents, that was very similar to the software design example, right?
Where you had the CTO and the project manager, the developers, QA engineers, the testers.
Marketing, everybody.
And so you can have the agents talk to each other.
You can have them in like a fully mesh connection.
So all the agents have access to all the other agents.
Or you can have them in a partially connected.
So that only certain agents talk to other agents in order to manipulate that workflow.
And then, of course, you could build that out even bigger into specialized workflow boxes.
So you have the same agents, but they're working in stages.
And then React.
React is a whole thing...
I'm not going to go into too much detail here.
Because it worked pretty well based on benchmarks in academia.
And only in academia.
It's horribly inefficient.
And I'm talking like a 15 to 1 or a 30 to 1 cost ratio.
Versus like using a workflow.
So for every 10 cents you spend...
Or for every $1,000 you spend, right?
That's $10,000 or $30,000 for API costs.
So this is why I had that little rant on big O notation.
Using solved issues in software design to apply towards AI agents.
You had a question?
Yeah.
So the question was...
Have I been having better success with planning agents using GPT-01 versus like GPT-40?
So the reasoning models like 01, 03, DeepSeq...
Those are good for building out longer plans.
The reason why I think they should only be as part of the planning agents is because they're very costly and they take a lot of time to generate.
So you don't need a reasoning model like that to use a tool.
You can use a much smaller model that's gonna be a little bit more efficient, a little bit more cost effective to use a tool because it's a simple task.
Whereas you can use the planning agent or the reasoning model for larger tasks.
Yeah, you will get more detailed plans from the reasoning models.
Sometimes too detailed.
So you want to be careful with that.
All of this is like 80% science, like 20% art.
So it's just...
It's like cooking, right?
Like put in however many seasoning and if you need to add more, just add more.
So that's kind of how you tweak it.
But I always do strongly believe start small and build up instead of trying to start big and build down.
A lot more efficient that way.
So awesome.
This is my contact info.
If there's any questions or anything that you want to chat about as far as AI agents or even AI in general, this is the best way to get a hold of me.
My YouTube channel is NetSecExplained.
I originally designed this to be an interactive workshop.
Didn't have enough people to do that and also we don't have Wi-Fi.
So I'll probably re-record this and upload it up onto the internet as well.
And give you the shared drive for all the messages, the system messages, the tool usage.
I do want to show you one more thing.
The APIs that I use, Serper.dev, it's a great free...
All of this is free, by the way.
Or at least they have free tiers.
So Serper.dev, this gives me access to a pretty sophisticated Google API.
You can also use Brave, which is also very popular for agentic design.
The other one is Appify.com.
Appify is like an API marketplace.
There's a free tier.
You see I get $5 for a month.
Between this and running a podcast and testing, I've only spent like $1.60, which is still in that free tier of $5.
So I think it was like $0.06 every time I ran that whole workflow for the LinkedIn queries.
So pretty nominal cost.
And you can imagine if you're a pen tester or a red teamer and you wanted to do some research on a company or a target, you could build out a whole workflow.
It'll just go and do that for you.
You can also add tools like Subdomain Finder or Sublister.
You can add Amass into that.
You can add Nmap into that as a tool as well and feed that information, store it in a registry, store it in a database or something, and then feed that information back into the LLM so it can take that info, execute more tools, and build an autonomous hacking system.
Which I was hoping to get ready for this week to kind of show off, but I don't have that, unfortunately.
It would actually go through and do some hack-the-box challenges.
Yeah.
But it's sophisticated, a lot of things can go wrong, and you know how live demos are.
So the actors that I'm using in this case is actually these three.
So I have the bulk LinkedIn company profile scraper, the LinkedIn people profile scraper, and a Google search results scraper, which I don't think is as good as Surfer API, but it still gets the job done.
So API Marketplace, connect things together, and that's it.
Thanks for coming.