Claude Mythos | Episode 49
E49

Claude Mythos | Episode 49

Bronwen Aker:

Hello, and welcome to AI Security Ops, the podcast where we cut through the hype and explore the real world intersection of artificial intelligence and cybersecurity. Each week, we examine how AI is reshaping both sides of the security landscape, both in the threats that we face and in the defenses we're building. I'm Bronwen Aker, and I'm joined today by Derek Banks and Brian Fehrman. And in this episode, we're digging into one of the most talked about AI stories well, the stories in AI and security going on right now, namely the story about Anthropic's Claude Mythos preview, a model that is so capable of finding and exploiting vulnerabilities that Anthropic decided not to release it to the public. They decided the public simply can't have it.

Bronwen Aker:

This show is brought to you by Black Hills Information Security and Anti Syphon Training. BHIS helps organizations identify and close real world security gaps through penetration testing, adversary emulation, purple team engagements, and managed detection and response. We do SOC as a service. Yay. AntiSyphon, on the other hand, delivers hands on practitioner led training that is built around real world tools and real world attacks.

Bronwen Aker:

So you can take what you learned in your courses and apply it immediately. Learn more at blackhillsinfosec.com or for trainingantisiphontraining.com.

Derek Banks:

So I guess this week we're gonna talk about Mythos because no one else is talking about Mythos. But we'll we'll give you we'll give you our takes What on is this Mythos thing?

Bronwen Aker:

So It's a mythic mythic mythical AI.

Derek Banks:

Yeah. Which, you know, I guess that's why they named it Mythos. So not publicly released. Existence was originally leaked back, what, two two weeks ago, probably three weeks by the time this episode comes out. Leaked because some some reporter had found, what'd you say, 3,000 articles that were unpublished, I guess, on either a Yeah.

Derek Banks:

Dev Which that seems like an awful as as a company that, you know, at Black Hills, we put out a lot of content. To have like 3,000 unpublished things is like our content and communities like Dream. Right? Like we're always being asked, hey, can y'all write more blog posts? But to have 3,000 unpublished in the hamper would be like, man, they'd probably go for three or 30, right?

Bronwen Aker:

Anthropic is blaming a human error in the CMS configuration for the initial breach. But then they got a second leak that exposed 512,000 lines of Claude code and source code through an MPM package.

Derek Banks:

Yeah. That was what? Yeah. Like, I guess the source maps or something were included in an MPM package bill or something and the source code for Claude code was released or leaked rather. Yeah, I mean I would definitely buy that but there's definitely folks who are very cynical.

Derek Banks:

Seems like Anthropic is one of those companies now that's starting to there's this polarization of people who like Anthropic and people who don't like Anthropic. And I don't know. All through all this time, even like of the rumors of like, you know, Claude not working well like now, which I haven't experienced. I assume I mean, you know, I'm not an LLM engineer, but I might have stayed at a Holiday Inn Express last night. You know, when you're under a lot of load, they might quantize the model they're offering you.

Derek Banks:

And they have a lot of demand and it's not you know, I think people underestimate the engineering feat that OpenAI and and Anthropic and Amazon and these model providers like like they're this isn't insignificant what they're doing. I mean, they're running thousands and thousands of GPUs to host these things for us. And so, you know, did some people catch, you know, Opus when they had to quantize it under load or something? Like, I don't know. But either way, I have not had the same experience.

Derek Banks:

But I definitely think there's at least some fuel to people being cynical about the source code leak, the blog release and now this whole Mythos thing, the statement they came out with. And I'll let you all handle the statement because I I wanna hear you all's takes. Actually, heard Bronwen's take. I was watching the news yesterday. So I did hear Bronwen's take.

Derek Banks:

So let's go with Brian. Brian, what do you think about Mythos?

Brian Fehrman:

What do I think about Mythos? So, we're talking about the statement of I'm I'm trying to find it here in the in the notes. We're talking about the statement of them withholding it because they're concerned about the security implications.

Derek Banks:

Any of it. What What do you think about any of it?

Brian Fehrman:

Any of it. Oh, I mean, like man, I'm gonna be honest. So I personally think that part of this is, is like a publicity stunt, a marketing ploy for them to withhold to say that they are withholding this model because they are concerned about the security implications of this of people getting a hold of and using this model.

Derek Banks:

It's just too good.

Brian Fehrman:

It's just too good for the public to have. Like, we just cannot allow this to be in the hands of the general public. And so we're only giving it to the elite few, who we're going to allow to use it for now. And then they're gonna make all these they made all these statements about all these capabilities, but yet we haven't been able to verify it for ourselves to see see what's going on. And so I'm not saying that it doesn't have those capabilities.

Brian Fehrman:

It's not able to do it. But to go the route of, like, oh, this is just this is too dangerous of a tool to put in the hands of people is, in my opinion, nonsense. I mean, there have we have been the community has been pushing out security testing tools for a long time now, and I know that there has been debate over the practice of that of whether or not that is ethical, whether or not that is for the the better good of everyone. But nonetheless, it it happens. And furthermore, I would say, I would I would be curious of how much better this new model actually is versus the previous ones.

Brian Fehrman:

Like, what difference that is making versus just the general scaffolding and tooling that they are putting around leveraging the model. Right? So, like, with a lot of these models, the power isn't just in the model. It's in how you leverage it and how you use it. And so I I part of me just feels like this is a marketing ploy so that when they do open it up, companies are like, oh, we have to have this now.

Brian Fehrman:

Like, we have to jump on this. We have to pay for this. If we don't, we're gonna fall behind. So

Derek Banks:

And and all the communication that's coming from our customers seems to indicate that is like, we've already been asked by multiple customers, hey. When y'all getting that Mythos thing?

Brian Fehrman:

Yeah. Exactly.

Bronwen Aker:

It's interesting. Apparently, Anthropic is claiming they're only going to allow access to 40 vetted organizations. 40. That's

Derek Banks:

Why why an arbitrary number of 40?

Brian Fehrman:

I I

Bronwen Aker:

don't know. It it seems an insanely small number. And just for the record, am I the only one who sees the irony in an AI company saying, we made a model that's too smart. And this is after all of the push and all of the just headlong rush for AGI. And now they get a model that is super smart, that can do all kinds of things and can even break out of a sandbox, and now they're freaking out.

Bronwen Aker:

It's like, is this what

Derek Banks:

you were OPUS six can break out of a sandbox. Ask me how I know. Or OPUS 4.6 rather can break out of a sandbox. Ask me how I know. Right?

Derek Banks:

Because I turn on the sandbox feature of Claude code and watch the the the coding agent go, I can't execute this because I'm in a sandbox. Let me change that in the JSON. Oh, now I can access it. Like, okay. That's great.

Derek Banks:

Right? So I guess I agree with Brian that, like, I think there's a lot more to scaffolding code around large language models and how you process the input and deal with the output and the whole agentic loop thing. Like I I I think there's more to the scaffolding than people give credit to. And as someone who's been building like AgenTic pen testing solutions, I almost don't care if it's a better model because I think the scaffolding is more important. Right?

Derek Banks:

I really do. Like and and also, I'll say that if it is that much better at cybersecurity tasks, it also means that it's very likely that much better at all the other tasks like two. Right? But also Pixar, it didn't happen. Right?

Derek Banks:

So to claim that it's found all these vulnerabilities, well specifically what kind of vulnerabilities? Because you found a thousand vulnerabilities in Windows, and I'll be arbitrary. And I mean how many of them are serious or significant? Because I've definitely seen AI find a vulnerability in my testing. And then a tester, you know, someone who's a penetration tester's experience come and look and go, yeah, I guess it's technically a vulnerability, but that's a lot of a lot of things would need to fall into place for that to have actual impact.

Derek Banks:

Right? And so I guess I'm with Brian. I'm skeptical and I also I think that they might be tired of releasing like frontier models and having folks distill them and release models that are oh you know, 85% is capable and 90% cheaper. Right? Ask me how I know about that too.

Derek Banks:

Right? And so I have some skepticism.

Bronwen Aker:

Well, okay. Some of the specific things that they claim Mythos discovered. A twenty seven year old bug in OpenBSD's TCP stack or s SAC, s a c k implementation, a sixteen year old FFmpeg codec vulnerability that apparently automated fuzzing tools missed a lot. A seventeen year old FreeBSD NFS remote code execution vulnerability, CVE twenty twenty six forty seven forty seven, Linux kernel vulns, well, browser exploit chains, I really don't see as being that momentous because browsers, of course, are evil, all of them. I don't care.

Bronwen Aker:

So, you know, I'm not surprised that it found these vulnerabilities, but I have a question as to is Mythos the only model that they've attempted to find vulnerabilities like this with?

Brian Fehrman:

Yeah. Well and I'm looking so, I mean, just listening to those and and and thinking about it, I mean, so most of those that they mentioned are are open source tools. Right? And they even mentioned for the, I think, the FFmpeg one, the, you know, they found the the one line of source code that had been overlooked. So it's doing so to me, it sounds like a lot of this is just source code analysis.

Brian Fehrman:

I I I hate to say just source code analysis, but it's, it's doing what AI is is good at. Like, if you have the source code, like, can feed in the source code of a program, that's a whole lot different of a task versus saying, hey, AgenTeq AI. Go pen test this web application where we don't have source code or, kind of what Derek has mentioned earlier, go find vulnerabilities in the Windows kernel where we don't have access to the to the source code. You know, these finding finding issues in more of the the close the closed source, applications, whatever way they might manifest themselves in more of that agentic manner versus just shoving in a bunch of of source code to it, I think those are those are two different things. Right?

Brian Fehrman:

And, yeah, I would be curious to see too, I mean, how well this stacks up against, a lot a lot of the other models. I mean, feeding in the the same amount of source code. I know they have, like, some little metrics, but still, I mean, it's it's easy enough to contrive those numbers depending on different prompting and different ways that you go about it. Right? But

Derek Banks:

Yeah. And this kind of elitism stuff where only some companies are gonna be good enough to, you know, ride this ride, like, really irritates me and makes me wonder if other companies might go in opposite direction. Because sort of along the same time frame, we saw Google release Gemma four as an open weight model that'll run locally on your system. And I I haven't done a lot of heavy lifting with it yet, but so far, what I'm hearing and my limited testing with it, it seems pretty powerful. And the hacker in me wants AI to be available for all.

Derek Banks:

Right? Like, I don't think it should be. And I think this sets the precedent of, well, you know, the good AI is only gonna be used by the elite few and the rest of y'all get this, you know, second tier crappy AI. Yeah. I don't like that.

Derek Banks:

I know the Chinese basically lead in the open weight model stuff, but I really would like to see some other US companies take Google's lead and give us a model file that we can run locally on our hardware that does you know I'm not saying it has to be as good as the frontier models. I know it won't be, right, because you can only run so much, but it sure would be nice to be able to do what I'm currently doing with Claude code locally on my machine, and I do think that's in the realm possibility. And I'm excited about, you know, a US company releasing an open weight model and being at least somewhat more transparent and not as elitist.

Brian Fehrman:

Here's I mean, another thing too besides just like the the I don't know if I wanna say beside, but also kind of going along with the elitism is look at the look at some of the companies that are listed as having access to this and think about the amount of money that they have funneled into the technologies in Anthropic. So then it becomes basically a pay to play, but, pay to play at a much higher level. Right?

Bronwen Aker:

We're not talking like are a lot higher.

Derek Banks:

Yeah. If if I was gonna believe in your altruism, instead of saying 40 companies, be like, if you're a vetted US security company, like you're Huntress or Black Hills or, you know, trusted sec or whoever, yeah, you can apply and get access to it. Just like, you know, getting, like, you know, access to any kind of other service after we get to prove we're, like, a business and we're legit. Like, why wouldn't we be able to get access to it? That seems, like, really arbitrary.

Derek Banks:

And, you know, maybe they have, like, CPU can like, load concerns or something. I don't know. But, like, I I don't know. I just don't really like that that situation.

Bronwen Aker:

No. I mean, who decides who is vetted enough to have access? I mean, it it is elitism. And that that business of the the pay to play, man, that's a slippery slope because we know how none of the AI companies that I'm aware of are actually turning a profit from their own generated income. They have if they have positive cash flow, it's because they've got VC pouring money into them and other people pouring money into them, but they are not self sustaining.

Bronwen Aker:

And I think if I were with any of the frontier organizations, I would be a little concerned about that because the current state is not sustainable. But anyway, that's that's off topic. The the other thing too is that it seems to me like Anthropic is using this hype around Mythos as a way to draw attention to Project Glasswing. Are you are you both familiar with Project Glasswing?

Derek Banks:

Yeah. I read about it. And and so it seems like that in a nutshell, those are the companies that are to get access to the models because they have they have what? Critical infrastructure like AWS or or, you know, what's it? JPMorgan Chase.

Derek Banks:

I guess they're financial critical infrastructure. And I get that. But when we're talking about critical infrastructure, like, okay, AWS and JPMorgan Chase, sure. Also, my power company. Like, I wanna go with that's critical infrastructure.

Derek Banks:

My water company, are they getting access to this? I mean, sorry. I need power and water before I need AWS and money. Right?

Bronwen Aker:

Like Power, water, gas.

Derek Banks:

Maslow's hierarchy of needs here. Right? Yeah.

Bronwen Aker:

I mean Yeah. So for those who aren't familiar with Project Glasswing, it's a defensive cybersecurity initiative that allegedly is being built by Anthropic around Mythos. The launch partners include little companies like AWS, Apple Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, trying to say that five times fast Linux Foundation, Microsoft, NVIDIA, and of course Palo Alto Networks. So allegedly there's something like a $100,000,000 in usage credits and $4,000,000 in donations that are going to be given to open source security organizations, I would love to know who those organizations are.

Derek Banks:

Hopefully OpenSSL, since a lot of those companies that made billions of dollars probably use OpenSSL. Right? Yeah. Yeah. I again, I just I don't know.

Derek Banks:

Like, what makes CrowdStrike any different than what we do here at Black Hills? Like, how come we can't get access? Like, it I'm not saying that they can be sued for it, but I'm sure someone's gonna try. Right? Like, it just seems like very like, you're picking and choosing who you're gonna sell to on what basis.

Derek Banks:

Like, why how are you picking them? Like, I don't know why is CrowdStrike better than huntress or us or trusted sex, you know, binary defense. Size?

Bronwen Aker:

How large is CrowdStrike compared to Black Hills?

Derek Banks:

I mean, they're probably 10 times our size. Like, I think, you know, based on what we have at our SOC and, you know, pen testing, I think looking at you know, I've done some competitive analysis. I think they had somewhere in the neighborhood of a thousand to 1,100, like, SOC customers last I checked a couple years ago. So they are bigger than us. No doubt.

Derek Banks:

Does that make them more important? I I would say no. And so I guess what I would suggest if someone from Anthropic I know they don't listen to our podcast, but if they did

Bronwen Aker:

They might.

Derek Banks:

I would say, I think I think you should open it up to a little bit broader information security audience because, you know, the folks at CrowdStrike, while they do great things, they're just one tiny little sliver of all of us that have been doing this kind of work for a long time. And I find it a little infuriating that we can't be part of that club.

Brian Fehrman:

Mhmm.

Bronwen Aker:

Especially since our testers have been known on occasion to evade CrowdStrike defenses.

Derek Banks:

It used to be a joke. Right? I mean, I'm sure it's better now, but I mean, AI I was on many a test personally where all you had to do was this to get past CrowdStrike. And so yeah. I mean, I'm not saying that they don't have a great product, but they're just one tiny little sliver and I don't think it's fair.

Brian Fehrman:

Yeah. Yep. Yeah. I I completely agree.

Derek Banks:

Alright. So Yeah. I think, that probably beats up, our Mythos take. Right?

Brian Fehrman:

Yeah. I think

Bronwen Aker:

That that too dangerous to release framing is it's kind of sketchy. It will be interesting to watch.

Derek Banks:

One other part we didn't really cover and that is the quote vulnpocalypse concern mostly because I wanted to say vulnpocalypse. But so it was either HackerOne or or BugCrowd. I think HackerOne Corey was saying yesterday on the news, and I'll repeat it, that they've suspended their bug bounty programs.

Brian Fehrman:

Yes. Yeah. That's great. Yep.

Derek Banks:

So okay. Again, it's one thing to find a thousand vulnerabilities. It's another thing for them to be legit true impactful vulnerabilities. And so if we have a whole lot of AI slot vulnerabilities, which just seems to be what's happening, again, is Mythos doing the same thing? And and yeah, I agree.

Derek Banks:

There's probably a gem or two in there, but like really what you know, like what Brian said earlier, is that the model or the harness? I've gotten pretty good results on existing models with decent harnesses. Just saying. So I'm I don't know that the Vompocalypse is nigh. So I just wanted to say that.

Bronwen Aker:

Well, if it if it is, it's not just because we have bigger and better and badder models that are able to do more. But the the key takeaway with HackerOne no longer accepting submissions, they've had the number of submissions go up by orders, plural, of magnitude. A lot of them are slopped. They can't keep up. And the really sucky part is that our bug bounty system has zero rewards to organizations or company that fix vulnerabilities that are found.

Bronwen Aker:

And that's a huge disconnect. But it has nothing to do with Mythos or its supposed capabilities, but it highlights how our entire vulnerability identification and remediation process I'm getting hammered for saying things are broken, but I can't think of a better way to describe it. It's broken. It doesn't work. And so now we're not gonna have a bug bounty program through the biggest organization.

Bronwen Aker:

I know. May you live in interesting times.

Derek Banks:

Yeah. It's the ancient Chinese curse. It's a tale of two cities. Yeah. Best of times and worst times.

Derek Banks:

Right?

Brian Fehrman:

Maybe we need to make a patch bounty program. Yeah.

Bronwen Aker:

You know what? I I think that it's a lot harder to patch than it is to find the vulnerabilities. And sometimes it isn't easy to find the vulnerabilities, so in no way am I dissing the people who are finding the vulnerabilities. No way at all but then figuring out how to close that barn door. And I really would love to see companies like Anthropic and whatnot great.

Bronwen Aker:

You felt you built a model that can go and break all of this stuff. Now can that model turn around and help fix it?

Derek Banks:

Yeah. Actually, yes, it can. I I have definitely had Claude Mythos code help me fix lots of things that have been painful in the past, specifically Linux NVIDIA drivers. But yeah. So I think I think that it can.

Derek Banks:

And I think that unfortunately we live in a time when fixing things and those kinds of AI applications aren't what gets you in the news. It's I hacked something with AI. So I'm sure there are companies out there that are using AI to help out systems administrators. And if not, well, it's probably coming soon.

Brian Fehrman:

Yep. Yeah. People enjoy watching the fire more than they enjoy watching the fire be put out.

Derek Banks:

Alright. Sweet. Well Yeah.

Brian Fehrman:

Cool. Well, if, for any of the viewers, if you guys have, hot takes, cold takes, or, lukewarm takes, whatever, you wanna put them in the comments, we'd love to hear them.

Bronwen Aker:

Yes. Please please do. Your comments help let us know if we're doing a good job and and what else you might wanna hear about.

Derek Banks:

And we'll say, and with that, keep on prompting.

Episode Video

Creators and Guests

Brian Fehrman
Host
Brian Fehrman
Brian Fehrman is a long-time BHIS Security Researcher and Consultant with extensive academic credentials and industry certifications who specializes in AI, hardware hacking, and red teaming, and outside of work is an avid Brazilian Jiu-Jitsu practitioner, big-game hunter, and home-improvement enthusiast.
Bronwen Aker
Host
Bronwen Aker
Bronwen Aker is a BHIS Technical Editor who joined full-time in 2022 after years of contract work, bringing decades of web development and technical training experience to her roles in editing pentest reports, enhancing QA/QC processes, and improving public websites, and who enjoys sci-fi/fantasy, Animal Crossing, and dogs outside of work.
Derek Banks
Host
Derek Banks
Derek is a BHIS Security Consultant, Penetration Tester, and Red Teamer with advanced degrees, industry certifications, and broad experience across forensics, incident response, monitoring, and offensive security, who enjoys learning from colleagues, helping clients improve their security, and spending his free time with family, fitness, and playing bass guitar.