Alright, check this out, we got this game called Gandalf, right? It’s all about teaching people about the risks of prompt injection attacks on big language models. But here’s the thing, it had an unintended expert level, man. They had this analytics dashboard that was publicly accessible, and it showed all the prompts players submitted and a bunch of other metrics.
So this company, Lakera AI from Switzerland, they took the dashboard down when they found out. But get this, they’re saying there’s no reason to worry because the data wasn’t even confidential. They launched Gandalf back in May, it’s like a web form where users try to trick the AI model into revealing passwords by giving it different challenges, you know?
Basically, users give the model some input text to try and bypass its instructions. Then they gotta guess the password using that input they tricked the model with, you feel me?
How prompt injection attacks hijack today’s top-end AI – and it’s tough to fix
But here’s the kicker, they had this dashboard built with a Python framework called Dash, right? And some security dude from Australia, Jamieson O’Reilly, spotted it. He wrote up a report for The Register, saying this server had like 18 million prompts and 4 million password guesses, plus a bunch of other game-related stats. And get this, he could access hundreds of thousands of those prompts through HTTP responses from the server. Damn!
So this dude O’Reilly, he’s like, “Hey man, this whole thing is a simulation to show the risks with these big language models. But, come on, they gotta step up their security game, dude.” He’s saying if this data gets in the wrong hands, it could help bad actors figure out how to beat similar AI security stuff, you know what I mean?
This data could serve as a resource for malicious actors seeking insights into how to defeat similar AI security mechanisms
But get this, the CEO of Lakera AI, David Haber, he’s not worried. He emailed The Register and was like, “Nah man, one of our demo dashboards with some anonymized prompts was just for demo and educational purposes. We’ve been using it for webinars and stuff to show how creative input can hack these language models.”
He’s saying there ain’t no personal info or anything like that in the data. They took the server down just to avoid any confusion, you know? According to Haber, it’s all good because they’ve been sharing the data with people anyway.