Close Menu
AsiaTokenFundAsiaTokenFund
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
What's Hot

Bitcoin Hitting $76,000—Is This a ‘Dead-Cat-Bounce’ Setup to Drag the BTC Price to $50K?

March 17, 2026

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

March 17, 2026

Bitcoin Just Flashed The Most Powerful Fractal In The Market, Here’s What To Expect

March 17, 2026
Facebook X (Twitter) Instagram
Facebook X (Twitter) YouTube LinkedIn
AsiaTokenFundAsiaTokenFund
ATF Capital
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
AsiaTokenFundAsiaTokenFund

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

0
By Aggregated - see source on March 17, 2026 Blockchain
Share
Facebook Twitter LinkedIn Pinterest Email


Alvin Lang
Mar 17, 2026 19:21

OpenAI details new ‘Safe Url’ defense system treating AI prompt injection like social engineering, with attacks succeeding 50% of the time before fixes.





OpenAI published technical details on March 16 revealing how ChatGPT defends against prompt injection attacks, acknowledging that sophisticated attempts now succeed roughly 50% of the time before triggering security countermeasures.

The disclosure marks a significant shift in how the AI lab frames these security threats. Rather than treating prompt injection as a simple input-filtering problem, OpenAI now views it through the same lens as social engineering attacks against human employees.

Attacks Have Evolved Beyond Simple Overrides

Early prompt injection was crude—attackers would edit Wikipedia articles with direct instructions hoping AI agents would blindly follow them. Those days are gone.

OpenAI shared a real-world attack example reported by external security researchers at Radware. The malicious email appeared to be routine corporate communication about “restructuring materials” but buried instructions directing ChatGPT to extract employee names and addresses from the user’s inbox and transmit them to an external endpoint.

“Within the wider AI security ecosystem it has become common to recommend techniques such as ‘AI firewalling,'” the company wrote. “But these fully developed attacks are not usually caught by such systems.”

The problem? Detecting a malicious prompt has become equivalent to detecting a lie—context-dependent and fundamentally difficult.

The Customer Service Agent Model

OpenAI’s defensive philosophy treats AI agents like human customer support workers operating in adversarial environments. A support rep can issue refunds, but deterministic systems cap how much they can give out and flag suspicious patterns. The same principle now applies to ChatGPT.

The company’s primary countermeasure is called “Safe Url.” When ChatGPT’s safety training fails to catch a manipulation attempt—and the agent gets convinced to transmit sensitive conversation data to a third party—Safe Url detects the attempted exfiltration. Users then see exactly what information would be transmitted and must explicitly confirm, or the action gets blocked entirely.

This mechanism extends across OpenAI’s product suite: Atlas navigations, Deep Research searches, Canvas applications, and the new ChatGPT Apps all run in sandboxed environments that intercept unexpected communications.

Why This Matters Beyond OpenAI

Prompt injection sits at the top of OWASP’s security vulnerability rankings for LLM applications. The threat isn’t theoretical—in December 2024, The Guardian reported ChatGPT’s search tool was vulnerable to indirect injection. By July 2025, researchers used an elaborate crossword puzzle game to trick ChatGPT into leaking protected Windows product keys.

Even Anthropic hasn’t been immune. In January 2026, three prompt injection vulnerabilities were discovered in the company’s official Git MCP server.

OpenAI’s admission that attacks succeed half the time before countermeasures kick in underscores an uncomfortable reality: prompt injection may be a fundamental property of current LLM architectures rather than a bug to be patched. The company’s shift toward containment strategies—limiting blast radius rather than preventing all breaches—suggests they’ve accepted this.

For enterprises deploying AI agents with access to sensitive data, the takeaway is clear. OpenAI recommends asking what controls a human agent would have in similar situations, then implementing those same guardrails for AI. Don’t assume the model will resist manipulation on its own.

Image source: Shutterstock


Credit: Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email

Related Posts

WisdomTree and Glassnode Push Blockchain Analysis Beyond Price Charts

March 17, 2026

AAVE Price Prediction: Targets $137 by Month-End as Bullish Momentum Builds

March 17, 2026

BNB Price Prediction: Targets $693 Resistance Break by End of March 2026

March 17, 2026
Leave A Reply Cancel Reply

What's New Here!

Bitcoin Hitting $76,000—Is This a ‘Dead-Cat-Bounce’ Setup to Drag the BTC Price to $50K?

March 17, 2026

OpenAI Reveals How ChatGPT Now Fights Prompt Injection Attacks

March 17, 2026

Bitcoin Just Flashed The Most Powerful Fractal In The Market, Here’s What To Expect

March 17, 2026

Feather Exchange Introduces a Structured Price Corridor for Digital Asset Trading

March 17, 2026
AsiaTokenFund
Facebook X (Twitter) LinkedIn YouTube
  • Home
  • Crypto News
    • Bitcoin
    • Altcoin
  • Web3
    • Blockchain
  • Trading
  • Regulations
    • Scams
  • Submit Article
  • Contact Us
  • Terms of Use
    • Privacy Policy
    • DMCA
© 2026 asiatokenfund.com - All Rights Reserved!

Type above and press Enter to search. Press Esc to cancel.

Ad Blocker Enabled!
Ad Blocker Enabled!
Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.