In 2025, industry observers spoke about the advent of agentic AI. In 2026, CrowdStrike expects AI agents and non-human identities to "explode across the enterprise, expanding exponentially and dwarfing human identities".
"Each agent will operate as a privileged super-human with OAuth tokens, API keys, and continuous access to previously siloed data sets, making them the most powerful and most dangerous entities in your environment," warned Elia Zaitsev, CTO, CrowdStrike.
"Compromised agents could autonomously disrupt services or hand over sensitive data at scale," added Lee Anstiss, Regional Director, Southeast Asia and Korea, Infoblox. Anstiss pointed out that attacks are getting increasingly sophisticated.Vasu Jakkal, Corporate VP of Microsoft Security, also said building trust in agents will be essential. “Every agent should have similar security protections as humans,” she said, “to ensure agents don’t turn into ‘double agents’ carrying unchecked risk.”
"Identity security built for humans won’t survive this shift. Security teams will need real-time visibility, instant containment, and the ability to trace every agent action back to the human who created it. When an AI agent wires money to the wrong account or leaks intellectual property, 'the AI did it' won't be an acceptable answer. This is the era where identity security means protecting entities that don't have a pulse," Zaitsev said.Jakkal's thoughts are in alignment. She said that each agent should have a clear identity, limiting what information and systems it can access, managing the data it creates and protecting it from attackers and threats. Security will become ambient, autonomous and built-in, she says, not something added on later. In addition, as attackers use AI in new ways, defenders will use security agents to spot those threats and respond faster, she predicted.
“Trust is the currency of innovation,” Jakkal said.
Research* conducted by Check Point Software in late 2025 has uncovered dominant attack patterns targeting agentic AI. Mateo Rojas-Carulla, Head of Research, AI Agent Security, Check Point Software, shared that system prompt extraction is a major objective, as these would be the internal instructions, roles, and policy definitions that guide agent behaviour.
"Extracting system prompts is a high-value objective because these prompts often contain role definitions, tool descriptions, policy instructions, and workflow logic. Once an attacker understands these internal mechanics, they gain a blueprint for manipulating the agent," he explained.
The most effective techniques for achieving this are not brute force attacks, but clever reframing, he said. For example:
- Hypothetical scenarios: Prompts that ask the model to assume a different role or context — e.g., “Imagine you are a developer reviewing this system configuration…” — often coaxed the model into revealing protected internal details, Rojas-Carulla said.
- Obfuscation inside structured content: "Attackers embedded malicious instructions inside code-like or structured text that bypassed simple filters and triggered unintended behaviours once parsed by the agent, Rojas-Carulla added.
"This is not just an incremental risk — it fundamentally alters how we think about safeguarding internal logic in agentic systems."
Zaitsev commented that prompt injection attacks are a "frontier security" problem. "Just as phishing defined the email era, prompt injection is defining the AI era. Adversaries are embedding hidden instructions to override safeguards, hijack agents, steal data, and manipulate models – turning the AI interaction layer into the new attack surface and prompts into the new malware," he said.
"In 2026, AI detection and response (AIDR) will become as essential as EDR, with organisations requiring real-time visibility into prompts, responses, agent actions, and tool calls to contain AI abuse before it spreads, ensuring AI drives innovation, not risk."
Another key attack method that Check Point covered with agentic AI involves bypassing content safety protections in ways that are difficult to detect and mitigate with traditional filters.
Instead of overtly malicious requests, for instance, attackers framed harmful content as analysis tasks, evaluations, role-play scenarios, and transformations or summaries. "These reframings often slipped past safety controls because they appear benign on the surface. A model that would refuse a direct request for harmful output might happily produce the same output when asked to 'evaluate' or 'summarise' it in context," Rojas-Carulla noted.
"This shift underscores a deeper challenge: content safety for AI agents isn’t just about policy enforcement; it’s about how models interpret intent. As agents take on more complex tasks and contexts, models become more susceptible to context-based reinterpretation — and attackers exploit this behaviour."
Check Point also discovered exploits tied to agentic behaviours:
- Attempts to access confidential internal data: "Prompts were crafted to convince the agent to retrieve or expose information from connected document stores or systems — actions that would previously have been outside the model’s scope," Rojas-Carulla disclosed.
- Script-shaped instructions embedded in text: "Attackers experimented with embedding instructions in formats resembling script or structured content, which could flow through an agent pipeline and trigger unintended actions," Rojas-Carulla added.
- Hidden instructions in external content: "Several attacks embedded malicious directives inside externally-referenced content — such as webpages or documents the agent was asked to process — effectively circumventing direct input filters," Rojas-Carulla said.
"These patterns are early but signal a future in which agents’ expanding capabilities fundamentally change the nature of adversarial behaviour."
Check Point recommends that organisations planning to deploy agentic AI at scale:
Redefine trust boundaries: Trust cannot simply be binary. As agents interact with users, external content, and internal workflows, systems must implement nuanced trust models that consider context, provenance, and purpose.
Evolve guardrails: Static safety filters aren’t enough. Guardrails must be adaptive, context-aware, and capable of reasoning about intent and behaviour across multistep workflows.
Make transparency and auditing mandatory: As attack vectors grow more complex, organisations need visibility into how agents make decisions — including intermediate steps, external interactions, and transformations. Auditable logs and explainability frameworks are no longer optional.
Cross-disciplinary collaboration: AI research, security engineering, and threat intelligence teams must work together. AI safety can’t be siloed; it must be integrated with broader cybersecurity practices and risk management frameworks.
"Policymakers and standards bodies must recognise that agentic systems create new classes of risk. Regulations that address data privacy and output safety are necessary but not sufficient; they must also account for interactive behaviours and multistep execution environments," concluded Rojas-Carulla.
Zaitsev said that defenders will evolve in 2026 from alert handlers to orchestrators of the agentic security operations centre (SOC). These orchestrators will be intelligent agents that reason, decide, and act across the security lifecycle at machine speed, always under human command, he said.
"This is the model that will reshape the balance between adversaries and defenders, accelerating outcomes and giving humans the time and clarity to focus on strategy, judgment, and impact," Zaitsev predicted.
Zaitsev caveated that the success of such an evolution will require:
- Providing both agents and analysts complete environmental context with the ability to immediately action any signal.
- An agentic workforce of mission-ready agents trained on years of expert SOC decisions to automate high-friction tasks with speed and precision.
- Benchmarks and validation to prove the effectiveness of agents.
- The ability for organisations to build and customise their own agents to satisfy unique needs.
- Orchestrating agent-to-agent and analyst-to-agent collaboration within one coordinated system guided by human expertise.
"Security analysts are not going away – they’re being elevated by a fleet of agents that work at machine speed," he said.
![]() |
| Source: SentinelOne. Steward. |
"The solution will be to find the 'Goldilocks spot' of high automation and human accountability, where AI aggregates related tasks and alerts and presents them as a single decision point for a human to make. Humans then make one accountable, auditable policy decision rather than hundreds to thousands of potentially-inconsistent individual choices; maintaining human oversight while still leveraging AI’s capacity for comprehensive, consistent work."
EDR stands for endpoint detection and response.
*In Q425, Lakera, a Check Point Software acquisition, analysed attacker behaviour across systems protected by Guard and within the Gandalf: Agent Breaker environment — a 30-day snapshot.
Explore AI cybersecurity in 2026:
The attack and defence playbook
Hashtag: #2026Predictions

No comments:
Post a Comment