This blog is jointly written by Arjun Sambamoorthy, Amy Chang, and Nicholas Conley.
Today, Cisco launched the LLM Security Leaderboard, a comprehensive resource for evaluating model security risk and susceptibility to adversarial attacks. By providing transparent, adversarial evaluation signals, this leaderboard contextualizes model performance metrics against evaluations of how models handle malicious prompts, jailbreak attempts, and other manipulation strategies. The tool empowers organizations with a clear, objective understanding of model security risk by mapping threats to our AI Security and Safety Framework taxonomy, and informs defense-in-depth approaches to AI deployments. As new models emerge and attack strategies evolve, we will continue expanding our evaluation coverage, refining our methodology and adding models as they are released. Your feedback and engagement to improve this tool are welcome and encouraged.
The Cisco LLM Security Leaderboard provides:
- Objective security rankings based on rigorous testing across single-turn and multi-turn attack scenarios
- Detailed threat mappings aligned to the Cisco AI Security Framework
- Transparent methodology so organizations can understand exactly what is being measured
Why Security Performance Matters
The rapid adoption of large language models (LLMs) has created an urgent need for standardized security evaluation against real-world attacks, a lagging consideration compared to benchmarking capabilities in engineering, math, and science. Organizations that have deployed or are considering deployment of AI assistants, chatbots, and other AI-powered applications need clear, actionable data about how these models handle adversarial manipulation strategies to understand how to harden their assets.
Not all LLMs are created equal when it comes to security. The consequences of deploying suboptimal models for your use case can range from harmful content generation to data leakage and brand damage. If these models are connected to agents, the damage risk increases exponentially, while reversibility of negative outcomes becomes ever smaller.
What Makes Our Approach Different
Comprehensive Attack Coverage
Our evaluation goes beyond simple prompt injection tests. We assess models against both single- and multi-turn attacks that attempt to elicit harmful or malicious responses. Each model receives a combined security score weighted equally between single-turn resistance (50%) and multi-turn defense capabilities (50%), providing a holistic view of security posture.
Fair, Unbiased Testing
All testing has been conducted on base models without any additional guardrails or safety layers. While production deployments often include guardrails, content filters, and additional safety mechanisms, our evaluation focuses on the inherent security capabilities built into the models themselves. This approach provides a fair baseline assessment across diverse model providers or versions and helps organizations understand the foundational security posture before layering on additional protections.
The Cisco AI Security Framework
We have mapped all attack data to our AI Security Framework taxonomy, which facilitates identification of model susceptibility to a specific type of attack, and how and where those weaknesses exist. We break this down hierarchically along three dimensions:
- Objectives - High-level security goals and attack categories
- Techniques - Specific methods attackers use to compromise models
- Subtechniques - Granular attack variations and implementation details
Transparency
Unlike proprietary evaluations, the Cisco LLM Security Leaderboard is publicly accessible and facilitates comparison of models side-by-side before deployment decisions; filter and search for specific models of interest; drill down into performance across procedures, content types, and attack strategies; and understand resistance rates at every level of our taxonomy.
Navigating the Leaderboard
The platform consists of three main components: LLM Security Rankings, Cisco AI Security and Safety Framework, and Methodology.
Rankings Page
On this page, visitors can view comprehensive model security rankings with quick access to the top and bottom performers against our attack dataset. Each model entry expands to reveal granular performance metrics across multiple attack dimensions.
Figure 1. The main rankings view shows combined security scores, with quick filters for top performers, bottom performers, and all models. Search functionality allows rapid model lookup.
Detailed Model Metrics
This detailed view enables security teams to identify specific threat patterns and make informed risk assessments for their particular use cases. Click on any model to expand comprehensive performance data and investigate:
- Overall resistance and success rates for both single-turn and multi-turn attacks
- Best and worst performing procedures
- Strongest and weakest content type defenses
- Subtechnique threat patterns
- Multi-turn strategy effectiveness
Figure 2. Expanded model view reveals granular breakdowns of performance across attack procedures, content types, sub-techniques, and multi-turn strategies. Each metric shows both resistance rate and attack success rate for complete transparency.
Cisco AI Security & Safety Framework Page
Explore an interactive hierarchy that maps model performance against our security framework and derives insights into certain attack techniques that pose challenges across nearly all models, or model-specific weaknesses. Visitors can also filter by model to view specific model performance across the framework and understand average resistance rates and overall attack coverage. This granular insight enables targeted risk mitigation strategies.
Figure 3. The interactive taxonomy tree maps all attack data to the Cisco AI Security Framework. Each node shows resistance rates, total prompts tested, and refused/successful counts. Filter by model to see security performance across the hierarchy.
Methodology Page
Transparency is critical to trust. Our methodology page details:
- How combined scores are calculated
- Data sources and evaluation criteria
- Score interpretation ranges (Excellent: 85-100%, Good: 70-84%, Fair: 50-69%, Poor: 0-49%)
- A glossary of terms
- Quality assurance procedures
All models evaluated in this leaderboard were tested in their base configurations with no additional guardrails applied. However, certain cloud service providers may enforce built-in content filtering or safety layers that cannot be disabled. As a result, observed model behavior may reflect a combination of the model's inherent resilience and any provider-level protections in place at the time of testing.
What the Data Reveals
Initial rankings reveal significant variance in LLM security capabilities. Some models demonstrate excellent resistance rates above 85%, effectively defending against both direct and conversational attacks. Others show notable threat patterns, particularly around multi-turn manipulation strategies that build rapport before introducing malicious requests.
Because testing occurs on base models without guardrails, organizations can assess security capabilities across a consistent baseline. Production deployments should layer additional protections based on these insights and specific use case requirements.
To see our approach in action, visit the Cisco LLM Security Leaderboard today.
Disclaimer: The scores and rankings presented are intended solely to reflect how models performed against the described benchmark methodology and do not constitute an endorsement or guarantee of performance. Users are solely responsible for conducting their own independent assessment to determine the adequacy of any model for their specific AI governance and security requirements. The Cisco LLM Security Leaderboard is provided "as-is" without warranties of any kind. Cisco does not guarantee that any evaluated model is safe, secure, or fit for your specific use case.
