The evaluation, performed by LatticeFlow AI, reveals DeepSeek distilled models lag behind proprietary models in cybersecurity and bias, while excelling in toxicity prevention
COMPL-AI, the first evaluation framework for Generative AI models under the EU AI Act, has flagged critical compliance gaps in DeepSeek’s distilled models. While these models excel in toxicity prevention, they fall short in key regulatory areas, including cybersecurity vulnerabilities and bias mitigation challenges, raising concerns about their readiness for production use by enterprises.
“As corporate AI governance requirements tighten, enterprises need to bridge internal AI governance and external compliance with technical evaluations to assess risks and ensure their AI systems can be safely deployed for commercial use”
Developed by ETH Zurich, INSAIT, and LatticeFlow AI, COMPL-AI is the first compliance-centered framework that translates regulatory requirements into actionable technical checks. It provides independent, systematic evaluations of public foundation models from leading AI organizations, including OpenAI, Meta, Google, Anthropic, Mistral AI, and Alibaba, helping companies assess their compliance readiness under the EU AI Act.
AI Authority Trend: New Relic Announces Observability Integration with DeepSeek to Boost AI Adoption and ROI
Key Insights from DeepSeek´s Compliance Evaluation
Leveraging COMPL-AI, LatticeFlow AI assessed the EU AI Act compliance readiness of two DeepSeek distilled models:
– DeepSeek R1 8B (based on Meta’s Llama 3.1 8B)
– DeepSeek R1 14B (built on Alibaba’s Qwen 2.5 14B)
The evaluation benchmarked those DeepSeek models against the EU AI Act’s regulatory principles, comparing their performance not only to their base models but also to models from OpenAI, Google, Anthropic, and Mistral AI, all featured on the COMPL-AI leaderboard.
Key findings are:
- Cybersecurity Gaps: The evaluated DeepSeek models rank lowest in the leaderboard for cybersecurity and show increased risks in goal hijacking and prompt leakage protection in comparison to their base models.
- Increased Bias: DeepSeek models rank below average in the leaderboard for bias and show significantly higher bias than their base models.
- Good Toxicity Control: The evaluated DeepSeek models perform well in toxicity mitigation, outperforming their base models.
AI Authority Trend: kluster.ai First Developer Platform to Host DeepSeek-R1
“As corporate AI governance requirements tighten, enterprises need to bridge internal AI governance and external compliance with technical evaluations to assess risks and ensure their AI systems can be safely deployed for commercial use,” said Dr. Petar Tsankov, CEO and Co-founder of LatticeFlow AI. “Our evaluation of DeepSeek models underscores a growing challenge: while progress has been made in improving capabilities and reducing inference costs, one cannot ignore critical gaps in key areas that directly impact business risks – cybersecurity, bias, and censorship. With COMPL-AI, we commit to serving society and businesses with a comprehensive, technical, transparent approach to assessing and mitigating AI risks.”
AI Authority Trend: Cerebras Launches Fastest DeepSeek R1 Distill Llama 70B Inference
Source – businesswire
To share your insights, please write to us at news@intentamplify.com