global AI standards - Quotients : Empowering Enterprise Innovation

Report: From Accuracy to Accountability- What Should We Really Measure in AI Systems

The rapid acceleration of artificial intelligence adoption has brought with it a fundamental shift in how we evaluate technological success. Traditionally, AI systems have been judged primarily on performance metrics such as accuracy, precision, and speed. However, as these systems move from controlled environments into real-world applications—impacting healthcare, governance, finance, and everyday decision-making—the limitations of these metrics have become increasingly evident.

The Data Trust Quotients (DTQ) recently convened a thought‑provoking discussion titled “From Accuracy to Accountability: What Should We Really Measure in AI Systems?” The dialogue tackled a critical shift in how we evaluate AI: is accuracy alone sufficient, or should accountability, trust, and human impact take precedence. The virtual session explored the growing realization that high-performing models can still fail in practice if they lack proper governance, transparency, and ethical grounding. As organizations race toward rapid deployment, the need to redefine evaluation frameworks for AI systems has never been more urgent.

Speakers

Naman Kothari – NASSCOM COE (Moderator)
Anniliza Crasta – Director, Information Security, Juniper Networks
Sneha Pillai – Data Protection Lawyer, Bosch Middle East
Abhishek Tripathi – Head of Cybersecurity & IT Operations
Himanshu Parmar – Senior Manager, AI, NASSCOM COE

Key Insights from the Discussion

1. The AI Adoption Paradox

The session opened by highlighting a striking imbalance in the current AI ecosystem. On one hand, there is unprecedented enthusiasm and investment, with billions of dollars flowing into AI development and a majority of enterprises actively integrating generative AI into their operations. On the other hand, there is a significant lack of preparedness when it comes to managing the risks associated with these systems. Organizations are under immense pressure to deploy AI quickly in order to remain competitive, yet only a small fraction feel confident in their ability to implement proper safeguards. This creates a paradox where speed is prioritized over safety, leading to fragile systems that may not withstand real-world complexities.

2. Accuracy as a Misleading Benchmark

A key theme throughout the discussion was the idea that accuracy, while important, can often be a misleading indicator of success. Examples were shared where models achieved near-perfect accuracy in testing environments but failed dramatically once deployed. These failures were not due to flaws in the mathematical models themselves but rather due to unaddressed external factors such as biased data, changing environments, and lack of human oversight. This highlights a critical gap between theoretical performance and practical reliability. In real-world scenarios, systems must operate under uncertainty, adapt to new conditions, and interact with human users—factors that accuracy metrics alone cannot capture.

3. The Shift from Accuracy to Trust

As AI systems take on more complex and sensitive roles, there is a growing recognition that trust is becoming the ultimate measure of success. Trust encompasses multiple dimensions, including fairness, transparency, reliability, and security. Organizations are beginning to move away from purely technical metrics toward a more holistic evaluation framework that considers how systems behave over time and how they are perceived by users. This shift reflects a broader understanding that AI systems must not only perform well but also inspire confidence among stakeholders.

4. Hidden Risks Across the AI Lifecycle

One of the most significant insights from the discussion was the identification of risks that are often overlooked during the development and deployment of AI systems. These risks are not confined to a single stage but span the entire lifecycle:

Data-related risks: Biases embedded in datasets, errors in labeling, and poor data quality can significantly impact outcomes.
Design assumptions: Many systems are built on implicit assumptions that are neither documented nor tested, leading to unexpected behavior.
Context drift: The environment in which a model operates can change over time, reducing its effectiveness.
Post-deployment gaps: Once a system is deployed, accountability often becomes unclear, and continuous monitoring is neglected.

These blind spots can lead to failures even when initial performance metrics appear satisfactory.

5. The Complexity of Global Regulations

The discussion also highlighted the challenges posed by the lack of a unified global standard for AI governance and data privacy. Different regions have developed their own regulatory frameworks, each with unique requirements and expectations. This creates a complex landscape for organizations operating across multiple jurisdictions. Systems that are compliant in one region may not meet the standards of another, requiring constant adaptation. The evolving nature of these regulations further complicates the situation, making compliance an ongoing process rather than a one-time achievement.

6. Security as an Integral Design Element

Another important takeaway was the need to rethink how security is approached in AI systems. Instead of treating security as a final checkpoint before deployment, it must be integrated into every stage of development. This involves designing systems with security considerations from the outset, ensuring that vulnerabilities are addressed early rather than patched later. Such an approach not only reduces risks but also aligns with the fast-paced nature of AI development, where late-stage changes can be costly and disruptive.

7. Real-World Deployment Challenges

When AI systems are deployed in real-world environments, a range of operational challenges emerges. These include over-permissioned systems that have access to more data than necessary, lack of domain-specific constraints, and insufficient control mechanisms. In some cases, AI agents may perform tasks beyond their intended scope, leading to unintended consequences. These issues underscore the importance of clearly defining the boundaries within which AI systems operate and ensuring that they are aligned with their intended purpose.

8. The Emergence of Shadow AI

The increasing accessibility of AI tools has led to the rise of “shadow AI,” where individuals within organizations use AI systems independently without proper oversight. While often driven by a desire to innovate or improve efficiency, this practice introduces significant risks. Sensitive data may be exposed, and untested systems may be integrated into workflows without adequate safeguards. Addressing this challenge requires both technical solutions and a cultural shift toward responsible AI usage.

9. The Challenge of AI Hallucinations

AI hallucinations—instances where systems generate incorrect or fabricated information—remain a persistent issue. Despite advancements in model design, these errors cannot be entirely eliminated. Instead, organizations must focus on mitigating their impact through validation mechanisms and oversight processes. This reinforces the need for layered accountability, where multiple checks are in place to ensure reliability.

10. Data as Both an Asset and a Challenge

While data is often described as the fuel of AI, the discussion revealed that managing data effectively is one of the most challenging aspects of AI development. Collecting high-quality data requires significant effort and resources, and legal restrictions can complicate cross-border data transfers. Even after data is collected and processed, it may not always meet the requirements for training effective models. This highlights the need for careful planning and validation at every stage of the data lifecycle.

11. The Importance of a Structured Data Strategy

A recurring theme was the lack of a comprehensive data strategy in many organizations. Without a clear framework for managing data, organizations risk inefficiencies and vulnerabilities. A robust data strategy should include classification, access control, and lifecycle management, ensuring that data is treated as a critical asset. Such an approach not only enhances security but also supports the development of more reliable AI systems.

12. Governance as the Backbone of AI System

Governance plays a crucial role in ensuring that AI systems operate within defined boundaries. It involves establishing policies, setting standards, and monitoring compliance throughout the lifecycle. Unlike operational management, governance focuses on creating the structures that guide decision-making. Effective governance ensures consistency, reduces risks, and supports the responsible use of AI.

13. Measuring Human Impact

One of the most important yet often overlooked aspects of AI evaluation is its impact on users. AI systems can influence behavior, decision-making, and societal outcomes in ways that are not immediately apparent. Evaluating these effects requires a long-term perspective and continuous monitoring. By considering human impact, organizations can ensure that their systems contribute positively to society.

14. Building Trust Through Design

Moving from compliance to trust requires a proactive approach to system design. Features such as transparency, user control, and data minimization can enhance trust and improve user experience. Trust is not built through policies alone but through consistent and predictable system behavior. By prioritizing user-centric design, organizations can create systems that are both effective and trustworthy.

15. The Need for Interdisciplinary Collaboration

The discussion emphasized the importance of collaboration between technical, legal, and business teams. As AI systems become more complex, no single discipline can address all the challenges involved. Bridging the gap between these domains is essential for developing systems that are both innovative and responsible.

Conclusion

The session underscores a critical shift in how AI systems should be evaluated. While accuracy remains an important metric, it is no longer sufficient on its own. The future of AI lies in building systems that are accountable, transparent, and aligned with human values. This requires a comprehensive approach that considers the entire lifecycle of AI systems, from data collection and model design to deployment and long-term impact. By expanding the scope of measurement to include trust, governance, and human impact, organizations can move toward a more responsible and sustainable AI ecosystem.

Ethics by Design: Global Leaders Convene to Address AI’s Moral Imperative

In a world where ChatGPT gained 100 million users in two months—a accomplishment that took the telephone 75 years—the importance of ethical technology has never been more pressing. Open Innovator on November 14th hosted a global panel on “Ethical AI: Ethics by Design,” bringing together experts from four continents for a 60-minute virtual conversation moderated by Naman Kothari of Nasscom. The panelists were Ahmed Al Tuqair from Riyadh, Mehdi Khammassi from Doha, Bilal Riyad from Qatar, Jakob Bares from WHO in Prague, and Apurv from the Bay Area. They discussed how ethics must grow with rapidly advancing AI systems and why shared accountability is now required for meaningful, safe technological advancement.

Ethics: Collective Responsibility in the AI Ecosystem

The discussion quickly established that ethics cannot be attributed to a single group; instead, founders, investors, designers, and policymakers build a collective accountability architecture. Ahmed stressed that ethics by design must start with ideation, not as a late-stage audit. Raya Innovations examines early enterprises based on both market fit and social effect, asking direct questions about bias, damage, and unintended consequences before any code is created. Mehdi developed this into three pillars: human-centricity, openness, and responsibility, stating that technology should remain a benefit for humans rather than a danger. Jakob added the algorithmic layer, which states that values must be testable requirements and architectural patterns. With the WHO implementing multiple AI technologies, identifying the human role in increasingly automated operations has become critical.

Structured Speed: Innovating Responsibly While Maintaining Momentum

Maintaining both speed and responsibility became a common topic. Ahmed proposed “structured speed,” in which quick, repeatable ethical assessments are integrated directly into agile development. These are not bureaucratic restrictions, but rather concise, practical prompts: what is the worst-case situation for misuse? Who might be excluded by the default options? Do partners adhere to key principles? The goal is to incorporate clear, non-negotiable principles into daily workflows rather than forming large committees. As a result, Ahmed claimed, ethics becomes a competitive advantage, allowing businesses to move rapidly and with purpose. Without such guidance, rapid innovation risks becoming disruptive noise. This narrative resonated with the panelists, emphasizing that prudent development can accelerate, rather than delay, long-term growth.

Cultural Contexts and Divergent Ethical Priorities

Mehdi demonstrated how ethics differs between cultural and economic environments. Individual privacy is a priority in Western Europe and North America, as evidenced by comprehensive consent procedures and rigorous regulatory frameworks. In contrast, many African and Asian regions prioritize collective stability and accessibility while functioning under less stringent regulatory control. Emerging markets frequently focus ethical discussions on inclusion and opportunity, whereas industrialized economies prioritize risk minimization. Despite these inequalities, Mehdi pushed for universal ethical principles, claiming that all people, regardless of place, need equal protection. He admitted, however, that inconsistent regulations result in dramatically different reality. This cultural lens highlighted that while ethics is internationally relevant, its local expression—and the issues connected with it—remain intensely context-dependent.

Enterprise Lessons: The High Costs of Ethical Oversights

Bilal highlighted stark lessons from enterprise organizations, where ethical failings have multimillion-dollar consequences. At Microsoft, retrofitting ethics into existing products resulted in enormous disruptions that could have been prevented with early design assessments. He outlined enterprise “tenant frameworks,” in which each feature is subject to sign-offs across privacy, security, accessibility, localization, and geopolitical domains—often with 12 or more reviews. When crises arise, these systems maintain customer trust while also providing legal defenses. Bilal used Google Glass as a cautionary tale: billions were lost because privacy and consent concerns were disregarded. He also mentioned Workday’s legal challenges over alleged employment bias. While established organizations can weather such storms, startups rarely can, making early ethical guardrails a requirement of survival rather than preference.

Public Health AI Designing for Integrity and Human Autonomy

Jakob provided a public-health viewpoint, highlighting how AI design decisions might harm millions. Following significant budget constraints, WHO’s most recent AI systems are aimed at enhancing internal procedures such as reporting and finance. In one donor-reporting tool, the team focused “epistemic integrity,” which ensures outputs are factual while protecting employee autonomy. Jakob warned against Goodhart’s Law, which involves overoptimizing a particular statistic at the detriment of overall value. They put in place protections to prevent surveillance overreach, automation bias, power inequalities, and data exploitation. Maintaining checks and balances across measures guarantees that efficiency gains do not compromise quality or hurt employees. His findings revealed that ethical deployment necessitates continual monitoring rather than one-time judgments, especially when AI replaces duties previously conducted by specialists.

Aurva’s Approach: Security and Observability in the Agentic AI Era

The panel then moved on to practical solutions, with Apurv introducing Aurva, an AI-powered data security copilot inspired by Meta’s post-Cambridge Analytica revisions. Aurva enables enterprises to identify where data is stored, who has access to it, and how it is used—which is crucial in contexts where information is scattered across multiple systems and providers. Its technologies detect misuse, restrict privilege creep, and give users visibility into AI agents, models, and permissions. Apurv contrasted between generative AI, which behaves like a maturing junior engineer, and agentic AI, which operates independently like a senior engineer making multi-step judgments. This autonomy necessitates supervision. Aurva serves 25 customers across different continents, with a strong focus on banking and healthcare, where AI-driven risks and regulatory needs are highest.

Actionable Next Steps and the Imperative for Ethical Mindsets

In conclusion, panelists provided concrete advice: begin with human-impact visibility, undertake early bias and harm evaluations, construct feedback loops, teach teams to acquire a shared ethical understanding, and implement observability tools for AI. Jakob underlined the importance of monitoring, while others stressed that ethics must be integrated into everyday decisions rather than marketing clichés. The virtual event ended with a unifying message: ethical AI is no longer optional. As agentic AI becomes more independent, early, preemptive frameworks protect both consumers and companies’ long-term viability.

Reach out to us at open-innovator@quotients.com or drop us a line to delve into the transformative potential of groundbreaking technologies and participate in our events. We’d love to explore the possibilities with you.

Blog