Governance perspective: Managing an AI-driven organization - AWS Cloud Adoption Framework for Artificial Intelligence, Machine Learning, and Generative AI

Governance perspective: Managing an AI-driven organization

Managing, optimizing, and scaling the organizational AI initiative is at the core of the governance perspective. Incorporating AI governance into an organization’s AI strategy is instrumental in building trust, enabling the deployment of AI technologies at scale, and overcoming challenges to drive business transformation and growth. By driving consistency, AI governance enables alignment with organizational goals, and ensures that AI technologies are ethically used and effectively managed. To that end, AI governance frameworks create consistent practices in the organization to address organizational risks, ethical deployment, data quality and usage, and even regulatory compliance, as well as managing the different cost patterns of AI workloads. This creation of scalable processes and standards for AI deployments allow organizations to expand initiatives across business units to create long term business value.

Building an AI governance practice requires close alignment with the organization's AI strategy. The first step is to identify all the key stakeholders and bring together a team with representation from multiple business units. This team will be responsible for:

  • Defining governance goals, including compliance and ethical goals as well as identifying areas of potential risks.

  • Developing policies and guidelines to include data, transparency, responsible AI and compliance.

  • Defining mechanisms to monitor AI systems, performance, compliance, bias and determine actions based on predefined thresholds.

  • Continuously revise results and existing policies to ensure alignment with business goals, and AI safety.

In this perspective, we describe some solutions to governance challenges and introduce a new capability: The Responsible use of AI, a decisive element for future competitive advantage in the AI space.

Foundational Capability Explanation
Cloud Financial Management (CFM) Plan, measure, and optimize the cost of AI in the cloud.
Data Curation Create value from data catalogs and products.
Risk Management Leverage the cloud to mitigate and manage the risks inherent to AI.
Responsible use of AI Foster continual AI innovation through responsible use.
Program and Project Management This capability is not enriched for AI, refer to the AWS CAF.
Data Governance This capability is not enriched for AI, refer to the AWS CAF.
Benefits Management This capability is not enriched for AI, refer to the AWS CAF.
Application Portfolio Management This capability is not enriched for AI, refer to theAWS CAF.

Cloud Financial Management (CFM)

Plan, measure, and optimize the cost of AI in the cloud.

Managing AI projects in the cloud involves planning for the cost structure of training and inference. This is important to consider in advance when budgeting for individual projects as well as for the overall funding of AI initiatives. An example of such a cost structure over the AI lifecycle, are zig-zag costs or phases of low/high/low/high costs:

  • You might start off with a high initial cost to establish or increase the quality of the data that is needed to build your solution. However, if the data is ready, this initial cost may be very low. This is followed by a potentially volatile proof-of-concept phase.

  • While most AI proof-of-concept (POC) initiatives may be relatively low-cost compute-wise, there are a few technical approaches that can quickly become costly, such as the training of larger models (in the context of generative AI) or constant retraining for domain-specific ML models. In such cases, you can leverage purpose-built AI hardware likeAmazon Elastic Compute Cloud (Amazon EC2) Trn1 instances powered by AWS Trainium or Amazon EC2 Inf2 instances powered by AWS Inferentia2 to help keep costs low. If you have access to the right talent, AI services, and AWS Partners, leverage their expertise to estimate the resources needed for different phases of your use cases and overall AI strategy. If feasible, work on calculating what an incremental improvement of an ML metric is worth to decide how to optimize your investment.

  • After the first iteration of the system is built, the next phase of building a minimum viable product (MVP) may have a relatively high cost; for example, to generalize the system’s capability or acquire edge-case and long-tail data that is crucial for user adoption. If you are working on a use case that requires generative AI capabilities, you can evaluate using or fine-tuning foundation models, since that can have significant positive cost impact, as the initial training costs have been absorbed by your supplier or vendor (for example, Amazon BedrockTitan Foundation Model).

  • After AI models are deployed, inference itself is largely dependent on the volume of requests, and in many cases the inference cost itself is relatively low. If not, you can leverage the purpose-built AWS Inferentia architecture. At this stage, monitoring model metrics and flagging drift alerts you to changes and the potential need to retrain your algorithms. You can leverage the low costs of scaling in the cloud. Throughout the AI lifecycle, it is important to track costs and tag all resources and ML workloads.

Once you have cost-visibility measures in place, it is critical to analyze the data , training , and inference costs over time . There is a large quantity of problem types (text, forecasting, document processing) , which in their infancy do not cost much , but their costs grow linearly with data size. There are other AI problems that rely on audio and voice data that have a much higher start-up cost and need well-defined goals even in the POC phase to not cause unexpected charges. Aligning your AI vision with the business goals should inform how you scope the work , and establishing mechanisms to calculate the tradeoffs between model costs and model performance is critical for maintaining positive ROI. Additionally, the cost of data acquisition is strongly influenced by the mechanisms that organizations establish around their data process. A standard process around acquiring new data, and master data, is key to keeping costs down, just as much as keeping data in formats where it can be used for AI (with reduced copy/read/copy or ETL needs). The cloud helps with all of these challenges through governed data-services and zero-ETL patterns .

Beyond this, always connect your AI initiative to an underlying business goal. If it relates to a new revenue stream, assume how much revenue will likely be associated to what success criteria and translate business value into your AI metrics. Factor in the often-underestimated cost of not recognizing the need for the responsible use of AI. Due to its importance, we have added the Responsible use of AI a s a new capability later in this perspective.

Data curation

Create value from data catalogs and products.

Your ability to acquire, label, clean, process, and interact with data will increase your speed, decrease time-to-value, and boost your model’s performance (such as accuracy). When models stall for accuracy, consider going back and enriching, growing, or improving the data you are feeding the algorithm. Doing so is often much easier than rearchitecting or squeezing out that next percent of performance with modeling alone.

Collecting data with ML in mind is crucial to achieving your AI roadmap and you should ask yourself and other leaders: “Are we enabling AI innovation through democratizing data?”, “Does my organization think of my data as a product?” and, “Is my data discoverable across my organization?” While answers to these questions often sit on a spectrum between yes and no, the key thing to remember is that it’s all about reinforcing a culture where data is recognized as the genesis of modern invention. Treating data as code and making it a first-class citizen in your business should not be an afterthought.

Data quality assessments and rules around the governance can either accelerate the use of your data or stop all progress. Balance these two and use proper tooling to allow your whole organization to innovate. Have direct owners of datasets or data stewards, which in turn helps you build a robust data ecosystem. Start small and then continually add to your data mesh, as this keeps the data flywheel spinning. Have your data accessible and discoverable by different means for different user types. This approach allows you to have greater visibility into work happening in your environment and avoid shadow DataOps.

Easy to use human readable data repositories, catalogs and dictionaries, can provide a centralized and organized repository of data and metadata about the organization’s data assets, which empowers teams of all skill levels to discover, understand, collaborate on data, and start using your data to create business value. This increases the speed to decide upon the additional investment cost needed for other use cases considerably. There are many ways to go about increasing your data’s potential, such as buying external data sources, augmenting or creating synthetic data through ML algorithms, crowdsourcing a team to label your internal data, or even changing your business practices to automate data generation and capture. It is strategic to develop practices to decide when to use each of these resources.

Risk management

Use the cloud to mitigate and manage the risks inherent to AI.

While every new technology comes with a new set of risks, managing the risks involved both in the design and development process of AI systems as well as in the deployment and long-term operations and application of AI is challenging due to the non-deterministic nature of AI models. Some risks are financial. Start by factoring in the risk of sunken cost into the development process as the outcome of an AI development initiative is hard to guarantee upfront (the nature of optimizing a system for output compared with specifically building it to do so). Establish solid practices, such as model cards and adversarial inputs, and mechanisms such as POCs, minimum loveable products (MLPs), and MVPs, to mitigate and control risks.

Other risks are of legal and ethical nature. This includes risks as classified by your local legislature, for example, the European Union and those that are inherent to AI, such as a hidden feedback loop or misinterpretation of uncalibrated outputs, and unexpected outcomes that may impact different groups of people negatively. Also consider its professional, organizational, and even societal use and impact (such as echo chambers or long-term impact on customer behavior). For more information, see Responsible use of AI .

Developing and adopting safeguards and architectures that constrain the system when necessary, not just in safety-critical environments is a priority. Make sure that subsystem failures don’t propagate and compound downstream AI systems. Consider which themes are relevant, such as explainability, transparency, and interpretability. Manage these risks, not just for a single AI-influenced decision or action, but across the process or larger system you act in. Capture the long-term challenges that drift of data and concepts in the world can have on your system and invest into hardening them against bad actors (see Security perspective: Compliance and assurance of AI/ML systems). Lastly, don’t minimize the complexity of reaching human-level parity in certain domains.

Responsible use of AI

Foster continuous AI innovation through responsible AI practices.

Until recently, the responsible use of this powerful new technology was often an afterthought as organizations focused exclusively on the technical aspects of developing AI solutions, and the specific business goal desired. However, the recognition that AI systems learn from vast amounts of data, and what the system learns is not always what you might have intended, has made it critical to focus on Responsible AI practices. Responsible AI practices are key for fostering continuous AI innovation and to ensure that AI solutions are developed, deployed, and used ethically, transparently, and without bias. The broader the use and impact of your application, the more important it becomes. Therefore, consider and address the responsible use of AI (RAI) early on in your AI journey and throughout its lifecycle.

Establish an AI governance board with representation of multiple business units (like research, human resources, diversity and inclusion, legal, government and regulatory affairs, procurement, and communications) to work closely or as part of AI leadership teams to ensure AI solutions are safe and cause no harm to employees, customer, and society at large. This board should be responsible for overseeing and guiding the ethical and responsible development , deployment, and use of AI technologies, and for driving alignment with industry regulations and compliance with AI-focused legislation. Scale how Responsible AI impacts your design, development and operations over time. Consider how your system affects individuals, subgroups of the population, your users, customers, as well as society. Given the speed at which AI can be scaled in the cloud, you need to consider how key responsible AI dimensions like explainability, fairness, governance, privacy, security, robustness, and transparency are being included, as well as how different cultures and demographics are impacted by the technology. Make it a key part of your AI vision, including well thought-out principles and tenets around the responsible use of AI and how it affects your initiatives. In particular, include algorithmic fairness, diverse and inclusive representation, and bias detection.

Embed explainability by design into your AI lifecycle where possible and establish practices to recognize and discover both intended and unintended biases. Consider using the right tools to help you monitor the status quo and inform risk. Use best practices that enable a culture of responsible use of AI and build or use systems to enable your teams to inspect these factors. While this cost accumulates before the algorithms reach production state, it pays off in the mid-term by mitigating damage. Especially when you are planning to build, tune, or use a foundation model inform yourself about new emerging concerns like hallucinations, copyright infringement, model data leakage, and model jailbreaks. Ask if, and how the original vendor or supplier has taken an RAI approach to the development as this trickles down directly into your business case.

Note

The AWS Responsible Use of AI team has written a whitepaper on this subject.