Model evaluation notebook tutorials
This section provides the following notebook tutorials, which include example code and explanations:
-
How to evaluate a JumpStart model for prompt stereotyping.
-
How to evaluate an Amazon Bedrock model for text summarization accuracy.
Topics
Additional notebooks
The fmeval
GitHub
-
bedrock-claude-factual-knowledge.ipnyb
– Evaluates an Anthropic Claude 2 model hosted on Amazon Bedrock for factual knowledge. -
byo-model-outputs.ipynb
– Evaluates a Falcon 7b model hosted on JumpStart for factual knowledge where you bring your own model outputs instead of sending inference requests to your model. -
custom_model_runner_chat_gpt.ipnyb
– Evaluates a custom ChatGPT 3.5
model hosted onHugging Face
for factual knowledge.