AlekSystem Workflow Detail

Evaluate AI Agent Response Correctness with OpenAI and RAGAS Methodology Workflow Solution

Evaluate AI Agent Response Correctness with OpenAI and RAGAS Methodology

This AlekSystem template demonstrates how to calculate the evaluation metric "Correctness" which in this scenario, measures the compares and classifies the a...

Rank 51 Verified workflow

Workflow overview

Why this workflow matters

Useful for software delivery and engineering operations. Supports knowledge capture and document intelligence use cases.

This AlekSystem template demonstrates how to calculate the evaluation metric "Correctness" which in this scenario, measures the compares and classifies the agent's response against a set of ground truths. The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_correctness.py How it works This evaluation works best where the agent's response is allowed to be more verbose and conversational. For our scoring, we classify the agent's response into 3 buckets: True Positive (in answer and ground truth), False Positive (in answer but not ground truth) and False Negative (not in answer but in ground truth). We also calculate an average similarity score on the agent's response against all ground truths. The classification and the similarity score is then averaged to give the final score. A high score indicates the agent is accurate whereas a low score could indicate the agent has incorrect training data or is not providing a comprehensive enough answer. Requirements AlekSystem version 1.94+ Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing

Best fit

Categories

AI/MLCommunicationDevOpsDocument Ops

Services

AI AgentBasic LLM ChainOpenAI Chat ModelStructured Output ParserEvaluation

Use cases

engineering workflow automation