Evaluation metric example: Correctness (judged by AI) Workflow Solution

Workflow overview

Why this workflow matters

Potentially useful as a reusable automation building block.

AI evaluation in AlekSystem This is a template for AlekSystem's evaluation feature. Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow. By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't. How it works This template shows how to calculate a workflow evaluation metric: whether an output matches an expected output (i.e. has the same meaning). The workflow takes questions about the causes of historical events and compares them with the reference answers in the dataset. We use an evaluation trigger to read in our dataset It is wired up in parallel with the regular chat trigger so that the workflow can be started from either one. More info If we're evaluating (i.e. the execution started from the evaluation trigger), we calculate the correctness metric using AI We pass this information back to AlekSystem as a metric If we're not evaluating we avoid calculating the metric, to reduce cost

Best fit

Services

AI AgentOpenAI Chat ModelOpenAIEvaluation

Use cases

business process automation

Need another direction?

Continue a new search Request this workflow

Evaluation metric example: Correctness (judged by AI) Workflow Solution

Why this workflow matters

Categories

Services

Use cases

Related AlekSystem workflow ideas

Automated AI Timesheets for Consulting Teams

Executive AI Briefing and Follow-Up Assistant

Automated Multi-Channel Customer Support with Gmail, Telegram, and GPT AI Solution