Workflow overview
Why this workflow matters
Useful for software delivery and engineering operations. Improves internal consulting operations and productivity.
The original LLM Council concept was introduced by Andrej Karpathy and published as an open-source repository demonstrating multi-model consensus and ranking. This workflow is my adaptation of that original idea, reimplemented and structured as a production-ready AlekSystem template. Original repository - https://github.com/karpathy/llm-council This AlekSystem template implements the LLM Council pattern: a single user question is processed in parallel by multiple large language models, independently evaluated by peer models, and then synthesized into one high-quality, consensus-driven final answer. It is designed for use cases where answer quality, balance, and reduced single-model bias are critical. ๐ Section 1: Trigger & Input โก When Chat Message Received (Chat Trigger) Purpose: Receives a userโs message and initiates the entire workflow. How it works: A user sends a chat message The message is stored as the Original Question The same input is forwarded simultaneously to multiple LLM pipelines Why it matters: Provides a clean, unified entry point for all downstream multi-model logic. ๐ Section 2: Stage 1 โ Parallel LLM Responses ๐ค Basic LLM Chains (x4) Models used: Anthropic Claude OpenAI GPT xAI Grok Google Gemini Purpose: Each model independently generates its own response to the same question. Key characteristics: Identical prompt structure for all models Independent reasoning paths No shared context between models Why it matters: Produces diverse perspectives, reasoning styles, and solution approaches. ๐ Section 3: Stage 2 โ Response Anonymization ๐งพ Set Nodes (Response A / B / C / D) Purpose: Stores model outputs in an anonymized format: Response A Response B Response C Response D Why it matters: Prevents evaluator models from knowing which LLM authored which response, reducing bias during evaluation. ๐ Section 4: Stage 3 โ Peer Evaluation & Ranking ๐ Evaluation Chains (Claude / GPT / Grok / Gemini) Purpose: Each model acts as a reviewer and: Analyzes all four anonymized responses Describes strengths and weaknesses of each Produces a strict FINAL RANKING from best to worst Ranking format (strict): FINAL RANKING: Response B Response A Response D Response C Why it matters: Creates multiple independent quality assessments from different model perspectives. ๐ Section 5: Stage 4 โ Ranking Aggregation ๐งฎ Code Node (JavaScript) Purpose: Aggregates all peer rankings by: Parsing ranking positions Calculating average position per response Counting evaluation occurrences Sorting responses by best average score Output includes: Aggregated rankings Best response label Best average score Why it matters: Transforms subjective rankings into a structured, quantitative consensus. ๐ Section 6: Stage 5 โ Final Consensus Answer ๐ง Chairman LLM Chain Purpose: One model acts as the Council Chairman and: Reviews all original responses Considers peer rankings and aggregated scores Identifies consensus patterns and disagreements Produces a single, clear, high-quality final answer Why it matters: Delivers a refined response that reflects collective model intelligence rather than a simple average. ๐ Workflow Overview Stage Node / Logic Purpose 1 Chat Trigger Receive user question 2 LLM Chains Generate independent responses 3 Set Nodes Anonymize outputs 4 Evaluation Chains Peer review & ranking 5 Code Node Aggregate rankings 6 Chairman LLM Final synthesized answer ๐ฏ Key Benefits ๐ง Multi-model intelligence โ avoids reliance on a single LLM โ๏ธ Reduced bias โ anonymized peer evaluation ๐ Quality-driven selection โ ranking-based consensus ๐ Modular architecture โ easy to add or replace models ๐ Language-flexible โ input and output languages configurable ๐งฉ Production-ready logic โ clear stages, deterministic ranking ๐ Ideal Use Cases High-stakes decision support Complex technical or architectural questions Strategy and research synthesis AI assistants requiring higher trust and reliability Comparing and selecting the best LLM-generated answers
Best fit
Categories
Services
Use cases
Need another direction?