This short article summarizes the R&D efforts that led to the development of the Frugal-IT tool.
Data Engineering Efforts
Our first achievement is the creation of a data model that efficiently represents a Kubernetes cluster.
Using minimal tokens, we provide sufficient knowledge to the LLM, enabling it to understand the nature and role of workloads. We leverage the model’s training base, which encompasses all aspects of understanding Kubernetes elements and COTS (open source and thus widely known).
Our LangChain pipelines dynamically generate the model:
- Extracting essential and advanced data from all workloads.
- Summarizing and critically analyzing workloads, taking into account their configurations, ingress, and services.
- Scoring workloads based on a predefined set of criteria.
- Analyzing potential relationships between workloads at the namespace level.
- Generating an article that provides an overall view of the cluster.
We have chosen to enrich this model with business insights at all levels of the cluster. These insights are often crucial for understanding the system and decision-making. For example, in the context of consumption optimization, user consumption periodicity is essential.
Designing a Multi-Agent Chatbot
Through in-context learning, we embed a lightweight version of the data engineering model into the system prompt, enabling our chatbot to grasp the general topology.
We implemented a multi-agent chatbot that first routes to a specific “use case” based on the user’s query theme.
For instance, if the user seeks information about a specific system component or its consumption, the chatbot appropriately directs the request.
Next, specialized agents handle each use case, equipped with tailored tools (ReAct model). These tools allow the agents to access additional use case-related information or perform actions with the user’s permission (human in the loop).
Technical Architecture of the Chatbot
The chatbot is built on a modular architecture that integrates several key components:
- Agent Orchestrator: Using the LangGraph framework, it manages communication between different agents and the user, ensuring proper routing to the appropriate use case.
- Specialized Agents: Each agent follows the ReAct (Reasoning and Action) model, allowing it to reason based on provided information and interact with domain-specific tools.
- Tool Integration: Agents access a variety of tools we’ve developed, enabling them to retrieve raw YAMLs of specific resources, resource consumption data, energy mix, and even perform internet searches.
- Context Management: Through in-context learning, the chatbot maintains a conversational state that accounts for interaction history and the lightweight data model inserted into the system prompt.
Maturity of LangChain and LangGraph
While LangChain and LangGraph provide a strong foundation for building modular, intelligent systems, they are still relatively immature frameworks. Implementing advanced strategies such as human in the loop requires thoughtful design and potentially extending these frameworks to meet specific needs.
Security and Governance
The human in the loop dimension is crucial for actions with significant system impact. Agents can propose changes or actions, but they require explicit user validation before execution. This is particularly important for operations such as pod scaling, configuration updates, or restarting critical services.