Streamlining LLM Prototyping with YAML: Introducing yamllm-core

What is yamllm-core?

Yamllm-core is a Python library designed to simplify and accelerate the process of prototyping and experimenting with Large Language Models (LLMs). It enables users to configure and execute LLM interactions using human-readable YAML files. This approach provides a clean and intuitive method for defining various aspects of LLM prototypes, including model selection, parameters, memory management, and output formatting. By abstracting away the complexity of different APIs and settings, yamllm-core allows developers and researchers to focus more on innovation rather than the intricacies of configuration, making it easier to read, save, and load previous configurations.

Why use yamllm-core?

Using yamllm-core offers several benefits for those working with LLMs, particularly those new to Python or API interactions. It significantly accelerates prototyping by reducing the time spent on setting up configurations, allowing for rapid iteration and experimentation with different LLM setups. The YAML-based configuration simplifies management by keeping all settings in organized, human-readable files, separate from the code, which improves readability and maintainability compared to complex code-based configurations. This facilitates faster experimentation with model parameters, prompts, and memory settings by simply editing the YAML file. Furthermore, yamllm-core enhances collaboration as YAML configurations are easily shareable among team members. It also provides straightforward, out-of-the-box memory management for storing conversation history, which is beneficial for iterating on responses with a model.

What are the key features?

Yamllm-core boasts several key features centered around its YAML-based configuration system:

YAML-Based Configuration:

Users can define LLM configurations in YAML, specifying:

Model details from providers like OpenAI, Google Gemini, DeepSeek, and MistralAI
Parameters such as temperature, maximum tokens, and stop sequences

Simple Initialization:

Allows for the initialization of LLMs with minimal Python code
Unified API Interaction: Provides a unified query() method for both simple queries and ongoing conversations

Built-in Memory Management:

Short-Term Memory: Utilizes SQLite for conversational history
Long-Term Memory: Employs FAISS vector stores for semantic memory

Additional Features:

Customizable System Prompts: Supports the definition of system-level instructions via YAML
Error Handling and Robustness: Incorporates error handling and retry logic for API requests
Configurable Output Formatting: Currently supports text output with planned support for JSON and markdown formats
Streaming Responses: Allows enabling or disabling of streaming for LLM responses

How to Get Started

Install the Package:

Using pip:

pip install yamllm-core

Or using uv:

uv add yamllm-core

Configure Your Settings:

Create a .config.yaml file to define your LLM settings. Here's an example configuration:

# .config.yaml example
provider:
name: "openai"  # supported: openai, google, deepseek, mistral
model: "gpt-4o-mini"  # model identifier
api_key: # api key goes here, best practice to put into dotenv
base_url: # optional: for custom endpoints

# Model Configuration
model_settings:
temperature: 0.7
max_tokens: 1000
top_p: 1.0
frequency_penalty: 0.0
presence_penalty: 0.0
stop_sequences: []

# Request Settings
request:
timeout: 30  # seconds
retry:
max_attempts: 3
initial_delay: 1
backoff_factor: 2

# Context Management
context:
system_prompt: "You are a helpful assistant, helping me achieve my goals"
max_context_length: 16000
memory:
enabled: false # Set to true to enable memory
max_messages: 10  # number of messages to keep in conversation history
conversation_db: "yamllm/memory/conversation_history.db"
vector_store:
index_path: "yamllm/memory/vector_store/faiss_index.idx"
metadata_path: "yamllm/memory/vector_store/metadata.pkl"

# Output Formatting
output:
format: "text"  # supported: text, json, markdown
stream: true

logging:
level: "INFO"
file: "yamllm.log"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

# Tool Management - In Developemnt
tools:
enabled: false
tool_timeout: 10  # seconds
tool_list: []

# Safety Settings
safety:
content_filtering: true
max_requests_per_minute: 60
sensitive_keywords: []

Example configurations can also be found in the project’s GitHub repository.

Initialize in Your Python Script:

Import the appropriate class from yamllm (e.g., OpenAIGPT or GoogleGemini), load your API key (preferably from environment variables or a .env file), and instantiate the LLM object by providing the path to your configuration file and your API key.

# Getting Started example with GoogleGemini
from yamllm import GoogleGemini
import os
from dotenv import load_dotenv

# Load environment variables from .env file
dotenv.load_dotenv()

config_path = ".config.yaml" # Path to your configuration file

# Ensure your GOOGLE_API_KEY is set in your .env file or environment variables
llm = GoogleGemini(config_path=config_path, api_key=os.environ.get("GOOGLE_API_KEY"))

response = llm.query("Hello, yamllm!")
print(response) # Response will be printed to the terminal

Additional Usage Examples

Python Code to Use a General Configuration (e.g., OpenAI) This example shows how to load an OpenAI configuration and make a query.

from yamllm import OpenAIGPT
import os
from dotenv import load_dotenv

# API key is required, I've chosen to use dotenv as a quick way to store API keys
load_dotenv()

config_path = ".config.yaml" # Path to your configuration file

llm = OpenAIGPT(config_path=config_path, api_key=os.environ.get("OPENAI_API_KEY"))
response = llm.query("Explain quantum physics in simple terms.")

Conversational Mode

For ongoing conversations, you can use a loop to continuously prompt the LLM.

from yamllm import OpenAIGPT # Or GoogleGemini, etc.
import os
from dotenv import load_dotenv
from rich.console import Console

load_dotenv()

config_path = ".config.yaml"
# Ensure your API key (e.g., OPENAI_API_KEY) is set and .config.yaml is configured
llm = OpenAIGPT(config_path=config_path, api_key=os.environ.get("OPENAI_API_KEY"))

console = Console()

while True:
    try:
        prompt = input("\nHuman: ")
        if prompt.lower() == "exit":
            break
        response = llm.query(prompt) # query method handles printing
    except FileNotFoundError as e:
        console.print(f"[red]Configuration file not found:[/red] {e}")
    except ValueError as e:
        console.print(f"[red]Configuration error:[/red] {e}")
    except Exception as e:
        console.print(f"[red]An error occurred:[/red] {str(e)}")

Reviewing Short-Term Memory (Conversation History)

If memory is enabled in your .config.yaml, you can review past conversation history.

from yamllm import ConversationStore # Corrected import based on article context
import pandas as pd
from tabulate import tabulate

# Ensure the path matches your .config.yaml memory.conversation_db setting
history = ConversationStore("yamllm/memory/conversation_history.db")

messages = history.get_messages()

df = pd.DataFrame(messages)
print(tabulate(df, headers='keys', tablefmt='psql'))

Using Long-Term Memory (Vector Store)
yamllm-core integrates a FAISS vector store for semantic memory.

from yamllm.memory import VectorStore # Corrected import based on article context

vector_store = VectorStore(
index_path="yamllm/memory/vector_store/faiss_index.idx", # Ensure paths match config
metadata_path="yamllm/memory/vector_store/metadata.pkl" # Ensure paths match config
)

vectors, metadata = vector_store.get_vec_and_text()

print(f"Number of vectors: {len(vectors)}")
if len(vectors) > 0 and hasattr(vectors, 'shape'):
    print(f"Vector dimension: {vectors.shape[1]}")
else:
    print(f"Vector dimension: 0 or N/A")
    print(f"Number of metadata entries: {len(metadata)}")
    print("Sample metadata (first few entries):")

for i, item in enumerate(metadata[:3]): # Print first 3 metadata entries as sample
    print(item)

Key Challenges and Next Steps

The article mentions several aspects that are currently under development and represent the next steps for yamllm-core. These features and improvements will further enhance the library's capabilities:

Expanded Output Formatting

Allow users to specify output in multiple formats:
Plain text for simple responses
Structured JSON for data processing
Markdown for rich text formatting
Support for custom export formats and templates
Ability to convert between different output formats

Enhanced Logging Capabilities

Comprehensive logging system with:
Configurable logging levels (DEBUG, INFO, WARNING, ERROR)
File output support with rotation
Customizable log formatting templates
Integration with popular logging frameworks
Real-time logging visualization

Tool Management System

Robust integration with external tools and services:
API connectors for popular services
Custom tool definition framework
Flexible configuration options for tool parameters
Plugin system for community-contributed tools
Tool execution monitoring and logging

Implementation of Safety Settings

Advanced safety and security features:
Content filtering mechanisms for inappropriate content
Rate limiting controls to prevent API abuse
Security best practices implementation:
Input validation and sanitization
API key management
Request/response encryption
Usage monitoring and alerting system
Compliance with AI safety guidelines

Results

The article introduces yamllm-core and its capabilities, focusing on streamlining LLM prototyping rather than presenting experimental results or benchmarks.

Key Benefits:

Accelerated prototyping
Simplified configuration management
Improved readability and maintainability of LLM projects
Faster experimentation cycles
Enhanced collaboration through shareable configurations
Easy-to-use memory management

Example code snippets demonstrate how easily users can configure and interact with LLMs—including reviewing conversation history. Overall, the result is a more efficient and intuitive workflow for developing LLM-powered applications.