Domain-Specific Languages, Tools, and Agents

Chapter Overview

This concluding chapter synthesizes the domain-specific applications explored throughout this book into a unified framework for building production AI systems. Across healthcare, finance, legal, recommendations, visual content, and observability, we observe recurring patterns: domains are formalized into structured representations, models learn to operate within these formalizations, and systems integrate models with tools to achieve business objectives. Understanding these patterns enables practitioners to systematically approach new domains rather than reinventing solutions.

The business imperative is clear. Organizations that successfully deploy domain-specific AI systems achieve measurable competitive advantages: 50-70\% cost reductions (legal contract review, healthcare documentation), 10-30\% revenue increases (recommendations, fraud detection), and 40-60\% efficiency gains (observability, visual content creation). However, success requires more than technical capability—it demands understanding domain constraints, managing model drift, balancing accuracy with explainability, and navigating regulatory requirements. The patterns synthesized in this chapter provide a reusable playbook for these challenges.

This chapter examines the world-to-language-to-tool pattern that underlies successful AI deployments, explores how to design domain-specific languages that enable reliable model-system integration, and investigates tool-augmented agents that orchestrate complex workflows. We synthesize drift management patterns across domains, compare accuracy-cost-latency trade-offs, and provide a practical framework for building domain-specific systems from requirements through deployment.

The stakes extend beyond individual applications to the future of AI deployment. As AI systems become more capable, they will increasingly operate as autonomous agents—perceiving environments, making decisions, and taking actions to achieve goals. These agents will need robust domain formalizations, reliable tool integration, and continuous adaptation to changing conditions. The patterns established in this chapter provide the foundation for this agent-driven future while remaining grounded in today's practical deployment realities.

Learning Objectives

  1. Understand the general pattern of DSLs in deep learning applications
  2. Design and formalize domain-specific languages for your application
  3. Build tool-augmented language models that call APIs, databases, and calculators
  4. Implement agents that plan and execute multi-step workflows
  5. Design structured outputs (JSON, XML) for reliable model-to-system integration
  6. Evaluate tool use, agent plans, and error recovery
  7. Understand trade-offs between model capability and system reliability

The World-to-Language-to-Tool Pattern

Across domains, a consistent pattern emerges:

  1. World: Messy, unstructured reality. Customer support tickets with varied formats. Code repositories with inconsistent styles. Video files with varying codecs and metadata.
  2. Formalization: Transform the world into a DSL. Ticket schemas define fields (customer ID, issue type, priority, description). Event schemas standardize user interactions. Log formats structure machine events.
  3. Models learn DSLs: Deep learning models trained on domain data learn to understand and generate within the formalized language.
  4. Tools operate on DSL: Systems downstream of the model (databases, APIs, business logic) operate on the formalized language. Because the model outputs adhere to the DSL, tools can process outputs reliably.
Definition: The most successful applications of deep learning to real domains follow this pattern:
  1. Identify the core data representation in your domain
  2. Formalize it into an explicit DSL (schema, grammar, format)
  3. Train models on domain data to master the DSL
  4. Build tools that operate on the DSL, providing model feedback and enabling automation
  5. Iterate: improve DSL clarity based on model mistakes; improve models based on tool feedback

Designing Domain-Specific Languages

A well-designed DSL makes models easier to train and systems easier to build. Poor DSL design leads to model confusion and system brittleness.

Case Example: Support Ticket DSL

Poor DSL (unstructured):

{
  "text": "I can't login to my account. Tried resetting password 
           but didn't receive the email. My email is johndoe@example.com. 
           Account created 6 months ago. Very frustrated!"
}

A model must extract key information from unstructured text, error-prone.

Better DSL (structured):

{
  "customer_id": "123456",
  "issue_type": "authentication",
  "severity": "high",
  "description": "Cannot login; password reset email not received",
  "email": "johndoe@example.com",
  "account_age_days": 180,
  "previous_interactions": 2,
  "sentiment": "negative"
}

Structured DSL reduces model ambiguity. Models learn to extract and classify information reliably. Downstream tools (routing, priority assignment) consume structured data.

DSL Design Principles

DSL design follows several key principles. Clarity requires that every field be unambiguous, avoiding free-text fields where discrete categories exist. Completeness demands including all information relevant to the task, as missing fields create ambiguity that degrades model performance. Consistency enforces uniform types and units across examples—all dates in ISO 8601 format, all sizes in bytes, all currencies explicitly specified. Expandability designs for future extensions through versioning or optional fields, preventing breaking changes as requirements evolve. Human readability ensures that humans can understand the DSL format (JSON, YAML, structured text) for debugging, annotation, and quality assurance.

Formal DSL Specification

For complex domains, define the DSL formally using schemas:

JSON Schema example:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "issue_type": {
      "enum": ["billing", "technical", "account", "other"]
    },
    "severity": {
      "enum": ["low", "medium", "high", "critical"],
      "description": "Impact on customer operations"
    },
    "description": {
      "type": "string",
      "maxLength": 500,
      "description": "Concise description of the issue"
    }
  },
  "required": ["issue_type", "description"]
}

Formal specification enables several critical capabilities. Validation checks that model outputs conform to schema before use, preventing malformed data from reaching downstream systems. Code generation automatically produces parsing and serialization code from schema definitions, ensuring consistency between specification and implementation. Documentation uses the schema as the authoritative specification for data handling, reducing ambiguity and miscommunication. Testing generates comprehensive test cases covering all schema types and edge cases, improving system robustness.

Tool-Augmented Models

Large language models are powerful but limited. They hallucinate facts, struggle with math, and cannot access real-time information. Tool augmentation addresses these limitations by enabling models to call external systems.

Tool Calling Architecture

A tool-augmented model has two components:

  1. Model (decision-maker): A language model decides when and how to call tools
  2. Tools (executors): External systems that perform actions (database lookups, API calls, computations)

Workflow:

  1. User query: ``What is the refund status of order 12345?''
  2. Model generates: Tool call: lookup\_order(order\_id=12345)
  3. System executes tool: Returns \{status: refunded, amount: \$50, timestamp: 2024-01-15\}
  4. Model generates response: ``Your refund of \$50 was processed on January 15, 2024.''

Function Calling in Modern LLMs

Modern APIs (OpenAI's function calling, Anthropic's tool use) formalize this. Models are provided with a tool schema:


{
  "name": "lookup_order",
  "description": "Retrieve order details by ID",
  "parameters": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "Order identifier"
      }
    },
    "required": ["order_id"]
  }
}

The model learns to produce outputs like:


Tool call: lookup_order(order_id="12345")

The system parses this, executes the tool, and returns results to the model for the next step.

Tool Selection and Chaining

With multiple tools available, the model must select appropriate tools and chain them:

  1. ``What is the weather in Berlin tomorrow?''
  2. Model calls: get\_weather(location=Berlin, days\_ahead=1)
  3. System returns: \{temperature: 5C, condition: rainy\}
  4. Model generates: ``It will be rainy and 5 degrees Celsius tomorrow in Berlin.''

More complex example:

  1. ``Show me orders from customers in California last month.''
  2. Model calls: search\_customers(state=California) → [customer\_id1, customer\_id2, ...]
  3. For each customer, calls: get\_orders(customer\_id=..., month=last\_month) → [order1, order2, ...]
  4. Aggregates results and generates summary

The model learns to decompose queries into tool calls and orchestrate them.

Reliability and Error Handling

Tool-augmented systems must handle errors gracefully through several mechanisms. Tool failures occur when APIs return errors such as customer not found or timeout—the model should acknowledge the failure and offer alternatives. Invalid parameters happen when the model generates tool calls with missing or incorrect parameters—validation catches these errors and prompts the model to retry with corrections. Hallucinated tools arise when the model calls non-existent tools—the system should list available tools and allow the model to try again. Infinite loops occur when the model repeatedly calls the same tool without making progress—implementing call limits and loop detection prevents this failure mode.

Agents and Workflow Orchestration

An agent is an autonomous system that perceives its environment, makes decisions, and takes actions to achieve goals. In the context of deep learning, an agent uses a language model to decide actions, tools to execute, and a planning loop to manage multi-step workflows.

Agent Loop

Algorithm: Agent Decision Loop
  1. Initialize: Given user goal and available tools
  2. Loop:
    1. Model reads current state (user goal, previous actions, results)
    2. Model thinks: ``What is the next action I should take?''
    3. Model decides: Calls a tool or generates response to user
    4. If tool call:
      1. Execute tool, get result
      2. Append result to state
      3. Continue loop
      4. If response: Return to user, exit
      5. Termination: User goal achieved or max iterations exceeded
      6. Planning and Reasoning

        Advanced agents plan before executing. Chain-of-thought prompting helps:

        
        Goal: Find the best laptop for a developer under $2000
        
        Thinking: 
        1. I need to understand developer needs: CPU, RAM, battery, build quality
        2. I should search for laptops matching these criteria
        3. I need to compare options and recommend the best
        
        Actions:
        - Tool: get_laptop_specs(type="developer", max_price=2000)
          Result: [Laptop A, Laptop B, Laptop C]
        - Tool: compare_laptops(laptop_ids=[A, B, C])
          Result: Detailed comparison
        - Response: Based on comparison, Laptop A is best because...
        

        Planning increases accuracy and transparency. Users understand the agent's reasoning, improving trust.

        Memory and State Management

        Agents require memory across interactions to maintain context and continuity. State management includes several components. Interaction history tracks previous queries, actions, and results, enabling the agent to reference past conversations and avoid repeating work. User preferences learned from past interactions allow personalization and improved recommendations. Task progress tracking is essential for multi-step workflows that may span hours or days, requiring checkpoints to resume interrupted work.

        Long-term memory requires careful management to remain effective. Summarizing old history prevents token explosion as conversations grow lengthy. Retrieving relevant past interactions through semantic search ensures important context is available when needed. Using structured state storage in databases rather than relying solely on the context window enables scalable memory management for production systems.

        Structured Output and Validation

        Models can generate free-form text, but for integration with systems, structured outputs are essential. Modern approaches:

        JSON Output Mode

        Some models support JSON output mode: model generates only valid JSON:

        
        System prompt: You must output valid JSON matching this schema: {...}
        
        User: Extract person and age from "My name is Alice and I'm 30"
        
        Model output:
        {
          "person": "Alice",
          "age": 30
        }
        

        JSON mode ensures outputs are syntactically valid, but not semantically correct. Validation still checks correctness.

        Semantic Validation

        Beyond syntax, validate semantic correctness:

        When validation fails, prompt the model to retry with explanation of the error.

        Practical Design Framework

        Here is a step-by-step framework for building domain-specific systems:

        Step 1: Analyze the Domain

        Step 2: Design the DSL

        Step 3: Choose Model and Training Approach

        Step 4: Integrate Tools and APIs

        Step 5: Implement Validation and Feedback

        Step 6: Evaluate and Deploy

        Exercises

        Exercise 1: Design a DSL for a restaurant reservation system. What entities, relationships, and operations are critical? Write a JSON schema for the core data types.
        Exercise 2: Build a tool-augmented chatbot for a weather service. Tools: get\_weather(location, days\_ahead), get\_hourly\_forecast(location, date). Design the tool schemas. Implement the chatbot with proper error handling.
        Exercise 3: Implement an agent loop for a personal expense tracker. The agent can: ask clarifying questions, retrieve past expenses, categorize new expenses, and summarize spending. What tools would the agent need?

        Solutions

        Full solutions for all exercises are available at \url{https://deeplearning.hofkensvermeulen.be}.

        Solution: Exercise 1: Restaurant Reservation DSL

        \itshape Core Entities:

        \itshape JSON Schema (partial):
        
        {
          "reservation": {
            "type": "object",
            "properties": {
              "id": {"type": "string", "pattern": "^RES-[0-9]{6}$"},
              "restaurant_id": {"type": "string"},
              "customer_id": {"type": "string"},
              "datetime": {"type": "string", "format": "date-time"},
              "party_size": {"type": "integer", "minimum": 1, "maximum": 20},
              "status": {
                "enum": ["pending", "confirmed", "checked-in", "cancelled", "no-show"]
              },
              "special_requests": {"type": "string", "maxLength": 200}
            },
            "required": ["restaurant_id", "customer_id", "datetime", "party_size"]
          }
        }
        

        \itshape Critical Operations:

        \itshape Design notes: Status field captures reservation lifecycle. Special requests allow customization without schema explosion. All timestamps in ISO 8601 for consistency.

        Solution: Exercise 2: Tool-Augmented Weather Chatbot \itshape Tool schemas:
        
        {
          "name": "get_weather",
          "description": "Get current weather and forecast",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "City name or coordinates"
              },
              "days_ahead": {
                "type": "integer",
                "description": "Days to forecast (0-14)"
              }
            },
            "required": ["location"]
          }
        }
        
        {
          "name": "get_hourly_forecast",
          "description": "Detailed hourly forecast",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"},
              "date": {"type": "string", "format": "date"}
            },
            "required": ["location", "date"]
          }
        }
        
        \itshape Chatbot interaction:
        
        User: "What's the weather in Berlin?"
        Model: Tool call: get_weather(location="Berlin", days_ahead=1)
        System: Returns current weather + 7-day forecast
        Model response: "In Berlin, it's currently 5°C and rainy. 
                      Tomorrow will be cloudy with a high of 8°C."
        
        User: "Hour by hour forecast for tomorrow?"
        Model: Tool call: get_hourly_forecast(location="Berlin", date="2024-02-01")
        System: Returns hourly data
        Model response: "Tomorrow hourly: 6am 4°C, 9am 6°C, 12pm 8°C, ..."
        

        \itshape Error handling: - Invalid location: ``I couldn't find that location. Did you mean Berlin, Germany?'' - API timeout: ``Weather service is slow. Showing cached forecast...'' - Out of range date: ``I can forecast up to 14 days ahead. Showing 14-day forecast.''

        Solution: Exercise 3: Personal Expense Tracker Agent

        \itshape Tools:

        \itshape Agent workflow:

        1. User: ``I spent \$25 on lunch today''
        2. Agent: Tool call: categorize\_expense(``lunch'') → ``Food \& Dining''
        3. Agent: Tool call: add\_expense(amount=25, category=``Food \& Dining'', date=today, description=``Lunch'')
        4. Agent response: ``Logged \$25 spending in Food \& Dining category for today.''

        \itshape Clarification questions:

        1. User: ``I spent \$100 today but forgot what on''
        2. Agent: ``I can help categorize it. Was it for food, transport, entertainment, or something else?''
        3. User: ``Entertainment''
        4. Agent: Tool call: add\_expense(...category=``Entertainment'')
        5. Agent: ``Got it. Added \$100 to Entertainment for today.''

        \itshape Summarization:

        1. User: ``How much have I spent on food this month?''
        2. Agent: Tool call: get\_expenses(category=``Food \& Dining'', date\_range=``current month'')
        3. Agent: Tool call: summarize\_spending(period=``current month'')
        4. Agent response: ``You've spent \$320 on Food \& Dining this month (15\% of your monthly budget of \$2000).''

        Key agent features: explicit categorization, budget awareness, historical tracking, proactive questions for clarity.

        Conclusion and Future Directions

        This chapter presented a general design pattern for applying deep learning to domain-specific problems. The pattern---world-formalization-language-tools---is not new to AI; it mirrors how humans solve problems by creating abstractions and tools. What is new is that deep learning models can now learn to operate effectively within these formal systems, bridging the gap between unstructured human communication and structured computational systems.

        The landscape of deep learning applications will continue to expand as models grow more capable and tools become more integrated. Future directions include:

        We hope this book has provided both the theoretical foundations and practical insights needed to build the next generation of deep learning systems. The principles and techniques covered---transformers, attention, scaling, training, and deployment---are tools. The true skill lies in recognizing your domain, formalizing it into a language, and building systems that leverage models and tools to solve real problems.

        Synthesis: Patterns Across Domains

        Having explored domain-specific AI systems across healthcare, finance, legal, recommendations, visual content, and observability, we can now synthesize the key patterns. The universal themes---drift inevitability, the accuracy-cost-latency trade-off, human-in-the-loop necessity, and explainability requirements---manifest differently in each domain. Table~[ref] summarizes these variations.

        DomainDrift PaceRetrain CadenceValidation RigorKey Constraint
        HealthcareQuarterlyQuarterly--annualExtreme (FDA)Patient safety
        FinanceDailyDaily--weeklyHigh (regulatory)Latency + adversarial
        LegalEpisodicQuarterly--semi-annualVery high (liability)Professional responsibility
        RecommendationsWeeklyDaily--weeklyModerate (A/B tests)Scale + freshness
        Visual ContentMonthlyMonthly--weeklyModerateTrend velocity
        ObservabilityContinuousOnline + monthlyHigh (reliability)24/7 uptime

        Three universal principles emerge across all domains:

        1. Drift is inevitable, not exceptional. Every production AI system degrades over time. Successful deployments plan for drift from deployment day, budgeting for detection, retraining pipelines, and continuous maintenance. The retraining frequency must match the domain's drift pace while respecting its validation requirements.

        2. Human oversight remains essential. The form varies---physician review of diagnoses, lawyer review of contract analysis, trader oversight of algorithmic decisions, product manager oversight of recommendation changes---but no high-stakes domain deploys AI without human judgment in the loop.

        3. Explainability is a business requirement, not a technical luxury. Stakeholders across all domains demand explanations for AI decisions. Attention mechanisms, retrieval-augmented generation, ensemble confidence estimates, and rule-based components all serve this need. Black-box models fail to achieve adoption regardless of accuracy.

        Future Directions

        Looking forward, four trends will shape domain-specific AI: (1)~multi-domain agents that operate across healthcare, finance, and legal simultaneously, requiring cross-domain drift management; (2)~federated learning enabling cross-organizational training while maintaining privacy; (3)~automated governance that monitors performance, detects drift, and maintains compliance at scale; and (4)~energy-efficient architectures as sustainability concerns grow alongside model scale.

        ← Chapter 33: Observability and Monitoring 📚 Table of Contents