Domain-Specific Languages, Tools, and Agents
Chapter Overview
This concluding chapter synthesizes the domain-specific applications explored throughout this book into a unified framework for building production AI systems. Across healthcare, finance, legal, recommendations, visual content, and observability, we observe recurring patterns: domains are formalized into structured representations, models learn to operate within these formalizations, and systems integrate models with tools to achieve business objectives. Understanding these patterns enables practitioners to systematically approach new domains rather than reinventing solutions.
The business imperative is clear. Organizations that successfully deploy domain-specific AI systems achieve measurable competitive advantages: 50-70\% cost reductions (legal contract review, healthcare documentation), 10-30\% revenue increases (recommendations, fraud detection), and 40-60\% efficiency gains (observability, visual content creation). However, success requires more than technical capabilityâit demands understanding domain constraints, managing model drift, balancing accuracy with explainability, and navigating regulatory requirements. The patterns synthesized in this chapter provide a reusable playbook for these challenges.
This chapter examines the world-to-language-to-tool pattern that underlies successful AI deployments, explores how to design domain-specific languages that enable reliable model-system integration, and investigates tool-augmented agents that orchestrate complex workflows. We synthesize drift management patterns across domains, compare accuracy-cost-latency trade-offs, and provide a practical framework for building domain-specific systems from requirements through deployment.
The stakes extend beyond individual applications to the future of AI deployment. As AI systems become more capable, they will increasingly operate as autonomous agentsâperceiving environments, making decisions, and taking actions to achieve goals. These agents will need robust domain formalizations, reliable tool integration, and continuous adaptation to changing conditions. The patterns established in this chapter provide the foundation for this agent-driven future while remaining grounded in today's practical deployment realities.
Learning Objectives
- Understand the general pattern of DSLs in deep learning applications
- Design and formalize domain-specific languages for your application
- Build tool-augmented language models that call APIs, databases, and calculators
- Implement agents that plan and execute multi-step workflows
- Design structured outputs (JSON, XML) for reliable model-to-system integration
- Evaluate tool use, agent plans, and error recovery
- Understand trade-offs between model capability and system reliability
The World-to-Language-to-Tool Pattern
Across domains, a consistent pattern emerges:
- World: Messy, unstructured reality. Customer support tickets with varied formats. Code repositories with inconsistent styles. Video files with varying codecs and metadata.
- Formalization: Transform the world into a DSL. Ticket schemas define fields (customer ID, issue type, priority, description). Event schemas standardize user interactions. Log formats structure machine events.
- Models learn DSLs: Deep learning models trained on domain data learn to understand and generate within the formalized language.
- Tools operate on DSL: Systems downstream of the model (databases, APIs, business logic) operate on the formalized language. Because the model outputs adhere to the DSL, tools can process outputs reliably.
- Identify the core data representation in your domain
- Formalize it into an explicit DSL (schema, grammar, format)
- Train models on domain data to master the DSL
- Build tools that operate on the DSL, providing model feedback and enabling automation
- Iterate: improve DSL clarity based on model mistakes; improve models based on tool feedback
Designing Domain-Specific Languages
A well-designed DSL makes models easier to train and systems easier to build. Poor DSL design leads to model confusion and system brittleness.
Case Example: Support Ticket DSL
Poor DSL (unstructured):
{
"text": "I can't login to my account. Tried resetting password
but didn't receive the email. My email is johndoe@example.com.
Account created 6 months ago. Very frustrated!"
}
A model must extract key information from unstructured text, error-prone.
Better DSL (structured):
{
"customer_id": "123456",
"issue_type": "authentication",
"severity": "high",
"description": "Cannot login; password reset email not received",
"email": "johndoe@example.com",
"account_age_days": 180,
"previous_interactions": 2,
"sentiment": "negative"
}
Structured DSL reduces model ambiguity. Models learn to extract and classify information reliably. Downstream tools (routing, priority assignment) consume structured data.
DSL Design Principles
DSL design follows several key principles. Clarity requires that every field be unambiguous, avoiding free-text fields where discrete categories exist. Completeness demands including all information relevant to the task, as missing fields create ambiguity that degrades model performance. Consistency enforces uniform types and units across examplesâall dates in ISO 8601 format, all sizes in bytes, all currencies explicitly specified. Expandability designs for future extensions through versioning or optional fields, preventing breaking changes as requirements evolve. Human readability ensures that humans can understand the DSL format (JSON, YAML, structured text) for debugging, annotation, and quality assurance.
Formal DSL Specification
For complex domains, define the DSL formally using schemas:
JSON Schema example:
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"issue_type": {
"enum": ["billing", "technical", "account", "other"]
},
"severity": {
"enum": ["low", "medium", "high", "critical"],
"description": "Impact on customer operations"
},
"description": {
"type": "string",
"maxLength": 500,
"description": "Concise description of the issue"
}
},
"required": ["issue_type", "description"]
}
Formal specification enables several critical capabilities. Validation checks that model outputs conform to schema before use, preventing malformed data from reaching downstream systems. Code generation automatically produces parsing and serialization code from schema definitions, ensuring consistency between specification and implementation. Documentation uses the schema as the authoritative specification for data handling, reducing ambiguity and miscommunication. Testing generates comprehensive test cases covering all schema types and edge cases, improving system robustness.
Tool-Augmented Models
Large language models are powerful but limited. They hallucinate facts, struggle with math, and cannot access real-time information. Tool augmentation addresses these limitations by enabling models to call external systems.
Tool Calling Architecture
A tool-augmented model has two components:
- Model (decision-maker): A language model decides when and how to call tools
- Tools (executors): External systems that perform actions (database lookups, API calls, computations)
Workflow:
- User query: ``What is the refund status of order 12345?''
- Model generates: Tool call: lookup\_order(order\_id=12345)
- System executes tool: Returns \{status: refunded, amount: \$50, timestamp: 2024-01-15\}
- Model generates response: ``Your refund of \$50 was processed on January 15, 2024.''
Function Calling in Modern LLMs
Modern APIs (OpenAI's function calling, Anthropic's tool use) formalize this. Models are provided with a tool schema:
{
"name": "lookup_order",
"description": "Retrieve order details by ID",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "Order identifier"
}
},
"required": ["order_id"]
}
}
The model learns to produce outputs like:
Tool call: lookup_order(order_id="12345")
The system parses this, executes the tool, and returns results to the model for the next step.
Tool Selection and Chaining
With multiple tools available, the model must select appropriate tools and chain them:
- ``What is the weather in Berlin tomorrow?''
- Model calls: get\_weather(location=Berlin, days\_ahead=1)
- System returns: \{temperature: 5C, condition: rainy\}
- Model generates: ``It will be rainy and 5 degrees Celsius tomorrow in Berlin.''
More complex example:
- ``Show me orders from customers in California last month.''
- Model calls: search\_customers(state=California) â [customer\_id1, customer\_id2, ...]
- For each customer, calls: get\_orders(customer\_id=..., month=last\_month) â [order1, order2, ...]
- Aggregates results and generates summary
The model learns to decompose queries into tool calls and orchestrate them.
Reliability and Error Handling
Tool-augmented systems must handle errors gracefully through several mechanisms. Tool failures occur when APIs return errors such as customer not found or timeoutâthe model should acknowledge the failure and offer alternatives. Invalid parameters happen when the model generates tool calls with missing or incorrect parametersâvalidation catches these errors and prompts the model to retry with corrections. Hallucinated tools arise when the model calls non-existent toolsâthe system should list available tools and allow the model to try again. Infinite loops occur when the model repeatedly calls the same tool without making progressâimplementing call limits and loop detection prevents this failure mode.
Agents and Workflow Orchestration
An agent is an autonomous system that perceives its environment, makes decisions, and takes actions to achieve goals. In the context of deep learning, an agent uses a language model to decide actions, tools to execute, and a planning loop to manage multi-step workflows.
Agent Loop
Planning and Reasoning
Advanced agents plan before executing. Chain-of-thought prompting helps:
Goal: Find the best laptop for a developer under $2000
Thinking:
1. I need to understand developer needs: CPU, RAM, battery, build quality
2. I should search for laptops matching these criteria
3. I need to compare options and recommend the best
Actions:
- Tool: get_laptop_specs(type="developer", max_price=2000)
Result: [Laptop A, Laptop B, Laptop C]
- Tool: compare_laptops(laptop_ids=[A, B, C])
Result: Detailed comparison
- Response: Based on comparison, Laptop A is best because...
Planning increases accuracy and transparency. Users understand the agent's reasoning, improving trust.
Memory and State Management
Agents require memory across interactions to maintain context and continuity. State management includes several components. Interaction history tracks previous queries, actions, and results, enabling the agent to reference past conversations and avoid repeating work. User preferences learned from past interactions allow personalization and improved recommendations. Task progress tracking is essential for multi-step workflows that may span hours or days, requiring checkpoints to resume interrupted work.
Long-term memory requires careful management to remain effective. Summarizing old history prevents token explosion as conversations grow lengthy. Retrieving relevant past interactions through semantic search ensures important context is available when needed. Using structured state storage in databases rather than relying solely on the context window enables scalable memory management for production systems.
Structured Output and Validation
Models can generate free-form text, but for integration with systems, structured outputs are essential. Modern approaches:
JSON Output Mode
Some models support JSON output mode: model generates only valid JSON:
System prompt: You must output valid JSON matching this schema: {...}
User: Extract person and age from "My name is Alice and I'm 30"
Model output:
{
"person": "Alice",
"age": 30
}
JSON mode ensures outputs are syntactically valid, but not semantically correct. Validation still checks correctness.
Semantic Validation
Beyond syntax, validate semantic correctness:
- Type validation: Age is an integer in range [0, 150]
- Consistency: If order status is ``cancelled,'' refund amount should be nonzero
- Logic validation: If customer is VIP, discount should be $\geq$ 10\%
When validation fails, prompt the model to retry with explanation of the error.
Practical Design Framework
Here is a step-by-step framework for building domain-specific systems:
Step 1: Analyze the Domain
- What are the key entities? (orders, customers, products)
- What are the key relationships? (customer has orders, orders contain items)
- What are the key operations? (search, aggregate, transform)
- What are the typical workflows? (customer inquiries â lookup â respond)
Step 2: Design the DSL
- Identify core data representations
- Formalize as schema (JSON, Protobuf, custom grammar)
- Ensure clarity, consistency, and completeness
- Version the DSL for evolution
Step 3: Choose Model and Training Approach
- Fine-tune a pretrained foundation model vs. few-shot prompting vs. from-scratch training
- Collect and annotate domain training data
- Evaluate on domain-specific metrics, not generic benchmarks
Step 4: Integrate Tools and APIs
- Identify external systems (databases, APIs, services)
- Wrap tools with clear interfaces (name, description, parameters)
- Test tool invocation and error handling
Step 5: Implement Validation and Feedback
- Validate model outputs against schema
- Log failures for analysis and retraining
- Collect user feedback on system responses
- Retrain models on failures and feedback
Step 6: Evaluate and Deploy
- Offline evaluation on test set
- Online A/B testing with real users
- Monitor performance metrics in production
- Plan rollback if metrics degrade
Exercises
Solutions
Full solutions for all exercises are available at \url{https://deeplearning.hofkensvermeulen.be}.
\itshape Core Entities:
- Restaurant: ID, name, cuisine, location, hours, capacity
- Customer: ID, name, email, phone, preferences
- Reservation: ID, customer\_id, restaurant\_id, datetime, party\_size, status, notes
{
"reservation": {
"type": "object",
"properties": {
"id": {"type": "string", "pattern": "^RES-[0-9]{6}$"},
"restaurant_id": {"type": "string"},
"customer_id": {"type": "string"},
"datetime": {"type": "string", "format": "date-time"},
"party_size": {"type": "integer", "minimum": 1, "maximum": 20},
"status": {
"enum": ["pending", "confirmed", "checked-in", "cancelled", "no-show"]
},
"special_requests": {"type": "string", "maxLength": 200}
},
"required": ["restaurant_id", "customer_id", "datetime", "party_size"]
}
}
\itshape Critical Operations:
- Search available tables: search\_availability(restaurant, datetime, party\_size)
- Make reservation: create\_reservation(customer, restaurant, datetime, party\_size)
- Modify reservation: update\_reservation(reservation\_id, new\_datetime/party\_size)
- Cancel: cancel\_reservation(reservation\_id)
\itshape Design notes: Status field captures reservation lifecycle. Special requests allow customization without schema explosion. All timestamps in ISO 8601 for consistency.
{
"name": "get_weather",
"description": "Get current weather and forecast",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name or coordinates"
},
"days_ahead": {
"type": "integer",
"description": "Days to forecast (0-14)"
}
},
"required": ["location"]
}
}
{
"name": "get_hourly_forecast",
"description": "Detailed hourly forecast",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"},
"date": {"type": "string", "format": "date"}
},
"required": ["location", "date"]
}
}
\itshape Chatbot interaction:
User: "What's the weather in Berlin?"
Model: Tool call: get_weather(location="Berlin", days_ahead=1)
System: Returns current weather + 7-day forecast
Model response: "In Berlin, it's currently 5°C and rainy.
Tomorrow will be cloudy with a high of 8°C."
User: "Hour by hour forecast for tomorrow?"
Model: Tool call: get_hourly_forecast(location="Berlin", date="2024-02-01")
System: Returns hourly data
Model response: "Tomorrow hourly: 6am 4°C, 9am 6°C, 12pm 8°C, ..."
\itshape Error handling: - Invalid location: ``I couldn't find that location. Did you mean Berlin, Germany?'' - API timeout: ``Weather service is slow. Showing cached forecast...'' - Out of range date: ``I can forecast up to 14 days ahead. Showing 14-day forecast.''
\itshape Tools:
- add\_expense(amount, category, date, description)
- get\_expenses(category=None, date\_range=None)
- categorize\_expense(description) â category
- summarize\_spending(period)
- set\_budget(category, amount, period)
- get\_budget\_status()
\itshape Agent workflow:
- User: ``I spent \$25 on lunch today''
- Agent: Tool call: categorize\_expense(``lunch'') â ``Food \& Dining''
- Agent: Tool call: add\_expense(amount=25, category=``Food \& Dining'', date=today, description=``Lunch'')
- Agent response: ``Logged \$25 spending in Food \& Dining category for today.''
\itshape Clarification questions:
- User: ``I spent \$100 today but forgot what on''
- Agent: ``I can help categorize it. Was it for food, transport, entertainment, or something else?''
- User: ``Entertainment''
- Agent: Tool call: add\_expense(...category=``Entertainment'')
- Agent: ``Got it. Added \$100 to Entertainment for today.''
\itshape Summarization:
- User: ``How much have I spent on food this month?''
- Agent: Tool call: get\_expenses(category=``Food \& Dining'', date\_range=``current month'')
- Agent: Tool call: summarize\_spending(period=``current month'')
- Agent response: ``You've spent \$320 on Food \& Dining this month (15\% of your monthly budget of \$2000).''
Key agent features: explicit categorization, budget awareness, historical tracking, proactive questions for clarity.
Conclusion and Future Directions
This chapter presented a general design pattern for applying deep learning to domain-specific problems. The pattern---world-formalization-language-tools---is not new to AI; it mirrors how humans solve problems by creating abstractions and tools. What is new is that deep learning models can now learn to operate effectively within these formal systems, bridging the gap between unstructured human communication and structured computational systems.
The landscape of deep learning applications will continue to expand as models grow more capable and tools become more integrated. Future directions include:
- Multimodal agents: Agents reasoning over text, images, and code simultaneously
- Self-improving systems: Agents that learn from interactions and improve autonomously
- Federated DSL standards: Industry standards for common domains (finance, healthcare, e-commerce)
- Trustworthy agents: Formal verification and safety guarantees for high-stakes domains
- Energy efficiency: Reducing computational requirements for model training and inference
We hope this book has provided both the theoretical foundations and practical insights needed to build the next generation of deep learning systems. The principles and techniques covered---transformers, attention, scaling, training, and deployment---are tools. The true skill lies in recognizing your domain, formalizing it into a language, and building systems that leverage models and tools to solve real problems.
Synthesis: Patterns Across Domains
Having explored domain-specific AI systems across healthcare, finance, legal, recommendations, visual content, and observability, we can now synthesize the key patterns. The universal themes---drift inevitability, the accuracy-cost-latency trade-off, human-in-the-loop necessity, and explainability requirements---manifest differently in each domain. Table~[ref] summarizes these variations.
| Domain | Drift Pace | Retrain Cadence | Validation Rigor | Key Constraint |
|---|---|---|---|---|
| Healthcare | Quarterly | Quarterly--annual | Extreme (FDA) | Patient safety |
| Finance | Daily | Daily--weekly | High (regulatory) | Latency + adversarial |
| Legal | Episodic | Quarterly--semi-annual | Very high (liability) | Professional responsibility |
| Recommendations | Weekly | Daily--weekly | Moderate (A/B tests) | Scale + freshness |
| Visual Content | Monthly | Monthly--weekly | Moderate | Trend velocity |
| Observability | Continuous | Online + monthly | High (reliability) | 24/7 uptime |
Three universal principles emerge across all domains:
- Drift is inevitable, not exceptional. Every production AI system degrades over time. Successful deployments plan for drift from deployment day, budgeting for detection, retraining pipelines, and continuous maintenance. The retraining frequency must match the domain's drift pace while respecting its validation requirements.
- Human oversight remains essential. The form varies---physician review of diagnoses, lawyer review of contract analysis, trader oversight of algorithmic decisions, product manager oversight of recommendation changes---but no high-stakes domain deploys AI without human judgment in the loop.
- Explainability is a business requirement, not a technical luxury. Stakeholders across all domains demand explanations for AI decisions. Attention mechanisms, retrieval-augmented generation, ensemble confidence estimates, and rule-based components all serve this need. Black-box models fail to achieve adoption regardless of accuracy.
Future Directions
Looking forward, four trends will shape domain-specific AI: (1)~multi-domain agents that operate across healthcare, finance, and legal simultaneously, requiring cross-domain drift management; (2)~federated learning enabling cross-organizational training while maintaining privacy; (3)~automated governance that monitors performance, detects drift, and maintains compliance at scale; and (4)~energy-efficient architectures as sustainability concerns grow alongside model scale.