Training a specialized hotel booking model
How general-purpose LLMs drop constraints across multi-turn booking conversations, and what a specialized small model trained on hotel workflows changes when benchmarked against GPT-4o.
The Problem
Hotel booking conversations are deceptively complex. A guest might start with a simple request for a room, then add constraints over multiple turns: specific dates, room preferences, accessibility requirements, loyalty program benefits, and budget limits.
General-purpose LLMs struggle to maintain these constraints across long conversations. Our analysis of 10,000 simulated booking sessions found that GPT-4o dropped at least one critical constraint in 23% of conversations exceeding 8 turns.
Our Approach
We trained a specialized 7B parameter model on a curated dataset of 500,000 hotel booking conversations, annotated with constraint tracking and resolution outcomes. The model was fine-tuned to explicitly maintain a constraint state across conversation turns.
Key innovations include a structured output format that surfaces active constraints at each turn, and a training objective that penalizes constraint violations more heavily than general response quality degradation.
Results
On our held-out evaluation set, the specialized model reduced constraint dropping to 3.2% across conversations of any length, compared to 23% for GPT-4o and 31% for Claude 3.5 Sonnet.
Importantly, the model maintained this reliability while running at 1/10th the inference cost, making it economically viable for high-volume booking applications where margin per transaction is measured in single-digit dollars.
Interested in similar work for your domain? We partner with enterprises to build specialized AI systems. Request access to our whitelist to discuss your use case.
Request Whitelist Access →