The assessment world has a consistency problem. Not with our values, but with how we arrive at them.
For decades, we've operated under a simple assumption: pick the best model, tune it carefully, and trust the output. It's worked reasonably well. But as Luc Hermans points out in his recent research, we may be leaving accuracy on the table by forcing ourselves to choose just one approach when we could be leveraging many.
Think about your current valuation process. Whether you're using multiple regression, comparable sales, or one of the newer AI-powered approaches, you're likely committed to a single methodology for each property class. When vendors pitch their latest "AI-powered" AVM, promising revolutionary accuracy, what's your first instinct? Probably to compare it against your existing model and pick a winner.
But here's the thing: both models might be right. Or more accurately, both might contain pieces of the truth.
As Hermans notes, "It's like getting 10 assessors in a room and then saying, well, we like you best. So we're just going to go with your estimates now." When you put it that way, the limitation becomes obvious. We'd never rely on just one assessor's opinion for critical valuations, so why do we do it with models?
The concept isn't new. Weather forecasters have been combining models for decades. They don't pick the "best" hurricane prediction model, they blend multiple forecasts to get a more accurate path. The principle, rooted in early 20th-century "wisdom of crowds" theory, suggests that aggregating multiple independent estimates often yields better results than any single estimate.
For assessors, this means running multiple valuation approaches on the same data:
The key insight from Hermans' research is that these don't have to compete. They can collaborate.
The paper outlines three practical approaches to combining models:
Central Tendency: Simply take the mean or median of multiple model outputs. It's straightforward and surprisingly effective at smoothing out individual model quirks.
Weighted Composition: Score each model based on accuracy metrics (MAPE, RMSE) or practical considerations like explainability. That black-box AI model might get 25% weight while your tried-and-true regression gets 50%.
Bandwidth Selection: Set acceptable ranges based on your most trusted model. If other models fall within, say, 0.9 to 1.1 of that value, include them. If not, exclude them. It's like having senior assessors review junior work, wildly off estimates don't make the cut.
Here's where it gets tricky for us. As Hermans candidly observes about Dutch courts: "The Dutch court system is a bit like, show us the numbers. Not really that advanced in the AI."
Sound familiar? Most assessment appeals boards want to see the math. If you're combining three models, you'd better be able to explain all three. If you're combining 75 (Hermans' theoretical ideal), good luck.
This is why the weighted approach might be most practical. You can explain that you're using multiple professional opinions, just like consulting multiple appraisers, but weighting them based on transparency and historical accuracy.
Before anyone panics about being replaced by an ensemble of algorithms, remember this: even the best models need local expertise. Hermans shares a perfect example from Moldova, where models suggested higher floors meant higher values, standard penthouse logic. Local assessors corrected this: "You don't want to live on the top floor... energy loss."
Or consider the Alaska example where the difference between a pond and a lake isn't size, it's whether you can land a float plane. No model picks that up without human input.
"Data is not always telling us everything," Hermans emphasizes. "There's a lot of local knowledge that we need for those models to work."
The beauty of composite modeling is its flexibility. You don't need to revolutionize your entire operation. Start small:
This isn't about replacing assessor judgment, it's about focusing it where it matters most.
For Assessors: Composite modeling isn't about picking winners and losers among AVMs. It's about leveraging multiple perspectives to identify properties that need closer review and building confidence in values where models agree.
For Leadership: Before investing in the next "revolutionary" AVM, consider how it might complement rather than replace existing tools. The infrastructure investment for running multiple models is minimal compared to the potential accuracy gains.
For the Profession: As courts and taxpayers demand both accuracy and explainability, composite approaches offer a path forward, maintaining the transparency of traditional methods while capturing the power of newer techniques.
The future isn't about finding the perfect model. It's about orchestrating multiple good models to play to their strengths while minimizing their weaknesses. In assessment, as in that room full of assessors, sometimes the best answer comes from thoughtful collaboration rather than individual brilliance.