EPISODE 76

AI Tools for Assessors: LLMs, Prompt Best Practices, and Free Resources with NYSAA

NYSAA
/
May 14

About this Episode

About this Episode

This episode captures a presentation Will Jarvis, CEO of ValueBase, delivered to the New York State Assessors Association. Rather than a typical podcast interview, it's a live session where Will walks a room full of assessors through how large language models actually work, where they fall short, and how assessment professionals can start using them today. The central argument is straightforward: these tools are powerful but wildly underutilized in the assessment world, and the gap isn't intelligence — it's context.

What makes the session valuable isn't the technology pitch. It's the practical framing. Will treats LLMs not as magic but as very smart interns who know nothing about your office. That mental model — and the specific techniques he shares for bridging that knowledge gap — is where the real insight lives.

It's Statistics, Not Thinking

Will grounds the conversation in something most assessors will appreciate: the math underneath. Large language models aren't reasoning engines. They're statistical pattern matchers trained on enormous volumes of human text. The foundational insight, drawn from a 2016 Google paper called "Attention Is All You Need," is that human language is predictable enough to model statistically. If someone says "Hey, Mike," there's an 80% chance the response is "Hey, Will."

Scale that up across all of humanity's written output, and you get something that mimics reasoning convincingly — but isn't actually reasoning. Will calls it a "simulacrum," and the distinction matters. It explains why these models can draft a polished taxpayer letter in seconds and also fabricate case law that doesn't exist. For assessors, who operate in a world where precision and defensibility are everything, understanding this distinction isn't academic. It's operational.

The Intern Model: Context Is Everything

The most useful framing Will offers is this: using an LLM is like managing the smartest intern you've ever met on their first day. They're brilliant, eager, and completely ignorant of your jurisdiction, your processes, and your constraints.

This means the difference between a useless output and a genuinely helpful one almost always comes down to the prompt. "Write a letter to a taxpayer about their assessment" will get you something generic and probably wrong in tone. But if you specify the jurisdiction, the situation, the audience, the constraints, and paste in an example of a good letter you've already sent — now you're working with something.

Will recommends a dictation tool called Wispr that lets you hold Shift and just talk, dumping as much context as possible into the prompt. The logic is sound: these models can handle enormous inputs, and every edge case or detail you include makes the output sharper. Write your best prompts once, save them, and reuse them. Build a library. This is the kind of boring, practical advice that actually moves the needle.

Hallucinations and the Human in the Loop

Will doesn't shy away from the weaknesses. Hallucinations — where the model confidently returns information that simply isn't true — remain a real risk. The famous example of the lawyer whose ChatGPT brief cited nonexistent case law gets a mention, and it should. In assessment work, a hallucinated statutory reference or fabricated comparable sale isn't just embarrassing. It's a liability.

The takeaway is blunt: always have a human reviewing model outputs. Will himself reviews roughly 150 email drafts per day that his AI agent generates. He confirms, confirms, confirms, tweaks, confirms again. When he sees a recurring error, he feeds the correction back into the agent's training file so it doesn't happen again. This isn't automation replacing human judgment. It's automation extending human capacity while demanding human oversight.

Privacy gets a mention too. When dealing with personally identifiable information, Will runs models on local machines with no internet access. When you send a prompt to ChatGPT, that data travels to a server somewhere. Paid tiers claim they don't store it, but "released into the ether" is how Will characterizes the risk. For offices handling taxpayer data, that should give pause.

Real Use Cases From the Field

The audience questions reveal where the rubber meets the road. One attendee asks about using AI to translate new legislation into plain-language guidance for taxpayers and attorneys. Another assessor in the room — who had already been using ValPal — confirms he'd done exactly that with recent exemption law changes in New York. He had the model summarize new legislation, then draft an email to the town attorney flagging key issues for discussion. He still read everything himself. But the AI did the heavy lifting of initial synthesis, and when official summaries came out later, they matched what the model had told him.

Another thread worth noting: the growing importance of ratio studies. Will mentions RatioPal, a free tool that generates a ratio study report in about three minutes from uploaded sales and values data, including the new IAAO vertical equity indicator. For offices where defending the assessment ratio is becoming as critical as defending individual values, that's a significant time savings on a technically demanding task.

Flattery Works, Threats Don't

One of the stranger revelations: telling the model it's smart and that you appreciate its help actually improves output quality. Models also perform slightly worse in winter, because the training data includes text written during periods of seasonal affective disorder. Will delivers this with appropriate bemusement. You can also threaten the models — "My boss will fire me if this isn't right" — but Will, self-deprecatingly chalking it up to Southern superstition, advises against it. The practical point stands: tone and framing in your prompts aren't just cosmetic. They materially affect results.

Key Takeaway

The models are already smart enough to be useful in assessment offices today. The bottleneck isn't the technology — it's assessors learning to give these tools enough context to work with. Start by treating every prompt like onboarding instructions for a brilliant new hire who knows nothing about your jurisdiction, and always verify what comes back.

Partner with Valuebase to transform your property valuations into a strategic asset