Building a Better Home Value Mousetrap

Post featured image

This article, written by Clifford A. Lipscomb of Greenfield Advisors, originally appeared in the December 2016 Housing News Report newsletter published by ATTOM Data Solutions. For a free subscription to the award-winning Housing News Report, contact

Introduction to Automated Valuation Models

Property valuation is an integral part of the housing industry that is long overdue for disruption. Although much could be said on the topic of automated valuation models (AVMs), in this piece I explore the current state of the commercially available AVM market, discuss how current AVMs fall short of meeting customers’ expectations, and lastly propose where the AVM industry might go next to meet future customer demands.

What is the Current State of the AVM Market?

The current state of the AVM market is quite competitive. In the lending world, automated valuation model estimates obtained via one of the approximately 20 commercially available AVMs range from $1.50 per property (for a high volume of properties) to more than $12 per property (for one-at-a-time valuations). In the lead generation world, AVM estimates are run for pennies, literally, depending on the client and the intended use. With such a wide range of per property AVM pricing strategies, the opportunity exists for new competitors to enter the market in a disruptive way by differentiating themselves not only on pricing, but also on the data returned to customers for each property valued using an AVM. Below I discuss ways in which this disruption of the industry can be achieved.


Why do Current AVMs Fall Short?

Current automated valuation models fall short in multiple ways.  First, some customers request AVM estimates only, whereas other customers request AVM reports.  The return of AVM estimates often includes simply the AVM value for the subject property, and potentially an error statistic (the most common being the Forecast Standard Deviation or FSD), and sometimes a range of possible values for the subject. Obviously, reports contain much more detail on the subject property as well as data on the “best” comparable properties, the neighborhood where the subject is located, and (sometimes) the region where the subject is located. These outputs returned to customers have been fairly standard for the last 20 years – as far as I can tell, no real innovation in the outputs delivered to customers has occurred.

Second, current AVMs fall short in the data used to generate the AVM estimates.  From others in the industry, I have learned that some attempt at incorporating different data sources has occurred.  In an attempt to mirror contemporary sales price trends, some AVMs use listing data from Multiple Listing Services (MLSs) in generating their estimates while others continue to use only historical comparable sales transactions.  Some AVMs use tax assessed value (TAV), which are often updated yearly, in their algorithms. Either way, it seems that the time is right for other “big data” and crowdsourced data to be used in AVMs.  In the academic literature, it is becoming more common to see Twitter data being used to predict stock prices (e.g., see Bollen, Mao, and Zeng, 2010, “Twitter mood predicts the stock market”, Journal of Computational Science) and Google data to predict movements in house price indices (e.g., see Kulkarni et al. 2009, “Forecasting Housing Prices with Google Econometrics”, George Mason University School of Public Policy Research Paper No. 2009-10).

What Else Could AVMs Deliver to Customers?

Reason codes or “variances” are common in some industries.  For example, one might find on her credit report a reason for a lower credit score (e.g. too recent opening of an account).  In AVMs, reason codes can provide reasons for a particular determination or indications for situations where some variable is “out of tolerance” or outside of a predetermined range of acceptable answers.  This would give customers additional insight into the confidence that the AVM provider has in their estimates.  One easily computable reason code that would provide additional insight is the number of comparable sales transactions used to produce the valuation estimate for a given subject property.  Another is a statistically-derived (bootstrapped) confidence interval around the valuation estimate for each subject property.

AVMs could also deliver estimates that are based on a reconciliation of multiple data sources.  An example of this is the ATTOM Data Solutions “attomized” data.  This is a data warehouse that stores the reconciled property-level data from several sources.  This is important for several reasons.  First, when using a single data source, there may be inherent biases in the raw data and how those data are collected.  By using a reconciled database, the opportunity to reconcile differences at the property level is presented.  Say that 123 Main Street in Cartersville, Georgia is listed in one data source (e.g. tax assessor data) as having three bedrooms and two bathrooms.  Further assume that a second data source (e.g. a current MLS listing) reports that the same property has four bedrooms and two bathrooms.  An important difference here is the contributory value of the fourth bedroom if the MLS listing data is used instead of the tax assessor data; that contributory value could mean the difference in AVM estimate of $15,000 to $20,000 for this property, all else held constant.  Recent research by Andy Krause and me (2016) in the Journal of Real Estate Practice and Education describes the importance of documenting the reconciliation of between-source data variation to ensure the “best” valuation possible and replicability by other professionals in our industry.



So, the question that confronts us is which data source should you use?

In the “ATTOMized” data, the reconciled data for 123 Main Street used to determine the AVM estimate is from the source that has been deemed most up-to-date, accurate, and reliable for that given jurisdiction and that given property based on myriad factors, including timeliness of delivery from the source, percent of fields consistently populated, and previous performance in producing accurate AVM values. The best source will often be different from one jurisdiction to another, even within the same state, county, city, or ZIP code. The best source may even differ for different fields on the same property (e.g. the best source for number of bedrooms may end up being different than the best source for number of bathrooms). This is important because this process finally fulfills the true promise of multi-sourcing property data to estimate AVM values — not just for the sake of creating redundancies (which does have some value) but also in creating a new “super set” of synthesized data that is 1) not available from any one source on its own and 2) not available from multiple sources utilized in a binary fashion (i.e. either one source or the other for all properties in a state or county).


In my opinion, the time has come for AVM vendors to start adding more value to the outputs that they provide customers.  Standard outputs, such as the AVM point estimate and a measure of confidence in the estimate (often conveyed using the FSD), are just that — standard.  Value-added outputs of interest to customers may include reason codes, statistically-derived confidence intervals around the AVM point estimate, the number of comparable sales transactions used to value a given subject property, and explanations of the underlying data source used to generate the AVM estimates.


Please contact us if you have questions about the underlying data referenced in this article, or would like to have access to that data in the form of custom reports, API or bulk files.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

Email our Media Contact
Email our Data Sales Team
Data Question?
Data Questions?

Contact our experts with questions about any of the data and analytics referenced in our articles.