Blog

PropMix announces nationwide coverage of Public Records data – residential and commercial

PropMix has completed the acquisition and integration of nationwide public record data on residential and commercial properties. The data covers additional property attributes such as owner occupancy, last sale information, and more detailed tax assessment information along with full property details. It also provides property identification, seller/buyer information, tax exemption details, building information, and legal description of the property.

This increased coverage enables PropMix to provide Public Record data alongside listing data, thereby creating a comprehensive information source for appraisers and valuation experts. The enhanced dataset also supports a larger coverage of the Uniform Appraisal Dataset (UAD) specs used in the mortgage underwriting process. Valuation models and comparable similarity scoring are now based on authoritative property details and current market conditions from listing data.

Assessor data

 

Another highlight of this new development is the added features in PropMix products Market Conditions Advisor and iCMALive, our CMA platform. Both the products now offer optional features enhanced by public record data such as comparable scoring using public record sales and listings. Data population tasks are further automated and data accuracy can be validated within the application, giving the user more flexibility and speed.

Stay tuned as we continue to innovate and deliver more features to the market in the coming weeks.

PropMix Launches Version 2 of REMarketLite Data

PropMix recently rolled out Version 2 of REMarketLite. With a new and improved V2 version of REMarketLite, you have access to better, more complete and up-to-date Real Estate market data. We have improved the Public Record data, field population, quality and fill rate, as well as new content in the API responses.

REMarketLite V2 is all you need for your research, review and valuation workflow. Use your local board membership to access extended listing details and images. REMarketLite V2 offers the most robust data available in the industry today and we are making improvements daily.

REMarketLite V2

Effective Lead Generation for Agents and Brokers

As a Real Estate agent or broker, generating sufficient leads is critical to sustain and grow your business. It is imperative that agents and brokers identify innovative ways to attract and engage their customers. Providing a home’s value is a good way to raise the curiosity of any visitor to a real estate website, and keep them engaged with what you have to offer.

PropMix can provide a white-labelled, cost-effective home value solution for your website. Each listing will have a button next to it for the fair market value for the property. Upon clicking the button, the user is prompted to provide their email id where they wish to receive the home value report. The address of the property and other details are pre-filled to make it a very convenient way to fetch the home value report. Once the report is generated instantaneously, an email is triggered to the customer.

Effective lead generation for agents and brokers

Agents will be able to log in to iCMALive to view the details of the property and the customer, as well as the home value report that was generated. They also have the option to edit the report to fine tune it, and make the updated report available for the customer to view. The dashboard on iCMALive provides a quick view of all the leads generated, along with their current status. As you are aware, a quick follow-up with the customer will ensure a better closing rate for the leads generated.

As a Broker, you have several options to route the leads generated. You could assign a specific geographic area, such as a postal code, to an agent. Another option is to assign the leads in a round-robin fashion among the agents available. You can also re-assign leads based on availability of agents, or if the leads have passed a certain duration without follow up.

 

How do you leverage local market conditions to gain competitive advantage?

Every Agent in your neighbourhood would have anecdotal evidence about recent listings and sales. However, it is hard to accurately gauge the trends without actual data. There is an overload of such data these days, but how do you gain insights into the market trends from this data set to focus on the next growth driver for your business?

Customers, both sellers and buyers, have started demanding more data based analyses of the neighbourhood. Your ability to provide data points is diminishing by the day as they get more and more of that information from online sources. As a Real Estate broker, have you attempted to leverage local market conditions to gain a competitive advantage?

Leverage local market condition

 

With PropMix, you have complete visibility into changes affecting your market, and how these are impacting as you move through the real estate cycles. Our leading nationwide real estate data and the analytical solutions built on top of that will give you the solutions that you need.

Our in-depth analytics provided through Market Conditions Insights (MCI) can help you not only provide valuable insights on the price-trends in your local market, but also predict the trend for the next two quarters based on lead indicators.

We will help you define the parameters that you have defined as critical for your market. These parameters are then processed for the last 60 months, and the trends are made available to you in a format that makes it easier for you to consume internally, as well as make it available to your customers or on your reports and website. The customizable widgets can be fine tuned to provide the insights that are most relevant for your customers.

Predictive Analytics to Grow Your Business

As a Broker, you are always on the lookout for better strategies to grow your business. Advancements in recent years have provided a wide variety of data on listings, agents, geographies, market trends etc. But does your current analytics platform automatically provide you actionable recommendations on how to grow your business? Can it alert you to a new business tactic or strategy?

Leveraging analytics to grow your business_02

PropMix’s BrokerView uses big data and deep learning models to analyze your historical and current transactions and marry that with market and competitor predictions in your area. We can discover your business practices and generate highly targeted recommendations on neighborhoods, markets, market segments, and agents.

Our analytics based agent and office performance management solution helps you identify the best agents, which geographical areas to target, and where competition is gaining market share.

The Office level dashboard helps to monitor total leads captured and tracked, while the Agent level dashboard measures each agent’s lead activity. The analytics section provides the following insights through intuitive graphs:

– Year-on-Year trends on # of Listings & Sales

– Year-on-Year Commissions, Sale Price, Sold Price per Sq. Ft.

– Trend of homes sold compared to Inventory

– Trend of Sold Price to Median Price in the market

–  Average Time for Contract to Close

–  List to Sold Price Ratios

–  Buy-Side Price Performance

–  Median Sold Price

 

How do I use Sally to Label an Image?

How do I use Sally, the cognitive image advisor, to label an image in my application?

Using Sally to label an image is quite simple. All you need is to make a POST request to the Sally API as given in the example below. Sally has been trained on Real Estate specific images and continues to improve its accuracy as more and more images are used for training the application. Read more about Sally here.

API Request GetLabelsForImage

POST Method:

URL: api.propmix.io/mlslite/v1/GetLabelsForImage

Body: {

“imageurl”: “http://blog.propmix.io/wp-content/uploads/2018/01/kitchen_001.jpg”,

“accesstoken”:<< your access token goes here >>

}

Header: Content-Type: application/json

A sample image is given below.

The API responds with the image label. In this case, the API identifies the room as a Kitchen.

API response

{

   “status”: {

       “code”: 200,

       “type”: “OK”,

       “message”: “Success”

   },

   “imageAnalysis”: {

“imageName”: “kitchen_001.jpg”,

“imageInfo”: [

           {

               “labelName”: “Kitchen”,

               “confidence”: 0.922417

           }

       ]

   }

}

Interested in trying out Sally in your application? Click here to get a trial access token.

How do I use Sally for room-by-room comparison?

A user guide for testing Sally at sally.propmix.io

Sally is a cognitive image advisory service that identifies property information from listing images. Sally automates the analysis of listing photos to:

  • Identify types of rooms – kitchen, family room, dining, game room, basement, etc
  • Identify objects in a room – brick fireplace, type of refrigerator, type of counter tops, number of sinks, etc.
  • Extract detailed property information from the photos

Sally is a neural network engine trained on terabytes of real estate imagery and it continues to learn how to perform deeper analysis. After an initial 3-month training period Sally started classifying images into different room types. After another 6-months of rigorous research, redesign, and retraining, she can now pickup objects in a room to help improve your listing data.

How do I access Sally?

You can access Sally using the URL sally.propmix.io. No login credentials are required to access the trial site.

Are there any restrictions during the trial period?

During the trial period, Sally will allow you to search only for properties in the zip code 33156 in Miami, FL.

Home page

The home page of Sally is shown in the screenshot below

Enter an address to search

The Address field in the top left cornet can be used to enter an address. The possible addresses in the US are prompted as the user types the address. For the trial period, the addresses within zip code 33156 in Miami, FL can be used. You can also start with the address that is already pre-filled in the Address field.

Once the address is entered, the app requires the user to verify the address components of the Subject Property as given in the screenshot below. The filter criteria for the search can also be specified in the filter area in the top row.

Search results

The search results are shown in map view and table view as given in the screenshot below

Grid View of Search Results

The user has the option to select the Grid view of the property details section on the right side. The view is as shown in the screenshot below.

Selecting a Property for further analysis

Clicking on a pin or the property card in the right side will highlight the property on the map view along with a summary of the property details.

Select properties to compare photos

You can add up to 5 properties to compare photos with the Subject Property by clicking on the Add to Compare button next to the Address on the table view or on the thumbnail on the Grid view.

Comparing photos of selected properties

 The photos of selected properties can be compared by clicking on the Compare Photos button on the top right corner of the page. The images of the properties categorized by type, such as Exterior, Bedroom, Bathroom, Kitchen etc are displayed as in the screenshot below.

Removing a property from the compare photos view

A property can be removed by clicking on the Delete icon next to the Address of the property

 

Sally API Technical Details

Sally app uses two cognitive APIs – GetListingsByRadiusWithThumbnail and GetLabeledImagesByListingId

  1. GetListingsByRadiusWithThumbnail

GetListingsByRadiusWithThumbnail API returns the comparable listings within the given radius along with thumbnail images. ImageType can be passed as Input Parameter, example &ImageType=Frontage. If the input ImageType is available in the data then the API will respond with corresponding thumbnail image of that property, else the Thumbnail will be displayed on the basis of predefined priority.

API EndPoint: https://api.propmix.io/mlslite/val/v1/GetListingsByRadiusWithThumbnail

 

  1. GetLabeledImagesByListingId

GetLabeledImagesByListingId API returns the ‘ImageUrls’ along with the ListingId, categorized on the basis of image type. For each image, corresponding image labels are also identified and displayed in the response. Up to ten ListingIDs can be provided in the same API call, separated by comma.

API EndPoint: https://api.propmix.io/mlslite/v1/GetLabeledImagesByListingId

 

Sally API Documentation

Sally API documentation is available in the PropMix API documentation page under Insights APIs for GetListingsByRadiusWithThumbnail and GetLabeledImagesByListingId.

Improve the Quality of Your Real Estate Data

Part 2 – How to improve Real Estate Data Quality?

 

In Part 1 of this series we broadly covered why data quality is important in real estate, why real estate data quality has become a hard problem to solve, and presented a few examples of how to measure the quality of your real estate data. In this second and final part we will present a few ideas on how you could begin the practice of improving real estate data quality.

Data Quality Best Practices

As you would expect data quality is a common problem in many other industries irrespective of how old or new the industry is. As a result many best practices already exist for managing and improving data quality that can be easily adopted within real estate. Here are a few important areas to focus on.

Data Quality Assessment

Before we can start improving quality we need a solid understanding of the current state of the data. As we presented in the last section of Part 1, knowing how to measure for the quality of your data is a first step. These data quality metrics are very specific to the industry we are in and we have provided a few good starting points.

 

In addition, to knowing your current state a good data quality assessment practice is required to assess yourself periodically to measure improvements and also measure any data quality leaks due to data trickling into your platform. It is also a great way to present to senior management on the strides you are making in your organization.

 

Design of the quality metrics needs to be traceable directly to your company’s business objectives which would be different depending on where in the real estate market you play – lead generation, mortgage origination, appraisals, brokerage, etc. Such a traceability is important to get buy-in from the management to invest in data quality.

Data Governance
To have a strong commitment from the organization towards data quality and to continuously support the people, processes, and technologies to maintain the data quality a data governance board must be established with participants from the business and IT. Business participants would be those who are close to the consumption and production of data and the IT participants would be the data architects and modelers. The objectives of the governance board would be to

 

  • Establish data policies and standards
  • Defining and measuring data quality metrics
  • Discover data related issues and provide resolution paths
  • Establish proactive measures to reduce data quality leakage

Data Stewards

One of the most important roles within a data governance board and the overall data management practice is the Data Steward. Data stewards are the ultimate owners of specific sections of the data – usually called subject areas, and they would represent business users and producers of data. The buck stops with the data steward for all data quality issues and the steward takes a leadership role to resolve data accuracy, consistency, or integrity issues.

 

Data stewards are often the liaisons between the business and the IT department that manages the data for the business. In this role, they are required to work with the business and IT to define relevant quality metrics, have it interpreted and implemented appropriately with the IT department and ultimately showcase their quality improvements that improve business outcomes.

Create a Data Quality “Firewall”

Most data resulting within an organization are traceable broadly to 2 types of sources – applications where users are entering data or data feeds that are processed to load data into data stores.The idea of a data quality firewall is to catch and reject any data that violates data quality rules at the time of its entry into a data store. All data ingestion points will have to hit this one virtual firewall to be validated before being processed and stored.

 

The keyword above is “virtual” – because it is impractical to create a single system to act as a data quality firewall given the various subject areas of data and the departmental data ingestion points across the organization. The idea is not to create a choke point but a proactive mechanism to catch data quality issues for follow up and resolution before it goes downstream into transactional or analytical systems.

conclusion

Data Standardization vs. Data Quality – What’s the difference

Does compliance to a data standard mean high data quality? In other words, if your data is Platinum level certified by RESO 1.5 data dictionary would you also considered it to be of high quality? It turns out the answer is not that straightforward.

 

There are typically 2 different views on data quality – conformance to a standard specification or usability of data for a specific purpose. If we take the first definition the data quality would be very high if a data set is certified by RESO. On the other hand as we discussed in Part 1, an agent could inadvertently enter erroneous listing data or purposefully tweak the listing for improved marketability. This can result in data inconsistency between a public record and a listing record for the same property leaving the user of the data to assign trustworthiness to the data sources before consumption. Since business objectives are driven by data use as opposed to conformance to a standard we prefer the second definition of data quality which is measured by its usability.

 

Consider another example of standard vs. quality: Assignment of a PropertySubType value of Condominium or Townhouse or Single Family Residence is standards compliant but an erroneous assignment of this field can cause the property to be missed from appearing in IDX searches. In addition, it can also cause valuation issues if not combined and cleansed against other data sources.

 

Having said that, certain standards specifications include elements of data use as well, in which case conformance to standards and usability begin to mean the same. But given the various uses of a particular data set it is in unfair to expect a standards organization to completely define all the usability of specs for the data resulting in an unwieldy standard that may reduce its adoption.

 

Here are some typical data quality concerns to consider:

Completeness Are we missing any values of critical fields?
Validity Is the data in a field valid? Does the whole record match my rules?
Uniqueness How much of our data is duplicated?
Consistency Is information consistent within a single record, across multiple records, and across multiple data sets?
Accuracy Does the data represent reality?
Temporal Consistency & Accuracy Does a snapshot in time represent reality at that time and are all data sets consistent with that snapshot?

 

As you can see, a data standard such as RESO would not be able to answer the above for all the real estate ecosystem players. We could define detailed rules for each of the concerns above and such rules will look different in a mortgage company and a sales lead generation company.

Practical data quality for real estate

Now let us bring all this down to a few specific takeaways to improve the quality of data in your company. We will define these in a few steps to begin with. But certainly stay tuned into our blog for future posts on this topic where we will continue to provide specific rules and heuristics you could implement.

 

Many of the activities below must be driven by an appointed data steward for each major data set you are dealing with – assessment, listings, deeds, mortgages, permits, etc.

Identify critical fields

The first step in your data quality journey is to identify the most critical fields for your particular application. Out of the 639 fields contained in the RESO 1.6 data dictionary, you would want to identify the fields that are required for your computations. There are some fields commonly required for any application and were listed in Part 1 of the article and repeated here for quick reference:

 

Parcel Number ListingContractDate AssociationName
Address StandardStatus AssociationFee
PropertyType OriginalListPrice Subdivision
PropertySubType ListPrice School Districts
Lot Size CloseDate
Zoning ClosePrice TotalActualRent
NumberOfBuildings DaysOnMarket
BedroomsTotal ListAgent Information
BathroomsTotal ListBroker Information
LivingArea SellingAgent Information
Tax Year SellingBroker Information
Tax Value Public Remarks
Tax Amount
Land Value
Improvement Value
StoriesTotal
ArchitectureStyle

Define Data Quality Rules

The next step is to define a set of rules that will consider 2 dimensions to begin with:

 

Data Quality Concerns: Completeness, Validity, Uniqueness, Consistency, Accuracy, and Temporal Consistency & Accuracy.

Extent of measurement: Single record, multiple history records of the same property, multiple history records of the same listing, multiple data sets (public records and listings)

 

You would end up with rules for each field, for each type of record, for a data set, and rules that cut across multiple data sets. These rules would validate the field, a record, a set of records, or the whole data set. Execution of these rules would result in either errors or warnings about the quality of your data.

Discovery with Data Profiling

Data Profiling helps you run a statistical analysis on the data to discover hitherto unknown problems

For example, we usually expect PropertySubType values to be always one of the known ones. But as new data gets processed, we might discover that certain PropertySubType mappings are absent in our standardization routines and as a result non-standard PropertySubTypes may be getting added to our DB.

 

To catch such issues, a data profiling capability will provide detailed stats on field populations, null counts, blank counts, and also field value distributions. For the PropertySubType values, the field value distribution will reveal to us that there is a new PropertySubType value with over 100,000 entries. This will mean that we should remap these values as required.

 

Running a data profiler periodically will help identify issues that creep up into the data. Note that a data quality firewall would only prevent “unclean” data when we have modeled such cleansing rules or quality rules within that firewall. But for previously unknown issues that get loaded via daily incremental data ingestions, we need to discover the issues and model prevention rules into the firewalls.

 

Establish Data Quality Metrics

Having defined the rules it is time to measure your quality against the rules you have established. Common quality metrics are:

  • Number of records that failed a particular quality rule
  • Field population thresholds and where we fall short
  • Field value distributions
  • Number of records with invalid data for each field
  • Number of records that failed a record level quality rule
  • Number of multi-record quality rule failures
  • Number of data-set level quality rule failures

 

For each of the above it is important to understand the trends and so you need to run the Data Profiler in regular intervals – weekly or monthly, to know how your data quality is trending – improving, getting worse, or discover issues that did not exist before.

Enforce the rules at the data ingestion points

This is the first and proactive step in improving and maintaining high quality of data.

 

Having defined the rules for measuring data quality, it is now important to maintain a higher quality data by ensuring we enforce these rules at the time data is created in the organization. Get the data steward to become the evangelist for the rules he/she has defined to work with each data origination point to implement the validation rules.

Define Heuristics for Quality Improvement

The reactive posture to data quality improvement is considered more of a data cleansing process and is a required element of a data quality practice. Most of the times, you are not in control of the data origination points and if the rule enforcement at the data origination point is too restrictive you might not have enough data for your applications. And hence the need for a reactive measure to cleanup data you have received.

 

There are broadly 2 alternatives – either perform the cleanup and then put it through a highly restrictive data quality firewall or have a lenient firewall with a downstream cleaning process. The choice depends very much on your application and its ability to deal with imperfect data.

 

Any data quality improvement mechanism is dependent on a set of heuristics that the data steward and the data architects work together to define. For example, you could reclassify a rental listing correctly by looking at the listing price and comparing it to local median sale price and to the median rental price. A strong partnership between a data steward and the data architect is necessary to define and develop these cleansing heuristics.

 

It is also recommended that you maintain a list of all active and retired heuristics used for cleansing. Another need alongside data cleansing is the ability to track the data lineage where you would keep track of the source of the cleansed data and the heuristics that caused the data to be modified.

Conclusion

Data quality is a cyclical process that begins with establishing rules, implementing them to measure quality, profiling the data, cleaning up the data as required, and finally go back to tweaking the rules to execute the cycle once more. The target metrics would start small but continue to tighten it with time.

firewall

 

We hope this article provided an overview and some key takeaways to implement a good data quality practice within your real estate technology platform. We will continue this conversation with more blog posts to provide you:

  • Practical data quality rules and metrics
  • Data cleansing heuristics to implement
  • Machine learning techniques in real estate data cleansing

 

We are planning to release our Data QA Tool specialized for real estate data free to the community. Please sign up here to be notified when the tool is released.

Want access to Data QA Tool?

Please provide your email to be alerted when Data QA Tool is published.

Improve the Quality of Your Real Estate Data

Part 1 – The Real Estate Data Quality Problem Part 2

Introduction

Real Estate data comprises of many categories – characteristics of a property, history of the property and how did it change during its lifetime – renovations, add-ons, permits, etc., current for sale properties, history of sale records, history of tax assessments, current mortgage information, any outstanding liens, utility consumption, neighborhoods, schools, and the list goes on. You can see that there is data about a real property and a lot of additional data about how the property is influenced. And as you read through that partial list of data categories you would have also observed that each of those categories are created and maintained by a different company or a government agency. Given this disparate sources of data and how the real estate industry has evolved, assembling all this in one place to know everything about a single property has become a challenge. Before we begin explaining why this is a challenge, let us briefly explain who uses this data and why this is so important.

Relevance of Data Quality in Real Estate

Housing alone contributes about 15-18% to the GDP of the US economy [1]. If you consider commercial real estate the numbers climb to well over 20% [2]. The real estate ecosystem is comprised of numerous industries and each of them are dependent on data. Here are a few of them in the table below.

 

Producers & Consumers of Real Estate Data

Local Municipalities
County Governments
Federal Agencies
Mortgage Lenders (Banks, Credit Unions)
Mortgage Brokers
Mortgage Servicers
Investment Banks
Appraisers
Home Inspectors
Title Companies
Real Estate Brokers and Agents
Home Buyers and Sellers
Home Improvement Companies
Home Improvement/Repair Contractors
Builders and Developers
Architects
Civil Engineers
Investment Banks
ETF and Fund Managers
Retirement and/or Sovereign Funds
GSAs – Freddie Mac and Fannie Mae

Here is one reason why data quality matters across these players: Consider the loan processing steps in home buying. The homebuyer applies for a mortgage at a lender and the lender’s underwriter hires an appraiser to determine the actual value of the property before lending a percentage of that value (a maximum of 80% in most cases) to the buyer. Once the mortgage is issued it is often transferred to a mortgage servicer and the mortgage itself is sold to another financial institution to enable securitization of the loan. Securitization enables other investors across the world to participate in the US mortgage market and in turn in the US real estate market. Each party in this chain of activities and especially the investor in the security needs to understand the security’s Value at Risk (VAR) which is directly dependent on the value of the home among many other categories of risk such as borrower risk, market risk, and so on.

data quality

Home valuations are dependent on the property’s characteristics, recent sales in the market, current inventory of homes, neighborhood information, recent development and employment activity in the area, and many more such factors. As you can see accurate and consistent real estate data is highly important to arrive home valuations of high degrees of confidence for every player in the ecosystem.

For instance, consider a property with 4 Bedrooms, 3 Baths, 2,500 sq. ft. living area, on a one acre lot is listed in the MLS as a 5 Bedroom, 3 Bath property since the agent counted an additional room in the basement as a bedroom. By comparing this to other 5 Bedroom, 3 Bath comparable properties, the subject property could get overvalued or other properties can get undervalued if the list price of such a property is used as a comparable. Similarly, the subject property being compared to another one in better condition, or missing out on improvements made to the kitchen or the basement, will reflect an inaccurate value in an appraisal. As a result an appraiser tries not to solely depend on the MLS listing data for their work; she supplements it with onsite inspections to collect detailed information. Appraisals are thereby delayed and it further cuts into the profit margins in the appraisal business. Much worse, this has a direct bearing on the ability of the homeowner or the buyer in closing the transaction. So, unreliable data sources inadvertently exert strong influence on the whole process.

Why is it difficult to maintain data quality in Real Estate?

Of all the sources of information about a particular property, the most dependable data is that made freely available in most counties in the US via the public records act in each state. That covers tax assessment, deeds, mortgages, liens, etc. These data sets are again completely independent typically tied together by an APN (Assessor’s parcel number). But each county or municipality creates and maintains this data in their preferred model even though conceptually they all cover the same types of information. Integrating data from over 3000 counties across the country and unifying them to a single data model is one necessary step to ensure data consistency can be maintained across all properties.

 

Real Estate listings data gathering, on the other hand, has been a wild west even with the Real Estate Transaction Standard (RETS) maintained by the National Association of Realtors (NAR), which only provides a protocol standard for data exchanges but not a payload standard for the data actually exchanged. Enter Real Estate Standards Organization (RESO) with the RESO standard data dictionary and it has immensely improved consistency in data representation across the various players. But RESO does not address the types of home valuation related data issues discussed earlier (we will presented why RESO is justified with that position in Part 2 of this article). The MLS data capture platforms most often do not enforce any data consistency rules within the system or with the local county/municipality data. Even though a Board of Realtors or MLS may have a recommended format, there could be hundreds, if not thousands of agents, brokers and their assistants that could submit a listing. Much as no two people are alike, their choice of words and descriptions of key features could vary. The description of features is another common area where subjectivity is prevalent. For every person that calls a home a “fixer upper”, another person will say it is “an incredible value, with lots of potential”.

 

data source

Inaccuracies in the data can be introduced through other means as well. Property characteristics are largely affected by this. Real estate appraisers require the Gross Living Area (GLA) of a home to be the “Above Grade” square footage, which would be how the assessor would report it, but when the property is listed, the Living Area is often, inclusive of finished basements, which can be misleading. Even though the intent is not to create a wrong listing, misinterpretation of the data creates tricky situations during the appraisal process. Data entry errors can create a listing with the wrong number of bedrooms or bathrooms, living area or lot size area. When there are several hundreds of fields to update for a listing and time is limited, these errors tend to multiply exponentially.

 

Know Your Data – Measure its Quality

As we explained in the previous sections, data quality in real estate is much required but hard to achieve given the integration complexities across the various players. Identifying the individual root causes and fixing them can take a long time, but in the meantime we can try to improve the quality of current data to achieve immediate business objectives.

 

Before we can “cleanse” the data to improve its quality, we need to be able to identify how bad is the data at hand using a few applicable metrics. It is important to understand that the target quality and the metrics to measure it by depends a lot on the target use for the data. For example, a selling agent is most interested in data related to property characteristics, financing terms, showing instructions, etc. but a home improvement company would be interested in property features, property improvements, etc. Here are a few suggested common metrics to measure the quality of real estate data.

 

Field Population Statistics with specific focus on the following fields from the RESO standard data dictionary.

Parcel Number ListingContractDate AssociationName
Address StandardStatus AssociationFee
PropertyType OriginalListPrice Subdivision
PropertySubType ListPrice School Districts
Lot Size CloseDate
Zoning ClosePrice TotalActualRent
NumberOfBuildings DaysOnMarket
BedroomsTotal ListAgent Information
BathroomsTotal ListBroker Information
LivingArea SellingAgent Information
Tax Year SellingBroker Information
Tax Value Public Remarks
Tax Amount
Land Value
Improvement Value
StoriesTotal
ArchitectureStyle

 

Address Standardization measures the extent to which the address components for a property are usable to uniquely locate a property or helps in deriving a high accuracy geocode.

Geocode Accuracy is sometimes required to support accurate property searches for radius or polygon searches. Rooftop accuracy may be required for certain applications but a street side geocode might suffice for many.

Listing Duplication must be reduced as much as possible again depending on the application but at the least listings from the different MLSs will need to be linked with a common unique property id.

Raw listing data from an MLS will trickle in with multiple updates and improving in quality over the first few days or weeks of a property being listed. Listing history records may have to be merged to improve data consistency.

Often a listing will move into a Cancelled/Withdrawn status before it is recorded as Sold. In such cases the listing history data may require a consolidation to drop the superfluous status transitions.

Very often sale and rental listings may get mixed up in different RETS resources/classes. It may be required to reclassify such listings appropriately.

Click here to continue to Part 2 of this article which explores the following ideas.

  • Data Quality Best Practices
  • Data Standardization vs. Data Quality – What’s the difference
  • Practical data quality for real estate

References

Please click here to provide your contact information to be alerted when Data QA Tool is published.

Want access to Data QA Tool?

Please provide your email to be alerted when Data QA Tool is published.

PropMix.io in collaboration with RESO is offering 50% discount to RESO members

MANHASSET HILLS, N.Y., Oct. 17, 2017 /PRNewswire-iReach/ — PropMix.io, an Innovation Incubator portfolio company, announced two new capabilities to empower real estate brokers and software providers – the latest release of Sally, cognitive image advisor and an AI-driven broker insights platform called BrokerView. Sally IDX APIs will enable brokers and agents to enhance the seller/buyer experience on their IDX websites and automate and improve the quality of the content. The updated version of BrokerView, the analytics platform for Brokers from PropMix.io, provides a comprehensive view of the Broker’s business and the market trends along with actionable recommendations. These services are being offered in collaboration with RESO at 50% discount for RESO members

PropMix has solved the image recognition problem in real estate using our decades long real estate experience and deep AI knowledge. Sally allows IDX portal visitors to find and compare homes using the photos posted by agents. Sally has been trained on a million images to help automate the labeling of all home photos, organize them, and choose the best thumbnail image. In addition, Sally infers property details from the images and standardizes them to the Real Estate Standards Organization (RESO) data dictionary. As a result images are searchable with standard tools. This will improve SEO hits on IDX websites, and best of all customers can compare rooms side-by-side using photos.

PropMix has also released an insights and decision making platform for brokers called BrokerView. It uses big data and deep learning models to analyze real estate transactions and combines them with competitor and local and adjacent market predictions. In addition to delivering a customizable dashboard of the common metrics, BrokerView discovers the broker’s business model and generates highly targeted recommendations on neighborhoods, markets, and agents.

PropMix is collaborating with RESO to make Sally and BrokerView available to its members at a discounted price. “Image recognition enables a new breed of real estate applications with lasting impact in consumer experience, valuation, and risk management and PropMix’s Sally platform is at the forefront of that disruption” said Umesh Harigopal, CEO of PropMix. “We are looking forward to RESO members actively utilizing the platform and incorporate image recognition APIs into the RESO standards in the near future”.

The combination of image recognition, natural language processing, and big data analytics within PropMix’s Real Estate cognitive fabric is creating a comprehensive platform for AI in real estate and enabling a new generation of apps that learn and automate decision making for all participants in the real estate market.

About PropMix.io LLC USA

PropMix.io LLC, an Innovation Incubator Inc portfolio company along with its subsidiary PropMix.io India Private Limited, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Apps. Strategic partners include Software Incubator and Cognub Decision Solutions the first decision science company in Real Estate. Built on industry open standards based Platform as a Service (PaaS) for global scale and bulk sync, PropMix.io empowers users to engage with data, make decisions using insights and build the Real estate future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan, PropMix.io LLC, 7329799507, sakeer@blog.propmix.io

News distributed by PR Newswire iReach: https://ireach.prnewswire.com