QualityStage is a standardization process for addresses which can be very useful when creating a highly cleansed data mart. The “out of the box” the standardization process is fairly accurate, however there will be some data that needs additional help. Here is the process for testing your addresses through the standard QualityStage standardization process.
Assumption: the standard process for standardizing addresses is as followed…
- Pass the address, city, state, and postal code used the Standardization process COUNTRY.SET to determine the country of the address.
- Separate the data set into multiple outputs for each country.
- Pass the country dataset through the …PREP standardization process (USPREP).
- Pass the country PREP datasets through the …ADDR and …AREA standardization processes (USADDR and USAREA).
The data we will be analyzing is as followed…
… | AddressLine1 | AddressLine2 | City | StateAbbr | PostalCode | … |
12345 Main St | 789 | Minneapolis | MN | 55431 |
In the Designer Client, expand open the Standardization Rules folder in the Repository.
Expand open the USA folder
To start with, expand open the USPREP folder
Double click the SET
Once the Rules Management opens, select Test on the right side
Populate the input strings was the data’s address lines
Note the AddressDomain and the AreaDomain
Close the rule set tester
Close the rules management window
When it asks to exit without saving, click yes
Collapse the USPRUP folder
Expand the USADDR folder
Double click on the SET
Once the Rules Management opens, select Test on the right side
Enter in the AddressDomain provided by the USPREP test and click Test This String
Note that columns are now populated with the appropriate address attributes.
Close the rule set tester
Close the rules management window
When it asks to exit without saving, click yes
Collapse the USADDR folder
Expand the USAREA folder
Double click on the SET
Once the Rules Management opens, select Test on the right side
Enter in the AreaDomain provided by the USPREP test and click Test This String
Note that columns are now populated with the appropriate area attributes.
Sometimes, addresses or area strings have an unfamiliar format. These will show up as an unhandled input value. These can be handled in one of two ways.
- Correct the data on the source so that the address and/or area can be standardized.
- Add the unhandled pattern into the rules management override section.