EzDeDupe for Microsoft Windows - Product Details
03.20.2008
EzDeDupe has been designed as an easy to use, affordable and effective database/spreadsheet deduplication tool. EzDeDupe gives the ability to load multiple files, use advanced deduplication algorithms to match mismatched data and then the ability to export cleaned data (and more) to a variety of database formats.
EzDeDupe will act as a central repository of data while you are in the process of standardizing, cleaning and of course deduplicating. You will be able to fill the central repository by connecting and importing a variety of different data sources including XLS, MDB, XML, CSV, TXT, DBF as well as any ODBC or UDL compliant database. Once all the data sources are loaded into the EzDeDupe central repository, then they are ready for duplicate finding and potentially merging. |
|
Examples of Mapping types in EzDeDupe.
Cleaned Account Name:
Uses the built in Account Name Cleaning List. The cleaning list
standardizes punctuation, spaces, word synonyms as well as removing
common business prefixes and suffixes. These lists are customizable to
your language(s) and/or line of business.
The Jenson and Form Co = Jenson Form Ltd. = The Jenson Form Corporation
Country Match: The country mapping
type is used to standardize field values for the recognized countries
of the world. It makes the long name, 2 digit ISO short form, 3 digit
ISO short form and the numeric ISO country value all to appear to be
matches of each other.
Canada = Ca = Can = 053 (iso numeric)
Domain: The domain mapping type is used when mapping web pages and/or email addresses. It allows for the independent analysis of the domain information contained within the URL or the email address. For email addresses it uses any information to the right of the @ sign. For web pages it parses the XXXXX.com portion. This tool allows for easy comparison of web page field vs . web page field or email field vs . email field. It also by nature allows for the comparison of email addresses compared to web pages and vice versa.
Exact : The exact mapping type in the Single Table Deduplication tool is exactly that, a 100% match of every character (assuming no options apply).
FirstName: Uses the built in Nickname List. To see the Nickname tool select the "Edit Nickname List" button at the top of the interface.
The Nickname list allows the deduplication tool to see Bill, William, Billy, etc. as potential duplicates of each other. This list is also customizable by the end user for localization or even in theory for non contact substitution on any field by replacing the nickname list with synonyms.
First XX Letters: Compares only the first XX letters in a field. Text fields are the only applicable field type. The user can select as many letters as they would like to compare.
Numeric: Compares only the numeric values in a field. Other characters that the field contains, such as spaces or punctuation, will be ignored and not seen by the deduper. A field with a value of " Apt # 31" is seen to the deduper as only the numeric characters "31". This is often used with phone number fields, so that (999) 555-1212 will match to 999-555-1212. In this case the deduper will see this as 9995551212.
Relaxed Address Match: Parses the street address to the lowest common denominator. Based on North American standards, it has also proved effective with most country address formats.
With relaxed address match the following addresses are all seen in the lowest common denominator of: 123 Pavillion:
- Apt #4, 123 Pavillion Street
- 123 Pavillion, Apt 4
- 4-123 Pavillion Ave NW
Relaxed NA Phone Match: Removes all non-numeric
characters and spaces. If the first is a 1 or 0 removes it. If just 7
digits are left use those seven digits, else just return digits 4 - 10.
It will not match the "Phone-word" values and will trim off the "SPOT"
in the phone number and only look at the numeric portion.
Street Address Match: The street address match is a slightly more rigid criteria than the relaxed address match tool. It will ignore the differences in street type short forms such as crescent - cres, road - rd, street – st.
Zip 5 and 9 Match: This mapping type will automatically match USPS 5 and 9 digit zip codes together without the need to standardize them first to a common number of digits.
Examples of Mapping Options
| Type | Description | Mapping Types |
|---|---|---|
| Fuzzy | Phonetics engine capable of analyzing words for how they sound when pronounced. Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engines works very well for matching fields with spelling mistakes. | Cleaned Account Name Exact FirstName |
Transpose |
The transpositional engine allows for fields to appear to be duplicates even if the have differences in their word order. For example Jones, Smith and Jackson will appear to be a duplicate of Jackson, Smith and Jones. |
Cleaned Account Name Exact FirstName Street |
| Alpha Clean | The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for matching. The alpha cleaner is used when you know you only have ascii (north american) data and you would like to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9. Any other character that the field may contain will be ignored and not seen by the deduplication matching algorithms.
|
Cleaned Account Name Exact FirstName Numeric Street Zip 5 and 9 |
<< Home





