Home > DemandTools for AppExchange Modules > DemandTools Cleansing Modules > Single Table Dedupe Wizard Details

Single Table Dedupe Wizard Details

 Capabilities & Features:

DeDuplication Best Practices & Guidelines:

Please test your deduplication scenarios thoroughly before deduping a large number of records, as there is no automatic "undo"/"rollback" option.  If you have a Salesforce sandbox, we recommended that you test in this environment first before deduping your production data.

More information on pointing DemandTools to the Salesforce Sandbox can be found here.

1. Dedupe in the following order: 

2. Use at least a 3 pass strategy for identifying duplicates

3.  It is recommended that name, address, phone fields are standardized prior to searching for duplicates.

This is not a requirement as DemandTools has lots of advanced mapping techniques to find matches when these fields are not standardized.  However, the cleaner and more complete the data, the easier it will be to find matches.

MassImpact can be used to standardize most of these fields.  Please refer to the MassImpact documents for details on how to make basic changes (i.e. find records where state - Texas and change to TX), and how to use prebuilt formulas for more complex standardizations (i.e. "Normalized_US_Address", "NaPhoneFix" etc.).

The Address Verification module is also extremely helpful in standardizing as well as completing address data.  If Address Verification will be used, there is no need to use MassImpact to standardize address fields first, as this will be done as part of the verification process.


Using Scenarios

The scenarios are listed in the left hand column, descriptions of the scenarios are listed in the middle column and the fields that will be viewable are listed in the right hand column.

 

Load Scenario
Loads the scenario and all settings. User must manually select the next screen for each step.
Run Scenario
Selection of Run Scenario takes the user to the final Single Table Deduplication screen. Duplicates are displayed, master records are selected. User must execute the merge.
Delete Scenario
Deletes the source xml file and removes the file from the interface listing
Save Scenario
Used for newly created or modified scenarios. Scenarios are stored in the Demand Tools directory.
Clear Scenario
Lets you start over with either selecting another scenario or selecting an object to deduplicate

Developing and Executing Dedupe Scenarios

Through the development of a deduplication pass (all the way from the selection of the object, the sub set of objects, match criteria and options, master rule and score delta through the merge options) can now be saved in a new DemandTools object called the deduplication scenario. The dedupe scenario saves all setting except for specific record custom actions such as field edits, manual master vs. slave selections etc.

The scenario is given a name and a self generated description of what the scenario settings are. After the system self creates this file it is end user modifiable to allow for custom notations etc. All fields and value settings are automatically recorded in the description file to save time.

NOTE: If you modify a scenario, to update the text description, you must select REBUILD SCENARIO. This will update all of the DemandTools modifications, you will need to re-enter any notes you had on the old scenario.


Save scenario dialogue box


Scenarios pre built or built by user being displayed in Screen 1

Scenarios are saved on screen 1 with the scenario name sorted under the object it belongs too. Scenarios can also be found in the the DemandTools directory in a folder called scenario and under the appropriate object type sub-folder.

Scenarios are xml files that can be sent from one DemandTools user to another as long as the selections of objects and fields are logical for both computers (either the same Salesforce database or very similar ones).

Please visit our NO DUPES SERVICE that will show you how DemandTools can offer their own custom built scenarios with your company.

Scenarios are controlled by the scenario control buttons on page 1. The scenario load button (or double clicking the scenario name) will load the scenario into the deduplicator.   You can also select “Save Scenario” on all of the 3 dupe screens, so the scenario can be saved or updated at any time in the process.

The delete scenario button will delete the source xml file and remove the file from the interface listing.

Step by Step Instructions 

Screen 1 of Single Table Deduper - Steps 1 - 4

 

Step 1.  Select the Object to Deduplicate

Select the proper Salesforce database to deduplicate. All tables are available beneath the pull down list and are sorted in alphabetical order. After selecting the object in the pull down menu, press the "Use Object" button to begin.

Step 2.  Select the fields to show on the Potential Duplicates List

The primary goal of step two is to select the fields to display in the found duplicates grid (Screen Three of the STD). Most users will want to display a combination of fields that would help verify that the records are truly duplicates and fields that display the most important data values. DemandTools shows both the table field name as well as the Field Label from the interface. Select the fields in the order you would like to see them when the potential duplicates list is shown. The Order column will be populated numerically to indicate you selections as you make them.

NOTE: DemandTools gives you the ability to show additional fields in the display screen (Screen 3) even after duplicates are located. So, if important selections are forgotten here, you will have another opportunity later to do so as well.  

Step 3.  Develop Criteria to Determine Which Objects in the Table are Searched for Potential Duplicates

Select either the default radio button "Use All" (to select all objects to search for potential duplicates) or the radio button "Use Conditions" (develop conditions to narrow the search range).

After each selection of field and value to narrow by the user must press "Add Condition" so that the expression builder will place the expression in the large expression display box.

 

Step 4.  Proceed to Mapping Screen

After selecting the table (object) to deduplicate, the fields to show on the potential duplicates list, and the subset of object to search, select " Screen 2 - Mapping" to proceed.

 

Screen 2 of Single Table Deduper - Steps 5 - 10

Step 5.  Select "Add Mapping Condition"

Mapping conditions consist of Fields to Match, Match Category and Mapping Options.

TIP: To remove a mapping condition, highlight the row by clicking on the box to the left of the Match Blank Values Column and hit Delete on your keyboard.  

Step 6.  Selecting "Fields to Match"

If you select the MATCH BLANK VALUES checkbox, DemandTools will return records where the values for that particular field are blank in all records that match on the other fields you select for matching.

Select the Salesforce field that will be used to identify duplicates. Multiple selections of course may be necessary in order to properly determine duplicates with confidence.

Step 7.  Choose a Mapping Technique

Select the 'Match Category" to look through the techniques by category or go directly to the "Match Type" (leaving "Match Category" as All) to see a list of all available match types. 


Mapping Types Explained in Detail

Match Categories

Mapping Types

Type Description:
Found in Category:
 

The Cleaned Account Name mapping type uses the built in Account Name Cleaning List.  To see the Cleaning list tool select the "Edit Account Cleaning List" button.

The cleaning list standardizes punctuation, spaces and removes common business prefixes and suffixes.  These lists are customizable for language(s) and/or line of business.

All, CRM

The Country Match mapping type is used to standardize field values for the recognized countries of the world.  It will recognize the long name of a country, the 2 digit ISO short form, the 3 digit ISO short form and the numeric ISO country value as possible matches of each other.

All, GEO

The Domain mapping type is used when mapping web pages and/or email addresses. It allows for the independent analysis of the domain information contained within the URL or the email address. For email addresses it uses any information to the right of the @ sign.  For web pages it parses the XXXXX.com portion.  This mapping types allows for easy comparison of web page field vs. web page field or email field vs. email field. It also allows for the comparison of email addresses to web pages and vice versa.

All, Internet
The Exact mapping type is a 100% match of every character (assuming no options apply (see Fuzzy, Transpose & Alpha Clean))

All, Alpha, Numeric

The First Name mapping type uses the built in Nickname List.  To see the Nickname list select the "Edit Nickname List" button at the top of the interface.

The Nickname list will see Bill, William and Billy as potential duplicates/matches of each other.  This list is also customizable by the end user for localization or for non contact substitution on any field by replacing the nickname list with synonyms.

All, Contact
The First XX Letters mapping technique is used to compare the first XX letters in a field.  When selected a pop up box will appear in order to specify the number of letters on which to match. Applicable to text fields only.

All, Alpha
The Numeric mapping type is used to compare only the numeric values in a field. All other characters, such as spaces or punctuation, will be ignored by the deduper. A field with a value of "Apt # 31" is seen by the deduper as the numeric characters "31", ignoring "Apt #".

All, Numeric

The Relaxed Address Match mapping type parses the street address to the lowest common denominator.  Based on North American standards it has also proved effective with most country address formats.

Example:

Apt#4, 123 Pavillion Street
123 Pavillion, Apt 4
4-123 Pavillion Ave NW

Are all seen as "123 Pavillion"

All, CRM
The Relaxed NA Phone Match mapping type removes all non-numeric characters and spaces, leading 0's and 1's, area codes and extensions on the back end, returning the 7 primary digits of the phone number.  If just 7 digits are left use those 7 digits, else just return digits 4 - 10.  It will not match the "Phone-word" values and will trim off the "SPOT" in the phone number and only look at the numeric portion.

Example:

+1 879 555 1212 ext 500
(879) 555 1212
1.879.555.1212 ext 408

Are all seen as "5551212"

Note: This is based on a 10-digit North American phone number.  Although it can be used with international phone numbers (will just return 7 digits in the middle), the Numeric mapping type is the recommeded mapping type to be used with international phone numbers.

All, CRM

The Salesforce.com ID Match  mapping type is used to match any Salesforce.com objects 15 digit ID to its 18 digit equivalent ID and vice versa.

All, CRM
The Street Address Match mapping type is a slightly more rigid criteria than the relaxed address match mapping type. The deduper will ignore the differences in street type short forms such as crescent - cres, road - rd, street - st, etc.

All, CRM
The Zip 5 and 9 mapping type will automatically match USPS 5 and 9 digit zip codes without the need to standardize them to a common number of digits.
All, CRM

Additional Mapping Options


Option: Description: Applicable Mapping Types:
Fuzzy  Phonetics engine capable of analyzing words for how they sound when pronounced.  Through a technique of removing vowels and analyzing the remaining consonants the fuzzy engine works well for matching fields with spelling mistakes. Cleaned Account Name
Exact
FirstName

Transpose

 The transpositional engine allows for fields to appear as duplicates even when there are differences in their word order.  For example,  "Jones, Smith and Jackson" will appear to be a duplicate of Jackson, Smith and Jones.

Cleaned Account Name
Exact
FirstName
Street

Alpha-Clean

 The alpha cleaner extends some of the capabilities of the account name cleaner to other fields for easier matching.  The alpha cleaner is used for ascii (north american) data to ensure that the only characters that are analyzed are the 26 characters of the english alphabet and the numbers 0-9.  Any other character that the field may contain will be ignored by the deduplication matching algorithms.

Cleaned Account Name
Exact
FirstName
Numeric
Street
Zip 5 and 9


Step 8.  Select Mapping Options

In addition to the already covered mapping types there are also options that can be applied to many of them. The STD provides for three different options to be augmented to the appropriate types of mappings. The available options are:

Step 9.  Go back to Step 5 and define more Mapping Options until your criteria is complete!

Step 10.  Search for Potential Duplicates

Select the "Screen 3 - Merge Control" Button to search for potential duplicates and move to screen 3 to analyze and eventually process.



Evaluating Found Duplicates

Screen 3 - The Merge Grid

 

Working in the Grid

Viewing Found Duplicate Groups

Analyze Grid Data

 

Edit:  Ability to update any field real time in Salesforce by right clicking on the field in the grid

Methods of Merging

1.  Custom Merging

By right clicking on the duplicate set and selecting the "Setup Custom Merge”, you can “paint” what you would like the Master record to look like. Anywhere from 2 - 100 records may be placed in the custom merge window. Select the Master Record and double click the field values to retain from the displayed record sets.

After the user presses "Accept" at the bottom of the Custom Merge dialogue box, the pins in the duplicates found view will change to blue to show that that record is involved in a custom merge

2.  Manually Select Master vs. Servant Records

In order to manually select master vs. servant (non-master) records in the grid view of the STD, the user double clicks the pin beside the duplicate record they wish to activate as the master record, turning it green.

You can merge duplicates one set at a time by manually right clicking and selecting "Merge Group".   The deduplicator will then merge that single duplicate set into a single record.

Or, you can identify all the records to be processed by placing check marks in boxes. Manually pre select all master records and then process all the sets in batch mode. Select the “Merge Checked Objects” button to process the entire loaded set of duplicates, find all checked ones, then determine which have been selected as masters and which as servants.

NOTE:   All merges, whether manual or automatic will be processed based upon the defined Merge Options listed along the right hand side!!

3.  Choose or Design a Master Rule

The Single Table Deduplicator is packaged at initial installation to have default master rules ready for use for most of the standard Salesforce objects that are typically chosen for deduplication.   These may include rules like oldest or newest record, most recently modified record, owned by, etc.   Custom objects and less frequently de-duped objects may require that even the simplest rules such as oldest record and newest record will have to be created by the end user.

Select a Master Rule

When using strict deduplication criteria (looking for records that are nearly for sure duplicates) you may decide that (after a quick scan maybe) that you want to process all the records in the grid view.  

In order to process duplicates in this mode you do not need to apply master rules to each duplicate set. Simply select the button "Check All" at the bottom of the grid and all the duplicate sets will be checked.     All records are now checked but no master servant selection has been pre-made.   Masters  will automatically be selected based on the currently selected master rule in the upper right hand of the interface when the end user selects "Merge Checked Objects" and the merge process will be completed.


Currently selected master rule

Master Rules Creation and Editing of Master Rules

Any object can have a master rule written for it. When building a master rule you have the option to add unlimited conditions for analyzing either the record (as a whole) or the field values that the record contains.

At the record level, options such as oldest, newest, most recently modified and many other advanced constraints are available. For field values, the STD displays constraints appropriate for the data type of the field being analyzed.

Sample constraints on different field types
Rule Dialog Box

After the setup of the initial field and constraint you have the ability to score the positive incident of the constraint with an arbitrary point value. The default rules that ship with the software are based upon a base 100 scoring system but any system can be used. When adding multiple constraints the record is scored in cumulative fashion with each additional positive expression adding to the total score of the records.   Additionally negative scores can also be applied to penalize a record.


A complex rule with multiple expressions and a score delta set


The final component to building rules after determining the desired expressions is to decide to assign a score delta. The score delta will be used by the deduper to automatically determine master vs. servant records. The value of the score delta determines how many points away a master record has to score from the next nearest servant record in order to automatically be selected as the master.

If the score delta threshold is set and NOT met:

If no score delta is set:

Master Rule Options

Field

Constraint

Value

Record

  • Owned by
  • Not Owned by
  • Oldest
  • Newest
  • Most Recently Modified
  • Most Opportunities
  • Most Contacts
  • Active Owner
  • Equals
  • Not Equal to
  • Prepopulates with Salesforce values

Field Name

  • Equals
  • Not Equal to
  • Starts With
  • Does Not Start With
  • Ends With
  • Does Not End With
  • Is Populated By
  • Not Populated By
  • Contains
  • Does Not Contain
  • Largest
  • Smallest

This will vary based on the field selected.   If there is a pick list associated with the field the possible values will be shown in a pop-up menu.

Methods for Applying Master Rules Against Duplicates

In order to apply rules on a record by record basis the end user can right click in the duplicate key set box. This will display the popup menu with the manual merge options on the left and the available master rules in the right hand column. Simply select the master rule to apply to the duplicate set and the records will be scored according to the rules scoring system.

If a score delta has been set, the master record will be automatically selected (pin turns green) for all the records with the largest cumulative point score that surpass the next nearest servant by at least the delta point value.   If no records are determined to be master worthy, then the pins will remain gray (no master selected) but the score of the records will be shown in the interface.

As opposed to right clicking in each duplicate set the interface provides for two more automated processes for apply master rules to larger duplicate sets.

The button "Apply Rule to Checked" will find all records/record sets that have a check mark in the square left box but do not currently have a master vs . servant selection made to them. The records will be scored according to the selected Master Rule and assigned a master vs. servant if the score delta is set and surpassed by any one record.   As above, if no records are determined to be master worthy, then the pins will remain gray (no master selected).

The "Apply Rules to All " button which will process all the records in the duplicate grid (regardless of check status) and score and select all records.

Merge Control Options

1.  Update Fields where Master is Empty

If a field is empty on the Master but populated on the Slave, update the Master with the Slave value.  By default, all fields are checked.  Options include:

2.  Combine Field Values

Used to select the fields to concatenate across all records in the duplicate group.  By default NO fields are selected for concatenation.

3.  Salesforce Merge

If the Salesforce merge is used, all objects will be merged to the master and the slave records will be moved to the recycle bin.

4.  Merge Objects (Only visible if "Use Salesforce Merge" unchecked)

Selection of which objects get transferred from the slave(s) to the Master Object.  Using the Salesforce Merge option automatically merges all sub-objects.

5.  Reassign to Master Owner

When sub-objects are reparented from the Slave(s) to the Master they can optionally have their ownership changed to match the Master Owner.
Check the specific sub-objects which should be reassigned (ie. Open Activities, Closed Activities, Open Cases, Closed Cases, Open Opportunities, Closed Opportunities, etc.).

Options With "Use Salesforce Merge"

1.  Additional Field Logging

One additional text field can be added to the DemandTools log file.  Standard logging includes From and To Salesforce ID's for the objects merged.

Options When "Use Salesforce Merge" Not Available (when merging objects other than Leads/Contacts/Accounts)

1. 
Non-Master Objects

Object
Default Prefix Field
Account
Name
Contact
Last Name
Lead
Last Name

 2.  Duplicate Campaign Exception

3.  Inactive Ownership Exceptions

Used to override some limitations of the API in the areas of reassignment of objects owned by inactive users. In some case the API will not allow these types of objects to be reparented. Choose from the following 3 available options:

4.  Merging of custom object sub-objects

Not just referenced ones, ones that truly are beneath the custom object in database hierarchy (we expect very, very rare).

Executing the Merge

Once the records you wish to merge are checked, and the master records determined (either by score delta or manual selection) select "Merge Checked Objects" to begin the merge process.


Executing the duplicates batch

Individual groups can also be merged by right clicking on the duplicate key and selecting "Merge Group"

Schedule Automatic Non-Attended Dedupe Scenarios

Single Table Deduplication is fully capable of working within the Windows scheduler interface and will allow for times execution of scenarios as long as the host cpu has an active internet connection.
 
More information on scheduling Single Table Deduplication can be found here.

Known Limitations

Object Area Description
Contacts SELF SERVICE USERS When a servant contact is merged into a master contact and the servant contains a self service user the SSU will be inactivated on the servant contact and will be recreated on the master contact. You will lose all login information for the SSU and will have to send the user a new password as this is not moved in the creation of the SSU.

Additionally, if the servant was also a "Super User" you will need to manually re-activate this status on the master contact.
Contracts ACTIVE CONTRACTS The Salesforce API does not allow the re-parenting of "Active" contracts. Therefore when duping account records DemandTools CANNOT merge "active" contracts from slave accounts to the selected master account.

As a workaround we have added a master rule for "Most Active Contracts", so you can set the master based on active contracts. When you run your Single Table De-dupe on your accounts you will receive errors if there are still contracts that could not be re-parented. In the log file, it will indicate which slave accounts had active contracts and that it was unable to delete these slave records. You can then select these using Salesforce's merge feature and merge them one at a time.
Activities CREATED DATE In order for tasks and activities older than one year to be moved/merged, the customer must contact Salesforce.com to 'open the window' for the API to functionally be able to do this. A 'work' around may be that the customer selects the oldest record as the master.   This is only a limitation if you do NOT use the Salesforce merge.
Contacts CAMPAIGN MEMBER STATUS DATES

When Deduping contacts and opting to use the "Duplicate Campaign Exception - Use Most Recent, if the most recent record is on the slave, when it is cloned to the master, the "Last Modified Date" (which is also the status date) will become the date of the merge. If the most recent record is on the master, we do not touch the record, so the original date stays intact.

This will happen EVEN If you have "Modify System Fields" turned on by salesforce.com. The only workaround is to create a custom object to track campaign history. There are some 3rd party vendors that have developed software that does this for you. Search the Salesforce AppExchange for a current list.

Use Conditions OWNER ROLE

Using the "Owner Role" in your "Use Conditions" for selecting which objects to check for duplicates will physically convert the Owner Role to the individual user id's.

If you update the members of a Role, you should re-do the "use Conditions" (clear and re-add) so the correct user will be inserted.

     
     
     

See also
Salesforce Merge - Unchecked
Mapping Types
SingleTable Dedupe - Using a Pre-Built Scenario
Lead2Contact DeDuplication
Lead2Account Deduplication
MassLead Convert