Daisy Analysis Applications - De-Duplicating Databases

Back

Many databases contain duplicate records or rows.

Daisy 4D is a simple way of identifying those records that may or may not be duplicates.

It uses a technique where a working field is assigned to a value formed by concatenating some of the other fields or columns in the database. These fields can be processed or filtered before concatenation.

If the working field in two records are the same then they are possibly duplicated.


Looking for duplicates may seem to be a simple process, but it is not!

Consider the database of season-ticket holders at a football, rugby or cricket club.

Are any of these records duplicates? Some would say not, as each corresponds to a seat in the stadium or it has a unique number.

But then it also depends on what you are doing!


For example, suppose you are sending out season-ticket renewals.

That's easy then, as one letter goes out to every season-ticket holder. If you have accounts, which handles the case, where one holder has more than one ticket, then you just send one letter to each account holder.

But that isn't always the case. James Miller has two season tickets at Ipswich Town FC and used to get two letters. This meant the club probably sent out about 3,000 extra letters, it was not an exercise without cost!


On the other hand, you might be sending out a newsletter or a catalogue for merchandise.

Do you really want to send out one per season-ticket, per account holder or just one per household?

Whatever you want to do, then Daisy 4D has the capability to reduce your database to what you want for the most efficient distribution of the newsletter.

You can filter the data, by name, last name, post code and many other factors to make certain you identify all of the duplicates you want to eliminate. You can also use Soundex, which reduces names that sound the same to a simple code.


Add in another database such as all those, who have bought goods in the club shop or purchased tickets at the box office and you end up with another dimension and innumerably more duplicates to be eliminated.


Consider this postscipt!

Have you ever received several identical catalogues in the post from a mail-order company? All perhaps with slightly different names!

Did it give you a good impression of the company?