Targeting Cross-Article Duplicate Content
Using the Duplicate Content Matching Function:
The controls for the duplicate content matching function can be found at the bottom of the DupeFree Pro user interface and look as shown below:

A very powerful side to the duplicate content function of DupeFree Pro is that the matching settings can be completely tailored to you own needs. So, essentially you have total control over what can be considered as duplicate content.
I have chosen to go this route rather then a set method because as well as giving more flexibility I thought it was important to allow you to make your own mind up as to what you think is the best threshold to measure duplicate content. After all the Search Engines are certainly not going to tell us what they do so we have to make our own informed decisions. The default settings of DupeFree Pro are the settings I use, I have no particular reason for using them other than they are what I feel gives a fair approximation.
Each of the controls shown in the panel above are detailed below...
![]()
Large 'Compare' Button - This is the button you push to make DupeFree Pro compare the two texts in both the window panes at the top of the user interface. Make sure you have an article present in each of the window panes prior to pressing the 'Compare' button.
![]()
Check Box: 'Allow Word Spanning Over Multiple Sentences' - This check box is situated just above the large 'Compare' button and when checked tells DupeFree Pro to ignore ends of sentences when calculating if a group of words could be considered as a match. This box is checked by default because I imagine the Search Engines look past full-stops and other end-of-sentence characters when searching for duplicate content.
The two examples below show what would be highlighted as duplicate content with and without the 'Allow Word Spanning Over Multiple Sentences' check box checked for the same pair of texts:
(minimum matching length settings of 4 words were used)
| Check Box Checked | Check Box Unchecked | |
| Original Sentence: | This is a test sentence. This is another one but not a test sentence. | This is a test sentence. This is another one but not a test sentence. |
| Rewritten sentence: | Not another test sentence. This is seriously not a test sentence. | Not another test sentence. This is seriously not a test sentence. |
Notice how when the check box was checked (the left column) the text section "test sentence. This is" has been picked up as duplicate content whereas on the right column this text section has not. This is due to the full-stop between the words.
Having this option checked can increase the duplicate content percentage but I feel ignoring end of sentences provides a more accurate reflection of what Search Engines could consider as duplicate content.
![]()
['Minimum Length Of The Matching Parts=' Number Entry Box] and [Word, Sentence & Paragraph Drop Down Box] - These two settings are very important and influence the amount of duplicate content that is detected.
One of the major unknowns when calculating duplicate content is knowing what the minimum size should be when considering a group of words to be recognized as duplicate content or not. Does it only take two adjacent words matching an identical pair in another text to become duplicate content? ... or maybe its three words running together, perhaps its ten?! We really have no way of realistically telling at what point the Search Engines may consider a duplicate content match to be an issue and so we have to try and deduce a measurement without going overboard or selling ourselves short. Everyone will probably have their own opinion as to what should be the right size to consider as a duplicate match across two pieces of text and so DupeFree Pro has been made flexible enough for you to set your own minimum matching length threshold.
You can set any number of running words, sentences and paragraphs you wish to be the minimum matching size for duplicate content to be flagged. Simply enter a number in the 'Minimum Length Of The Matching Parts=" entry box and then select from the drop down box next to that whether you want the number of matching parts to be either words, sentences or paragraphs. Now when clicking the 'Compare' button those minimum matching length settings will be used when DupeFree Pro calculates and highlights duplicate content.
The default settings in DupeFree Pro for these two controls are '4' and 'Words'. These are the settings I tend to use most often because I believe they provide the right balance for catching most of the larger chunks of duplicate content and are not so low that the software picks up everything. You will find that is usually very hard if not impossible to remove all duplicate content if you go for a setting of '2 - Words' or below. This minimum length is just too small and causes far too many text sections to be flagged as duplicate. Using settings of '3 - Words' is bearable but still poses similar issues, whilst going above '7 - Words' or '8 - Words' can start to open up the possibility of large sections of duplicate content being missed. Find a setting that provides the balance you are happy with or simply use the current DupeFree Pro default settings.
One thing to note when comparing two articles that have been written for the same keyword phrase(s) is that if your matching length settings are of the same or less length as the keyword phrase(s) used in the articles then they will be picked up as duplicate content as well. When this happens I try increasing the minimum matching length to one above the longest keyword phrase length and see how this effects the 'duplicate content found' percentage.
There are also a couple of extra controls next to the 'Minimum Length Of The Matching Parts=" entry box. These extra controls are a set of small up and down arrows (
) and when pressed increase or decrease the number in the entry box by one - just a nice quick way to adjust the matching number rather than having to type it in all the time.
![]()
Check Box: 'Case Sensitive' - This check box toggles on or off whether DupeFree Pro considers character case (capital letters and lower case letters) when matching content.
In other words when the 'Case Sensitive' box is checked the phrases:
"simple test Phrase" and "Simple test phrase" (Capitals emphasized with an underline)
would not be treated as a match because their character case does not match.
With the 'Case Sensitive' box unchecked these two phrases would be treated as the same because character case would now be ignored when comparing texts.
![]()
'Duplicate Content Found' Percentage - After clicking the 'Compare' button DupeFree Pro locates all the duplicate content between the two texts in the two windows panes based on your matching settings and then as well as highlighting the duplicate content in the window panes DupeFree Pro also displays the exact duplicate content percentage in this output box.
This percentage could be seen as a 'similarity percentage' between the two documents. Always remember that this percentage value will vary depending on your matching settings.
When the default DupeFree Pro matching settings are used ('Allow sentence spanning' checked, '4 - words' min matching length, and 'case sensitive' unchecked) I normally consider a 10% or less percentage to be very good. Below 5% is extremely good but often hard to achieve as you are usually comparing two articles optimized for the same keyword phrase(s) and so there are likely to be some sections of text that match which contain the keyword phrases(s) or topic specific jargon that cannot be changed.
This 'duplicate content percentage' will be new to most users and so gauging it may be a little difficult at first. To help with this DupeFree Pro displays a small emotion icon next to the output percentage value. This smiley will change from happy to sad depending on the percentage. The following are the set parameters for the smiley emotions:
| Duplicate Content Percentage | Smiley Emotion |
| Less than 15% | |
| Between 15% and 35% | |
| More than 35% |
How Duplicate Content is Displayed:
When DupeFree Pro highlights duplicate content over the two window pane texts DupeFree Pro makes it easy to see which sections of text are matched across both window panes by highlighting them with the same color. This is demonstrated for you in the screen shot below:
As you can see from the screen shot above, spotting the duplicate content is very quick and easy to do. This is very powerful side of DupeFree Pro as it enables you to instantly find and eradicate duplicate content.
As well as the matching pairs of text being highlighted the same colors, DupeFree Pro makes it even easier to find the matching pairs just in case you are comparing very long texts or articles. The last thing you want to have to do is scroll endlessly looking for the other matching text section, so in DupeFree Pro simply clicking on a highlighted section will cause the opposite window pane to automatically scroll, centering the corresponding highlighted text within its window. Both highlighted text sections will also become underlined to further help you match up the duplicate text pair.
To add to the ease of navigating the highlighted duplicate text sections you can use the arrows at the very bottom right of DupeFree Pro (
) to move through the duplicate text sections in order. With each click on the up/down arrows DupeFree Pro will centralize the previous/next highlighted duplicate text sections in the window panes so you can see the matching pairs instantly.
This concludes the duplicate content section of DupeFree Pro. I hope you can see the immense power behind this function of the software, it can save you a lot of time.
Copyright © 2006 DupeFreePro.com | All Rights Reserved