Internet Marketing Blog
How to Detect (and Fix) Duplicate Content on Your Site
I estimate that about 80% of the websites I do an SEO audit for have some sort of duplicate content issue. And there’s nothing that hurts a website more than duplicate content. In fact, if you never want to rank well on Google, all you have to do is add duplicate content to your site.
The good news is that detecting duplicate content is very easy, and fixing it is sometimes even easier. In this post I’ll show you how to do it.
Step 1: Check for Duplicate Meta Tags
Every URL of your website should have a unique title and a unique meta description. Your page titles shouldn’t be over 60 characters and your meta descriptions shouldn’t be over 140 characters.
There’s a really cool tool called Screaming Frog SEO Spider that will help you detect duplicate page titles and duplicate meta descriptions. It’s free for websites up to 500 pages URLs and if your website has more than 500 pages you can get the paid version. Run the tool on your site and then go to Page Titles and select Duplicate from the drop-down menu.
Once you know what pages have duplicate titles, all you have to do is fix them. Once you’re done with the page titles, do the same for the meta descriptions.
Step 2: Check for URL Canonicalization Issues
Google sees YourSite.com and www.YourSite.com as two different URLs. If you can access your website through both instead of having one redirect to the other, Google will see two URLs with the exact same content. The best practice is to have YourSite.com (without the “www”) redirect to www.YourSite.com (with the “www). If you want to see an example, type TheOutsourcingCompany.com into your browser and you’ll see how you get redirected to www.TheOutsourcingCompany.com.
You can set up this redirect from your .htaccess file. Warning: if you don’t know what you’re doing and mess up your .htaccess, your whole website could go down. Have an expert take care of this for you if you don’t feel confident you can do it yourself. If you want to give it a shot, there’s a great tutorial here.
Step 3: Check for Internal Duplicate Content
There are various reasons websites have internal duplicate content. Some websites have print-friendly versions of their pages, or the same type of content in both HTML and PDF formats. A lot of ecommerce sites have really bad architectures and you can find the exact same content through searches, category pages, tags and the product pages themselves. Most ecommerce platforms are very bad for SEO off-the-shelf, but can be made SEO-friendly with some minor customization.
To detect internal duplicate content on your site go to one of your pages and copy some of its content. Here’s an example:
Now you’ll need to do a Google search using advanced search operators. It’ll look like this: site:yoursite.com “text you copied”. Make sure to put the text you took from your website between quotes. I did this for the example above and look at what I found:
There are several pages on this website that have the exact same content. There are three ways to fix this issue:
- If you have two versions of the same piece of content (such as web and PDF versions), just tell Google not to index the PDF version. Put all your PDFs in one folder and use the robots.txt file to tell Google not to index that folder. This is a good tutorial on how to create a robots.txt file.
- If there’s absolutely no reason to have two or more URLs with the same content, redirect all the duplicate URLs to the main one using 301 redirects. Here’s a tutorial on 301 redirects.
- If you have a legitimate reason to have multiple URLs with the same (or very similar) content, define a canonical URL. This is a way of telling Google “I know I have multiple URLs with the same content. I have a reason for it. This is the URL I want you to count.” This is perfect for when pages A, B and C are similar but you want to keep all three. Just tell Google that page A is your preferred choice and even though B and C won’t redirect to A, Google won’t think they’re duplicates. Here’s a tutorial from Google on setting up rel canonicals.
Step 4: Check for External Duplicate Content
Take that same search query you did in step 3 and add a dash at the beginning. It’ll look like this: -site:yoursite.com “text you copied”. This will show you all the duplicate content that is not on your site. This is what I found:
We can see that the wines this company is selling are also being sold by other companies using the exact same description. How do you fix that? Ideally, you want every page of your site to have unique content, but if you have an ecommerce site with tens of thousands of products, this might not be possible. In cases like this you need to have one goal: to make your website the best place where people can buy the products you sell. There are a lot of ways to do this, but think about it: if 10 companies sell the exact same product, what reason can you give people to buy it from you? Here are some ideas:
- You can be the cheapest option
- You can have a lot of great content, such as videos and infographics
- You can have a better-looking website that’s easy to use and provides a better overall shopping experience
- You can have more product reviews than your competitors, increasing the value of your site
- You can use social media to position your company as the go-to resource in your space. Establishing thought leadership and a strong brand goes a really long way.
OK, that’s it. Not too hard, right? Now it’s time to get to work and fix that duplicate content on your site. If you have questions, post them in the comments section below and I’ll answer them for you.