Duplicate content refers to replicated content that appears in different places of the web or even within the same website. There are many methods through which duplication occurs and it is not merely limited to ‘copied and pasted’ work like your teacher You must apply proper measures to shield you from negative rankings by search engines in these cases, since you cannot avoid duplication altogether as most of duplication is a necessary part of your web presence. The mitigating measures for these instances of content duplication are broadly known as Duplicate Content SEO or International SEO. Before going through the details of International SEO, let us first highlight how contents are duplicated in the web.
Suppose there are two URLs for an article on your website: ‘www.mysite.com/article1’ and ‘www.mysite.com/archives/article1’. These are links to the same content and merely different paths it can be reached by. However, to the search engine, these are different contents and are treated as duplicates because of the variation in URL.
Contents that are copied from various places in the web are duplicates and a search engine will only rank the version with highest authority. Other sites will be treated as duplicates and will either not be properly ranked or be completely deindexed.
If you have a site that starts with www and also own the link that does not, it can be confusing to the search engine, which will flag your sites as duplicates of each other. The same is applied for ‘http’ and ‘https’ sites.
Contents that are republished on another site despite getting permitted for publication will still pass as duplicates unless you specify likewise to the search engines.
Print friendly versions of pages on the web will also pass as duplicates. Most CMSs will develop a print friendly version and they can be inadvertently linked. Google crawlers will treat them as duplicates unless they are blocked specifically by using robot.txt nofollow tag.
You store session Ids of users in cookies. However, each URL appended with the session ID will be treated as a duplicate.
Unless you add noindex tags or canonical tags to your staging site, there is a fair chance that it will duplicate the content of your main site. Search engines will take the content on the sites as duplicates and you will lose the ranking and authority of your main site.
Paginating comments adds an identifier to the end of URL like /comments/page-1 and /comments/page-2. Despite this being the same page, search engines will treat them differently because of different URLs and will pass as content duplication.
You prepare sites for different geo locations and each contains a different URL despite a majority of the content being the same. These versions of your sites will be treated as duplicates unless you specify otherwise to the search engines. Moreover, multilingual sites with little difference in content like the American English version of a site and a British English version of a site will pass as duplicates.
Large companies will have a huge number of products with most products differing from the other by a few parameters like price, texture etc. The product description pages is a rich source of duplication. Plus, resellers who list the same product on their websites will have their contents duplicated with each other.
If URL parameters like author and category, for example, are switched, then the two URLs are treated as duplicates by the search engines. These are a rich source of content duplication as website owners seldom take notice of these subtle variations.
These are a few sources of duplicate content among which a majority of them arise despite your wrong intentions. Maintaining sites for various geo-locations, in different languages and in different Country Code Top Level Domains (ccTLDs) are some of the most widely used practices for going global in the web. However, without specifying to the search engines that the content duplication is because of the various versions prepared for a target audience, you will lose the ranking potential of your pages.
International SEO practice will help you in these situations as these methods are brought forth by the search engines themselves to help website owners distinguish necessary content duplication like a few mentioned above with malpractices.
What do you do if you have to purposefully keep the same content in various places in your site? To avoid content duplication, use the ‘rel canonical’ tags to specify to the search engines which content is the primary one and the one that you want to be indexed/ranked. It also helps with your link authority as most of the links to the non-canonicalized pages will be transferred to the page of your choice by the search engines via redirection. Use canonical tags if you are using syndicated content from other sites to specify the source content and urge others who syndicate your contents to do so too.
You will have different versions of your site for different languages. A search engine will not know which one to show for a search from a specific location unless you tell it. Since it will not rank multiple pages with same content, the rank of your pages will be lowered. To make it easy for the search engines and also boost up your ranking, use the hreflang tag for Google and the meta language tags for Bing and Baidu. This will distinguish your content language specifically and the desired language version will be served to users searching in matching languages.
If you have various versions for http and https, www and non www etc. then you must redirect the secondary link to the one you wish to build as a primary one. Also redirect any dynamic URLs to the main URL for that page. Also redirect from your old site to the new one to preserve traffic and link juice.
For print versions of a page, which will not win you any traffic but will dilute your ranking by passing as duplicates, use robot.txt code for specifying no-follow to the search engines. However, do not use robot.txt to hide all duplicate content from your site. Let all contents be crawled but instead tell the search engine the reason for duplication with proper tags like rel-canonical etc.
The most easiest and most important of all measures is to avoid content duplication. It is estimated that nearly one third (30%) of the content in the web is duplicated content. Do not copy contents from other sources or aggregate them. This will also diminish the authority of your other genuine pages that are placed together.
To monitor the implementation of hreflang tags and other features, log into the google webmasters tool and get insights into the performance and effect of the tags employed. It will also provide you recommendations for fixing errors.
There are many benefits to international SEO if you do it right. Given that the web is rampant with duplicate content and people who get the basics of International SEO wrong, proper implementation will propel you swiftly upwards the Google rankings. It will also save you a lot of hosting money because you will not have to rely on costly measures like having a separate ccTLDs for each nation and a single TLD will work all fine. Moreover, consider the advantage of having to manage a single site through a single CMS rather than having to make changes in multiple sites possibly in different CMSs.
AndMine specializes in cutting edge International SEO practices which are backed by our zero day alignment with almost all search engine modifications and changes. With our team you will be looking at the best value for every dollar you spend.
It is great working with such a dedicated and competent team in this ever changing space and I would highly recommend Michael and his work.