Principle: Canonicalization

Canonicalization is the process of communicating to search engines the original source of content found on a given web page.

The term actually stems from the Ancient Greek word “Kanon.” A Kanon was a reed that was cut to a specific length and used as a standard of measurement, much like rulers and yardsticks of today.

measuring-rodImage Source: Canon

Fun Fact: Did you know the word “canon” is also used to describe the Bible because it is the standard of which we should compare ourselves to live?

The idea is that by showing search engines what the original source of content is, you are communicating to them what the standard is, therefore guiding them in their attribution of authority to that one, single source of the content, rather than spreading the authority across multiple web pages.

Canonicalization Diagram-01

This is now more important than ever because of the series of Google updates called Panda. Panda’s primary target was websites with duplicated content. Through the use of canonicalization methods, you can tell search engines what the original source of content was if you need to duplicate it.

Some reasons to legitimately duplicate content, include:

  • PPC landing pages
  • Blog reels
  • Staging sites

The reason why Google created panda, however, was because people were duplicating content for malicious reasons, using other people’s content to boost their own rankings, without putting in the work and being worthy of the rankings.

Those people got what was coming to them.

So, in short, never duplicate someone else’s content and claim it as your own.

You may be thinking “well, duh! not only is that unethical, that’s illegal!”

Right and right.

Beyond intentional duplication of content though, you can still potentially get hit by Panda and other canonicalization triggers in the search engines.

After all, sometimes bad things happen to good people, right?

Here are some commonly made mistakes that I’ve seen time and time again from well-meaning webmasters who simply get canonicalization wrong:

  • Multiple TLDs with the same content
  • Multiple subdomains with the same content
  • Unchecked 404’s
  • Inconsistent file extensions such as .aspx, .html, .asp, .php, etc.
  • Slashes or no slashes on the end of the URL
  • The key with these is being consistent – there’s not a right or wrong, just be consistent

Read the next couple lessons (Canonical Tags & 301 (Permanent) Redirects) to learn more about putting canonicalization into practice… the right way.