http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html. Though the difference between 301 and 302 redirects appears to be merely semantics, the actual results are dramatic. Google decided a long time ago to not pass link juice (ranking power) equally between normal links and server redirects. At SEOmoz, I did a considerable amount of testing around this subject and have concluded that 301 redirects pass between 90 percent and 99 percent of their value, whereas 302 redirects pass almost no value at all. Because of this, my co-workers and I always look to see how non-canonicalized pages are being redirected.
It s not just semantics. How a page is redirected (whether by a 301 or a 302 redirect) matters.
WARNING Older versions of IIS use 302 redirects by default. D oh! Be sure to look out for this. You can see worthless redirects all around popular IIS-powered websites like microsoft.com and myspace.com. The value of these redirects is being completely negated by a single value difference!
Canonicalization is not limited to the inclusion of letters. It also dictates forward slashes in URLs. Try going to http://www.google.com and notice that you will automatically get redirected to http://www.google.com/ (notice the trailing forward slash). This is happening because technically this is the correct format for the URL. Although this is a problem that is largely solved by the search engines already (they know that www.google.com is intended to mean the same as www.google.com/), it is still worth noting because many servers will automatically 301 redirect from the version without the trailing slash to the correct version. By doing this, a link pointing to the wrong version of the URL loses between 1 percent and 10 percent of its worth due to the 301 redirect. The takeaway here is that whenever possible, it is better to link to the version with the forward slash. There is no reason to lose sleep over this (because the engines have mostly solved the problem), but it is still a point to consider.
CROSSREF The right and wrong usage of 301 and 302 redirects is discussed in 3. The correct syntax and usage of the canonical link element is discussed in 5.
Robots.txt and Sitemap.xml
After analyzing the domain name, general design, and URL format, my colleagues and I look at potential client s robots.txt and sitemap. This is helpful because it starts to give you an idea of how much (or little) the developers of the site cared about SEO. A robots.txt file is a very basic step webmasters can take to work with search engines. The text file, which should be located in the root directory of the website (http://www.example.com/robots.txt), is based on an informal protocol that is used for telling search engines what directories and files they are allowed and disallowed from accessing. The inclusion of this file gives you a rough hint of whether or not the developers of the given site made SEO a priority. Because this is a book for advanced SEOs, I will not go into this protocol in detail. (If you want more information, check out http://www.robotstxt.org or http://googlewebmastercentral.blogspot.com/2008/06/improving-on-
robots-exclusion-protocol.html.) Instead, I will tell you a cautionary tale. Bit.ly is a very popular URL shortening service. Due to its connections with Twitter.com, it is quickly becoming one of the most linked websites on the Web. One reason for this is its flexibility. It has a feature where users can pick their own URL. For example, when linking to my website I might choose http://bit.ly/SexyMustache. Unfortunately, Bit.ly forgot to block certain URLs, and someone was able to create a shortened URL for http://bit.ly/robots.txt. This opened up the possibility for that person to control how robots were allowed to crawl Bit.ly. Oops! This is a great example of why knowing even the basics of SEO is essential for webbased business owners. After taking a quick glance at the robots.txt file, SEO professionals tend to look at the default location for a sitemap. (http://www.example.com/sitemap.xml). When I do this, I don t spend a lot of time analyzing it (that comes later, if owners of that website become a client); instead, I skim through it to see if I can glean any information about the setup of the site. A lot of times, it will quickly show me if the website has information hierarchy issues. Specifically, I am looking for how the URLs relate to each other. A good example of information hierarchy would b e www.example.com/mammal/dogs/english-springer-spaniel.html, whereas a bad example would be www.example.com/node type=6&kind=7. Notice on the bad example that the search engines can t extract any semantic value from the URL. The sitemap can give you a quick idea of the URL formation of the website.
URLs like this one are a sign a website has information hierarchy issues because search engines can t extract any semantic value from the URL.
