Sunday 8 January 2012

Understanding Search Engine Optimization


Understanding Search Engine Optimization

At Google, search engineers talk about “80-20” problems. They are describing situations where the last 20 percent of the problem is 80 percent of the work. Learning SEO is one of these problems. Eighty percent of the knowledge SEOs need is available online for free.
Unfortunately, the remaining 20 percent takes the majority of the time and energy to find and understand. My goal with this book is to solve this problem by making the last 20 percent as easy to get as the first 80 percent. Though I don’t think I will be able to cover the entire 20 percent (some of it comes from years of practice), I am going to write as much actionable advanced material as humanly possible.


This blog is for those who already know the basics of SEO and are looking to take their skills to the next level. Before diving in, try reading the following list:

robots.txt
Sitemap
nofollow
301 redirect
Canonicalization


The Secrets of Popularity

Once upon a time there were two nerds at Stanford working on their PhDs.
(Now that I think about it, there were probably a lot more than two nerds at Stanford.) Two of the nerds at Stanford were not satisfied with the current options for searching online, so they attempted to develop a better way.
Being long-time academics, they eventually decided to take the way academic papers were organized and apply that to webpages.
A quick and fairly objective way to judge the quality of an academic paper is to see how many times other academic papers have cited it.
This concept was easy to replicate online because the original purpose of the Internet was to share academic resources between universities. The citations manifested themselves as hyperlinks once they went online.
One of the nerds came up with an algorithm for calculating these values on a global scale, and they both lived happily ever after.
Of course, these two nerds were Larry Page and Sergey Brin, the founders of Google, and the algorithm that Larry invented that day was what eventually became PageRank. Long story short, Google ended up becoming a big deal and now the two founders rent an airstrip from NASA so they have somewhere to land their private jets.

That fateful day, the Google Guys capitalized on the mysterious power of links. Although a webmaster can easily manipulate everything (word choice, keyword placement, internal links, and so on) on his or her own website, it is much more difficult to influence inbound links.
This natural link profile acts as an extremely good metric for identifying legitimately popular pages.

Now wait a second—isn’t this supposed to be a book for advanced SEOs? Then why am I explaining to you the value of links? Relax, there is a method to my madness. Before I am able to explain the more advanced secrets, I need to make sure we are on the same page.
As modern search engines evolved, they started to take into account the link profile of both a given page and its domain. They found out that the relationship between these two indicators was itself a very useful metric for ranking webpages.


Domains and Page Popularity

There are hundreds of factors that help engines decide how to rank a page. And in general, those hundreds of factors can be broken into two categories—relevance and popularity (or “authority”).
For the purposes of this demonstration you will need to completely ignore relevancy for a second. (Kind of like the search engine Ask.com.)
Further, within the category of popularity, there are two primary types—domain popularity and page popularity. Modern search engines rank pages by a combination of these two kinds of popularity metrics. These metrics are measurements of link profiles. To rank number one for a given query you need to have the highest amount of total popularity on the Internet.

This is very clear if you start looking for patterns in search result pages.
Have you ever noticed that popular domains like Wikipedia.org tend to rank for everything? This is because they have an enormous amount of domain popularity.
But what about those competitors who outrank me for a specific term with a practically unknown domain? This happens when they have an excess of page popularity.



Before I summarize I would like to nip the PageRank discussion in the bud. Google releases its PageRank metric through a browser toolbar. This is not the droid you are looking for. That green bar represents only a very small part of the overall search algorithm.

Not only that, but at any given time, the TbPR (Toolbar PageRank) value
you see may be up to 60–90 days older or more, and it’s a single-digit
representation of what’s probably very a long decimal value.

Just because a page has a PageRank of 5 does not mean it will outrank all pages with a PageRank of 4. Keep in mind that major search engines do not want you to reverse engineer their algorithms.
As such, publicly releasing a definitive metric for ranking would be idiotic from a business perspective. If there is one thing that Google is not, it’s idiotic.

Google makes scraping (automatically requesting and distributing) its
PageRank metric difficult. To get around the limitations, you need to
write a program that requests the metric from Google and identifies
itself as the Google Toolbar.

In my opinion, hyperlinks are the most important factor when it comes to ranking web pages. This is the result of them being difficult to manipulate.
Modern search engines look at link profiles from many different perspectives and use those relationships to determine rank. The takeaway for you is that time spent earning links is time well spent. In the same way that a rising tide raises all ships, popular domains raise all pages.
Likewise, popular pages raise the given domain metrics.
In the next I want you to take a look into the pesky missing puzzle piece of this chapter: relevancy. I am going to discuss how it interacts with popularity, and I may or may not tell you another fairy tale.


The Secrets of Relevancy

In the previous section, I discussed how popular pages (as judged by links) rank higher. By this logic, you might expect that the Internet’s most popular pages would rank for everything.
To a certain extent they do (think Wikipedia!), but the reason they don’t dominate the rankings for every search result page is that search engines put a lot of emphasis on determining relevancy.

Text Is the Currency of the Internet

Relevancy is the measurement of the theoretical distance between two corresponding items with regards to relationship. Luckily for Google and Microsoft, modern-day computers are quite good at calculating this measurement for text.

Google owns and operates well over a million servers. The electricity to power these servers is likely one of Google’s larger operating expenses. This energy limitation has helped shape modern search engines by putting text analysis at the forefront of search.
Quite simply, it takes less computing power and is much simpler programmatically to determine relevancy between a text query and a text document than it is between a text query and an image or video file.
This is the reason why text results are so much more prominent in search results than videos and images.
As of this writing, the most recent time that Google publicly released the size of its indices was in 2006.

So what does this emphasis on textual content mean for SEOs?
To me, it indicates that my time is better spent optimizing text than images or videos. This strategy will likely have to change in the future as computers get more powerful and energy efficient, but for right now text should be every SEO’s primary focus.

But Why Content?

The most basic structure a functional website could take would be a blank page with a URL. For example purposes, pretend your blank page is on the fake domain www.WhatIsJessicaSimpsonThinking.com . (Get it? It is a blank page.) Unfortunately for the search engines, clues like top-level domains (.com, .org, and so on), domain owners (WHOIS records), code validation, and copyright dates are poor signals for determining relevancy.
This means your page with the dumb domain name needs some content before it is able to rank in search engines.

The search engines must use their analysis of content as their primary indication of relevancy for determining rankings for a given search query.
For SEOs, this means the content on a given page is essential for manipulating—that is, earning—rankings. In the old days of AltaVista and other search engines, SEOs would just need to write “Jessica Simpson” hundreds times on the site to make it rank #1 for that query. What could be more relevant for the query “Jessica Simpson” than a page that says Jessica Simpson 100 times? (Clever SEOs will realize the answer is a page that says “Jessica Simpson” 101 times.)
This metric, called keyword density, was quickly manipulated, and the search engines of the time diluted the power of this metric on rankings until it became almost useless. Similar dilution has happened to the keywords meta tag, some kinds of internal links, and H1 tags.

0 comments:

Post a Comment