A Universe of Graphs
Posted by gfiorelli1
Sometimes, we have to look at things from far away to really understand them.
Ever since I can remember, I have loved astronomy and science fiction. When I was a kid in the 70s (yes, I’m older than most of you), the space adventure was still something mythical; men were driving the Moon Buggy, Viking 1 and 2 sent the first pictures from Mars; the Pioneer 10 had been just sent to the boundaries of the Solar System with its famous golden plaque (our message in a bottle for those who might receive it); and Carl Sagan was presenting Cosmos: A Personal Voyage on PBS.
Perhaps it is for that passion that when I saw the graphic representation of the Internet for the first time, a question came to mind: what if the visualization of the universe and of the web are more similar than we think?
(Visualization of Internet – Credits: Opte.org)
I know it’s a metaphor – one of many – but perhaps it is the most effective among those we have available.
One thing is clear: we should be aware that the universe is not only what we see with our eyes, and there is no one law that commands it; and so it is with the web, and especially with Google.
Gravity, relativity, and quantum theories are some of the laws that govern the universe we know. Physical and mathematical laws that translate into formulas and algorithms, as well as algorithms and graphs, are what govern the Google universe.
In this post, I invite you to put on your astronaut suit and travel with me through the universe of Google on a mission to understand how, perhaps, it works.
Before we go, a space travel needs a soundtrack, always:
Our site, our Earth. A blue dot in the darkness. Isn’t it beautiful seen from space?
Yet Earth is the only one of nine planets in our solar system where life could have evolved to a sentient state.
But, if Earth was out of the so-called “habitable zone,” life as we know it would not have been possible.
The same is true with our websites; our online homes.
An incorrect navigation architecture and crawlers won’t be able to index the whole site. An bad use of robots.txt, of rel=”canonical”, and of the meta robots tag (or simply, the use of an incorrect design) and important parts of a website will be as invisible as the dark side of the moon.
These incorrect points and countless others especially related to user experience (speed, heavy ad presence, duplicated and thin content, etc.) or to what may correctly explain to crawlers what a page is about (for instance, structured data) are those things that make our website “uninhabitable” in the eyes of the search engines.
These algorithms – we’ll refer to them as the “technical graph” for convenience – are what Google uses to determine if a site is habitable, just as scientists measure the presence of oxygen, water, seasons, and many others things when considering if a planet has the ability of hosting life.
A habitable zone is a prerequisite that, without, with we cannot even think to start our journey. This alone should help us understand how technical SEO is and still will be an essential element in our plans as Internet marketers.
The beginning of the journey: the Link Graph
Did you know that life on Earth might not have been possible without the Moon? At least, this is what many scientists affirm. But even without going to that extreme, it seems certain that our giant satellite has played a role in the creation of life.
Similarly, in the universe of Google, a site cannot be considered habitable without the influence of a key external factor: the Link Graph.
The Link Graph is the representation of the relation between web pages (Nodes) through links (Edges).
There is an “internal” Link Graph which exists in the connections between pages of the same Pay-Level Domain (domain.com) and Fully Qualified Domain Names (commonly called subdomains, as it is www.domain.com) and an external one, which is based on the connections via backlinks between pages of different domains and that we normally call the “Link Profile.”
One of the laws ruling the Link Graph of the Google universe is the PageRank algorithm.
In the PageRank algorithm (explained in the least complicated way possible), a link to a page counts as a vote of support.
The PageRank of a page, then, is defined recursively and depends on the number and PageRank of all the pages that link to it ( also known as “backlinks”).
Therefore, a page that is linked to by many pages with high PageRank will, in turn, have a high PageRank. If a page has low-value backlinks or no backlinks at all, then it will have low PageRank or PageRank zero.
Apart from the page level PageRank, there is also a Domain PageRank, which is the aggregate value of all the single PageRanks of the pages of a site. It explains why having a link from a page with low PageRank, but great Domain PageRank can be better than having a link from a page with a relatively better PageRank, but a worse Domain PageRank.
PageRank (which technically is a query-independent ranking model) isn’t the only factor that plays a role in the link graph. There is also a second mode of connectivity based-ranking, this time query-dependent has a major role. This mode is based on the HITS algorithm, which declares that a document which points to many others might be a good hub, and a document that many documents point to might be a good authority. Recursively, a document that points to many good authorities might be an even better hub, and similarly a document pointed to by many good hubs might be an even better authority, as Monika Henzinger of Google explained (quote from Search Quality: The Link Graph Theory by Dan Petrovic).
SEOmoz (now Moz) is a good example of a site which responds positively to both algorithms. To PageRank, because it has more than 6 million backlinks from 40,843 unique domain names (OSE data before the migration to Moz.com), and to HINT, because it is both a hub and an authoritative site according to that algorithm.
The problem of PageRank and the HINT Algorithm is that they can be altered artificially using manipulative techniques.
For that reason, a great part of the Google updates’ history is the tale of Google fighting the effects of those techniques to keep its mission: presenting only the best results in the SERPs.
One update was assigning a value to a link depending from what section of a web page (header, sidebar, footer, “body”) that same link is published
And this is why we have seen changing so many times the so-called ranking factors related to the link profile.
What are the factors related to the Link Graph that are taken into consideration by Google now? No one knows for sure, and SEOs can only reasonably guess through correlation studies.
The latest one was presented by Searchmetrics during the last edition of SMX London in May (and Rand announced Moz is cooking a new edition of their Ranking Factors in time for MozCon). Very few factors presented were directly related to backlinks:
|Factor||Spearman Correlation value|
|Number of backlinks||0.34|
|% Backlinks with Keywords||0.08 (diminishing value)|
|% Backlinks rel=”nofollow”||0.23 (increasing value)|
|% Backlinks with Stopword||0.17 (increasing value)|
While we wait for the publication of both the complete report by Searchmetrics and the new Ranking Factors study by Moz, here you can find:
Why is it important to know how the Link Graph works? Because even though the power of backlinks in the Link Graph is diminishing, by percentages, it is still the most important factor in the Google algorithm. Knowing how it works and what makes a link graph of a site a good or bad is essential for the success of a site, and Penguin is here to prove it.
It also tells us very clearly how being an hub and/or an authoritative site prizes in the Link Graph, hence we should …
You can read the full article at Moz Bloghttp://seopti.com/a-universe-of-graphs/http://seopti.com/wp-content/uploads/2013/07/d9280963de_visualization-of-internet.jpghttp://seopti.com/wp-content/uploads/2013/07/d9280963de_visualization-of-internet-150x150.jpgSEOMOZ