Australian Automotive Landscape Visualised In Gephi Network Graph

Understanding Industry Verticals Through Backlink Visualisation and Analysis

Innovative ways to analyse a search category for content and relationship opportunities

An Introduction and Background

In a few days it will be my third anniversary of arriving in Australia, and it has been so very different from how I imagined it when I stepped off the plane in Sydney for the first time. Having arrived in Sydney I set about exploring and loved every minute of it – so much so that after about 3 weeks, I thought it would be interesting to stay a bit longer than the couple of months that was originally planned. I looked for some contract work and ended up being offered the opportunity to start the Organic Search team within Mediabrands (MB) /Reprise Media (RPM) – something that I considered a huge opportunity. We have since built the team from 4 people, to just under 60 in 3 years, and I anticipate the growth with continue into 2013.

I have recently transitioned out of my role as SEO Director to an Innovation and Technology role – which means that I finally have time to document and share some of the conceptual, current and now historical work that the RPM Sydney team have been doing – in a pretty progressive environment. This isn’t intended to be a blow our own trumpet post – but I certainly want to set the scene before I explain a bit more about some of the fabulous visualisations that we have created. A quick example is one that I have included below:

Australian Automotive Vertical Visualised In Domain BacklinksImage #1 Australian Automotive Vertical Visualised In Domain Backlinks

I believe (obviously biased) that RPM have a very different and future facing view of search, social and digital (certainly in Australia and maybe wider afield) and possibly a unique perspective of offsite optimisation and how to approach digital media, PR, social engagement, content creation and relationship building. This is, I think, down to the positioning of RPM as a specialist agency servicing a large media group (Mediabrands), however, there are many examples where this setup doesn’t work in large media agencies or media groups, due to lack of understanding, egos, and possibly greed within the holding group. That said, it is working here for both clients and the agencies within Mediabrands – and that’s what I want to talk about. I think we are very fortunate to be in the position that we are in, having been given the encouragement and opportunities by the senior team within MB; Henry Tajer and Reg Davidson to get on and seize opportunities rather than watch them go by over the last few years, and I think that this is a great example of a larger media group’s teams collaborating.

The diverse skills and resource mix that MB has within the teams here in Australia is unique to IPG/MB globally and is a mix that that has been tremendously successful for agency clients both from a media perspective and specifically from a digital perspective. Many other agencies are trying this specialist approach, but I am not aware of any who are doing it as successfully as we are doing it here.

In relation to search and social marketing, the RPM product is based on a solid understanding of business requirements, comprehensive market/landscape analysis (The opportunity), identifying and interrogating the key websites (the influencers) that control and drive traffic around a landscape, and how the relationship between online & offline spend (investment weighting) affect both consumer interest, and visibility in search and social environments. As we look at it – a competitor is anyone who is limiting a brands exposure of any of its assets within a particular platform – search/social/social networks/video platforms/other. It is due to the wide mix of skills that both RPM and MB have access to that made the initial analysis and consequentially this post possible.

Whilst the RPM team have the capability to build solutions that simply give us this data as an actionable or insightful output, I feel that there is tremendous value in understanding the data, the information, using this to re-interrogate the data to highlight insights that were never expected to be identified through this analysis. I would much rather use our brains and resource to get to where we need to get to, though pushing the limits of off the shelf tools, before investing in bespoke solutions that might not provide the answers or the questions that needed to be asked or raised.

There has been a lot of search industry shift towards data analysis in recent years to the point where I believe that statisticians, mathematicians and analysts will make up significantly larger numbers of the wider media teams in digital agencies than they currently do. It is the move towards these skills sets that are introducing new tools like Tableau and Gephi to search specialists – as has been seen with the likes of Distilled, Seer and others starting to showcase what are run of the mill tools for data analysts.

My intimate knowledge of the auto category, competitors, and the ease of collection/ access to data, was the driving factor in selecting the Australian Auto Manufacturers Vertical for this analysis / demonstration of some of the visualisations that we have utilised.

For this example, I chose to focus attention on KIA and Hyundai Australia, as when conducting an analysis like this, one could prioritise a single website, a key competitor, or a defined competitor set – the intention with this case was to demonstrate that the analysis and insight opportunity is an example of what could be achieved if one were to invest time in not only understanding a category, but also interrogating competitors and their mutually exclusive competitors (IE – a Car Brand like Audi would be interested in BMW or Mercedes but not Suzuki)

Disclosure: Mediabrands have both KIA and Hyundai as clients through Initiative Media

The Australian Automotive and Cars Category

Historically the Australian Car and Auto Market has been dominated by Toyota, Holden and Ford, with the other major Japanese brands making up the majority of sales. The last 3 years have seen some significant changes: former #1 Holden is losing significant share, Hyundai is leading the Korean charge outselling Ford and sitting at #4 year to date sales figures in 2012 (see Jan-Oct sales below), and Australia is an outstanding success for Mazda, performing better here than in most other markets globally. Toyota absolutely dominates in the #1 sales position.

Australian VFACTS New Car Sales 2012 YTD October
Australian VFACTS New Car Sales 2012 YTD October

Chart #1 –  VFACTS October 2012 Data

The premium end of the market is dominated by the 3 Big German brands; Mercedes, BMW and Audi, whilst VW is the only European brand to impact the mainstream having aggressively brought down their pricing to make their cars much more attainable to a broad consumer base.

It is expected that the Chinese brands will generate significant sales moving forward, but not for a number of years until their products are considered more competitive and the quality of the product increases considerably.

The Australian Digital Media Landscape – A Very Brief Overview

In Australia (a land of few publishers) the large publishers or media owners (News, Fairfax) and networks (Yahoo7 and nineMSN) own many of the prominent auto news, classified cars sites, and car review websites.  Naturally these few properties would link to almost all of the Auto manufacturers websites, however, car communities and fan sites far outnumber, and out date the more modern phenomenon of social media pages – what impact does the size of these communities have on a manufactures backlink profile – what opportunities exist outside of these big publishers and how can a brand challenge these powerful sites in the search and social space.

The Idea – Visualising the backlink profiles of the leading Australian Car Manufacturers

The kind of questions that this visualisation could assist in answering and communicating:

  • Which brands are dominant from a link perspective
  • What social assets are brands investing in
  • What influence and link equity do these social assets have
  • Which “publisher” websites are most influential in the category
  • What content makes these brands and their social assets popular with publishers and other websites

With the progress that the Korean car brands (KIA & Hyundai) have made in the Australian market the analysis was focused around them for this example, however it would be easy enough to take the same data and skew it towards any particular asset (an asset is: any digital property). The intention was to identify the differences in backlink profile between brands and their assets, and compare between more established manufacturers and individual manufacturers.

The beauty of all of this analysis and its flexibility, is that it is then possible to cross reference unique backlink profile data with visibility in results (data that most search agencies have, or have access to), and potentially to even match this with paid search visibility reports through with the share of voice data from Adwords & Kenshoo to identify which prominent or influential websites could be deemed priority targets for not only relationships, content partnerships etc. but also for reviews, display or affiliate activity that drives true business value to assets in this category.

Over the years, I have seen numerous visualisations of backlink profiles, but very few allow one to gain actionable insight – the currency of kings in our ever evolving digital landscape. My opinion is that a lot of the current visualisation of backlink data is more focused on volume of data to give indications of overall trends rather than using to firstly strategise a campaign, but secondly to validate strategy & positioning when planning activity or having a discussion about the authority of competitive websites.

With the last 12 months of heavyweight spam fighting from Google, many of the websites who are prominent in the category have been penalized for infractions by various updates and looking for relationship and content opportunities has become more important. These updates have also resulted in a shift towards my thinking over the last three years that bigger brands can normally only be persuaded to action something when they can visualise it. Visualising may be conceptual, or data led, but as long as one can show the opportunity simply it goes a long way towards getting clients to action campaign or content recommendations.

Collecting and Analysing Backlink Data for This Project

I built out a simple table (see appendix) of the 25 relevant manufacturers in the AU market and collected all of the digital assets that I considered of value:

  • Website
  • YouTube Channel(s)
  • Blog
  • Facebook
  • Twitter
  • G+
  • Others of Interest (not used)

NB. Some luxury competitors were ignored –Eg.  Ferrari and Lamborghini

Sources of backlink information are limited:

  • Google Analytics – Referrals (Owned Properties Only)
  • Google Webmaster Tools – Linking Sites (Owned Properties Only)
  • ahref – Backlink Data
  • Majestic SEO – Backlink Data
  • SEOMoz – Opensite Explorer (OSE) – Backlink Data (Owned and Competitive Properties)

For this example, I chose to only utilise the OSE data from SEOMoz, however if de-duplicated correctly one could extend this to any source of backlink data or other metrics such as social mentions, shares etc.

Each asset had the backlink profile downloaded through the SEOMoz OSE through one of our RPM proprietary tools called Europa (you could pull it manually if so desired). Initially, it was decided that due to the volume of data that we would start with a domain level analysis rather than page, despite the fact that we pulled page level data – as I felt that this would be a next step to create a granular action plan which would involve only a limited number of Brands and/or assets reviewing content on sites to identify what content types were resonating with referring assets – eg. What and why are people linking?

When originally conceptualising this project, through an off the cuff conversation with the MB Marketing Sciences Team, I became aware of an Open Source tool for visualising data and building out network graphs called Gephi – . The basic principle that we worked with was that assets were nodes in the network graph, and links were edges – that we would then visualise between the individual nodes.

From everything that I was told Gephi’s off the shelf ability to manipulate and present data was spectacular, and we were curious to see what it would/could do with a large dataset such as the backlink profiles of a large category such as Automotive. Early visualisation that I created reminded me very much of the output of Google fusion tables which Eva Jio (again within the RPM Search Team had experimented with after seeing the Seer Interactive post) – an example of these early Fusion Table outputs, I have included below:

Google Fusion Tables OSE Backlink Visualisation Image #2 – Google Fusion Tables OSE Backlink Visualisation

Mediabrands Marketing Sciences’ Gephi Wizard – Habib Adam created an initial Gephi file – importing the data from CSV. Much of the early time invested was spent explaining the SEOMoz Metrics, relationships, understanding what the SEOMoz OSE provides, understanding domains: sub-domain relationships, and then cleaning the data up to ensure that there were no errors occurring with the import into Gephi. A basic methodology to create the network graphing is outlined in the next section.

Creating An Industry Vertical Network Graph Of OSE Link Data in Gephi

To set up this network map, the network data outlining nodes and links was set up for use in Gephi by aggregating the SEOMoz OSE Backlink data for all Automotive Manufacturers in the Australian Market.

Gephi and Data Setup Notes:

  • Information lists the unique nodes based on their subdomain, and all links between those nodes and other asset nodes within the network.
  • Links between subdomains we ignored. We focused exclusively on links to other websites
  • This dataset also includes domain authority information
  • A field was created to control the colour-coding and sizing of special nodes of interest within the network
  • The key to this is the algorithm that is used to map out the nodes and links, and knowing how the algorithm works helps identify the insights from the network constructed.
  • Force Atlas II was the Gephi algorithm used to create this map, resulting in a map whereby nodes are attracted to the other nodes with which they share a link.
  • As a result the map has the following spatial properties:
  • Nodes on the outer edges of the network are linked to many nodes within the network
  • Nodes which cluster around an asset node represent unique links to that asset
  • Nodes with links to similar asset combinations are located close to one another

 Obviously, there are a number of insights that appear when the data is arranged as a simple network graph:

Australian Automotive Landscape Visualised In Gephi Network Graph Image #3 – Australian Automotive Landscape Visualised In Gephi Network Graph

Network Graph Key and Notes

  • The spatial element only represents some of the information contained within this map
  • Additional information is contained within the node sizes and colour
  • NB. The largest nodes are the asset nodes – these have been coloured green with the exception of Kia and Hyundai’s websites, which have been coloured red to distinguish them as they were the two websites we had a special interest in for the purpose of this investigation
  • The remaining nodes are sized based on domain authority, the largest being those with the highest the authority
  • Pink nodes indicate links that are unique to a particular asset node
  • Blue nodes represent domains that Kia have links from, but Hyundai are not
  • Vice-Versa, Yellow nodes represent domains that Hyundai are linked from but Kia are not
  • Green nodes represent pages that both Kia and Hyundai share links to
  • Nodes which neither Hyundai or Kia share links to are coloured on a colour-scale from white to red depending on how many other competitors have a link to their page on that site – white nodes have the fewest external links and red nodes have the most (ranging from 2 to 16)

Some Of The Early Top Level Insights Gained

  • Understanding of the size and diversity of the competitors backlink profiles
  • Similarities and relationships between different manufacturers and their digital footprints
  • Identification of segments within the automotive category that could be leveraged with other digital media – Performance, Display, Social
  • An understanding of the influence of social communities (forums, fan sites) on the more established brands
  • Identification of influential websites that link to many of the websites in the category that exclude one or both KIA and or Hyundai

Examples Of Granular Insights

Insight #1

In this example(where the focus is on KIA and Hyundai), one need not concern oneself with the green nodes, particularly those on the outer edges of the network as these indicate common domains for links to automotive websites that both Hyundai and Kia are already linked from:

Commonly Linked Domains in The AU Auto Landscape Image #4 – Commonly Linked Domains in The AU Auto Landscape

Insight #2

The specific example below shows how YouTube is already links to Hyundai and Kia, as well as some of their competitors.

YouTube linking To Many Car Brands Image #5 – YouTube linking To Many Car Brands

Insight #3

Green nodes located between Hyundai and Kia indicate sites that both are linked to, however not many other competitors are. This could indicate that the suitability of these links may need further consideration – why are other brands assets not being linking to by these sites?

The following example highlights a few such nodes. This is a great and very effective opportunity to identify spammy or low quality sites linking to you, or instances where negative SEO may be occuring, which I have noticed a number of times in very competitive categories where even brand new sites have been blitzed with poor quality 

Domains Almost Exclusive to KIA and Hyundai
Image #6 - Domains Almost Exclusive to KIA and Hyundai

Image #6 – Domains Almost Exclusive to KIA and Hyundai – A

Image #6 – Domains Almost Exclusive to KIA and Hyundai – Zoomed

Insight #4

The obvious pages requiring the attention of KIA and Hyundai are the red nodes on the outside of the network, particularly those with higher domain authority which is signified by a larger sized node. These indicate domains that link to many competitors but not to either Kia or Hyundai, representing a clear opportunity for them to capitalize on gaining either visibility or links, via a site that others seem to have identified as key. In this  example network map looks like a website that some kind of relationship could be created – either from a content, partnership, PR or outreach perspective.

Identifying Opportunities Through Backlink Visualisation Image #7 – Identifying Opportunities Through Backlink Visualisation

Insight #5

Blue nodes located on the outskirts of the map identify opportunities for Hyundai, that Kia and many other competitors are already taking advantage of, while the yellow nodes identify such opportunities for Kia. In the below charts, it is clear to see that is an example of an opportunity of this sort for Hyundai, while is one identified for Kia.

Hyundai Identified Opportunities Image #8 – Hyundai Identified Opportunities

KIA Identified Opportunities Image #9 – KIA Identified Opportunities

Vertical Graph of the Same Data

This chart (Image) is pushing the limits of Gephi, with the volume of data that we are analysing with the tool, but it is an enormous image, so I have chose to provide a link rather than include it in the post itself.

Node Range Colour Spectrum

Image #10 – Node Range Colour Spectrum

I have the image available in Dropbox should you want to download the original – Note it is 20MB

Vertical Chart Overview Key and Notes

  • It is very simple to grasp the sheer volume of Pink "unique links" – these are unique to each individual brand (and exclude KIA or Hyundai)
  • With the spectrum of White to Red Links it is simple to see the priority domains weighted by domain authority
  • Green domains are linking to at least KIA and Hyundai

Future Opportunities With This Kind Of Visualisation

I would hope that this post stimulates and enthuses members of the SEO and analytics community to go and have an explore of what Gephi is capable of, and how else this tool can be used to identify opportunities within our industry.
  • For me looking at a de-duplicated SEOMoz OSE, Majestic SEO and ahref backlink profile for the industry would be a curious point to explore – obviously domain authority would have to be dropped as a metric within the charting, but this isn’t a huge concern.
  • Mashing up any backlink data with Search Visibility (or rankings) – both from a paid search and organic search perspective could give the websites a visibility scoring that could further add value to understanding the potential partnership that a website could bring to a brand.
  • Essentially if you were to have a list of all assets (even down to individual YouTube videos – available through the API) you could check multiple signals – likes, shares, links – it just depends what value this could generate for you.
Thanks, and I hope that this has proven interesting reading for you and that you're going to go and play with OSE and Gephi going forward.


Andrew Hughes

Director: Innovation and Technology Reprise Media Australia

Disclosure, Credits, Contributions & Acknowledgements

  • Habib Adam – Gephi build and visualisation.
  • Eva Jio – Initial investigative work using Seer Interactive concept with Google Fusion Tables.
  • Chris Carr – Head of Marketing Sciences Australia for his thoughts, enthusiasm and acceptance of venturing into these new areas with his team.
  • Dan Tighe – Commercial Director at Initiative Media – Additional insight into the AU Automotive market and clarification on a couple of the changes in the market in recent years.
  • Robert LloydSenior SEO Analyst at RPM for his thinking whilst we were conducting this project, and allowing me to bounce ideas off him all the time. Robert Lloyd’s Blog.

Data Utilised & Sources

  • Data Source was SEOMoz’s OSE
  • Data was collected and analysed over recent months – there may be updates since the data was downloaded

The brands and owned assets that were including within this, are included in the table below:

Australian Car Brands and Their Digital Assets Image #8 – Australian Car Brands and Their Digital Assets

Other and Further Reading

Some of the previous data visualisation of similar data that I have seen and that I would like to recommend that you peruse are below:

  • Neil Walker

    Great Article Yozza and thanks for the ping back, some of the images from my blog seemed to have gone awol :) anyway I have now added them again, but again nice article.

    • y0z2a

      No problem.

      Sorry been on holiday so only just seen your reply.