11 January 2017

What are robots.txt files? Featuring 15 of our favourites

By James Phillips

At MCM Net we like nerdy website stuff. Here is a prime example.

Robots.txt files are an important (if not very sexy) part of SEO, but that doesn’t mean that you can’t have fun with them.

Robots.txt files are there to prevent the crawling and indexing of particular areas of your website by web crawlers and spiders (internet bots that systematically browse websites, typically for indexing purposes) from sites such as Google and Yahoo.

By telling the internet bots where not to go, you can prevent certain pages from appearing in search engine results. You also save bandwidth and server resources, to help speed up your website. This is especially useful for larger sites and those with a lot of traffic, or ones on slow servers.

Just as the robots from famed author Isaac Asimov’s universe are governed by the Three Laws of Robotics, internet robots have their own rules to abide by, known as The Robots Exclusion Protocol. Before a bot visits your website, they must first check your robots.txt file.

For a run-of-the-mill example of a robots.txt file, you can view our very own http://www.mcmnet.co.uk/robots.txt.

MCM Net's robots.txt file

Our website’s functionality is straightforward, without many of the types of pages that Google recommends you disallow from being indexed (such as login areas and search results). In our case, we just ask bots not to crawl our admin pages.

User-agent: * means that the instruction applies to all robots. Some websites have defined rules for different robots (for example see http://www.bbc.co.uk/robots.txt).

Websites built with plugin architecture such as WordPress, have the option of simply installing an appropriate plugin that will generate a file for you. Otherwise you need to ask your developer if you need to create or change a robots.txt. You can find lots more information by visiting http://www.robotstxt.org/robotstxt.html.

Now for the fun part. Some websites like to get creative with their robots.txt files and we’ve shared our favourites below. Anything after a hashtag in a robots.txt is ignored by search engines, which is why websites can get away with website Easter eggs such as these.

Just crawl it  - Nike

Not content with including a pun on their own slogan at the top of the file, Nike reward you for scrolling all the way to the bottom with a not-so-subtle ASCII art of their logo.

nike.com robots.txt -- just crawl it

nike tick within robots.txt

Source: http://www.nike.com/robots.txt

Job posting for SEOs by TripAdvisor, White & Seer Interactive

Here are three examples of companies using their robots.txt to hide job adverts. The idea being that, if a human is reading your file, they might be the right fit for an SEO role. Or they just read a blog post on the subject.

Tripadvisor robots.txt file

Source: https://www.tripadvisor.com/robots.txt

White robots.txt file

Source: http://white.net/robots.txt

Seer Interactive robots.txt file

Source: http://www.seerinteractive.com/robots.txt

I, Robots.txt featuring Yelp & last.fm

We mentioned Asimov earlier and here is Yelp’s web team using their robots.txt to pay tribute to the sci-fi author. While, last.fm use a more understated version of the same Asimovian concept.

Yelp.com robots.txt file

Source: https://yelp.com/robots.txt

last.fm robots.txt file

Source: http://www.last.fm/robots.txt

Friendly robot messages from Wikipedia and Etsy

Wikipedia’s file is full of helpful, thoughtful comments such as this. While Etsy hide a cute, little character at the bottom of their very long robots.txt.

Wikipedia robots.txt file

Source: https://en.wikipedia.org/robots.txt

Etsy robots.txt file

Source: https://www.etsy.com/robots.txt

Less-friendly robots

There is a chilling, dystopian future (past?) relayed through YouTube’s robots.txt file.

YouTube robots.txt file

Source: https://www.youtube.com/robots.txt

Well-known for their hidden messages and inside jokes, Google do justice to the idea of robots.txt files being for actual (well, fictional) robots with this nod to the Terminator films.

Google robots.txt file

Source: https://www.google.com/killer-robots.txt

Reddit have left instructions for the lead robots from science fiction masterpiece The Day the Earth Stood Still and legendary TV show Futurama. Whereas Tindeck’s homage to Bender from Futurama is slightly more over-the-top.

reddit robots.txt file

Source: https://www.reddit.com/robots.txt

Tindeck robots.txt file

Source: http://tindeck.com/robots.txt

Flogging Fuchsias with Arena Flowers

Who can resist a DeLorean DMC-12? Arena Flowers know the target market for their Back to the Fuchsia bouquet and make the most of the valuable advertising platform a robots.txt file can provide. And maybe internet bots need to buy flowers sometimes too.

Arena Flowers robots.txt file

Source: https://www.arenaflowers.com/robots.txt

A Website Inside My Robots.txt

The heading says it all. This really is a step up from the previous examples. Vinna used this concept to put a game inside theirs.

a website inside my robots.txt file

Source: https://awebsiteinsidemy.com/robots.txt

Vinna robots.txt file

Source: https://vinna.cc/robots.txt

Any we’ve missed? Get in touch @mcmnet.

