November 29, 2021

MSN Search and referer logs

If anybody from Microsoft or MSN search reads my blog, give me a shout (ryan at no slang dot com). MSN really needs a URL removal function other than robots.txt and what not. Let me explain why:

To any developers out there, here’s some important advice too:
Always Password Protect, and make Robots.txt files FIRST, even if your site is still just a prototype. I learned that lesson the hard way.

It seems MSN has a perfect cached image of some internal web applications of mine that aren’t meant for external eyes. The page in question (while still in development) had links to external websites on it, and somebody working on the site clicked them. The problem is, this external website’s referer log is indexed by MSN search and publically accessible (FYI it’s not a good idea to share your server logs with the world). Thus, the URL of the internal application (complete with parameters) shows up in their stats page.

For some reason, it only took a day for MSN to index this URL, and now what should be password protected and robots.txt excluded information is showing up for some obscure searches.

The problem is this site isn’t linked anywhere anymore, so spiders aren’t likely to return anytime soon. There’s also no way to remove a site from MSN other than by using .htaccess and robots.txt (I’ve done that). A user can’t see the current site, but that cached version looks like it’s going to be there for a long time.

Lesson learned I guess, always prevent spiders from accessing stuff… even if it’s not linked anywhere and especially if it’s still in development.

About Ryan Jones

Ryan Jones is an SEO from Detroit. By day he works as a manager of SEO & Analytics at SapientNitro where his team performs SEO for Fortune500 clients. By night he's either playing hockey or attempting to take over the world with his own websites - which he would have already succeeded in doing had it not been for those meddling kids and their dog. The views expressed here have not been paid for and belong only to Ryan, not any of his employers or clients. Follow Ryan on Twitter at: @RyanJones, add him on Google+ or visit his personal website: www.RyanMJones.com