Understanding A Robots.txt File And Why Every Domain Needs One

WebBots Visit Sites

We’ve spoken quite a bit about the different types of hosting scenarios you might encounter. Additionally, we have tried to provide extensive explanations on the subject including the pros and cons of each system with the intent of helping you determine which is the best hosting option for your website and business.

Needless to say, everyone is at a different stage in their learning curves and goals, so one method might suit only a handful of individuals, while another is the optimum hosting solution for the masses. But one thing that we do think is important is a Robots.txt file and that can only be uploaded to a paid hosting account or server.

Free hosting sites and web builders are not going to allow you access to their servers. Further, you may be paying a monthly subscription to use the services and keep your website live, but again, if it is not a real hosting account, you cannot benefit from the power of a robots file.

Therefore, this article will address the ins and outs of a robots.txt file and the reasons why every domain name displaying a website needs this file.

What Are Robots And What Is Their Purpose?

Everyone knows what a robot is in terms of a mechanical being. Movies have depicted them for a long time, and they are now popular in manufacturing, taking over very dangerous tasks. But, you might wonder what they have got to do with the internet and how a website operates? Briefly, they visit a website and attempt to learn as much information as possible about how the site is coded and what it is trying to tell users. In short, this is how domain names and website pages get indexed in search engines.

Also known as crawlers and spiders, the more common name for robots is simply bots. They come in both good and bad versions, which we will discuss in more detail further down the article. Probably, you have heard of the Googlebot , the Yahoo Slurp, or the MSNbot. Each company has its own proprietary robot that crawls websites. They are looking for more items, such as undocumented websites and pages, to add to their databases. Thus, the more current your content, and the more you post, the idea is that they come back often to retrieve new information.

What Is A Robots.txt File And Why Is It Needed?

What The File Looks LikeThis file is a simple text file created in Notepad and it includes instructions on what to do when a bot comes calling. It is uploaded to the root of a domain with all the other website files. Because of this, you need to have access to your hosting account server.

Its purpose is to allow or deny bots from visiting the site. In other words, you white list good bots and black list the bad guys.

To help you determine which ones you want to encourage and which you need to block, there are massive lists published regularly that have already identified and named the bots out there.

What Are The Advantages Of Allowing Bots?

#1 Indexation

The good bots are beneficial to you and your website because they help you to get indexed in search engines. Obviously, you want to allow the main search engines to crawl your site on a regular basis. What they do is look for links which enables them to find both new sites and new pages or content. This is one reason why both internal and external links are important on a website.

If you think about how WordPress blogs work, you usually find the most recent posts in the sidebar or at the bottom of the page. This tells the bot there is something new to find. The theory is if the bot finds new material on a regular basis, it will come back often.

Sometimes, you hear web owners saying that their pages are indexed within a few hours. Others wait a long time because if the bot realizes a site is not being worked on and updated regularly, it won’t come back as often, since it never finds anything to report.

#2 Diagnostics

Often, you may want to monitor your site’s performance and bots are involved, since they do the checking. Needless to say, if you want to use any of the online tools, you will need to let the bot have access to your site files.

#3 Good SEO

Because the good bots are there to identify and penalize methods that are blackhat, it is in our best interests to allow certain bots to let them know our site and processes are legitimate.

Why Should We Disallow Some Bots?

#1 Scrapers

The sole purpose of this type of bot is to steal content. They might take full articles and post it on their own websites, or they might mash up a variety of pages to create a new unique post. Obviously, this “new content” is not content at all, since it won’t make any sense. But the biggest issue here is the theft of intellectual property and copyrighted material.

#2 Resource Hogs

Regardless of the bot’s intentions, both good and bad bots use up valuable resources in terms of bandwidth and real traffic. If you are being metered on how much bandwidth you use, then bots could be a costly proposition. Likewise, if you don’t clearly understand your traffic reports, you may think you are getting lots of visitors, yet in fact, you have nothing but spiders checking you out.

AwStats Reports Name Of Bot, Date Visited, Bandwidth Used To Crawl

#3 Attack Of The Bots!

Individuals that own these bots will attack sites for many reasons. Three of the main reasons are removing the competition, just for fun, and retaliation for a perceived slight from a company.

#4 Bad User Experience

When bots are taking up bandwidth, they are also putting pressure on the site’s performance. This means that real users will either not be able to access certain functions, such as the shopping cart, or payment gateway, or the site could load very slowly. Whatever the outcome, customers will not be happy.

#5 Injection

Bad bots look for potential back doors to inject a host of bad code into sites that may or may not do something harmful.

#6 Security Breaches

Bots look for known holes in systems that have not been patched up or regularly updated. They then use these loopholes for their own purposes.

Conclusion

Studies seem to suggest that about sixty percent of all traffic to a website comes from bots. Out of that number, more than half are from bad bots causing you serious harm. Consequently, one point that we should mention is that even though you take care to allow the right kinds of bots and deny the bad guys, it doesn’t mean they will listen. Individuals and companies are continually trying to improve their technology to bypass your efforts.

And if a bot is used for nefarious purposes, it stands to reason that it doesn’t care one iota about your instructions. The example I use here is like a lock on the front door. It does deter certain bad elements, but a burglar that wants in will enter. Bad bots are the same. Thus, there is an obligation on your part to protect your website, but at the same time realize no system is foolproof.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close