Welcome to the Real Estate Forum


The "ORIGINAL" Real Estate Social Network" SINCE 2005 and your #1 Resource for all things Real Estate


  •  »Over 35,000 Members
  •  » Answer Questions From "REAL" Buyers & Sellers
  •  »Ask Questions & Share Stories With Fellow Real Estate Professionals.
  •  »Read Articles & Blogs written by Real Estate Professionals.

...you have come to the right place!


YES! I want to register an account for free right now!


p.s.: For registered members YOUR FORUM NAME is free of ads

Page 1 of 2 12 LastLast
Results 1 to 10 of 13
  1. #1
    agentBOOST is offline Renter
    Join Date
    May 2007
    Posts
    5

    Arrow A Small File That Can Make A Big Difference

    If you have a website, you know how important search engine placement is to driving new clients to your site. What you may not realize is that you can control which pages the search engines see by uploading a simple file to your site.

    Before I get into the details, I think it's important to talk a little bit about how search engines work. Each of the major search engines (Google, Yahoo, MSN, and Ask) use what are called "spiders" or "robots" to try to visit every web page on the internet and add each one to their index. Once the spider adds your site to the index, the search engine then decides where each page will rank for certain terms.

    The first thing a spider does when it visits a website is to look for a "robots.txt" file. This file tells the spider the areas of your site where it's not allowed to go. If you don't have one (or if yours is blank), you are telling the spider, "Please index my entire site."

    Believe it or not, this is a problem!

    It may seem counterintuitive to want to block the search engines from accessing certain areas of your site, but otherwise the spiders are going to spend a considerable amount of time indexing pages that will never rank, never bring you traffic, and never bring a client through your door. Blocking these pages from the spiders also funnels your site's PageRank to your optimized pages, which means they'll rank higher in the SERPs.

    What type of pages should we block from the spiders? Anything that isn't optimized for search engine placement. A typical list would include contact pages, image galleries, policy pages, etc.

    It may help to look at an example, so let's take a look at a robots.txt file I am most familiar with (I wrote it): agentBOOST.com/robots.txt.

    The first line of the file looks like this:

    User-Agent: *

    This line is telling the spiders that the rules that follow apply to all spiders (the * means every spider). For an example where individual spiders are addressed, take a look at activerain's robots.txt file (activerain.com/robots.txt), which gives specific instructions to Googlebot (Google's spider) and ShopWiki.

    Going back to agentBOOST.com's robots.txt file, the ten lines after the User-Agent line all begin with "Disallow:" which is then followed by a directory on our site. It should come as no surprise that each of these lines is telling the spiders that they are not allowed access to a certain directory.

    The first two lines (/terms/ and /privacy/) disallow the search engine spiders from indexing agentBOOST's Terms of Service and Privacy Policy. While both of these pages are important to our users, I don't see the benefit of having search engines wasting their time or our bandwidth/PageRank on these pages; we don't aspire to rank for the term "Privacy Policy"!

    The next five lines (/user/, /agent/, /bid/, /property/, and /logout/) block the spiders from trying to index areas of the site that were built for our registered members to navigate the site, but not for search engines. This seems like a good time to point out a very important, powerful, and dangerous aspect of the robots.txt file:

    When you disallow a directory in your robots.txt file it also blocks all the subdirectories under that directory!

    We don't have to add lines for /user/register/ or /user/password/, for instance, because these are subdirectories of /user/. Just make sure you don't abuse this power by adding "Disallow: /", which will block your entire site!

    The next three lines (/blog/category/ and /blog/feed/) block the spiders from indexing areas of our blog that may be considered duplicate content. The last line (/blog/subscribe) disallows our blog's subscribe page, which isn't optimized for anything in particular.

    Remember, search engines have finite resources and billions (trillions?) of pages to index. When the spider comes to visit your site don't let it waste time on pages that aren't going to do you any good! Utilizing a robots.txt file is a great way to hold the spider's hand and bring them to the content you worked so hard to optimize.

    I hope you found this quick tutorial on robots.txt helpful and informative.

    If you'd like us to show you how to get the most cost-effective real estate leads, with no monthly fee and no percentage of your commission, please visit us at agentBOOST.com.

    Chris
    agentBOOST.com

  2. #2
    justicewhite is offline Condominium
    Join Date
    Jan 2005
    Location
    England
    Posts
    123

    Default

    Quote Originally Posted by agentBOOST View Post
    ...

    Remember, search engines have finite resources and billions (trillions?) of pages to index. When the spider comes to visit your site don't let it waste time on pages that aren't going to do you any good! Utilizing a robots.txt file is a great way to hold the spider's hand and bring them to the content you worked so hard to optimize.

    ...
    Are you implying that not allowing the spiders index as many pages from your site as possible is a bad thing for search engine optimisation? If so, what kind of pages do you advise people to allow for indexing and what type of pages to disallow?

  3. #3
    agentBOOST is offline Renter
    Join Date
    May 2007
    Posts
    5

    Default

    Hi justicewhite.

    Are you implying that not allowing the spiders index as many pages from your site as possible is a bad thing for search engine optimisation?
    Yes, that's exactly what I am saying.

    If so, what kind of pages do you advise people to allow for indexing and what type of pages to disallow?
    As I stated in the article, any pages that aren't optimized for search engines should be blocked from the index. Your Contact, Privacy Policy, and Terms and Conditions pages are all important pages for your users, but do no good in the search engines.

    A good test for determining whether or not you should block a page is to ask yourself, "if this page ranked for the keywords on it, would it bring me business?" For instance, ranking for the terms "Privacy Policy" won't yield too many new clients!

    I should also mention that Google recently released a robots.txt tool in their webmaster console, which further substantiates how important this is.

  4. #4
    spanishproperty is offline Condominium
    Join Date
    Dec 2006
    Location
    Torrevieja, Spain
    Posts
    243

    Default

    Good pages to use your robot .txt file for are also dynamic pages as spiders find it hard to exit as well.

    Robot .txt files are not the be all and end all of a site, I don't think they are important at all. Some of my sites have them and some don't, I don't see any relevance at all.

    I have one site number one for its search term in all three SE's and that site doesn't have the robot.txt file and never has.

  5. #5
    Monkeyleg is offline Fixer Upper
    Join Date
    May 2007
    Posts
    48

    Default

    spanishproperty, I look at the robots.txt file as just another tool in the toolbox.

    From looking at my stats programs for my sites, the bots are reading the robots.txt files.

    Does it make a difference? I honestly don't know. But I'd rather spend three minutes writing a robots.txt file and have it, rather than not have it and find out later that I should.

    And I also have pages on some sites that I absolutely do not want indexed or followed. So I look at the robots.txt files as backup for the <meta name="robots" content="noindex, nofollow"> tags.

  6. #6
    orlandorealestate is offline Fixer Upper
    Join Date
    Jun 2007
    Location
    Orlando, FL
    Posts
    42

    Default

    I think the robot file is good to have on your site but I agree it is just a drop in the bucket and if you really do not want a page indexed I would place a no index tag on the page to compliment the robot file.

  7. #7
    spanishproperty is offline Condominium
    Join Date
    Dec 2006
    Location
    Torrevieja, Spain
    Posts
    243

    Default

    Or you can use a nofollow link as well if you don't want a certain page to be spidered.

    Monkeyleg I am not saying it is bad or not needed, I am just saying there is a hell of a lot more things that contribute to a good site and good ideas for SEO than a robot.txt file.

  8. #8
    ventasman is offline Fixer Upper
    Join Date
    Aug 2007
    Location
    Panama City, Panama
    Posts
    18

    Default Any advice on joomla sites?

    Robots.txt is a really good resource, but what happens when you have some 800 pages that are useful all of them?.

  9. #9
    Monkeyleg is offline Fixer Upper
    Join Date
    May 2007
    Posts
    48

    Default

    ventasman, you just allow all of your 800 pages in the robots.txt file.

  10. #10
    twalters84 is offline Fixer Upper
    Join Date
    Sep 2007
    Posts
    21

    Default Robots File can be Dangerous

    Hey there,

    I have a few comments about your post:

    It may seem counterintuitive to want to block the search engines from accessing certain areas of your site, but otherwise the spiders are going to spend a considerable amount of time indexing pages that will never rank, never bring you traffic, and never bring a client through your door. Blocking these pages from the spiders also funnels your site's PageRank to your optimized pages, which means they'll rank higher in the SERPs.
    You have to be very careful when you are blocking content from spiders. Let me tell you why.

    Let's say you have four pages on your website that are all linked from your home page, so 5 pages total. Your home page has PR8. That would nice right? From the google pagerank documentation, each of your subpages would be getting a PR2 vote from your homepage.

    PR Home page / 4 Outgoing Links = PR2 vote

    Let's say one of these pages is your privacy statement that your home page links to. In addition, lets put that file in the robots.txt file so we can see what happens.

    You are losing a PR2 vote from your homepage so you are wasting your pagerank potential because you are blocking it in your robots.txt file. If your privacy statement could be linking back to your homepage so it distributes pagerank throughout yoru website.

    I do agree that this page isnt nice to have in the search results. That is why there is a noindex meta tag that you can include on your privacy statement. If this page does get indexed, you can delete it in the google webmaster tools Yahoo also has a utility to delete indexed pages.

    The next three lines (/blog/category/ and /blog/feed/) block the spiders from indexing areas of our blog that may be considered duplicate content. The last line (/blog/subscribe) disallows our blog's subscribe page, which isn't optimized for anything in particular.
    You do not have to worry about this for your blog - if your content type for the feed is text/xml. Crawlers will see this differently than ordinary webpages. If you block your RSS feeds, then it is not worth having them at all because you are blocking crawlers that look for frequently updated content in them.

    Bottom line, be careful what you block in your robots.txt file.

    Sincerely,
    Travis Walters

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •