Hello all,
Let's say you have a web site that uses some resources in special folders.
PHP code access those resources, but they are not linked specifically in any of the pages...
Of course, if I use the robots.txt file to exclude any search engine to crawl those pages, I'm making the private folders names/paths public.
Should I simply not put all those pages into the robots file?
Now that I'm writing this I'm starting to think that putting them there makes it much easier for everyone to know the weak points of the web page...
but... how do you ensure those pages won't be crawled?
The underlying question is:
- How the sites are crawled? is it only following the links that appear on the same pages? without reading the real folder structure neither reading the PHP (or any other) code and reading only the final created page?
- And if a page is not specifically linked, will it be crawled even if it is not in the robots.txt?
Thank you very much!
What I have tried:
Just reading the help from the google webmaster pages...