Pages: [1]
  Print  
Author Topic: Nestor10 webspider FAQ  (Read 1419 times)
rork
Guest
« on: October 22, 2011, 10:12:26 am »

'As development on my own webspider continues and now reached a stage where the basics seem to be working well and it's ready for a larger implementation it is time to put up some basic information about it. I will write about the technology later, this article is intended for administrators who find the user-agent in their logs and are wondering about it.

What's Nestor10's full name?
The naming scheme is: Nestor10/<bot version>/libwww-perl/<backend version>

The current user agent is: Nestor10/alpha/libwww-perl/6.02

What is Nestor10's purpose?
I currently use the webspider to collect news and information I want for both professional and personal use. This means that it will visited targeted sites for veterinarians and poultry health. My first goal for the bot was to provide personal rss feeds for sites that don't supply these themselves. I'm currently working on a front-end to share this information with a small group of colleagues.

Is Nestor10 malicious? e.g. posting spam or collecting emails?
No, Nestor10 only collects links to news items.

How often will Nestor10 visit my site?
That depends on how often I execute the script, currently this is about daily. For most pages there's a minimum of 6 hours between visits, non-news sites will be visited about weekly. If there's more then one page on a domain that is indexed the time between visits is at least 5 seconds.

Does Nestor10 support robot exclusion protocols?
No, in it's alpha stage it doesn't, this is because the use of the information it generates is currently for private use only. It will however scan for the robots.txt and robot meta tag. When the collected data will be made available to a wider public the robot exclusion protocols will be supported.

What can I do to block Nestor10?
Please send me an email specifying the url or website you want to have removed, my email address is sent in the webspiders headers. You can also add the robot meta tags or robots.txt. Nestor10 will look for the following strings in the user agent: nestor10 and *.

What are your plans for Nestor10?
In time my plan is primarily to expand the number of sites Nestor10 indexes and to provide this information to others as a rss like service: only news headlines will be shown and linked to the original articles.'
Logged
Pages: [1]
  Print  
 
Jump to: