How to detect a search engine spider/crawler with PHP →
Posted on January 27, 2009 in Development with 12 comments.
I was tasked with the job of writing a small PHP script today that detects whether a search engine spider is crawling a page of your site. There are a few ways to go about it. The challenging thing about the script is that there are so many spiders on the web. The script I am currently using only checks for the main spiders. It does however allow you to add as many crawlers as you want. Here is the code with comment explanations:
if ( ! function_exists('check_if_spider'))
{
function check_if_spider()
{
// Add as many spiders you want in this array
$spiders = array('Googlebot', 'Yammybot', 'Openbot', 'Yahoo', 'Slurp', 'msnbot', 'ia_archiver', 'Lycos', 'Scooter', 'AltaVista', 'Teoma', 'Gigabot', 'Googlebot-Mobile');
// Loop through each spider and check if it appears in
// the User Agent
foreach ($spiders as $spider)
{
if (eregi($spider, $_SERVER['HTTP_USER_AGENT']))
{
return TRUE;
}
}
return FALSE;
}
}
And there we have it. You can find a list of search engine crawlers here.
Now the other way of doing this check without having to specify the spiders in an array is using the get_browser() PHP function. It returns an array with very useful data. One of the things it returns is [crawler] TRUE/FALSE. I didn’t use it because I have not done enough testing to see how efficient it is. You can read more about this function on the PHP.net website.
Tagged with PHP








Neat, just what I needed for my site, google and other are crawling all over it, messing up my visitor stats.
I am glad you found it useful
what if the bot name changed? and how we test it?
Just a note:
eregi() is depreciated and will not be supported by php in the future. Use preg_match() going forward. It requires forward slashes sandwiching your pattern like so:’/pattern/’
That is correct Gatorpower, I have replaced eregi() in all my own scripts as from php 5.3 onwards, it is no longer supported and will even be removed in php 6.0.
Instead of the eregi line, I used strpos(‘ ‘.$agent,$spider) where $agent is the variable I give it to check if is a spider.
Because strpos could return the 0 value if the searched string begins at first character, i’ve added ‘ ‘ at the beginning of the searchin string.
[...] something like this work for you? http://iarematt.com/how-to-detect-a-…wler-with-php/ Share [...]
Thanks for this piece of code. please can you post the code after replacing with preg_match()?
is this right?
if (preg_match($spider, $_SERVER['HTTP_USER_AGENT']))
thanks
Matt,
Could you update the script you provide above, or in another thread, with the correct code? With eregi replaced?
@Kelly
Just change this line:
if (eregi($spider, $_SERVER['HTTP_USER_AGENT']))
to this:
if (strpos($_SERVER['HTTP_USER_AGENT'],$spider) !=false)
Guys,
You can do array_search function. So, if I will be the one writing the codes to eliminate the deprecated eregi, my updated codes should be something like below
foreach ($spiders as $spider)
{
if (array_search($_SERVER['HTTP_USER_AGENT'], $spider) !== FALSE) {
return TRUE;
$_SESSION['verify'] = “true”;
}
}
just remove this part from my codes.
$_SESSION['verify'] = “true”;