Blocking Automated Scanners

Automated scanners make up 99.99% of all website attacks. Preventing these scanners from accessing your site will do more than almost anything else in preventing security breaches.

Webservers use special HTTP headers to identify who or what is requesting a webpage. These are blocks of text used to identify the source uniquely.

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

UserAgent for Google Web Crawler

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0

User Agent for Mozilla Firefox on Windows 10

These text blocks (referred to as “strings” by software developers) are not authenticated, meaning that anyone, or anything, can set these to whatever they like. Most automated web hacking tools and web scanners contain large lists of the most common user agents and randomly use one for each request.

This is why depending on user-agent strings for detecting malicious web requests is highly unreliable. Many WordPress security plugins with “bot blocking” features use this methodology because it is easy and simple.

BitFire uses an alternative approach. Rather than trusting the user agent headers, BitFire uses automated server-side challenges requiring browsers and bots to prove they are who they say they are.

Web crawlers like google-bot are first compared against a deny list of user agents. The deny list contains many malicious hacking tools, default user agents and noisy unhelpful web scanners. If configured in the most restrictive bot blocking (recommended), the bots can then be compared to the allow list.

If a user agent claims to be on in the allow list, the source IP is then looked up using a reverse DNS query and a WHOIS lookup. The reverse DNS query finds the domain name of the IP address (for instance, Googlebot reverse lookup returns “host-x.google.com”) The domain name is then compared to the allow list, and if it matches, a second forward DNS query is done to make sure that “host-x.google.com” maps to the original IP address. The result of this lookup is stored in a server-side cache and an encrypted browser cookie.

If the user agent is a human-operated web browser (chrome, edge, safari, etc), an obfuscated JavaScript challenge is randomly created and sent to the web browser in place of the normal web page. This challenge is quickly evaluated by the web browser, which sends back the JavaScript-evaluated response. Once verified by the firewall, an encrypted cookie identifying the browser as “real” is stored on the web browser for one hour. After answering correctly, no further checks are made until the cookie expires. The full process takes 2ms + 2RTT (where RTT = Round Trip Time from browser to server, usually 20-70ms)


Comments

18 responses to “Blocking Automated Scanners”

  1. All Work and No Play Makes Jack a Dull Boy

  2. so what if I’m wearing my girlfriend’s glasses?

  3. i don’t play well with others

  4. never send a boy to do a woman’s job

  5. a new comment

  6. left a comment 2

  7. fraud detector

  8. foobar bart

  9. this is a selenium comment

  10. another random comment from selenium

  11. yet another foobard comment from selenium 3

  12. and another one another foobar comment from selenium 4

  13. here is another automated post 5

  14. here is a natural comment

  15. this is a newly created automated post 6

  16. this is a newly created automated again post 7

  17. this is a logged in comment, attempt number 2

  18. this is a newly created automated offset again post 8

Leave a Reply to Jack Cancel reply

Your email address will not be published. Required fields are marked *