Blocking Automated Scanners

Nov 4, 2022

—

Automated scanners make up 99.99% of all website attacks. Preventing these scanners from accessing your site will do more than almost anything else in preventing security breaches.

Webservers use special HTTP headers to identify who or what is requesting a webpage. These are blocks of text used to identify the source uniquely.

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
UserAgent for Google Web Crawler

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:107.0) Gecko/20100101 Firefox/107.0
User Agent for Mozilla Firefox on Windows 10

These text blocks (referred to as “strings” by software developers) are not authenticated, meaning that anyone, or anything, can set these to whatever they like. Most automated web hacking tools and web scanners contain large lists of the most common user agents and randomly use one for each request.

This is why depending on user-agent strings for detecting malicious web requests is highly unreliable. Many WordPress security plugins with “bot blocking” features use this methodology because it is easy and simple.

BitFire uses an alternative approach. Rather than trusting the user agent headers, BitFire uses automated server-side challenges requiring browsers and bots to prove they are who they say they are.

Web crawlers like google-bot are first compared against a deny list of user agents. The deny list contains many malicious hacking tools, default user agents and noisy unhelpful web scanners. If configured in the most restrictive bot blocking (recommended), the bots can then be compared to the allow list.

If a user agent claims to be on in the allow list, the source IP is then looked up using a reverse DNS query and a WHOIS lookup. The reverse DNS query finds the domain name of the IP address (for instance, Googlebot reverse lookup returns “host-x.google.com”) The domain name is then compared to the allow list, and if it matches, a second forward DNS query is done to make sure that “host-x.google.com” maps to the original IP address. The result of this lookup is stored in a server-side cache and an encrypted browser cookie.

If the user agent is a human-operated web browser (chrome, edge, safari, etc), an obfuscated JavaScript challenge is randomly created and sent to the web browser in place of the normal web page. This challenge is quickly evaluated by the web browser, which sends back the JavaScript-evaluated response. Once verified by the firewall, an encrypted cookie identifying the browser as “real” is stored on the web browser for one hour. After answering correctly, no further checks are made until the cookie expires. The full process takes 2ms + 2RTT (where RTT = Round Trip Time from browser to server, usually 20-70ms)

Comments

18 responses to “Blocking Automated Scanners”

Jack

December 19, 2022

All Work and No Play Makes Jack a Dull Boy

Reply
iceman

December 21, 2022

so what if I’m wearing my girlfriend’s glasses?

Reply
zerocool

December 21, 2022

i don’t play well with others

Reply
acidburn

December 21, 2022

never send a boy to do a woman’s job

Reply
wheel

April 28, 2023

a new comment

Reply
wheel

April 28, 2023

left a comment 2

Reply
wheel

April 29, 2023

fraud detector

Reply
wheel

April 29, 2023

foobar bart

Reply
selenium

April 29, 2023

this is a selenium comment

Reply
selenium

April 29, 2023

another random comment from selenium

Reply
selenium

April 29, 2023

yet another foobard comment from selenium 3

Reply
selenium

April 29, 2023

and another one another foobar comment from selenium 4

Reply
selenium

April 29, 2023

here is another automated post 5

Reply
wheel

April 29, 2023

here is a natural comment

Reply
selenium

April 29, 2023

this is a newly created automated post 6

Reply
selenium

April 29, 2023

this is a newly created automated again post 7

Reply
wheel

April 29, 2023

this is a logged in comment, attempt number 2

Reply
selenium

April 29, 2023

this is a newly created automated offset again post 8

Reply

Blocking Automated Scanners

Comments

18 responses to “Blocking Automated Scanners”

Leave a Reply to Jack Cancel reply