What pages do bad bots look for?
I’ve been wondering for some time now about what pages and paths are visited the most by “bad” bots – scrapers, data harvesters and other automated scanners which disregards the exclusions set in robots.txt[1]. To determine this, I’ve set up a little experiment – I placed robots.txt on one of my domains, which disallowed access to commonly used paths and PHP pages which might of interest to bots (login.php, /wp-admin/, etc.), configured the server to provide HTTP 200 response for these paths and pages and started logging details about requests sent to them.
To avoid as much legitimate or manually generated traffic as possible, I’ve done this on a domain which pointed to a server on which none of the common content management systems was used.
The captured requests were a mixed bag, as one might expect. Some of them were simple one-shot HTTP GET requests while others were part of multi-request scans, some had no parameters set, while others carried generic SQL injection or XSS payloads or tried to “blindly” exploit vulnerabilities specific to common content management systems.
For our purposes, however, this is beside the point as we’re more interested in finding out which pages were looked for the most. I went over the logs and put the “top 10” most commonly requested pages for the past 12 months in the following table, along with the number of times each path or page was hit.
Path | Count |
---|---|
/wp-login.php | 1140 |
/admin/ | 189 |
/administrator/ | 104 |
/wp-admin/install.php | 82 |
/login.php | 48 |
/administrator/index.php | 26 |
/admin.php | 24 |
/wp-admin/setup-config.php | 24 |
/admin/index.php | 23 |
/wp-links-opml.php | 20 |
Although finding wp-login.php in the first place is hardly surprising, the results are interesting. Given the fairly large early drop in a number of requests it seems that one might be able to catch a significant portion of interesting “bad” bot behavior with just a single-page (or four or five-page) honeypot... In other words, if you’ve ever wondered where to place a “honeypage” on your server in order for it to be effective, the top paths mentioned in the table above might probably be a good start.
Comments
Anonymous
Dec 3rd 2022
9 months ago
Anonymous
Dec 3rd 2022
9 months ago
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.
<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
Anonymous
Dec 26th 2022
9 months ago
Anonymous
Dec 26th 2022
9 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
9 months ago
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
Anonymous
Dec 26th 2022
9 months ago
Anonymous
Dec 26th 2022
9 months ago
https://defineprogramming.com/
Dec 26th 2022
9 months ago
distribute malware. Even if the URL listed on the ad shows a legitimate website, subsequent ad traffic can easily lead to a fake page. Different types of malware are distributed in this manner. I've seen IcedID (Bokbot), Gozi/ISFB, and various information stealers distributed through fake software websites that were provided through Google ad traffic. I submitted malicious files from this example to VirusTotal and found a low rate of detection, with some files not showing as malware at all. Additionally, domains associated with this infection frequently change. That might make it hard to detect.
https://clickercounter.org/
https://defineprogramming.com/
Dec 26th 2022
9 months ago
rthrth
Jan 2nd 2023
8 months ago