What pages do bad bots look for?

Published: 2020-08-01
Last Updated: 2020-08-01 14:28:20 UTC
by Jan Kopriva (Version: 1)
3 comment(s)

I’ve been wondering for some time now about what pages and paths are visited the most by “bad” bots – scrapers, data harvesters and other automated scanners which disregards the exclusions set in robots.txt[1]. To determine this, I’ve set up a little experiment – I placed robots.txt on one of my domains, which disallowed access to commonly used paths and PHP pages which might of interest to bots (login.php, /wp-admin/, etc.), configured the server to provide HTTP 200 response for these paths and pages and started logging details about requests sent to them.

To avoid as much legitimate or manually generated traffic as possible, I’ve done this on a domain which pointed to a server on which none of the common content management systems was used.

The captured requests were a mixed bag, as one might expect. Some of them were simple one-shot HTTP GET requests while others were part of multi-request scans, some had no parameters set, while others carried generic SQL injection or XSS payloads or tried to “blindly” exploit vulnerabilities specific to common content management systems.

For our purposes, however, this is beside the point as we’re more interested in finding out which pages were looked for the most. I went over the logs and put the “top 10” most commonly requested pages for the past 12 months in the following table, along with the number of times each path or page was hit.

Path Count
/wp-login.php 1140
/admin/ 189
/administrator/ 104
/wp-admin/install.php 82
/login.php 48
/administrator/index.php 26
/admin.php 24
/wp-admin/setup-config.php 24
/admin/index.php 23
/wp-links-opml.php 20

Although finding wp-login.php in the first place is hardly surprising, the results are interesting. Given the fairly large early drop in a number of requests it seems that one might be able to catch a significant portion of interesting “bad” bot behavior with just a single-page (or four or five-page) honeypot... In other words, if you’ve ever wondered where to place a “honeypage” on your server in order for it to be effective, the top paths mentioned in the table above might probably be a good start.

-----------
Jan Kopriva
@jk0pr
Alef Nula

Keywords: bot HTTP statistics
3 comment(s)

Comments

What's this all about ..?
password reveal .
<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure:

<a hreaf="https://technolytical.com/">the social network</a> is described as follows because they respect your privacy and keep your data secure. The social networks are not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go.

<a hreaf="https://technolytical.com/">the social network</a> is not interested in collecting data about you. They don't care about what you're doing, or what you like. They don't want to know who you talk to, or where you go. The social networks only collect the minimum amount of information required for the service that they provide. Your personal information is kept private, and is never shared with other companies without your permission
https://thehomestore.com.pk/
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> nearest public toilet to me</a>
<a hreaf="https://defineprogramming.com/the-public-bathroom-near-me-find-nearest-public-toilet/"> public bathroom near me</a>
https://defineprogramming.com/
https://defineprogramming.com/
Enter comment here... a fake TeamViewer page, and that page led to a different type of malware. This week's infection involved a downloaded JavaScript (.js) file that led to Microsoft Installer packages (.msi files) containing other script that used free or open source programs.
distribute malware. Even if the URL listed on the ad shows a legitimate website, subsequent ad traffic can easily lead to a fake page. Different types of malware are distributed in this manner. I've seen IcedID (Bokbot), Gozi/ISFB, and various information stealers distributed through fake software websites that were provided through Google ad traffic. I submitted malicious files from this example to VirusTotal and found a low rate of detection, with some files not showing as malware at all. Additionally, domains associated with this infection frequently change. That might make it hard to detect.
https://clickercounter.org/
Enter corthrthmment here...

Diary Archives