Uninvited Guests: Analyzing the Identity and Behavior of Certificate Transparency Bots

Abstract

Since its creation, Certificate Transparency (CT) has served as a vital component of the secure web. However, with the increase in TLS adoption, CT has essentially become a defacto log for all newly-created websites, announcing to the public the existence of web endpoints, including those that could have otherwise remained hidden. As a result, web bots can use CT to probe websites in real time, as they are created. Little is known about these bots, their behaviors, and their intentions.

In this paper we present CTPOT, a distributed honeypot system which creates new TLS certificates for the purpose of advertising previously non-existent domains, and records the activity generated towards them from a number of network vantage points. Using CTPOT, we create 4,657 TLS certificates over a period of ten weeks, attracting 1.5 million web requests from 31,898 unique IP addresses. We find that CT bots occupy a distinct subset of the overall web bot population, with less than 2% overlap between IP addresses of CT bots and traditional host-scanning web bots. By creating certificates with varying content types, we are able to further sub-divide the CT bot population into subsets of varying intentions, revealing a stark contrast in malicious behavior among these groups. Finally, we correlate observed bot IP addresses into campaigns using the file paths requested by each bot, and find 105 malicious campaigns targeting the domains we advertise. Our findings shed light onto the CT bot ecosystem, revealing that it is not only distinct to that of traditional IP-based bots, but is composed of numerous entities with varying targets and behaviors.

Awards:

Type
Publication
In Proceedings of the USENIX Security Symposium (USENIX Security), 2022
Johnny So
Johnny So
PhD Candidate

I am currently a fourth-year Ph.D. candidate advised by Professor Nick Nikiforakis at the PragSec Lab in Stony Brook University. I investigate (the lack of) web integrity in various contexts (e.g., domain names and JavaScript) through large-scale experiments, and subsequently design and evaluate defenses that improve the integrity of the web.