These days, it seems like we hear news of at least one major online service provider admitting to loosing their users data every month due to a hack, leak, breach or what have you. There are many collectors of this data for use with various analysis because it is free, detailed and accurate – or at least it was at the moment that it was released. But there are some drawbacks.
Disclaimers
This post does not advocate the collection, distribution or collection of breach data. Before collecting, it is highly recommended that you check your state or governing laws on the topic and consult an attorney for legal guidance on whether there may be risk for criminal charges if this data found in your possession.
Unfortunately, I was unable to come up with a way to share specific examples, given the depth of information discoverable. This said, I will write and link future posts here, on how to modify your data to expedite search queries, and I’ll even share some scripts that I made for running multiple searches back to back.
Utility for OSINT
Depending on the scope, I have often utilized breach data in the early phases of an analysis, though I do know of a few fellow OSINT analysis who run queries right away. I haven’t had the opportunity to compare notes on how many data points they start their investigations with, but if I’m starting off with a low resolution cell phone photo or doorbell cam still shot, it may take a few minutes of other techniques before I have enough enough data point to utilize my collection.
Additionally and since a query can take a lot of time, I usually load up multiple queries through a homebrew script with multiple email addresses, phone numbers and other data. This way, the scrip can run searches continuously in the background without me having to pay constant attention to it. It’s pretty basic programming, but if you’re interested, this script will be available on my github soon.
Regardless, the value of breach data to an OSINT investigation or analysis lies in its ability to reveal a potentially large volume of pivot points off of just a few searches. In example, if I search a subject’s email address though my data sets, many sites, mailing lists and mobile applications in which the email was used for the registration process may pop be returned. Think of each return as a contact card that you can send in a text, where it lists the site breached, company, username, phone number, and email, but sometimes a license plate number, emergency contact details and street address. Why does this matter during an investigation? Because with an email address, we’ve uncovered potentially many additional pivot points. One I look for specifically is a strong (or weak) password which the target uses across multiple accounts – including alias accounts – because you can now search for a user by their password.
Safeguards Against Hackers
If you’re thinking that having this type of information might have some implication for hacking them, beyond the reconnaissance phases which OSINT already provides, hit me up on twitter and let’s talk about it. I’ll cede that spoofing their cell phone number may be an option for a password reset, but for your higher value accounts (such as Vinmo, google, apple, crypto, banking, etc), many of the companies managing these services have API access to sites like HaveIBeenPwned, which alert the companies of the compromised accounts. If they can’t buy it first, their team is likely on the same forums that we’re all on, wading chest deep through anime to get to the same data sets. I intentionally left Facebook out of the above list. https://www.npr.org/2021/04/09/986005820/after-data-breach-exposes-530-million-facebook-says-it-will-not-notify-users which may present another attack method: If you maintain accounts on the same platforms and with the same service providers as your target, and with some patience; it’s possible that you will know of the breach and the companies response well in advance f them.
Another safeguard that people could use is a password manager to ensure that each password is unique. I like to use and support open-source projects, so I personally use KeePassXC for my daily driver and research accounts. In short, this will allow the user to simply choose how many characters they would like, which type of special characters and case sensitivity options, then hit generate for a new random password. When used in conjunction with a hardware 2FA (rather than SMS), this could make things more challenging for a hacker.
Resources Required
Aside from the obvious resources, like qBittorrent and a source to find this stuff, you’ll need a place to store it all. For stand-alone breach data machine, I’ve heard of a lot of collectors using Network Attached Storage(NAS) computers which they build from the ground up. This could be one one of the better options, depending on your budget, in that running queries on a stand-alone machine does not suck up resources of your investigation machine, and internal hard-drive means a fast read and write speed. There’s also the matter of expandability, though a lot of desktops (like System76) could have you pretty well covered there.
If traveling or if you would like a temporary setup for proof of concept, I’ve used the SanDisk 1TB external SSD to some success – though the speed of queries gets to me sometimes, and running a query can hog a lot of the resources and battery power.
Conclusion
Though it can benefit an investigation, ask yourself if you really need it and the potential legal risks are worth the benefits. For most OSINT cases outside of vulnerability of threat assessment, the utility has been pretty low so it may not be worth the time or effort required to to build an effective collection, not to mention the investment for hardware or independent machine to crunch the search queries.
Instead of that, I would almost recommend that you put your time and resources into training – especially if youmight have difficulty applying the process of intelligence to an investigation (which is where this would come in handy.
As always, if you have any comments or questions, be sure to hit me via email or on twitter.