Even if you are not a developer, you should be familiar with GitHub. If you are not familiar, then consider this blog post your introduction. GitHub is a large cloud-based software repository that uses the git protocol. Creating a GitHub account is painless and free for anyone who is interested. You don’t even need to supply a valid email address to get an account.
Once you sign up for a GitHub account, you are free to publish any code you want to it and anyone in the world can view your code, download it, or fork it and build their own version of it. This model is incredibly powerful and allows for easy and collaborative software development but there are some security issues with it, as well. GitHub performs no filtering on any of the content that is uploaded from users.
Because there is no limit on the content that is uploaded, people use it as a personal storage and backup solution. Because it is used as a storage and backup solution, you can find a treasure trove of sensitive data if you know how to find it. Luckily for us, they provide search capabilities. Much like Google Dorks, GitHub has a certain amount of keywords that can be used in order to refine its search results.
Some of these keywords are “filename,” “extension,” and “path.” As the name suggests, the the “filename” and “extension” keywords search for a specific filename or file extension. I find the “path” keyword very interesting as it searches for a specific directory in the file path.
For example, you can use the search term “path:etc” to find files that resides in the “etc” folder. Now, if you like to dabble in the evil side like I do, you can chain several of these keywords together to find some really fun results. For instance, try using the search term “filename:shadow path:etc”.
At the time of this writing, you will find 736 results Unix shadow files. For those less Unix inclined, the /etc/shadow file contains all of the hashed passwords for every user on that system.
Fig 1. /etc/shadow file’s exposed on GitHub
Fig 2. Unix password hash’s available on GitHub With the hash, you can put it into popular password crackers, such as John and Hashcat. Depending on the password complexity, the hash can be cracked anywhere from seconds to years.
In just three days of cracking, I have been able to crack roughly 60 percent of all the /etc/shadow hashes I found in mid-October of last year. Unix shadow files are just the beginning. On GitHub, you are able to find wordpress configuration files, SFTP server configuration files, RSA private keys, SQL dumps, and much more.
So, what do you do with all of this data available to you? Well, you harvest it is what you do! To assist with that, I have written a tool called GitHarvester to do just that. GitHarvester has the ability to take GitHub search strings, pull the results, and then perform regular expressions against the results to find not only just sensitive files but also specific data within those files.
For instance, if you wanted to look for shadow files that specifically have password hashes for the root user, you can do that with GitHarvester. Why would I write a tool that can allow for bad guys to more easily compromise other people’s or organizations systems? The answer to that has many different answers ranging from “because it was an excuse to start a new coding project” to “just to see how much data is really out there on GitHub” but the main reason is to help expose this insecurity.
The more people that are aware that this data is out there and they or their organizations may be inadvertently placing it there means more people that get a chance to search for it and help remove it. I will be demonstrating GitHarvester and my findings with it at BSides Salt Lake City, THOTCON and NolaCon.
BSides SLC will take place on March 10th and 11th at the Salt Palace in Downtown Salt Lake City; THOTCON will take place on May 5th and 6th in a yet undisclosed location in Chicago; and NolaCon will take place on May 20th through the 22nd at the Crowne Plaza Hotel on Bourbon Street in New Orleans.
About the Author:
The end of the world is probably not right around the corner but Danny will tell you it is just to see if you will freak out, purchase all the water possible from Walmart, horde cheap bic lighters, and hide in the shipping container buried in your back yard. Danny is currently a Threat Analyst at ProofPoint as well as the COO and Director of Security Services at Mark V Security. He is also active in the Salt Lake City hacker scene as a member of DC801 and a founder of the Salt Lake City based Hacker Space 801 Labs. You can contact him on twitter at @metacortex
Editor’s Note: The opinions expressed in this guest author article are solely those of the contributor and do not necessarily reflect those of Tripwire.
Title image courtesy of ShutterStock
Mastering Security Configuration Management
Master Security Configuration Management with Tripwire's guide on best practices. This resource explores SCM's role in modern cybersecurity, reducing the attack surface, and achieving compliance with regulations. Gain practical insights for using SCM effectively in various environments.