Google Hack
Google Hack
Google Hack
Outline
Google Bombing Schneier in Secrets and Lies
Attack at a distance Emergent behavior Automation
CGI Scanning
Vulnerable software
Simply Put
Google allows for a great deal of target reconnaissance that results in little or no exposure for the attacker. Johnny Long Using Google as a mirror searches find:
Google searches for Credit Card and SS #s Google searches for passwords CGI (active content) scanning
Anatomy of a Search
Server Side Client Side
http://computer.howstuffworks.com/search-engine1.htm
Johnny.ihackstuff.com
Johnny Long
Wrote Google Hacking for Penetration Testers; ISBN 1931836361 Many free online articles.
Two PDFs cached at MattPayne.org/talks/gh See the references slide Or just use google
Local Example
Monday 14 February, 2005 @10:11am Update: Now it sounds like everyone was hit with an exploit on awstats which took out quite a few bloggers and other sites. ==> Actually, phorum got hit with it too! After running my server something.net for quite awhile on 'borrowed time', it eventually got hacked into - just this weekend. The "Simiens Crew" took credit to a webpage defacement, and by doing some googling... they've hit quite a few websites even just this last weekend! My best guess so far was an attack on one of my many 3rd-party PHP-run services that I have not taken the time to watch and patch for security announcements. Could have been gallery, phorum, webcalendar, icalendar, etc... I'll do some investigating and hopefully find out. I may have been lucky though, it sounds like these were just defacements and not all-out attacks, other victims have not reported any data loss at least. I can respect that. What I can't respect though is the many defacements they've put up with "FrontPage" as the HTML generator!
10
a, about, an, and, are, as, at, be, by, from, how, i, in, is, it, of, on, or, that, the, this, to, we, what, when, where, which, with
11
12
Wildcards
Google supports word wildcards but NOT stemming.
"It's the end of the * as we know it" works. but "American Psycho*" won't get you decent results on American Psychology or American Psychophysics.
13
Advanced Searching
Advanced Search Page: http://www.google.com/advanced_search
14
Advanced Operators
cache: define: info: intext: intitle: inurl: link: related: stocks: filetype: numrange 1973..2005 source: phonebook:
DEMO:
on-2-13-1973..2004 visa 4356000000000000..4356999999999 999 15
Advanced Operators
Google advanced operators help refine searches. Advanced operators use a syntax such as the following: operator:search_term
Notice that there's no space between the operator, the colon, and the search term.
The site: operator instructs Google to restrict a search to a specific web site or domain. The web site to search must be supplied after the colon. The link: operator instructs Google to search within hyperlinks for a search term. The cache: operator displays the version of a web page as it appeared when Google crawled the site. The URL of the site must be supplied after the colon.
Turn off images and you can look at pages without being logged on the server! Google as a mirror.
17
Other parts
Google searches not only the content of a page, but the title and URL as well. The intitle: operator instructs Google to search for a term within the title of a document. The inurl: operator instructs Google to search only within the URL (web address) of a document. The search term must follow the colon. To find every web page Google has crawled for a specific site, use the site: operator.
Source: http://tinyurl.com/dnhc3
18
Adobe Portable Document Format (pdf) Adobe PostScript (ps) Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku) MacWrite (mw) Microsoft Excel (xls) Microsoft PowerPoint (ppt) Microsoft Word (doc) Microsoft Works (wks, wps, wdb) Microsoft Write (wri) Rich Text Format (rtf) Shockwave Flash (swf) Text (ans, txt) And many more.
19
Directory Listings
Directory Listings
Show server version information
Useful for an attacker
Displaying variables
Standard demo and debugging program HTTP_USER_AGENT=Googlebot Frequently an avenue for remote code execution
http://somebox.someU.edu/~user/demo.cgi?cmd=`cat /etc/passwd`
20
Default Pages
Default Pages are another way to find specific versions of server software. Apache Server Version Query Apache 1.3.01.3.9 Intitle:Test.Page.for.Apache It.worked! this.web.site! Apache1.3.111.3.26 Intitle:Test.Page.for.Apache seeing.this.instead Apache 2.0 Apache SSL/TLS Many IIS servers Unknown IIS server IIS 4.0 IIS 4.0 IIS 4.0 IIS 5.0 IIS 6.0 Intitle:Simple.page.for.Apache Apache.Hook.Functions Intitle:test.page "Hey, it worked !" "SSL/TLS-aware" intitle:welcome.to intitle:internet IIS intitle:"Under construction" "does not currently have"
intitle:welcome.to.IIS.4.0 allintitle:Welcome to Windows NT 4.0 Option Pack allintitle:Welcome to Internet Information Server allintitle:Welcome to Windows 2000 Internet Services allintitle:Welcome to Windows XP Server Internet Services allintitle:Netscape Enterprise Server Home Page allintitle:Netscape FastTrack Server Home Page
21
CGI Scanner
Google can be used as a CGI scanner. The index.of or inurl searchs are good tools to find vulnerable targets. For example, a Google search for this: allinurl:/random_banner/index.cgi
Hurray! There are only three
the broken random_banner program to cough up any file on that web server, including the password file
22
23
Johnnys Disclaimer
Note that actual exploitation of a found vulnerability crosses the ethical line, and is not considered mere web searching.
24
Automation!
CGIs and other active content can be located in several places on a server. Many queries need to be used to find a vulnerability. There are two ways to automate Google searches:
Plain old web robots The Google API: http://www.google.com/apis/
26
Terms of Service
http://www.google.com/terms_of_service.html
"You may not send automated queries of any sort to Google's system without express permission in advance from Google. Note that 'sending automated queries' includes, among other things: using any software which sends queries to Google to determine how a web site or web page 'ranks' on Google for various queries; 'meta-searching' Google; and performing 'offline' searches on Google."
27
Google API
The Google API is the blessed way of automating Google interaction. When you use the Google API you include your license string
28
Gooscan
The gooscan tool, written by j0hnny, automates CGI scanning with Google, and many other functions. Gooscan is a UNIX (Linux/BSD/Mac OS X) tool that automates queries against Google search appliances (which are not governed by the same automation restrictions as their web-based brethren). For the security professional, gooscan serves as a front end for an external server assessment and aids in the information-gathering phase of a vulnerability assessment. For the web server administrator, gooscan helps discover what the web community may already know about a site thanks to Google's search appliance. For more information about this tool, including the ethical implications of its use, see http://johnny.ihackstuff.com.
29
30
Googledorks?
http://johnny.ihackstuff.com/googledorks The term "googledork" was coined by the author [Johnny Long] and originally meant "An inept or foolish person as revealed by Google." After a great deal of media attention, the term came to describe those who "troll the Internet for confidential goods." Either description is fine, really. What matters is that the term googledork conveys the concept that sensitive stuff is on the web, and Google can help you find it. The official googledorks page lists many different examples of unbelievable things that have been dug up through Google by the maintainer of the page, Johnny Long.
http://tinyurl.com/2ywye
Each listing shows the Google search required to find the information, along with a description of why the data found on each page is so interesting.
31
GooPot
According to http://www.techtarget.com, "A honey pot is a computer system on the Internet that is expressly set up to attract and 'trap' people who attempt to penetrate other people's computer systems." For example, build a page that matches the query:
inurl:admin inurl:userlist
Then examine the referrer variable to figure out how the person found the page. This information can help protected normal sites. http://ghh.sourceforge.net/
32
Protecting Yourself
Googledork! Use the techniques outlined in this article (and the full Google Hacker's Guide) to check your site for sensitive information or vulnerable files. SiteDigger from FoundStone automates this.
Uses the Google API so
Only 1000 searches on Google per day
Free beer!
34
SiteDigger 2.0
http://tinyurl.com/28aeh The tool requires Google web services API license key.
Your license key provides you access to the Google Web APIs service and entitles you to 1,000 queries per day.
System Requirements Windows .NET Framework (can be installed using Windows Update)
35
36
37
38
39
Protecting yourself
Consider removing your site from Google's index. http://www.google.com/remove.html
40
Robots.txt
Use a robots.txt file. Web crawlers are supposed to follow the robots exclusion standard. This standard outlines the procedure for "politely requesting" that web crawlers ignore all or part of your web site. This file is only a suggestion. The major search engine's crawlers honor this file and its contents. For examples and suggestions for using a robots.txt file, see http://www.robotstxt.org.
41
Example Robots.txt
User-agent: * Disallow: /images/ Disallow: /stats/ Disallow: /logs/ Disallow: /admin/ Disallow: /comment/ User-agent: Googlebot Allow: User-agent: BecomeBot Disallow: Disallow: / Disallow: * User-agent: MSNBot Disallow: Disallow: / Disallow: * By default tells others to not scan specific paths Allows Google to scan Tells BecomeBot and MSNBot to go away entirely. Please the robots.txt in the root of your HTML documents directory. See also Removing Your Materials from Google How to remove your content from Google's various web properties. http://hacks.oreilly.com/pub/h/220 Robots.txt generator http://tinyurl.com/7pc4k
42
CAPTCHA
Completely Automated Public Turing Test to Tell Computers and Humans Apart
http://www.captcha.net/ http://en.wikipedia.org/wiki/Captcha
43
Google Extras...
Translation and Language options - over 100 to choose from: http://www.google.com/language_tools Stock Quotes - enter stocks:, example: stocks:GOOG Newsgroups - http://groups.google.com Calculator - "1024 minus 768" or "12 to the 10 power" Froogle - http://froogle.google.com Images - http://images.google.com Spell Checking - just type it in: "convienence" Blogger - http://www.blogger.com/start
44
45
References
http://www.informit.com/articles/article.asp?p=170880
46
References
1. Google Hacks: 100 Industrial-Strength Tips & Tools 2. by Tara Calishain, Rael Domfest 3. Protect yourself from Google hacking: http://tinyurl.com/8q3fg 4. Johnny I Hack Stuff: http://johnny.ihackstuff.com 5. Google:http://www.google.com 6. http://www.i-hacked.com/content/view/23/42/ 7. HowStuffWorks: 8. http://computer.howstuffworks.com/search-engine1.htm
47
Interesting Searches
Source http://www.i-hacked.com/content/view/23/42/
intitle:"Index of" passwords modified allinurl:auth_user_file.txt "access denied for user" "using password "A syntax error has occurred" filetype:ihtml allinurl: admin mdb "ORA-00921: unexpected end of SQL command inurl:passlist.txt "Index of /backup "Chatologica MetaSearch" "stack tracking:"
48
Credit Cards
Number Ranges to find Credit Card Numbers
Amex Numbers: 300000000000000..399999999999999 MC Numbers: 5178000000000000..5178999999999999 visa 4356000000000000..4356999999999999
49
Music
You only need add the name of the song/artist/singer. Example: intitle:index.of mp3 jackson
51
CD Images
inurl:microsoft filetype:iso You can change the string to whatever you want, ex. Microsoft to Adobe, .iso to .zip etc
52
Passwords
"# -FrontPage-" inurl:service.pwd FrontPage passwords.. very nice clean search results listing !! "AutoCreate=TRUE password=*" This searches the password for "Website Access Analyzer", a Japanese software that creates webstatistics. For those who can read Japanese, check out the author's site at: http://www.coara.or.jp/~passy/
53
54
IRC Passwords
"sets mode: +k" This search reveals channel keys (passwords) on IRC as revealed from IRC chat logs. eggdrop filetype:user user These are eggdrop config files. Avoiding a fullblown discussion about eggdrops and IRC bots, suffice it to say that this file contains usernames and passwords for IRC users.
55
56
DCForum Passwords
allinurl:auth_user_file.txt DCForum's password file. This file gives a list of (crackable) passwords, usernames and email addresses for DCForum and for DCShop (a shopping cart program(!!!). Some lists are bigger than others, all are fun, and all belong to googledorks. =)
57
MySQL Passwords
intitle:"Index of" config.php This search brings up sites with "config.php" files. To skip the technical discussion, this configuration file contains both a username and a password for an SQL database. Most sites with forums run a PHP message base. This file gives you the keys to that forum, including FULL ADMIN access to the database.
58
59
Serial Numbers
Let's pretend you need a serial number for Windows XP Pro. In the Google search bar type in just like this "Windows XP Professional" 94FBR the key is the 94FBR code.. it was included with many MS Office registration codes so this will help you dramatically reduce the amount of 'fake' sites (usually pornography) that trick you. or if you want to find the serial for WinZip 8.1 "WinZip 8.1" 94FBR
61