• Scraping and Parsing Google Search Results with the PHP Simple HTML DOM Library

    Data Extraction, Screen Scraping, Web Scraping January 9th, 2013 21 Comments

    The title tells it a lot, but let us give you a view of this post. We have been planning for many months now that we would share our scraping knowledge to people interested in this field. This is because we feel that this field has a lot of opportunities than the number of real experts available. We already created quite a few scraping professionals in our region but the manual teaching could create only this few; we wanted to help a lot more others. Our today’s topic is on one of the most asked/requested crawling tasks e.g. Scraping Google Search. This post is intended for people already have some knowledge on PHP.

    To learn the intention of this post, please be sure you already have the PHP Simple HTML DOM library and took a look at the first page. Read the “Quick Start” block which has stated how to grab image links from a given page.

    In our example, we have searched Google by a keyword “Beautiful Bangladesh” and parsed the names & associated links from the results. Here is link we used to search the keyword in Google:

    http://www.google.com/search?hl=en&safe=active&tbo=d&site=&source=hp&q=Beautiful+Bangladesh&oq=Beautiful+Bangladesh

    Then, we located the first result using Firebug inspector. It gives us the path to the DOM element for the result.

    Scraping Google Search Results

    Scraping Google Search Results

    You can see the result item is accommodated by an “h3″ element of class “r” under which the link is found. This can be easily captured by the HTML DOM Library as below:

    Code Screenshot

    Code Screenshot

    Here, we traversed through all links and extracted titles & links. We found direct link in “href”. But not always the links are straight, Google sometimes keep in a different structure, with their own reference and in a parameter. So we used a regular expression to match and extract the URL in that case. Please learn basics of Perl Compatible Regular Expressions (PCRE) if you’re new to PHP.

    Finally, my result looked like this (as of Jan 08, 2013):

    Google Search Results

    Google Search Results

    If you want the script as a file, please click follow this link:
    http://blog.proscraper.com/wp-content/uploads/2013/01/google_search.zip

    For developers – please feel free to ask questions, we are eager to help you learn technologies related to web scraping.

    For webmasters – please contact us to develop your complex web scraping solution.

    Thanks for reading.

21 Responses to Scraping and Parsing Google Search Results with the PHP Simple HTML DOM Library

  1. Keliweb says:

    Thanks very much for releasing the code, it would be useful for our purposes :)

  2. kris says:

    Hi

    Great script.

    How would you modify it to scrape AdWords section?

  3. Giannis says:

    Nice job,

    Can you please share any easy way to get the number of indexed pages?

  4. debhu says:

    hi, thanks for sharing this code, i already use it with very happy. I tried the same way to collect the “Searches related to beautiful bangladesh” keyword at your Google Search Results but have no idea to do this. I can’t find the element and the class using firebug inspector. please your advice.

  5. scrameUP says:

    You extracted titles & links,but
    can you add/extract the description/snippet part of the result?

    (In your 1st result(googlesearch.jpg) snippet part says:
    “17 Feb 2011 – GREY Dhaka did it again! Creative idea:…” )

    Cheers.

  6. Matt says:

    Works great – thanks! Is there any danger of Google blocking requests through this method? I presume sensible rate limiting is needed to avoid blocked requests?

  7. Praveen says:

    Hey,

    Thanks for the code.

    By the way, How to display the description of each result along with the title and link? Code for that would be helpful.

    Also how to display more than 10 results for the search term?

    Thanks in advance!

  8. Totgia says:

    Maybe google has changed the way it displays results, i can no longer see any relevant text in the page source.

  9. Website Scraper says:

    Nice tutorial on scraping Google search result with simple use of php. Thanks for giving the code it will really useful.

  10. htk says:

    I’ve got problem with this script – I have the blank page. Can anybody help me?

  11. Fedot says:

    How to setup to get more then 10 results?
    Thank you!

  12. chat says:

    not sure if this comment will get a reply or even be approved, but this PHP script is exactly what i need. but the last activity this post has seen was almost a year ago. :(

    so i downloaded and ran this script, but when i compare search results with the output of this PHP script and results on Google (no altered results for location or previous browsing history), the results are different.

    they’re close to the same, but different result items are in different places, position wise.

    any input on this issue? please? :)

  13. Hotel.Rome.it says:

    Really useful, thank you so much for sharing this!

  14. Hubert says:

    I see you share interesting content here, you can earn some additional cash,
    your blog has big potential, for the monetizing method, just type in google.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*  


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>