• Scraping and Parsing Google Search Results with the PHP Simple HTML DOM Library

    39 Comments

    The title tells it a lot, but let us give you a view of this post. We have been planning for many months now that we would share our scraping knowledge to people interested in this field. This is because we feel that this field has a lot of opportunities than the number of real experts available. We already created quite a few scraping professionals in our region but the manual teaching could create only this few; we wanted to help a lot more others. Our today’s topic is on one of the most asked/requested crawling tasks e.g. Scraping Google Search. This post is intended for people already have some knowledge on PHP.

    To learn the intention of this post, please be sure you already have the PHP Simple HTML DOM library and took a look at the first page. Read the “Quick Start” block which has stated how to grab image links from a given page.

    In our example, we have searched Google by a keyword “Beautiful Bangladesh” and parsed the names & associated links from the results. Here is link we used to search the keyword in Google:

    http://www.google.com/search?hl=en&safe=active&tbo=d&site=&source=hp&q=Beautiful+Bangladesh&oq=Beautiful+Bangladesh

    Then, we located the first result using Firebug inspector. It gives us the path to the DOM element for the result.

    Scraping Google Search Results

    Scraping Google Search Results

    You can see the result item is accommodated by an “h3″ element of class “r” under which the link is found. This can be easily captured by the HTML DOM Library as below:

    Code Screenshot

    Code Screenshot

    Here, we traversed through all links and extracted titles & links. We found direct link in “href”. But not always the links are straight, Google sometimes keep in a different structure, with their own reference and in a parameter. So we used a regular expression to match and extract the URL in that case. Please learn basics of Perl Compatible Regular Expressions (PCRE) if you’re new to PHP.

    Finally, my result looked like this (as of Jan 08, 2013):

    Google Search Results

    Google Search Results

    If you want the script as a file, please click follow this link:
    http://blog.proscraper.com/wp-content/uploads/2013/01/google_search.zip

    For developers – please feel free to ask questions, we are eager to help you learn technologies related to web scraping.

    For webmasters – please contact us to develop your complex web scraping solution.

    Thanks for reading.

  • Our Horse Racing Scraper is Ready for your Subscription now

    No Comments

    We have been scraping horse racing results for quite a few years now. But it wasn’t ready for subscription. However, we have prepared it for your subscription now. We have currently set the racing for the following countries:
    - USA
    - UK
    - Singapore
    - Hong Kong
    - South Africa
    - Malaysia
    - Zimbabwe
    - Mauritius

    Upon subscription, we get you access to the following:
    - All past results (to date) from your subscribed countries
    - An “On Demand Scraper” to help you instantly scrape results of any date from your subscribed countries
    - Facility to get your results as CSV or in your email
    - Facility to host and analyze your results in the given panel

    These are the current features, but we will be adding more over time based on your feedback and expectations.

    Take a look at our On Demand Scraping:

    Horse Racing Results Screenshot

    Horse Racing Results

    You can download & check a sample CSV result as well: Sample Racing Results

    Our subscription rate is very reasonable:
    Package # 1: For two or less countries -> $30/month
    Package # 2: For four countries -> $40/month
    Package # 3: For all countries -> $55/month

    If you report us any downtime of our service, we will deduct some fair amount from your invoice.

    Please contact us for more detail.