-
Scraping and Parsing Google Search Results with the PHP Simple HTML DOM Library
January 9th, 2013 | Data Extraction, Screen Scraping, Web Scraping | Tags: Google Crawler . Google Scraping . Learn Screen Scraping . Learn Web Scraping . PHP Simple HTML DOM . Simple HTML DOM No Comments
The title tells it a lot, but let us give you a view of this post. We have been planning for many months now that we would share our scraping knowledge to people interested in this field. This is because we feel that this field has a lot of opportunities than the number of real experts available. We already created quite a few scraping professionals in our region but the manual teaching could create only this few; we wanted to help a lot more others. Our today’s topic is on one of the most asked/requested crawling tasks e.g. Scraping Google Search. This post is intended for people already have some knowledge on PHP.
To learn the intention of this post, please be sure you already have the PHP Simple HTML DOM library and took a look at the first page. Read the “Quick Start” block which has stated how to grab image links from a given page.
In our example, we have searched Google by a keyword “Beautiful Bangladesh” and parsed the names & associated links from the results. Here is link we used to search the keyword in Google:
http://www.google.com/search?hl=en&safe=active&tbo=d&site=&source=hp&q=Beautiful+Bangladesh&oq=Beautiful+Bangladesh
Then, we located the first result using Firebug inspector. It gives us the path to the DOM element for the result.
You can see the result item is accommodated by an “h3″ element of class “r” under which the link is found. This can be easily captured by the HTML DOM Library as below:
Here, we traversed through all links and extracted titles & links. We found direct link in “href”. But not always the links are straight, Google sometimes keep in a different structure, with their own reference and in a parameter. So we used a regular expression to match and extract the URL in that case. Please learn basics of Perl Compatible Regular Expressions (PCRE) if you’re new to PHP.
Finally, my result looked like this (as of Jan 08, 2013):
If you want the script as a file, please click follow this link:
http://blog.proscraper.com/wp-content/uploads/2013/01/google_search.zipFor developers – please feel free to ask questions, we are eager to help you learn technologies related to web scraping.
For webmasters – please contact us to develop your complex web scraping solution.
Thanks for reading.
-
Our Horse Racing Scraper is Ready for your Subscription now
January 4th, 2013 | Data Extraction, Showcase, Web Crawling, Web Scraping | Tags: Horse Racing . Horse Racing Results . Horse Racing Scraper . Racing Results No Comments
We have been scraping horse racing results for quite a few years now. But it wasn’t ready for subscription. However, we have prepared it for your subscription now. We have currently set the racing for the following countries:
- USA
- UK
- Singapore
- Hong Kong
- South Africa
- Malaysia
- Zimbabwe
- MauritiusUpon subscription, we get you access to the following:
- All past results (to date) from your subscribed countries
- An “On Demand Scraper” to help you instantly scrape results of any date from your subscribed countries
- Facility to get your results as CSV or in your email
- Facility to host and analyze your results in the given panelThese are the current features, but we will be adding more over time based on your feedback and expectations.
Take a look at our On Demand Scraping:
You can download & check a sample CSV result as well: Sample Racing Results
Our subscription rate is very reasonable:
Package # 1: For two or less countries -> $30/month
Package # 2: For four countries -> $40/month
Package # 3: For all countries -> $55/monthIf you report us any downtime of our service, we will deduct some fair amount from your invoice.
Please contact us for more detail.
-
RSS Parser & Aggregator for Drupal & WordPress
March 16th, 2012 | Data Aggregation, Data Extraction, Data Parsing, Web Scraping | Tags: Drupal . RSS Aggregator . RSS Parser . WordPress No Comments
We developed a complex RSS Parser and aggregator module for Drupal that can scrape given feeds and create nodes with proper versioning. It doesn’t only merge RSS feeds but also can hanle duplicate items according to the setup in the backend. The module is fully manageable from the backend. Later we developed a similar plugin for WordPress where each item has been added as a post. In both cases, you have ability to add & manage the custom feeds and their data. We have also prepared and deployed CRON version for both cases.
Beside the above experience, we worked for a UK based property (MLS) website with a number of real estate agents feeding data into the site in a number of various formats including RMv3, BLM files, XML feeds etc. The website is built using Drupal. Our role was to build a Module to process agent feeds & some websites and parse them to feed to the Drupal system. Also, we managed their frontend website to deploy & display those processed data properly.
-
What is Web Scraping?
July 25th, 2010 | Data Extraction, Web Scraping | Tags: Data Mining . Website Scraping No Comments
The term “Web Scraping” means pulling information from specific web sources in an automated way which is much faster than any manual ways.
Web Scraping is necessary in many cases where you need a huge amount of data within a short time and with accuracy. For example, you need one million contact addresses (emails & phones) in your region for your business purposes by tomorrow but you don’t have enough time to do so using 10/15 people. Web Scraping can give you a solution. An automated program (we call “Scraper”) can achieve the goal in few hours which is quite impossible by using 15 people.
We develop tools that can scrape / extract data from your targeted websites. You define the requirements, our scraper extracts data and parses that to yield the best possible output. We handle all underlying complexities that you never going to know.
-
Web Scraping Blog to Share Our Scraping Experiences
July 20th, 2010 | Data Extraction, Web Crawling, Web Scraping | Tags: Data Scraping No Comments
Finally, we are starting our blog on web scraping experiences. In this blog, we will write our scraping experiences, case studies and current possibilities on this field. We will also let you know various services that we will provide. It is mainly going to be a place exclusively related to website scraping updates.
Thanks for being with us.








