We will write a number of web scraping case studies in this page.
Case Study # 1:
One of our clients needed Amazon reviews, reviewers and their reviewed products information in his Drupal site. He wanted us to customize drupal’s existing feeds module for this purpose so that he can access & use the features of feeds module (like importing schedule, setting custom duplicate handling keys, etc). We developed a custom module for him (which is compatible to feeds module) with two fetchers, parsers & processors for Amazon’s HTML & XML review pages. We then developed a custom scheduler script to automate importing feeds (to put in CRON job). Now this module is continuously scraping amazon data for his website.
Case Study # 2:
A client of us needed various fashion products from different sources all over the world. He had a frontend website designed already but without displayable products. He sent us a list of 100 websites he wants to scrape products from. We discussed and understood his requirements, the fields he wants us to extract and then developed a database & a set of libraries to extract products info to fill the database. We created scripts for its automation, and today our scripts grabbing like 3 million products for him everyday – all are automatically.
Case Study # 3:
Another client of us needed to gather million of videos everyday to build his own video search engine. We understood his requirements and then asked the video sources. He provided his 20 video sources including YouTube, Vimeo & Metacafe. We developed 2 parsers for each of his sources. One, which runs everyday, crawls latest videos from the front pages, and the other we ran only once to crawl old videos only. We grabbed all details and images from the sources using our natural crawling techniques. Now our scripts are contributing to keep his video search engine running. We regularly maintain this project to keep the crawlers good.
Case Study # 4:
Scraping & parsing from SWF files: One of our recent (2014) clients came with a requirement of collecting some millions of contacts from a few SWF files, we analyzed the SWF files and their metadata. We then converted SWF’s raw data to readable text data, and then parsed them to bring clean data to MySQL database system. We then supplied the final result in CSV and XML formats.