• Joomla to WordPress Data Transferring Experience

    No Comments

    One of our recent projects was a Joomla to WordPress Data Transferring Experience. The requirement was to extract items from one website (Joomla) and transfer them into another website (WordPress). The target site was a clothing store built on WooCommerce with shirts, pants, and jackets. Scraping all pictures, colors and sizes were also part of the job.

    Joomla to WordPress data transfer

    We handled everything 100% and got a 5/5 feedback on Upwork. Additionally, we maintained all URLs as SEO friendly as they were in the original site.

    Just wanted to share this ‘simple but important’ information to let our possible customers know about the range of services we can handle.

  • Our Recent Experiences with Price Comparison Tools

    No Comments

    We have been developing price comparison tools for many years now, but never shared any of our experiences here or on any other blog of us. However, this year we have developed two (2) different kind of price comparison tools for 2 different customers whose requirements were very custom for their own.

    I will share both of the experiences here today. One was expecting a comparison between different versions of same products from same source. They basically needed to track changes of price and some other attributes (like colors & sizes) of each product when it comes from their competitors websites. This way they became able to set their own prices for the respective items of their store. We even created an email template to let them know the changed products every morning to their email. We developed a very simple web panel for this customer as they said the view was not very important where data are more important.

    Price Comparison Simple Panel View

    Price Comparison Simple Panel View

    The other experience was related to golf pricing comparison. The client wanted to launch an application where they can show the golf club owners a comparative pricing between his/her club’s course and some closest courses, chosen or configured by them. For active subscribers, the system prepares reports for their choices and sends them email regularly in the morning. This helps them get the competitors prices in a single window at a glance in the morning.

    Price Comparison Smart View for Golf Clubs

    Price Comparison Smart View for Golf Clubs

    In both cases, our job was to softly pull complex information from target sources and then parse them as per the requirements before saving to the database. The scraping frequency was defined by the customer’s requirement.

    If you need a similar solution or any sort of related help, please don’t hesitate to contact us.

    Best regards.

  • Alexa Top One Million Websites’ Details

    No Comments

    We have a new scraper now running for last 6 months (so sorry we often forget to share many things here) for a few clients of us, which can collect details of Alexa’s top one million websites. It is designed for people that needs to analyze websites time to time. Our scraper collects information fields like categories, demographics, keywords, owner contact information (name, email & phone) as well as the general details like global rank, country rank, etc. Currently we scrape the data once a month to deliver updated feed to our clients every month.

    Please take a look at the screenshots:

    Alexa Top 1 Million Websites

    See another for the other fields:

    Alexa Top Websites

    Please feel free to contact us for this service or any similar solution.

  • Scraping and Parsing Google Search Results with the PHP Simple HTML DOM Library

    39 Comments

    The title tells it a lot, but let us give you a view of this post. We have been planning for many months now that we would share our scraping knowledge to people interested in this field. This is because we feel that this field has a lot of opportunities than the number of real experts available. We already created quite a few scraping professionals in our region but the manual teaching could create only this few; we wanted to help a lot more others. Our today’s topic is on one of the most asked/requested crawling tasks e.g. Scraping Google Search. This post is intended for people already have some knowledge on PHP.

    To learn the intention of this post, please be sure you already have the PHP Simple HTML DOM library and took a look at the first page. Read the “Quick Start” block which has stated how to grab image links from a given page.

    In our example, we have searched Google by a keyword “Beautiful Bangladesh” and parsed the names & associated links from the results. Here is link we used to search the keyword in Google:

    http://www.google.com/search?hl=en&safe=active&tbo=d&site=&source=hp&q=Beautiful+Bangladesh&oq=Beautiful+Bangladesh

    Then, we located the first result using Firebug inspector. It gives us the path to the DOM element for the result.

    Scraping Google Search Results

    Scraping Google Search Results

    You can see the result item is accommodated by an “h3″ element of class “r” under which the link is found. This can be easily captured by the HTML DOM Library as below:

    Code Screenshot

    Code Screenshot

    Here, we traversed through all links and extracted titles & links. We found direct link in “href”. But not always the links are straight, Google sometimes keep in a different structure, with their own reference and in a parameter. So we used a regular expression to match and extract the URL in that case. Please learn basics of Perl Compatible Regular Expressions (PCRE) if you’re new to PHP.

    Finally, my result looked like this (as of Jan 08, 2013):

    Google Search Results

    Google Search Results

    If you want the script as a file, please click follow this link:
    http://blog.proscraper.com/wp-content/uploads/2013/01/google_search.zip

    For developers – please feel free to ask questions, we are eager to help you learn technologies related to web scraping.

    For webmasters – please contact us to develop your complex web scraping solution.

    Thanks for reading.

  • Our Horse Racing Scraper is Ready for your Subscription now

    No Comments

    We have been scraping horse racing results for quite a few years now. But it wasn’t ready for subscription. However, we have prepared it for your subscription now. We have currently set the racing for the following countries:
    - USA
    - UK
    - Singapore
    - Hong Kong
    - South Africa
    - Malaysia
    - Zimbabwe
    - Mauritius

    Upon subscription, we get you access to the following:
    - All past results (to date) from your subscribed countries
    - An “On Demand Scraper” to help you instantly scrape results of any date from your subscribed countries
    - Facility to get your results as CSV or in your email
    - Facility to host and analyze your results in the given panel

    These are the current features, but we will be adding more over time based on your feedback and expectations.

    Take a look at our On Demand Scraping:

    Horse Racing Results Screenshot

    Horse Racing Results

    You can download & check a sample CSV result as well: Sample Racing Results

    Our subscription rate is very reasonable:
    Package # 1: For two or less countries -> $30/month
    Package # 2: For four countries -> $40/month
    Package # 3: For all countries -> $55/month

    If you report us any downtime of our service, we will deduct some fair amount from your invoice.

    Please contact us for more detail.

  • RSS Parser & Aggregator for Drupal & WordPress

    No Comments

    We developed a complex RSS Parser and aggregator module for Drupal that can scrape given feeds and create nodes with proper versioning. It doesn’t only merge RSS feeds but also can hanle duplicate items according to the setup in the backend. The module is fully manageable from the backend. Later we developed a similar plugin for WordPress where each item has been added as a post. In both cases, you have ability to add & manage the custom feeds and their data. We have also prepared and deployed CRON version for both cases.

    Beside the above experience, we worked for a UK based property (MLS) website with a number of real estate agents feeding data into the site in a number of various formats including RMv3, BLM files, XML feeds etc. The website is built using Drupal. Our role was to build a Module to process agent feeds & some websites and parse them to feed to the Drupal system. Also, we managed their frontend website to deploy & display those processed data properly.

  • What is Web Scraping?

    No Comments

    The term “Web Scraping” means pulling information from specific web sources in an automated way which is much faster than any manual ways.

    Web Scraping is necessary in many cases where you need a huge amount of data within a short time and with accuracy. For example, you need one million contact addresses (emails & phones) in your region for your business purposes by tomorrow but you don’t have enough time to do so using 10/15 people. Web Scraping can give you a solution. An automated program (we call “Scraper”) can achieve the goal in few hours which is quite impossible by using 15 people.

    We develop tools that can scrape / extract data from your targeted websites. You define the requirements, our scraper extracts data and parses that to yield the best possible output. We handle all underlying complexities that you never going to know.