July 19th, 2016 |
Data Marketplace, Professional Data Services, Web Crawling, Web Scraping
For last few months, we have been preparing for a marketplace of our own to sell our existing datasets / databases. Finally it happened recently, we now have the platform ready for you with about 45 datasets in the beginning. You can visit the marketplace here – http://data.proscraper.com
While we are developing more databases for this platform from our decade-long experiences, we have data items in several categories at this moment:
- Business Listing
- Clothing & Accessories
- Real Estate
- Yellow Pages
- Food & Dining
- Health & Medicine
- Service Providers
However, we are not limited to these categories only, not even limited to the listed data items. You can customize your need with us.
In the detail page of each of our data item, we have options to show the data count, data fields, etc. as well as option to download a sample CSV file.
The given screenshot above was from our ‘Alexa Top Sites‘ data item’s page. The number in ‘Data Count’ is customizable (extra charge may or may not require), we gave one decent size that are often asked for.
We will post more updates from our ‘Data Marketlace’ here as it happens. Please suggest anything that might come to our help.
July 14th, 2016 |
Data Extraction, Scraping for CMS, Web Scraping
Joomla Scraping . Joomla to WordPress
One of our recent projects was a Joomla to WordPress Data Transferring Experience. The requirement was to extract items from one website (Joomla) and transfer them into another website (WordPress). The target site was a clothing store built on WooCommerce with shirts, pants, and jackets. Scraping all pictures, colors and sizes were also part of the job.
We handled everything 100% and got a 5/5 feedback on Upwork. Additionally, we maintained all URLs as SEO friendly as they were in the original site.
Just wanted to share this ‘simple but important’ information to let our possible customers know about the range of services we can handle.
April 23rd, 2016 |
Data Extraction, Data Parsing, Web Crawling, Web Scraping
Golf Prices Comparison . Price Comparison . Price Comparison Tools . Shopping Comparison
We have been developing price comparison tools for many years now, but never shared any of our experiences here or on any other blog of us. However, this year we have developed two (2) different kind of price comparison tools for 2 different customers whose requirements were very custom for their own.
I will share both of the experiences here today. One was expecting a comparison between different versions of same products from same source. They basically needed to track changes of price and some other attributes (like colors & sizes) of each product when it comes from their competitors websites. This way they became able to set their own prices for the respective items of their store. We even created an email template to let them know the changed products every morning to their email. We developed a very simple web panel for this customer as they said the view was not very important where data are more important.
The other experience was related to golf pricing comparison. The client wanted to launch an application where they can show the golf club owners a comparative pricing between his/her club’s course and some closest courses, chosen or configured by them. For active subscribers, the system prepares reports for their choices and sends them email regularly in the morning. This helps them get the competitors prices in a single window at a glance in the morning.
In both cases, our job was to softly pull complex information from target sources and then parse them as per the requirements before saving to the database. The scraping frequency was defined by the customer’s requirement.
If you need a similar solution or any sort of related help, please don’t hesitate to contact us.
March 4th, 2016 |
Screen Scraping, Web Crawling, Web Scraping, Web Scraping Technology
PhantomJS . PhantomJS for Screenshot . PhantomJS for Web Scraping . PhantomJS with PHP
We always do experiments with some technologies and finally start using some of them regularly when we regularly find them efficient for our work. PhantomJS (along with one of its related module named “scraperjs”) had been one such for a long time, then decided to use it for projects that might be made even better for its inclusion. In this post, I will try to inform you how this has made our web scraping much better than before. Not in all cases, but in many cases.
What is PhantomJS?
Among various features, we mainly used the “Screen Capture” and “Page Automation” features in our projects. The first one helped us capture screenshots (thumbnails, etc.) as well as fully rendered HTML contents. And the latter helped us manipulate the page contents with DOM API and jQuery. This “Screen Capture” is really great because you’ll feel like your script is navigating like a real browser. You can make your script wait only till your required content loads.
PhantomJS has a lot of settings to help customize your requirements during the crawling process. You can set user agent, cookies, screen size for screenshots, timeout duration, you can inject your JS script and many other things.
To take a quick look, check this page – http://phantomjs.org/page-automation.html
PHP Integration with PhantomJS:
We use a lot of PHP in our work. So we needed a way to work the PhantomJS capabilities with our PHP scripts. We struggled initially, but then we succeeded and now its producing all amazing results. PHP integration is useful in many ways, especially when need to keep track from a database and/or save results into the database or manipulate some results along with some other results that are processed by other complex PHP methods.
That’s it for PhantomJS. We will share more in the coming days. Please feel free to contact us for any of your web scraping needs.
December 6th, 2015 |
Data Extraction, Web Crawling, Web Scraping
Alexa Top Websites
We have a new scraper now running for last 6 months (so sorry we often forget to share many things here) for a few clients of us, which can collect details of Alexa’s top one million websites. It is designed for people that needs to analyze websites time to time. Our scraper collects information fields like categories, demographics, keywords, owner contact information (name, email & phone) as well as the general details like global rank, country rank, etc. Currently we scrape the data once a month to deliver updated feed to our clients every month.
Please take a look at the screenshots:
See another for the other fields:
Please feel free to contact us for this service or any similar solution.
December 24th, 2014 |
Data Aggregation, Web Crawling, Web Scraping
Free Web Data Platform . Free Web Scraping Tool
A lot happening all the time, a lot new platforms came into the horizon. Some saying they can do everything for you within a simple setup, some would say they are doing the best ever stuffs in the web scraping / extracting field and minimized your time and expense. But still the essence of a developer-based web scraping is much more demanding than those automated tools.
One of our new clients recently sent me a message saying “I’ve attempted to download and use at least 12 different programs and non of them do what is expected. I appreciate your help. “. We discussed his requirements and gave solutions that his business needed urgently, ofcourse within a reasonable cost and time. We hear from such unhappy webmasters almost every week and bring smiles on their faces after sometime.
Before writing this post, we the team members created accounts on some of those platforms and tested them in various ways. The experience was very very time killing, we got wrong results and we had no way to check how many wrong results we received. So this is simply not for serious business and professional people. If you need to depend on your data, you only need the accurate one and if you have somebody responsible for that, you can stay relaxed.
Please feel free to communicate with us to discuss any web scraping project.
January 9th, 2013 |
Data Extraction, Screen Scraping, Web Scraping
Google Crawler . Google Scraping . Learn Screen Scraping . Learn Web Scraping . PHP Simple HTML DOM . Simple HTML DOM
The title tells it a lot, but let us give you a view of this post. We have been planning for many months now that we would share our scraping knowledge to people interested in this field. This is because we feel that this field has a lot of opportunities than the number of real experts available. We already created quite a few scraping professionals in our region but the manual teaching could create only this few; we wanted to help a lot more others. Our today’s topic is on one of the most asked/requested crawling tasks e.g. Scraping Google Search. This post is intended for people already have some knowledge on PHP.
To learn the intention of this post, please be sure you already have the PHP Simple HTML DOM library and took a look at the first page. Read the “Quick Start” block which has stated how to grab image links from a given page.
In our example, we have searched Google by a keyword “Beautiful Bangladesh” and parsed the names & associated links from the results. Here is link we used to search the keyword in Google:
Then, we located the first result using Firebug inspector. It gives us the path to the DOM element for the result.
You can see the result item is accommodated by an “h3″ element of class “r” under which the link is found. This can be easily captured by the HTML DOM Library as below:
Here, we traversed through all links and extracted titles & links. We found direct link in “href”. But not always the links are straight, Google sometimes keep in a different structure, with their own reference and in a parameter. So we used a regular expression to match and extract the URL in that case. Please learn basics of Perl Compatible Regular Expressions (PCRE) if you’re new to PHP.
Finally, my result looked like this (as of Jan 08, 2013):
If you want the script as a file, please click follow this link:
For developers – please feel free to ask questions, we are eager to help you learn technologies related to web scraping.
For webmasters – please contact us to develop your complex web scraping solution.
Thanks for reading.
January 4th, 2013 |
Data Extraction, Showcase, Web Crawling, Web Scraping
Horse Racing . Horse Racing Results . Horse Racing Scraper . Racing Results
We have been scraping horse racing results for quite a few years now. But it wasn’t ready for subscription. However, we have prepared it for your subscription now. We have currently set the racing for the following countries:
- Hong Kong
- South Africa
Upon subscription, we get you access to the following:
- All past results (to date) from your subscribed countries
- An “On Demand Scraper” to help you instantly scrape results of any date from your subscribed countries
- Facility to get your results as CSV or in your email
- Facility to host and analyze your results in the given panel
These are the current features, but we will be adding more over time based on your feedback and expectations.
Take a look at our On Demand Scraping:
You can download & check a sample CSV result as well: Sample Racing Results
Our subscription rate is very reasonable:
Package # 1: For two or less countries -> $30/month
Package # 2: For four countries -> $40/month
Package # 3: For all countries -> $55/month
If you report us any downtime of our service, we will deduct some fair amount from your invoice.
Please contact us for more detail.
November 26th, 2012 |
Property Scraping, Scraping in Drupal, Web Crawling, Web Scraping
Crawling Property Contents . Property Scraping . Real Estate Scraping
As you know (by our site) we have a real estate property scraping framework, we wanted to keep you updated on some latest activities on this. We recently crawled quite a few property sites using the framework. Our crawling was quick and we were able to deliver results within a very short time; and ofcourse at a very reasonable cost.
Can not share all the sites, but only two (per the clients’ consent):
If you’re in a similar need and want to see a sample results before trying our real estate scraping service, please feel free to contact us. We will happily welcome you and provide sample data in your preferred format.
November 26th, 2012 |
Scraping for CMS, Scraping in Drupal, Web Crawling, Web Scraping
We have customized Drupal’s feeds module to aggregate custom data from multiple sources including websites, feeds, local or remote databases, local files or medias, etc. We applied huge customization to add ability to do many things. For example - our customization for the “Fetcher” adds strength to download from complex sites requiring a login to access data. Also, we added ability to parse many difficult data types in the parser. And our customization on the “Processor” can insert/update multiple nodes/destinations & serve many purposes like rating update, referencing other nodes, posting to twitter/facebook, and many more.
Please feel free to contact us for any custom aggregation services in your Drupal site.