Many online retailers with an affiliate program make their product data available as a 'data feed'. Typically these can either be downloaded as a CSV or XML file.
Manually downloading feeds, cleaning them, extracting the relevant products, and then uploading to your website is a time consuming and often tedious task. It makes sense to automate this process. After all, one thing that computers are very good at is processing large amounts of data very very quickly!
It is a standard requirement to be able to automatically download data feeds on a daily basis, and update your web site as appropriate. This would be performed by an automated script running at a preset time every day (or night). Automated scripts that run on a schedule are often referred to as 'cron jobs'.
So, what happens?
The script connects to the affiliate network's server, and downloads the required data feeds to your own server (shared hosting is fine). Another process is then run to perform any necessary cleaning, modification and categorisation of the data. The clean data is then inserted into a database table that is accessed by the customer facing web site.
In short, your website is kept up to date with current products listed at the correct price, without you having to actually do anything. Well, you could use the time to write some more content, or do some link building :-)
One common problem with data feeds is that the categories often don't match between one merchant and another. For example, Merchant A might put an Ipod 16GB in the Ipod category, while Merchant B puts it in the Mp3 player category. Affiliate networks have attempted to solve this problem by allowing merchants to 'map' their categories against the network's categories. Whilst this does help to a small extent, there are still several problems: Merchants often don't map their categories very well, often opting to dump everything in very general categories. As the networks are providing a categorisation system for just about any product available on the net it is not surprising that often their categories are not defined precisely enough. A further problem is that networks don't have a standardised categorisation system between themselves.
While on the surface it sounds like a hopeless situation, the networks' categories can come in very handy when writing automated categorisation rules as you can choose whether to utilise any of this data on a per merchant basis.
For one reason or another, datafeeds sometimes end up being published with duplicate data in them. i.e. the same product being listed more than once.
Matching more than a few products across retailers for price comparison purposes can be very time consuming. We've developed a system that that combines automated matching with the sophistication of a human actually
