Click here to Skip to main content
15,889,651 members
Articles / All Topics

Creating a Price Crawler to Stay Competitive – A Creative, White Hat Hack

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
14 Sep 2011CPOL4 min read 21K   4   2
Reason for sharing this story:  an example of what I think is thinking outside of the box.

Reason for sharing this story:  an example of what I think is thinking outside of the box.  I was a bit younger when I did this, so at the time, it was sort of a big deal for me.

Several years ago, I was presented a challenge from the CEO/owner of the company I was working for at the time.  Bottom line: He wanted his web store prices to compete with one of his major competitors.  One of many “Is there a way…” requests our small little IT shop got from time to time.

He wanted a [digital] price sheet from our competitor in order to create a new pricing strategy (at the time we were basically adding a flat markup to our cost).  At first, I was hesitant to think there was even a way to do this without nefarious activity, thinking that the competing company wouldn’t publish a public spreadsheet of their prices.  So the question then became “How can I get a price for each product that we also sell?”

First thing’s first: Their website.  Seemed basic enough – categorized product structure – and a nice search box.  But there was also an “Advanced Search” option.  I clicked the link and saw that a user could specify to search ONLY for manufacturer part number (MPN).  So I went ahead and searched for a product we also sold (I think you may know where I’m going here…).

Their developers were very nice and because there was only one result for that product, the search page redirected straight to the product page – and even better, all the fields from the form were sent via HTTP GET.  Once I saw that I realized I was already half way done.

So, next I parsed out the URL/query strings and figured out which variable to change to the desired MPN.

Example:

http://www.domain.com/search.aspx?foo=bar&moe=search&ad=y&mpn=xyz

I tested it directly with another MPN, and bam – it sent me straight to THAT product’s page.

Our group used ColdFusion (CF) at the time, so the next step was to create a dummy page to see if I could program a simple script to grab these result pages for a list of given MPNs.  Using CFHTTP (of course, you can do this with all kinds of different languages), and dumped out the result.  Of course, it was a whole bunch of HTML code.

Then, I had to Google some basic regular expressions, because I knew I would need one or two (and to refresh my memory on how to use them).  Then, I searched though that mess of HTML, and found some particular text patterns that repeated right before the price and right after.  Next, using some basic ColdFusion functions, I created the search routines to find those patterns, stripped out all the stuff before and after my string patters, and there it was – just the price.  I almost couldn’t believe how quickly it took to get this done.

So the final task was to figure out how to automate this and store my results into a database.  I could grab a bunch of MPNs from our database and loop through them – but it was a ColdFusion script, so it would probably time out.  Then, I realized we had just purchased a Google Search Appliance that crawled our webservers.  I created a single CF script that could take a MPN as a query string, parse our competitor’s site’s response, and stick the price in a database.  Then, I created one more script that basically generated a paginated HTML page that outputted links to my first CF script.

http://intranetlocation/crawl/page.cfm?MPN=xyz

…..

…..

….

NEXT (and so on)

Almost done!  Then working with my boss, we configured the Google Search Appliance to crawl my scripts(s), added some time checking (so we weren’t banging the competitor’s website too hard and often), and a few other tweaks, and had the crawler start.  That was pretty much about it.

After a couple of weeks of crawling, I presented our data to the CEO, and he then created the new pricing strategy used for web pricing! Mission accomplished!

That’s it in a nutshell – I left a lot of the detailed technical stuff out because I just wanted to relay the sentiment of the task I was given.  It was just one of my experiences where I almost instinctively wanted to say no this can’t be done, but after some creative thinking, the job was complete, done so in a legal and ethical manner, and was a huge money saver.   Sounds like a hack job I know, and maybe in some ways it is, but not an illegal hack.  It just automated the process somebody could easily do manually.

Am I right or wrong? – I would genuinely enjoy arguments for both sides.

Thanks for reading!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
QuestionDownvote Pin
John Y.15-Sep-11 5:23
John Y.15-Sep-11 5:23 
AnswerRe: Downvote Pin
coldhandslol2-Aug-15 22:13
coldhandslol2-Aug-15 22:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.