Building a Steam Recommendation Engine Part 1 – Steam Game List

Part 0 – Intro

On a semi-related note, I registered the domain “steamredux.com” for this project. Apparently, redux is Latin for “revived from the dead”. That means SteamRedux makes all of zero sense. Aaaaaaand there goes ten bucks that could have bought me a six pack of something 6.7%.

Googling for a list of Steam games got me nowhere. I landed on the Steam search page which contained a promising little footer showing 1 - 25 of 6691, indicating that I could just walk through the pages to find every Steam game available. This revelation was “the hard part” of creating a recommendation engine, probably.

My day job is writing javascript with jQuery for DOM manipulation, and my Github account has a couple ignored nodejs projects. This technology stack is perfect for screen scraping. Instead, the code here uses Ruby and the Nokogiri gem. I don’t know Ruby at all (or Latin it seems). Ruby has a bunch of pimp features that I’m not using. Please gloss over the completely synchronous network calls and unclear code, it’s not a reflection of the language, just my recovery Sunday laziness. Even my dog is exhausted from a big night chewing bones. (Something something your mom)

Pup Taking a Nap

Okay, brass tacs. The Steam search page http://store.steampowered.com/search/ ajaxes in full blocks of HTML from http://store.steampowered.com/search/results with a “page” GET parameter. In that page is an empty div full of blocks that look like this:

The highly sophisticated code below is what fetches all the games, and parses out the content from the above block of HTML a few thousand times. Thank you computer. Comments say what each part does because making 8 Gists sounds worse than a hangover.

If you are only here to rip off a list of games, open this up in excel. Otherwise clone it like you know what you’re doing.

git://github.com/johnnyfuchs/steamredux.git

Branch “blog_part_1” will have the code for this post. Next week I might go down the rabbit whole of putting these in a Neo4j database. I also might be setting up the script to scrape user data. Whichever I’m least likely to not do.

Advertisements
Standard