Building a Steam Recommendation Engine Part 1.5 – Users

Part 0 – Intro
Part 1 – Game List

The gentle tingle of a wine-washed Tropicana bottle has encouraged me to finish the script that grabs Steam user data. That, and some unheeded Netflix telling my subconscious that I’m not drinking and coding alone on a Friday night. My regular-conscious, on the other hand, is fucking jazzed to be drinking and coding.

“To build a recommendation engine, one must be handed recommendations on a platter.” – Albert Ghandi Jesus

If you have a public profile, anyone can hit up at http://steamcommunity.com/id/{{your steam name}}/games/?tab=all. Behold! All your games in a full, unpaginated request. Too good to be true? No. It’s better. One of the script tags on the page defines a global js variable called rgGames, a full JSON array of games that you really don’t need to do shit to.

More beholding! A script that just pulls this JSON rgGames object and then wastes some cycles for text output formatting because I don’t know what I’m doing with it yet. (Edit: Well, shit. It looks like steam has a public API for profiles. It requires some extra setup, but I want to be a good steward of their data.)

Dump that output to a text file and out get:

There you have it, the sad, skewed report of my Steam game playtime. This miniscule and distorted data set isn’t going to do a very good job suggesting games. I might as well replace all this code with:

10 PRINT "Assassins Creed III"
20 GOTO 10

A good recommendation uses your previous preferences to infer what you will probably like, and you’ll probably like what other people who share your likes like. So we need to find people like you. Fortunately, I found some “volunteers” to start seeding a database with:

Steam Player Community

Thank you vibrant Steam community portal! The following script walks through discussion forum comment threads and pulls out usernames or profile ids.

There is a few second delay between network calls to not be a complete bitch to Steam’s servers, or flagged as an obvious bot, whatever. After a few minutes we are up to a few thousand profiles. The best part is seeing the same names appear dozens of times in a row. This isn’t Youtube guys, save the comment spam for your diary. So now we have a fairly lengthy list of usernames. ( Edit: I just did a little linux sort | uniq | wc -l magic and got a whopping 252 unique users out of 30K scrapped names… which barely rounds up to 1% unique rate… REALLY? )

blog_part_1_5 of git://github.com/johnnyfuchs/steamredux.git has the project code in the state of this post. Next post will be starting what the title promises, super fucking cool database stuff.

Advertisements
Standard

Building a Steam Recommendation Engine Part 1 – Steam Game List

Part 0 – Intro

On a semi-related note, I registered the domain “steamredux.com” for this project. Apparently, redux is Latin for “revived from the dead”. That means SteamRedux makes all of zero sense. Aaaaaaand there goes ten bucks that could have bought me a six pack of something 6.7%.

Googling for a list of Steam games got me nowhere. I landed on the Steam search page which contained a promising little footer showing 1 - 25 of 6691, indicating that I could just walk through the pages to find every Steam game available. This revelation was “the hard part” of creating a recommendation engine, probably.

My day job is writing javascript with jQuery for DOM manipulation, and my Github account has a couple ignored nodejs projects. This technology stack is perfect for screen scraping. Instead, the code here uses Ruby and the Nokogiri gem. I don’t know Ruby at all (or Latin it seems). Ruby has a bunch of pimp features that I’m not using. Please gloss over the completely synchronous network calls and unclear code, it’s not a reflection of the language, just my recovery Sunday laziness. Even my dog is exhausted from a big night chewing bones. (Something something your mom)

Pup Taking a Nap

Okay, brass tacs. The Steam search page http://store.steampowered.com/search/ ajaxes in full blocks of HTML from http://store.steampowered.com/search/results with a “page” GET parameter. In that page is an empty div full of blocks that look like this:

The highly sophisticated code below is what fetches all the games, and parses out the content from the above block of HTML a few thousand times. Thank you computer. Comments say what each part does because making 8 Gists sounds worse than a hangover.

If you are only here to rip off a list of games, open this up in excel. Otherwise clone it like you know what you’re doing.

git://github.com/johnnyfuchs/steamredux.git

Branch “blog_part_1” will have the code for this post. Next week I might go down the rabbit whole of putting these in a Neo4j database. I also might be setting up the script to scrape user data. Whichever I’m least likely to not do.

Standard

Building a Steam Recommendation Engine Part 0

I thought it’d be fun to make a Steam game rating site. But apparently (and expectedly) that exists. Instead, I plan to figure out how graph databases work. This was not inspired by Facebook’s graph search, because I can’t possibly see how my “friends” restaurant recommendations could be more valuable than, you know, an aggregate of everyone’s or experts’. Nor was it inspired by the amazing ability to see which of my friends of friends like dogs or cheese or both. Spoiler: all of them.

A little more inspiring is the use of graph databases in studying gene interaction, illness diagnosis, path finding, and to the point, a video game recommendation engine.

Mostly to keep me focused, I’ll be chronicling the progress of putting this loosely planned project together. We (royal ‘we’, not author and readers ‘we’) are going to need a few things to make this happen.

  1. A list of Steam games. Just scraping these.
  2. Some Steam users and the lists of the games that they play or like. I dunno yet. Maybe I can scrape these too.
  3. A graph database! I’m going to use Neo4j because I heard a talk on it and Heroku has a fee addon. Plus, I trust companies of Swedish origin. Right? IKEA?
  4. A little web site to, you know, recommend stuff to people. I’ll do that last, or part way.

Four things? Nice project management, we. Next post I’ll cover part one, because I already wrote that.

Standard