Building a Steam Recommendation Engine Part 1.5 – Users

Part 0 – Intro
Part 1 – Game List

The gentle tingle of a wine-washed Tropicana bottle has encouraged me to finish the script that grabs Steam user data. That, and some unheeded Netflix telling my subconscious that I’m not drinking and coding alone on a Friday night. My regular-conscious, on the other hand, is fucking jazzed to be drinking and coding.

“To build a recommendation engine, one must be handed recommendations on a platter.” – Albert Ghandi Jesus

If you have a public profile, anyone can hit up at{{your steam name}}/games/?tab=all. Behold! All your games in a full, unpaginated request. Too good to be true? No. It’s better. One of the script tags on the page defines a global js variable called rgGames, a full JSON array of games that you really don’t need to do shit to.

More beholding! A script that just pulls this JSON rgGames object and then wastes some cycles for text output formatting because I don’t know what I’m doing with it yet. (Edit: Well, shit. It looks like steam has a public API for profiles. It requires some extra setup, but I want to be a good steward of their data.)

Dump that output to a text file and out get:

There you have it, the sad, skewed report of my Steam game playtime. This miniscule and distorted data set isn’t going to do a very good job suggesting games. I might as well replace all this code with:

10 PRINT "Assassins Creed III"
20 GOTO 10

A good recommendation uses your previous preferences to infer what you will probably like, and you’ll probably like what other people who share your likes like. So we need to find people like you. Fortunately, I found some “volunteers” to start seeding a database with:

Steam Player Community

Thank you vibrant Steam community portal! The following script walks through discussion forum comment threads and pulls out usernames or profile ids.

There is a few second delay between network calls to not be a complete bitch to Steam’s servers, or flagged as an obvious bot, whatever. After a few minutes we are up to a few thousand profiles. The best part is seeing the same names appear dozens of times in a row. This isn’t Youtube guys, save the comment spam for your diary. So now we have a fairly lengthy list of usernames. ( Edit: I just did a little linux sort | uniq | wc -l magic and got a whopping 252 unique users out of 30K scrapped names… which barely rounds up to 1% unique rate… REALLY? )

blog_part_1_5 of git:// has the project code in the state of this post. Next post will be starting what the title promises, super fucking cool database stuff.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s