Secret societies of reddit

Out there in the wild internet there are many dark corridors and places we’ll never be able to visit. Understandably. But on reddit?! I think the people deserve to at least vaguely know the inner workings of their contentocracy. Here’s a list of a few most of us can only see the closed door of:

What are we voting for? What’s running this voting machine?

</tinfoil_hat>

redditp – a fullscreen presentation with reddit

tl;dr – add a “p” before the “.com” to any subreddit you visit and voila, you have a fullscreen presentation of all the images.

I like to show my friends cool stuff on the internet but browsing is a real conversation killer. You can’t really lean back, talk and have fun with friends while operating a website, surely not one as clunky as reddit. Even though RES does help.

So I just had to make this “hands-free” reddit mode. Where I can see:

Easy!

Welp, not that easy, there was a lot of CSS to handle and the design right now is dead ugly but functional. Also, many stories on reddit aren’t images and I skip those that aren’t in a quirky way. If the url’s 4th character from the right is a dot, I display it. That’s a hack that works for imgur (which is most of reddit’s images) so I’m using it for now until I have more time to fix it. Any suggestions are more than welcome - help improve redditp on github! Also, comics are a pain to watch right now. I might implement some sort of scroll wheel zooming in the future, though that really is a bit of a different use case that might deserve a different site.

I guess not too surprisingly the first 200 visits where mostly to gonewild. You internet you….

edit - here are some stats from the launch night

redditp launch night stats

redditp launch night stats

Ah the old Reddit switch-a-roo analyzed

So after clicking through what seemed an infinite amount of tabs from one of these switcheroo comments I finally wrote down the script which analyzed the graph. I’d suggest you ignore the following png and take a gander at the network pdf of the switcharoo graph because you can click through to the links.

The old reddit switch-a-roo analyzed image

To recap – 50 nodes, 52 edges, though there are probably more out there that point into some point of that chain. And here are the awards:

There. I hope that didn’t take away from the magic.

Appendix – The hardships

This was overly hard to do – first of all NSFW links gave me the “are you over 18?” prompt which for some reason I wasn’t able to solve by cookies. I eventually turned to the mobile version of the site (append “.compact”) to avoid the prompts completely. Also, matplotlib and networkx aren’t that fun for drawing graphs it seems. To visualize and output the graph I eventually used gephi which was somewhat easy although has it’s clunkiness baggage.

Statistics on reddit’s top 10,000 titles with NLTK

Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:

  • On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).
  • 5.2 years between the oldest and newest submission
  • 8,331,382 comments. That’s about 833 comments per submission.
  • The #1 post has 26,758 – 4,882 = 21,876 points
  • The #10,000 post has 15,166 - 13,679 = 1,487 points
  • And now some graphs….

Adjectives – reddit loves “new”, “old”, “good” and “right”

Adjectives

Top Adjective, Superlative – “Best” is the best

Questions reddit loves how?

Questions

What’s reddit talking about? People.

Or news, the president, man…

Reddit appreciates personal content about you, this, it and I.

Even NLTK doesn’t understand these…

I’m pretty sure you don’t need example links for these…

The top 10,000 seem to come mostly from 17:00 UTC and rarely from around 12:00 UTC

This isn’t exactly the probability of succeeding to hit the front page as it’s not clear at what time submission count is highest. But it’s something.

An apology

This is my first time using NLTK and though I’m ok at coding I most certainly have no idea how to parse natural language. Here’s hoping this was somewhat insightful.

I have no idea what I'm doing

Appendix