Statistics on reddit’s top 10,000 titles with NLTK

Drawing inspiration from this blog post on title virality I wanted to investigate what makes these top 10,000 titles the best of their breed. Which are the best superlatives? Who/what’s the most popular subject? Let’s start with some statistics:

  • On Feb. 03, 14:10:45 (UTC) the all-time top 10,000 submissions on reddit (/r/all) had a total of 82,751,429 upvotes and 62,655,532 downvotes (56.9% liked it).
  • 5.2 years between the oldest and newest submission
  • 8,331,382 comments. That’s about 833 comments per submission.
  • The #1 post has 26,758 – 4,882 = 21,876 points
  • The #10,000 post has 15,166 - 13,679 = 1,487 points
  • And now some graphs….

Adjectives – reddit loves “new”, “old”, “good” and “right”

Adjectives

Top Adjective, Superlative – “Best” is the best

Questions reddit loves how?

Questions

What’s reddit talking about? People.

Or news, the president, man…

Reddit appreciates personal content about you, this, it and I.

Even NLTK doesn’t understand these…

I’m pretty sure you don’t need example links for these…

The top 10,000 seem to come mostly from 17:00 UTC and rarely from around 12:00 UTC

This isn’t exactly the probability of succeeding to hit the front page as it’s not clear at what time submission count is highest. But it’s something.

An apology

This is my first time using NLTK and though I’m ok at coding I most certainly have no idea how to parse natural language. Here’s hoping this was somewhat insightful.

I have no idea what I'm doing

Appendix

4 thoughts on “Statistics on reddit’s top 10,000 titles with NLTK

  1. Pingback: Yuval Greenfield: Statistics on reddit’s top... | Python | Syngu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s