BuzzFeed Content Analysis Part 2

Nicolas Dirand The October 7, 2013

It’s time to extract all the voodoo from the BuzzFeed headline ! 🙂

Here we continue and finish what we started in part 1.

Remenber this network made by measuring nodes modularity and degree?

Degree and modularity

Degree and modularity

By a first look we can see that very high degree node give an idea of what make an article on buzzfeed so interesting to the people.

We can see adjectives like ‘new’ ‘best’ but not that many.
This maybe be prove to the contrary that article titles like ‘top 10 super mega hyper secrets for being smart’ Don’t make much hit after all!

We can classify the node that we see into 3 major category ‘politics and current news’ ‘entertainment’ ‘gossip’ once again it’s doesn’t make much surprise, appart maybe politics.

What is interesting to note is that politics are very event dependent. What matter are the current issues.

About gossip: Celebrities make the majority of it. Awkward, moment, celebrities, dancing, ‘hilarious’ – notice that celebrities are highly connected to sport.
I will point out that photography are connected to lots of adjective like cute , perfect, tweet, world, news. it seem that pictures we are talking about are mainly sourced from tweeter or others social networks and there is the usually ‘funny picture’ like baby picture, Lohan, food and ‘versus’

context of 'love'

context of ‘love’

If we take a direct look at ‘love’ which seem connected to various class of context. We can imagine that love is mostly use to describe our taste for something.


Loving a haircut, loving a movie, loving a celebrity’ essentially this is about giving our opinion, how much you love something. Interesting to see that ‘negative’ terms are not popular.

let’s modify the size of the node by centrality

measuring centrality

measuring centrality

Like we will see later for the pagerank, centrality look like to even the size of the nodes the result are still not surprising but maybe make it more evident to categorize the nodes

New,best,love who are almost of the same class(color) make me believe like I said earlier that opinion and reviews about product or events view in a positive light seem to be highly popular.

Get,make,thing are also very central into their own domain. Here we are talking about How-to, receipts, and others more ‘user centric’ activities (things we could interact with every day in the contrary of politics or celebrities)

Page Rank

Page rank seems to even the node a bit more! The high degree node stay big, but there is a lot of changes from the small nodes.


measuring by page rank

Look at the twitter, star, game, photo,’vs.’, movie they profited from the page rank. What is our reading from this?

Page rank makes it easier to distinguish the category, and a cleaner hierarchy of subjects. I believe it is because of the ability of page rank to be less sensible to nodes of smaller degree, even if in great numbers.



From what we have seen, there are no ‘sensational’  keyword spamming into the titles of popular articles.

Political figures and current US affairs seem to have a high popularity especially all the societal issues like gay marriage.

Serious issue or those who can be perceived as negative seem to be not popular at all.

We can see there is a bias against the ‘negative’ kind of titles.

Peoples are more appealed by positive reviews or opinions.

Celebrity gossip and especially all the materials coming from twitter, (stolen pictures, or self-shoot pictures by celebrity) and videos is highly connected.

Celebrities & sport is heavily influenced by the novelty. Peoples love new games, new movies, new fashion styles.

Amazingly enough Kim Kardashian have her very own category totally disconnected from the others celebrities class.

Photography of the usual subjects like cat or babies still highly popular there is a lot adjectives and noun connected to this class.

Which make me think of the popular post we everywhere on various ‘funny’ web site. Like ‘adorable cat pictures’. The kind of materials we use to make meme, Or demotivational poster.

What would be my recommendation to a publisher who is now trying to make a very popular content?

  1. Fresh content. Follow the news! For whatever subjects but specially entertainment keep it as fresh as possible.
  2. About politics, stay into the societal matter. Politics is view today as an entertainment and peoples want it as an entertainment.
  3. Stay positive! If you write reviews or give your opinions stay highly positives!  What ‘we love’ win them all!
  4. Videos about sport and celebrities.
  5. Photos are about meme, demotivational posters, and funny pictures. The old receipts are still kicking!
  6. Do not use ‘SEO optimized’ title. The good old ‘top 10 hyper mega super ubber secret to stay young forever that my mom told me’ look like to not work anymore.

I suspect there is a fatigue about that kind of ‘tricks’ especially after the update of Google to detect spammy content.

If I had to produce content from those recommendation. I would start to monitor the twitter of celebrities and very well know political figures and construct articles on the fly with my own ‘positive review’ about the material they produces. (Self-portrait (Rihanna as the habit to take pictures of herself and post it on twitter), new movies, new songs etc.)

I would love to address that there sadly a lot more to do!

And my knowledge is not yet ready to handle dynamic networks (which I’m planning to learn) because seeing those networks growing in time could have maybe a predictive value about what would the ‘next big things’ especially from the content producer point of view.

There is also a possibility to generate ‘idea’ from those networks and a simple random walk algorithm to generate a vocabulary of possible articles titles.

I also didn’t use very much the weights between the nodes because it gave almost the same results as the simple unweighted degree. But I feel there is something we could use about this. Maybe ‘encode’ the position of the windows into the weights and give a higher weight to a windows nears the beginning of a phrase.

Thanks you

-Nicolas Dirand

  • Facebook
  • Twitter
  • Delicious
  • Digg
  • Newsvine
  • RSS
  • StumbleUpon
  • Technorati