Freemindscafe

Trending problem in twitter

#1

Problem Statement

We need to display the accounts that are trending the most in last hour. If a tweet is getting re-tweeted or it is getting liked more; then we can say that account is trending.

Solution #1

Solution

  • We can have one account table and the other tweet table. This tweet table can have columns like account_id, tweet_text, like_count, retweet_count, insert_time and last_upd_time.
  • Whenever a tweet gets liked or retweeted then we can increment the corresponding counts in _tweet_table.
  • Then we can do the query against the _tweet_table and can fetch the accounts which have most likes and retweets combined in last hour so.
  • We can use the last_upd_time column to select only those tweets that saw some activity in last hour.

Issue

  • But there is an issue with this approach as like and retweet count is an aggregated number, then even the previous count can be counted in last hour

Solution #2

Solution

  • Instead of storing the like and retweet count in the same _tweet_table, we can move this information to tweet_activity table. This table acts as a log table and there will only be inserts in this table.
  • We can than query against this table and can do aggregates against the accounts and can show the accounts that are trending the most
  • We can partition this tweet_activity table by day so that the queries are done against the limited volume of data.
  • This strategy might work if we have 100s of users and say 2-3 tweets per day with 2-3 activities per tweet, because then the data volumn per day will be like in thousands.

Issue