Problem Statement
We need to display the accounts that are trending the most in last hour. If a tweet is getting re-tweeted or it is getting liked more; then we can say that account is trending.
Solution #1
Solution
- We can have one account table and the other tweet table. This tweet table can have columns like account_id, tweet_text, like_count, retweet_count, insert_time and last_upd_time.
- Whenever a tweet gets liked or retweeted then we can increment the corresponding counts in _tweet_table.
- Then we can do the query against the _tweet_table and can fetch the accounts which have most likes and retweets combined in last hour so.
- We can use the last_upd_time column to select only those tweets that saw some activity in last hour.
Issue
- But there is an issue with this approach as like and retweet count is an aggregated number, then even the previous count can be counted in last hour
Solution #2
Solution
- Instead of storing the like and retweet count in the same _tweet_table, we can move this information to tweet_activity table. This table acts as a log table and there will only be inserts in this table.
- We can than query against this table and can do aggregates against the accounts and can show the accounts that are trending the most
- We can partition this tweet_activity table by day so that the queries are done against the limited volume of data.
- This strategy might work if we have 100s of users and say 2-3 tweets per day with 2-3 activities per tweet, because then the data volumn per day will be like in thousands.
Issue