For today’s Leetcode July Challenge, we’re going to write an algorithm to find the $k$ most frequent elements from an array of integers. There are a few important special considerations given: we can assume $k$ is always valid i.e. there will always be at least $k$ most frequent elements and that the correct answer for a given input will be unique. Given those, we are asked to write an algorithm with an asympotitic complexity better than O(n log n). Let’s see a sample case first to make sure we understand the problem.

Input: nums = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5], k = 3
Output: [3,4,5]

To write this algorithm, the general approach will be to iterate over the array to accumulate the frequency for each unique integer, then sort that list of unique integers by their frequencies, and just take the top k values. The difficulty here is that most of our sorting algorithms have an asymptotic complexity O(n log n), and we’re trying to get a time complexity better than that.

I have two approaches in mind. First, we can avoid a O(n log n) case by the fact that the problem description explicitly tells us that $k$ will always have a valid answer set. This means that when $k$ is equal to the input size, the $k$ most frequent elements will just be the entire input set. With this, we know that any sorting we do on $k$ frequencies will be less than n, resulting in a complexity O(n log k) for any O(log n) sorting algorithm. You can see how this works with my first algorithm where I essentially perform a partial heapsort of the frequencies, which is the one I prefer due to its simplicity and efficient use of the standard data structures.

vector<int> topKFrequent(vector<int>& nums, int k) {
     if(nums.size() == k) return nums;

	unordered_map<int,int> freqs;
	for(const auto n : nums) {
	  freqs[n]++;
	}
	priority_queue<pair<int,int>> heap;
	// priority queue of a pair will prioritize on pair.first per STL.
	for(const auto [n, count] : freqs) {
	  heap.push({count, n});
	}
     vector<int> topK;
	for(int i=0;i<k;i++) {
	  topK.push_back(heap.top().second);
	  heap.pop();
	}
     return topK;
}

I really like that solution and am satisfied with it, but there’s another solution worth looking despite its drawbacks. This algorithm uses the same approach but uses a bucket sort of the frequencies. One downside is that we have to initialize an empty array as large as the largest frequency, but is mostly empty/0-values, that could be a big memory hit for an input that contains high frequency of a few elements. Another more general consideration, though not a problem here, is that extending the solution to more complex elements would require some additional pointer handling in order to avoid the expensive default initializations.

vector<int> topKFrequent(vector<int>& nums, int k) {
   if(nums.size() == k) return nums;
   
   unordered_map<int,int> freqs;
   int maxFreq = 0;
   for(auto n : nums) {
	  freqs[n]++;
	  maxFreq = max(maxFreq, freqs[n]);
   }
   vector<vector<int>> buckets(maxFreq+1);
   for(auto [num, freq] : freqs) {
	  buckets[freq].push_back(num);
   }
   vector<int> topK;
   for(int i=buckets.size()-1;i>=0 && k>0;i--) {
	  for(auto num : buckets[i]) {
		 topK.push_back(num);
		 k--;
	  }
   }
   return topK;
}

Good times! I always like having a few approaches in my arsenal to approach this sort of probem so this was great practice.