4 minutes
Leetcode: LRU Cache
I’ve been slacking off on posts lately, but a friend of mine recently recommended this Leetcode problem, LRU Cache, and I thought it’s interesting enough to warrant a discussion. The problem asks us to implement a fixed capacity key-value storage data structure that maintains a fixed capacity by removing the least recently used key-value pair before inserting a new pair when at capacity. The data structure needs to specifically support two operations, put(key, value) and get(key)–with the additional challenge that these operations have constant runtime performance. So, we should be able to do something like:
auto cache = LRUCache(2); // initialized with a fixed capacity of two key-value pairs
cache.put(1,1);
cache.put(2,2);
cache.get(1); // returns 1
cache.put(3,3);
cache.get(2); // returns -1, key not present
In that example, even though the key-value pair {2,2} is added after the pair {1,1}, accessing the value at key = 1 makes it the most recently used, and thus the pair {2,2} is removed from the cache when another element is added over capacity. If key = 1 had not been accessed, that pair would have been removed from the cache instead.
Now, thinking about the implementation, constant time addition, access, and removal are the primary requirements. A hashmap–with the assumption of a sufficient hashing function–provides constant time complexity, so the question is how do we keep track of how recently used an element is. We could use a heap, but searching the heap is a O(n) operation and removing any non-root node results in the need to reheapify, a O(n log n) operation. The better option here would be to use a doubly-linked list, which provides constant time insertion and removal of elements. The shortcoming of the list is its linear search time, but we can use a little creativity to link the list to our hashmap to provide constant time element access.
For the doubly-linked list, we can use the C++ standard library list to store key-value pairs. Where it gets interesting is how we connect each pair element in the list to a key value in the hash map. In order to use the list element reference to affect contant time removal, it’s not sufficient to just store a pointer to the element. Instead, we’ll use an iterator, which stores the referenced element’s contextual position within the list. This seems like a good approach, which gives us the private data members:
typedef int cacheKey;
typedef int cacheVal;
typedef list<pair<cacheKey,cacheVal>>::iterator cacheNodeItr;
const int capacity;
list<pair<cacheKey,cacheVal>> lruList;
unordered_map<cacheKey,cacheNodeItr> lruMap;
Looking at the operations, we can start with the cache initialization, which is trivially:
LRUCache(int capacity) : capacity(capacity) {}
The two operations are more interesting. Thinking through the get(key) operation, we start at the map. If the map doesn’t contain an element with the matching key, we can simply return -1. But if the key exists, we’ll use that keys associated iterator to access the list element. We should store the referenced key-value pair in local variables, because we then erase that element from the list invalidating the iterator and push the key-value pair as a new element to the front of the list. Finally, we have to assign the key-value pair’s new iterator back to its map key. Oh, also return the stored value.
int get(int key) {
if(!lruMap.count(key)) return -1;
auto itr = lruMap[key];
// must deref the itr before accessing first/second of pair.
int val = (*itr).second;
lruList.erase(itr);
lruList.push_front({key,val});
lruMap[key] = lruList.begin();
return val;
}
The put() operation uses the data members very similarly. To begin, if the map contains the key we’ll just go ahead and erase the linked element from the list. If the element isn’t already in the list, we need to check if we have enough room to add the element and if not, it’s very easy to remove the least recently used element by simply poppping the last element off the last and removing its key from the map, making room for a new element within capacity. With that done, we’ll just push the new pair to the front of the list and map its iterator to its key in the map.
void put(int key, int value) {
if(lruMap.count(key)) {
auto itr = lruMap[key];
lruList.erase(itr);
}
else if(lruList.size() == capacity) {
auto lru = lruList.back();
lruList.pop_back();
lruMap.erase(lru.first);
}
lruList.push_front({key,value});
lruMap[key] = lruList.begin();
}
The completed data structure:
class LRUCache {
private:
typedef int cacheKey;
typedef int cacheVal;
typedef list<pair<cacheKey,cacheVal>>::iterator cacheNodeItr;
const int capacity;
list<pair<cacheKey,cacheVal>> lruList;
unordered_map<cacheKey,cacheNodeItr> lruMap;
public:
LRUCache(int capacity) : capacity(capacity) {}
int get(int key) {
if(!lruMap.count(key)) return -1;
auto itr = lruMap[key];
// must deref the itr before accessing first/second of pair.
int val = (*itr).second;
lruList.erase(itr);
lruList.push_front({key,val});
lruMap[key] = lruList.begin();
return val;
}
void put(int key, int value) {
if(lruMap.count(key)) {
auto itr = lruMap[key];
lruList.erase(itr);
}
else if(lruList.size() == capacity) {
auto lru = lruList.back();
lruList.pop_back();
lruMap.erase(lru.first);
}
lruList.push_front({key,value});
lruMap[key] = lruList.begin();
}
};