295. Find Median from Data Stream
Description of the Problem
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
For example, for arr = [2,3,4], the median is 3.
For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.
Implement the MedianFinder class:
MedianFinder() initializes the MedianFinder object.
void addNum(int num) adds the integer num from the data stream to the data structure.
double findMedian() returns the median of all elements so far. Answers within \(10^{-5}\) of the actual answer will be accepted.
Example 1:
Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]
Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.addNum(2); // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0
Constraints:
-10^5 <= num <= 10^5- There will be at least one element in the data structure before calling findMedian.
- At most
5 * 10^4calls will be made toaddNumandfindMedian.
Follow up:
- If all integer numbers from the stream are in the range
[0, 100], how would you optimize your solution? - If
99%of all integer numbers from the stream are in the range[0, 100], how would you optimize your solution?
Solution
Tags: Heap
Explanation
Construct a self-balancing max-min heap such that all numbers are evenly distributed to left-max-heap and right-min-heap.
[_,_,_,...,max] [min,...,_,_,_]
The left-max-heap contains elements less than or equals to max, the right-min-heap contains elements greater than or equals to min.
If the difference of the size of two heaps is greater than one, then extract the element and put it into another heap.
Thus, the median can be found by extracting element(s) from heap(s).
Answer to follow-up question
First follow-up question:
For the first question, we use an array to count the number of occurrence of each number. Find the median by cumulative frequency of numbers.
For instance, Frequency of [0,1,2,3] is [12,4,5,10]. The total frequency is 31. The cumulative frequency is [12,16,21,31]. Hence, 2 is the median.
Second follow-up question:
For the second question, we use an array to count the number of occurrence of each number in [0, 100] and variables to count the number of occurrence of number that less than 0 and greater than 100. Again, use the cumulative frequency to find the median.
Code (Java)
class MedianFinder {
private final PriorityQueue<Integer> leftMaxHeap;
private final PriorityQueue<Integer> rightMinHeap;
public MedianFinder() {
leftMaxHeap = new PriorityQueue<Integer>(Collections.reverseOrder());
rightMinHeap = new PriorityQueue<Integer>();
}
public void addNum(int num) {
Integer leftMax = leftMaxHeap.peek();
if (leftMax != null && num <= leftMax){
leftMaxHeap.add(num);
if (leftMaxHeap.size() - rightMinHeap.size() > 1 ){
rightMinHeap.add(leftMaxHeap.poll());
}
}else {
rightMinHeap.add(num);
if (rightMinHeap.size() - leftMaxHeap.size() > 1 ){
leftMaxHeap.add(rightMinHeap.poll());
}
}
}
public double findMedian() {
if ( (leftMaxHeap.size() + rightMinHeap.size()) % 2 == 0){
return leftMaxHeap.peek() / 2.0 + rightMinHeap.peek() / 2.0;
}else if (leftMaxHeap.size() > rightMinHeap.size()){
return leftMaxHeap.peek() * 1.0;
}else{
return rightMinHeap.peek() * 1.0;
}
}
}
/**
* Your MedianFinder object will be instantiated and called as such:
* MedianFinder obj = new MedianFinder();
* obj.addNum(num);
* double param_2 = obj.findMedian();
*/
Complexity
nis the total number of elements input into self-balancing heap.
Time Complexity
- \( T(n) = O(n\lg n + 1) \)
- The time complexity of
findMedianis constant. - The time complexity of
addNumis bounded by \( \lg(k) \) where \(k\) is the current size of the heap plus 1.- Insertion costs \( \lg(k) \)
- Self-balancing costs \( \lg(k) \) (Popping element and push the element)
- Total: \( c+c \lg 2 + c \lg 3 + ... + c \lg n \lt O(n \lg n + 1) \)
- The time complexity of
Auxiliary Space
- \( S(n) = O(n) \)
- Store n elements