Given an array of integers citations where citations[i] is the number of citations a researcher received for their ith paper, return the researcher's h-index.
According to the definition of h-index on Wikipedia: The h-index is defined as the maximum value of h such that the given researcher has published at least h papers that have each been cited at least h times.
Example 1:
Input: citations = [3,0,6,1,5]
Output: 3
Explanation: [3,0,6,1,5] means the researcher has 5 papers in total and each of them had received 3, 0, 6, 1, 5 citations respectively.
Since the researcher has 3 papers with at least 3 citations each and the remaining two with no more than 3 citations each, their h-index is 3.
Example 2:
Input: citations = [1,3,1]
Output: 1
Constraints:
n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000
Solutions
Solution 1: Sorting
We can sort the array citations in descending order. Then we enumerate the value \(h\) from large to small, if there is an \(h\) value satisfying \(citations[h-1] \geq h\), it means that there are at least \(h\) papers that have been cited at least \(h\) times, just return \(h\) directly. If we cannot find such an \(h\) value, it means that all the papers have not been cited, return \(0\).
Time complexity \(O(n \times \log n)\), space complexity \(O(\log n)\). Here \(n\) is the length of the array citations.
We can use an array \(cnt\) of length \(n+1\), where \(cnt[i]\) represents the number of papers with the reference count of \(i\). We traverse the array citations and treat the papers with the reference count greater than \(n\) as papers with a reference count of \(n\). Then we use the reference count as the index and add \(1\) to the corresponding element of \(cnt\) for each paper. In this way, we have counted the number of papers for each reference count.
Then we enumerate the value \(h\) from large to small, and add the element value of \(cnt\) with the index of \(h\) to the variable \(s\), where \(s\) represents the number of papers with a reference count greater than or equal to \(h\). If \(s \geq h\), it means that at least \(h\) papers have been cited at least \(h\) times, just return \(h\) directly.
Time complexity \(O(n)\), space complexity \(O(n)\). Here \(n\) is the length of the array citations.
We notice that if there is a \(h\) value that satisfies at least \(h\) papers are cited at least \(h\) times, then for any \(h'<h\), at least \(h'\) papers are cited at least \(h'\) times. Therefore, we can use the binary search method to find the largest \(h\) such that at least \(h\) papers are cited at least \(h\) times.
We define the left boundary of binary search \(l=0\) and the right boundary \(r=n\). Each time we take \(mid = \lfloor \frac{l + r + 1}{2} \rfloor\), where \(\lfloor x \rfloor\) represents floor \(x\). Then we count the number of elements in array citations that are greater than or equal to \(mid\), and denote it as \(s\). If \(s \geq mid\), it means that at least \(mid\) papers are cited at least \(mid\) times. In this case, we change the left boundary \(l\) to \(mid\). Otherwise, we change the right boundary \(r\) to \(mid-1\). When the left boundary \(l\) is equal to the right boundary \(r\), we find the largest \(h\) value, which is \(l\) or \(r\).
Time complexity \(O(n \times \log n)\), where \(n\) is the length of array citations. Space complexity \(O(1)\).