274. H 指数

题目描述

给你一个整数数组 citations ，其中 citations[i] 表示研究者的第 i 篇论文被引用的次数。计算并返回该研究者的 h 指数。

根据维基百科上 h 指数的定义：h 代表“高引用次数” ，一名科研人员的 h 指数是指他（她）至少发表了 h 篇论文，并且至少有 h 篇论文被引用次数大于等于 h 。如果 h 有多种可能的值，h 指数 是其中最大的那个。

示例 1：

输入：citations = [3,0,6,1,5]
输出：3 
解释：给定数组表示研究者总共有 5 篇论文，每篇论文相应的被引用了 3, 0, 6, 1, 5 次。
     由于研究者有 3 篇论文每篇 至少 被引用了 3 次，其余两篇论文每篇被引用 不多于 3 次，所以她的 h 指数是 3。

示例 2：

输入：citations = [1,3,1]
输出：1

提示：

n == citations.length
1 <= n <= 5000
0 <= citations[i] <= 1000

解法

方法一：排序

我们可以先对数组 citations 按照元素值从大到小进行排序。然后我们从大到小枚举 \(h\) 值，如果某个 \(h\) 值满足 \(citations[h-1] \geq h\)，则说明有至少 \(h\) 篇论文分别被引用了至少 \(h\) 次，直接返回 \(h\) 即可。如果没有找到这样的 \(h\) 值，说明所有的论文都没有被引用，返回 \(0\)。

时间复杂度 \(O(n \times \log n)\)，空间复杂度 \(O(\log n)\)。其中 \(n\) 是数组 citations 的长度。

Python3JavaC++GoTypeScriptRust

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        citations.sort(reverse=True)
        for h in range(len(citations), 0, -1):
            if citations[h - 1] >= h:
                return h
        return 0

class Solution {
    public int hIndex(int[] citations) {
        Arrays.sort(citations);
        int n = citations.length;
        for (int h = n; h > 0; --h) {
            if (citations[n - h] >= h) {
                return h;
            }
        }
        return 0;
    }
}

class Solution {
public:
    int hIndex(vector<int>& citations) {
        sort(citations.rbegin(), citations.rend());
        for (int h = citations.size(); h; --h) {
            if (citations[h - 1] >= h) {
                return h;
            }
        }
        return 0;
    }
};

func hIndex(citations []int) int {
    sort.Ints(citations)
    n := len(citations)
    for h := n; h > 0; h-- {
        if citations[n-h] >= h {
            return h
        }
    }
    return 0
}

function hIndex(citations: number[]): number {
    citations.sort((a, b) => b - a);
    for (let h = citations.length; h; --h) {
        if (citations[h - 1] >= h) {
            return h;
        }
    }
    return 0;
}

impl Solution {
    #[allow(dead_code)]
    pub fn h_index(citations: Vec<i32>) -> i32 {
        let mut citations = citations;
        citations.sort_by(|&lhs, &rhs| rhs.cmp(&lhs));

        let n = citations.len();

        for i in (1..=n).rev() {
            if citations[i - 1] >= (i as i32) {
                return i as i32;
            }
        }

        0
    }
}

方法二：计数 + 求和

我们可以使用一个长度为 \(n+1\) 的数组 \(cnt\)，其中 \(cnt[i]\) 表示引用次数为 \(i\) 的论文的篇数。我们遍历数组 citations，将引用次数大于 \(n\) 的论文都当作引用次数为 \(n\) 的论文，然后将每篇论文的引用次数作为下标，将 \(cnt\) 中对应的元素值加 \(1\)。这样我们就统计出了每个引用次数对应的论文篇数。

接下来，我们从大到小枚举 \(h\) 值，将 \(cnt\) 中下标为 \(h\) 的元素值加到变量 \(s\) 中，其中 \(s\) 表示引用次数大于等于 \(h\) 的论文篇数。如果 \(s \geq h\)，说明至少有 \(h\) 篇论文分别被引用了至少 \(h\) 次，直接返回 \(h\) 即可。

时间复杂度 \(O(n)\)，空间复杂度 \(O(n)\)。其中 \(n\) 是数组 citations 的长度。

Python3JavaC++GoTypeScript

class Solution:
    def hIndex(self, citations: List[int]) -> int:
        n = len(citations)
        cnt = [0] * (n + 1)
        for x in citations:
            cnt[min(x, n)] += 1
        s = 0
        for h in range(n, -1, -1):
            s += cnt[h]
            if s >= h:
                return h

class Solution {
    public int hIndex(int[] citations) {
        int n = citations.length;
        int[] cnt = new int[n + 1];
        for (int x : citations) {
            ++cnt[Math.min(x, n)];
        }
        for (int h = n, s = 0;; --h) {
            s += cnt[h];
            if (s >= h) {
                return h;
            }
        }
    }
}

class Solution {
public:
    int hIndex(vector<int>& citations) {
        int n = citations.size();
        int cnt[n + 1];
        memset(cnt, 0, sizeof(cnt));
        for (int x : citations) {
            ++cnt[min(x, n)];
        }
        for (int h = n, s = 0;; --h) {
            s += cnt[h];
            if (s >= h) {
                return h;
            }
        }
    }
};

func hIndex(citations []int) int {
    n := len(citations)
    cnt := make([]int, n+1)
    for _, x := range citations {
        cnt[min(x, n)]++
    }
    for h, s := n, 0; ; h-- {
        s += cnt[h]
        if s >= h {
            return h
        }
    }
}

function hIndex(citations: number[]): number {
    const n: number = citations.length;
    const cnt: number[] = new Array(n + 1).fill(0);
    for (const x of citations) {
        ++cnt[Math.min(x, n)];
    }
    for (let h = n, s = 0; ; --h) {
        s += cnt[h];
        if (s >= h) {
            return h;
        }
    }
}

方法三：二分查找

我们注意到，如果存在一个 \(h\) 值满足至少有 \(h\) 篇论文至少被引用 \(h\) 次，那么对于任意一个 \(h' \lt h\)，都有至少 \(h'\) 篇论文至少被引用 \(h'\) 次。因此我们可以使用二分查找的方法，找到最大的 \(h\) 值，使得至少有 \(h\) 篇论文至少被引用 \(h\) 次。

我们定义二分查找的左边界 \(l=0\)，右边界 \(r=n\)。每次我们取 \(mid = \lfloor \frac{l + r + 1}{2} \rfloor\)，其中 \(\lfloor x \rfloor\) 表示对 \(x\) 向下取整。然后我们统计数组 citations 中大于等于 \(mid\) 的元素的个数，记为 \(s\)。如果 \(s \geq mid\)，说明至少有 \(mid\) 篇论文至少被引用 \(mid\) 次，此时我们将左边界 \(l\) 变为 \(mid\)，否则我们将右边界 \(r\) 变为 \(mid-1\)。当左边界 \(l\) 等于右边界 \(r\) 时，我们找到了最大的 \(h\) 值，即为 \(l\) 或 \(r\)。

时间复杂度 \(O(n \times \log n)\)，其中 \(n\) 是数组 citations 的长度。空间复杂度 \(O(1)\)。