3278. Find Candidates for Data Scientist Position II π
Description
Table: Candidates
+--------------+---------+ | Column Name | Type | +--------------+---------+ | candidate_id | int | | skill | varchar | | proficiency | int | +--------------+---------+ (candidate_id, skill) is the unique key for this table. Each row includes candidate_id, skill, and proficiency level (1-5).
Table: Projects
+--------------+---------+ | Column Name | Type | +--------------+---------+ | project_id | int | | skill | varchar | | importance | int | +--------------+---------+ (project_id, skill) is the primary key for this table. Each row includes project_id, required skill, and its importance (1-5) for the project.
Leetcode is staffing for multiple data science projects. Write a solution to find the best candidate for each project based on the following criteria:
- Candidates must have all the skills required for a project.
- Calculate a score for each candidate-project pair as follows:
- Start with
100
points - Add
10
points for each skill where proficiency > importance - Subtract
5
points for each skill where proficiency < importance
- Start with
Include only the top candidate (highest score) for each project. If there’s a tie, choose the candidate with the lower candidate_id
. If there is no suitable candidate for a project, do not return that project.
Return a result table ordered by project_id
in ascending order.
The result format is in the following example.
Example:
Input:
Candidates
table:
+--------------+-----------+-------------+ | candidate_id | skill | proficiency | +--------------+-----------+-------------+ | 101 | Python | 5 | | 101 | Tableau | 3 | | 101 | PostgreSQL| 4 | | 101 | TensorFlow| 2 | | 102 | Python | 4 | | 102 | Tableau | 5 | | 102 | PostgreSQL| 4 | | 102 | R | 4 | | 103 | Python | 3 | | 103 | Tableau | 5 | | 103 | PostgreSQL| 5 | | 103 | Spark | 4 | +--------------+-----------+-------------+
Projects
table:
+-------------+-----------+------------+ | project_id | skill | importance | +-------------+-----------+------------+ | 501 | Python | 4 | | 501 | Tableau | 3 | | 501 | PostgreSQL| 5 | | 502 | Python | 3 | | 502 | Tableau | 4 | | 502 | R | 2 | +-------------+-----------+------------+
Output:
+-------------+--------------+-------+ | project_id | candidate_id | score | +-------------+--------------+-------+ | 501 | 101 | 105 | | 502 | 102 | 130 | +-------------+--------------+-------+
Explanation:
- For Project 501, Candidate 101 has the highest score of 105. All other candidates have the same score but Candidate 101 has the lowest candidate_id among them.
- For Project 502, Candidate 102 has the highest score of 130.
The output table is ordered by project_id in ascending order.
Solutions
Solution 1: Equi-Join + Group Statistics + Window Function
We can perform an equi-join of the Candidates
table and the Projects
table on the skill
column, counting the number of matched skills and calculating the total score for each candidate in each project, which is recorded in table S
.
Next, we count the required number of skills for each project, recording the results in table T
.
Then, we perform an equi-join of tables S
and T
on the project_id
column, filtering out candidates whose number of matched skills equals the required number of skills, and recording them in table P
. We calculate the rank (rk
) for each candidate within each project.
Finally, we filter out the candidates with rank $rk = 1$ for each project, identifying them as the best candidates.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|