Feature Engineering Utilities
- utils.feature_engineering.calculate_distance(coord1: Tuple[float, float], coord2: Tuple[float, float]) float | None[source]
Compute geodesic distance in kilometers between two coordinate pairs.
- Parameters:
coord1 (tuple of float) – First coordinate as (latitude, longitude).
coord2 (tuple of float) – Second coordinate as (latitude, longitude).
- Returns:
Distance in kilometers, or None if calculation fails.
- Return type:
float or None
- utils.feature_engineering.calculate_experience_match_score(df: DataFrame) Series[source]
Calculate the normalized difference between candidate and required experience.
The function compares years of experience and returns a normalized score based on the range of values found in the dataset.
- Parameters:
df (pandas.DataFrame) – DataFrame containing ‘Years Experience_int’ and ‘Years Experience.1_int’.
- Returns:
A Series containing normalized experience difference scores.
- Return type:
pandas.Series
- utils.feature_engineering.calculate_professional_similarity_score(df: DataFrame) Series[source]
Calculate semantic similarity between candidate’s background and job description.
Compares sector and last role against job family and job title using sentence embeddings and cosine similarity.
- Parameters:
df (pandas.DataFrame) – DataFrame with ‘Sector’, ‘Last Role’, ‘Job Family Hiring’, and ‘Job Title Hiring’.
- Returns:
A Series of professional similarity scores.
- Return type:
pandas.Series
- utils.feature_engineering.calculate_salary_fit_score(df: DataFrame, is_expected: bool = True) Series[source]
Calculate the salary fit score between a candidate’s salary and job’s salary range.
Returns 1.0 if candidate’s salary is within range; otherwise, returns a normalized score based on how far it is from the closest bound.
- Parameters:
df (pandas.DataFrame) – DataFrame with salary information, including candidate and job salary columns.
is_expected (bool, optional) – If True, uses ‘Expected Ral’; if False, uses ‘Current Ral’. Default is True.
- Returns:
A Series of salary fit scores.
- Return type:
pandas.Series
- utils.feature_engineering.calculate_study_area_score(df: DataFrame) Series[source]
Calculate semantic similarity between candidate and required study areas.
Uses sentence embeddings and cosine similarity to quantify alignment between study fields.
- Parameters:
df (pandas.DataFrame) – DataFrame with ‘Study area’ and ‘Study Area.1’ columns.
- Returns:
A Series of cosine similarity scores.
- Return type:
pandas.Series
- utils.feature_engineering.calculate_study_title_score(df: DataFrame) Series[source]
Calculate the normalized difference between candidate and required study levels.
This function maps education levels to a numerical ranking and computes the normalized difference between a candidate’s level and the job’s requirement.
- Parameters:
df (pandas.DataFrame) – DataFrame containing ‘Study Title’ and ‘Study Level’ columns.
- Returns:
A Series of normalized score differences between candidate and job study levels.
- Return type:
pandas.Series
- utils.feature_engineering.create_candidate_text(row: Series) str[source]
Create a text description summarizing a candidate’s profile.
Combines fields such as education, sector, last role, experience, and skills into a single formatted string.
- Parameters:
row (pandas.Series) – A row from the candidate DataFrame.
- Returns:
A text summary of the candidate.
- Return type:
str
- utils.feature_engineering.create_job_text(row: Series) str[source]
Create a text description summarizing a job posting.
Combines job title, department, job description, and requirements into a single formatted string for use in NLP models.
- Parameters:
row (pandas.Series) – A row from the job DataFrame.
- Returns:
A text summary of the job posting.
- Return type:
str
- utils.feature_engineering.prepare_nlp_text_columns(df: DataFrame) DataFrame[source]
Create candidate_text and job_text columns for NLP similarity calculations.
This function adds text summaries for both candidate and job profiles to the DataFrame.
- Parameters:
df (pandas.DataFrame) – The input DataFrame with candidate and job information.
- Returns:
DataFrame with added ‘candidate_text’ and ‘job_text’ columns.
- Return type:
pandas.DataFrame