Feature Engineering Utilities

utils.feature_engineering.calculate_distance(coord1: Tuple[float, float], coord2: Tuple[float, float]) → float | None[source]

Compute geodesic distance in kilometers between two coordinate pairs.

Parameters:

coord1 (tuple of float) – First coordinate as (latitude, longitude).
coord2 (tuple of float) – Second coordinate as (latitude, longitude).

Returns:

Distance in kilometers, or None if calculation fails.

Return type:

float or None

utils.feature_engineering.calculate_experience_match_score(df: DataFrame) → Series[source]

Calculate the normalized difference between candidate and required experience.

The function compares years of experience and returns a normalized score based on the range of values found in the dataset.

Parameters:: df (pandas.DataFrame) – DataFrame containing ‘Years Experience_int’ and ‘Years Experience.1_int’.
Returns:: A Series containing normalized experience difference scores.
Return type:: pandas.Series

utils.feature_engineering.calculate_professional_similarity_score(df: DataFrame) → Series[source]

Calculate semantic similarity between candidate’s background and job description.

Compares sector and last role against job family and job title using sentence embeddings and cosine similarity.

Parameters:: df (pandas.DataFrame) – DataFrame with ‘Sector’, ‘Last Role’, ‘Job Family Hiring’, and ‘Job Title Hiring’.
Returns:: A Series of professional similarity scores.
Return type:: pandas.Series

utils.feature_engineering.calculate_salary_fit_score(df: DataFrame, is_expected: bool = True) → Series[source]

Calculate the salary fit score between a candidate’s salary and job’s salary range.

Returns 1.0 if candidate’s salary is within range; otherwise, returns a normalized score based on how far it is from the closest bound.

Parameters:

df (pandas.DataFrame) – DataFrame with salary information, including candidate and job salary columns.
is_expected (bool, optional) – If True, uses ‘Expected Ral’; if False, uses ‘Current Ral’. Default is True.

Returns:

A Series of salary fit scores.

Return type:

pandas.Series

utils.feature_engineering.calculate_study_area_score(df: DataFrame) → Series[source]

Calculate semantic similarity between candidate and required study areas.

Uses sentence embeddings and cosine similarity to quantify alignment between study fields.

Parameters:: df (pandas.DataFrame) – DataFrame with ‘Study area’ and ‘Study Area.1’ columns.
Returns:: A Series of cosine similarity scores.
Return type:: pandas.Series

utils.feature_engineering.calculate_study_title_score(df: DataFrame) → Series[source]

Calculate the normalized difference between candidate and required study levels.

This function maps education levels to a numerical ranking and computes the normalized difference between a candidate’s level and the job’s requirement.

Parameters:: df (pandas.DataFrame) – DataFrame containing ‘Study Title’ and ‘Study Level’ columns.
Returns:: A Series of normalized score differences between candidate and job study levels.
Return type:: pandas.Series

utils.feature_engineering.create_candidate_text(row: Series) → str[source]

Create a text description summarizing a candidate’s profile.

Combines fields such as education, sector, last role, experience, and skills into a single formatted string.

Parameters:: row (pandas.Series) – A row from the candidate DataFrame.
Returns:: A text summary of the candidate.
Return type:: str

utils.feature_engineering.create_job_text(row: Series) → str[source]

Create a text description summarizing a job posting.

Combines job title, department, job description, and requirements into a single formatted string for use in NLP models.

Parameters:: row (pandas.Series) – A row from the job DataFrame.
Returns:: A text summary of the job posting.
Return type:: str

utils.feature_engineering.prepare_nlp_text_columns(df: DataFrame) → DataFrame[source]

Create candidate_text and job_text columns for NLP similarity calculations.

This function adds text summaries for both candidate and job profiles to the DataFrame.

Parameters:: df (pandas.DataFrame) – The input DataFrame with candidate and job information.
Returns:: DataFrame with added ‘candidate_text’ and ‘job_text’ columns.
Return type:: pandas.DataFrame