Feature Engineering Utilities

utils.feature_engineering.calculate_distance(coord1: Tuple[float, float], coord2: Tuple[float, float]) float | None[source]

Compute geodesic distance in kilometers between two coordinate pairs.

Parameters:
  • coord1 (tuple of float) – First coordinate as (latitude, longitude).

  • coord2 (tuple of float) – Second coordinate as (latitude, longitude).

Returns:

Distance in kilometers, or None if calculation fails.

Return type:

float or None

utils.feature_engineering.calculate_experience_match_score(df: DataFrame) Series[source]

Calculate the normalized difference between candidate and required experience.

The function compares years of experience and returns a normalized score based on the range of values found in the dataset.

Parameters:

df (pandas.DataFrame) – DataFrame containing ‘Years Experience_int’ and ‘Years Experience.1_int’.

Returns:

A Series containing normalized experience difference scores.

Return type:

pandas.Series

utils.feature_engineering.calculate_professional_similarity_score(df: DataFrame) Series[source]

Calculate semantic similarity between candidate’s background and job description.

Compares sector and last role against job family and job title using sentence embeddings and cosine similarity.

Parameters:

df (pandas.DataFrame) – DataFrame with ‘Sector’, ‘Last Role’, ‘Job Family Hiring’, and ‘Job Title Hiring’.

Returns:

A Series of professional similarity scores.

Return type:

pandas.Series

utils.feature_engineering.calculate_salary_fit_score(df: DataFrame, is_expected: bool = True) Series[source]

Calculate the salary fit score between a candidate’s salary and job’s salary range.

Returns 1.0 if candidate’s salary is within range; otherwise, returns a normalized score based on how far it is from the closest bound.

Parameters:
  • df (pandas.DataFrame) – DataFrame with salary information, including candidate and job salary columns.

  • is_expected (bool, optional) – If True, uses ‘Expected Ral’; if False, uses ‘Current Ral’. Default is True.

Returns:

A Series of salary fit scores.

Return type:

pandas.Series

utils.feature_engineering.calculate_study_area_score(df: DataFrame) Series[source]

Calculate semantic similarity between candidate and required study areas.

Uses sentence embeddings and cosine similarity to quantify alignment between study fields.

Parameters:

df (pandas.DataFrame) – DataFrame with ‘Study area’ and ‘Study Area.1’ columns.

Returns:

A Series of cosine similarity scores.

Return type:

pandas.Series

utils.feature_engineering.calculate_study_title_score(df: DataFrame) Series[source]

Calculate the normalized difference between candidate and required study levels.

This function maps education levels to a numerical ranking and computes the normalized difference between a candidate’s level and the job’s requirement.

Parameters:

df (pandas.DataFrame) – DataFrame containing ‘Study Title’ and ‘Study Level’ columns.

Returns:

A Series of normalized score differences between candidate and job study levels.

Return type:

pandas.Series

utils.feature_engineering.create_candidate_text(row: Series) str[source]

Create a text description summarizing a candidate’s profile.

Combines fields such as education, sector, last role, experience, and skills into a single formatted string.

Parameters:

row (pandas.Series) – A row from the candidate DataFrame.

Returns:

A text summary of the candidate.

Return type:

str

utils.feature_engineering.create_job_text(row: Series) str[source]

Create a text description summarizing a job posting.

Combines job title, department, job description, and requirements into a single formatted string for use in NLP models.

Parameters:

row (pandas.Series) – A row from the job DataFrame.

Returns:

A text summary of the job posting.

Return type:

str

utils.feature_engineering.prepare_nlp_text_columns(df: DataFrame) DataFrame[source]

Create candidate_text and job_text columns for NLP similarity calculations.

This function adds text summaries for both candidate and job profiles to the DataFrame.

Parameters:

df (pandas.DataFrame) – The input DataFrame with candidate and job information.

Returns:

DataFrame with added ‘candidate_text’ and ‘job_text’ columns.

Return type:

pandas.DataFrame