Methods for Skill Extraction from Resumes and Job Postings
Methods for Skill Extraction from Resumes and Job Postings
Automatic skill extraction is a key task in recruitment systems, job recommendation, and labor market analysis. The input consists of unstructured text: the "Requirements" section of a job posting or the "Experience/Skills" block of a resume. The output is expected to be a normalized list of competencies, suitable for searching, comparison, and analytics.
This article discusses the pipeline implemented in iskillmatching, which combines three complementary approaches:
- NER based on LLM - neural network named entity recognition.
- Pattern matching via spaCy - searching using a predefined skill dictionary.
- Normalization via vector representations - converting extracted variants to canonical forms using semantic similarity.
1. NER based on LLM (Neural Network Named Entity Recognition)
What is NER
Named Entity Recognition (NER) is a sequence classification task where each token in a text is assigned a label: whether it is part of a named entity (e.g., "technology," "skill," "organization") or not. Traditionally, NER was solved using CRFs and rules, but modern transformer-based LLMs (Large Language Models) achieve significantly higher quality due to their contextual understanding of text.
Model Used
In ner_utils.py, the HuggingFace Transformers pipeline is used:
from transformers import pipeline
def get_ner_extractor(model_name="dondosss/rubert-finetuned-ner"):
return pipeline(
"token-classification",
model=model_name,
aggregation_strategy="simple"
)
---
## π Read also
- [Can you find relevant candidates in 1 minute?](semantic-candidate-search)
- [AI Experience: How to Stop Competing with Thousands of Candidates](ai-experience-job-market)
- [How We Reimagined Developer Assessment: From Resumes to Voice AI Interviews](developer-evaluation-voice-screening)
- [The Death of the Static Resume: Why the Future of Hiring Belongs to a Network of Digital Twins](digital-twins-ai-net)
- [It seems the era of "off-the-shelf" HR solutions is ending](end-of-boxed-hr-solutions)