Person Retrieval using Soft Biometrics
Read the full paper on IEEE Xplore
Research Motivation
Person retrieval systems must interpret human descriptions of appearance—such as clothing, build, or accessories—to identify individuals in surveillance footage. However, real-world datasets differ drastically: some provide natural language descriptions (NLD) while others contain fixed discrete annotations (DA). This inconsistency limits scalability and prevents models from learning jointly from complementary sources.
Our research addresses this gap by designing a dataset-merging framework that transforms NLDs into DA-style, numerically encoded attributes, enabling unified training across datasets with different annotation styles.
This work supports soft-biometric person retrieval where faces are unavailable and visual identity must be inferred from descriptive traits.
Core Contribution
We propose a hybrid dataset integration pipeline that:
- Extracts adjective–noun attribute pairs from natural language queries using rule-based + customized NLP parsing.
- Maps descriptive expressions to target soft-biometric attributes (e.g., "dark hair" → hair_color: dark).
- Encodes attribute values numerically following the AVSS schema used in surveillance datasets.
- Constructs a mapping matrix per description, allowing retrieval models to index which soft-biometric traits appear in each query.
- Generates a unified, tabular representation enabling joint training across datasets with different annotation formats.
This approach enables NLD → DA conversion while preserving descriptive nuance, giving retrieval systems access to richer language-driven variation without sacrificing structured consistency.
Framework Overview
| Component | Role |
|---|---|
| NLD Parsing | Extracts appearance traits from natural language descriptions |
| Synonym Dictionaries (DSC) | Normalizes vocabulary and merges equivalent expressions |
| Value Encoding Dictionaries (DA) | Assigns numeric values to descriptive attribute levels |
| Mapping Matrix (MM) | Tracks presence of each attribute across descriptions |
| Dataframe Generation | Produces structured output for training and evaluation |
Example: “a very slim tall man wearing blue shorts and yellow shoes”
→ height: tall (3), build: very slim (0), skin: unknown (-1), torso_color: unknown (-1), leg_color: blue, footwear_color: yellow
Why This Matters
Merging datasets improves retrieval coverage, reduces bias, and enhances generalization, but prior work lacked a programmatic solution for unifying different annotation styles.
Our framework enables:
- Scalable dataset growth without manual relabeling
- Attribute-aware person retrieval even with vague or noisy language
- Interoperability between NLD- and DA-style datasets
- Training signals that reflect how humans naturally describe people
Practical Challenges & Solutions
| Challenge | Solution |
|---|---|
| Coordinated adjectives (“black and white shirt”) | Preprocessing normalizes phrasing to extract both attributes (black shirt, white shirt) |
| Vocabulary drift & synonyms | Expandable synonym dictionaries allow iterative refinement and controlled normalization |
| Missing attribute values | Attribute presence encoded as -1 to denote unknown while maintaining schema consistency |
| Noisy or ambiguous language | Filtering, part-of-speech checks, and secondary extraction passes improve robustness |
| Scaling to additional attributes | Dictionary-based extensibility supports new features without restructuring the pipeline |
Key Takeaways
- First modular framework to merge datasets with contrasting annotation styles for person retrieval.
- Transforms unstructured descriptions into model-ready structured attributes.
- Reduces reliance on manual dataset construction while expanding applicability of soft-biometrics retrieval systems.
Acknowledgment
Funded by the Gujarat Council of Science and Technology (GUJCOST) under STI Policy research grant GUJCOST/STI/2021-22/3858. Research completed in collaboration with Ahmedabad University, BITS Pilani, RyDOT Infotech, and Pandit Deendayal Energy University.
You have reached the end of the research list. Thanks for reading!
For any queries or issues, feel free to reach out to me via the contact page.