Skip to content
View nguyennampfiev's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report nguyennampfiev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
nguyennampfiev/README.md

Hi there, I'm Nam πŸ‘‹

I'm a PostDoc with a passion for ML, DL and Computer Vision. I enjoy working on ANR Chamdoc project. Check out some of my work below!

  • πŸ”­ I’m currently working on ANR CHAMdoc project.
  • 🌱 I’m currently learning NLP as well as LLMs, RAG.
  • πŸ‘― I’m looking to collaborate on NLP as well as Computer Vision problem.
  • πŸ’¬ Ask me about Document Processing, OCR, Denoising, Generative Modeling, ASR, Action Recognition.
  • πŸ“« How to reach me: [email protected]

πŸš€ Projects

Overview

The goal of this project is to develop a comprehensive, automated workflow for analyzing Cham documents, an ancient script with cultural and historical significance. The workflow consists of three primary stages: Image Enhancement, Text Line Segmentation, and Text Line Transliteration (OCR). Each stage is designed to address the unique challenges presented by Cham manuscripts, including their age, script complexity, and the varying quality of preserved documents.

Duration: Sep 2023 - Now

1. Image Retrieval

  • Objective: Retrieval all the similar stamps given input stamp.

  • Solution:

    • Grounding DINO for segmentation.
    • Triplet loss with customize miner based on SwinTransformer.
    • I created a simple interface so user can easily to interact with system based on Streamlib and Docker.

Duration: Feb 2020 - Jul 2023

2. Image Enhancement
  • Objective: Improve the quality of scanned Cham document images.
  • Solution:
    • I proposed Pix2Pix model with Multi scale Attention for further improve both quantitative and qualitative of documents.
3. Text Line Segmentation
  • Objective: Accurately segment text lines from enhanced images.
  • Solution:
    • I proposed finetuning docExtractor for base line extraction.
    • I modified the typical Seam Carving algorithm by adding the additional cost fucntion for correcly adjust bounding box.
4. Text Line Transliteration (OCR)
  • Objective: Convert segmented Cham text lines into machine-readable text.
  • Solution: Seq2Seq model with Transformer
    • I proposed two step sequence transformer by first using Seq2Seq model to translate text from Cham alphabet to Latin alphabet then using Transformer to adjust the results
Final Deliverable

This workflow automates the analysis of Cham documents (inscription and manuscript), enhancing images, segmenting text, and converting it into digital text. This process aids in the preservation and further study of Cham cultural heritage.

Duration: 2020 - 2021

Overview

This project focused on developing an automated system for extracting key information from invoices. The project involved creating a robust pre-processing algorithm to handle rotated invoice images and integrating a micro AI service into a complete information extraction pipeline.

Key Activities

1. Pre-Processing Algorithm Development

  • Objective: Correctly rotate invoice images to standardize orientation for accurate data extraction.
  • Tasks:
    • Developed a pre-processing algorithm to detect and correct the orientation of scanned or photographed invoices.
    • Utilized image processing techniques to identify text and layout patterns that indicate the correct orientation.
    • Integrated the algorithm into the pipeline, ensuring that all invoices are properly aligned before further processing.

2. Micro AI Service Development

  • Objective: Create a micro AI service to handle the extraction of key information from invoices.
  • Tasks:
    • Designed and implemented a microservice using AI models capable of identifying and extracting relevant fields (e.g., invoice number, date, total amount) from invoices.
    • Integrated the micro AI service into the broader pipeline, ensuring seamless data flow and processing.

Tools & Technologies

  • Image Processing: Used for developing the rotation correction algorithm, focusing on text detection and orientation analysis.
  • AI/ML Models: Implemented to recognize and extract key invoice fields, adaptable to different invoice templates and layouts.
  • Microservices Architecture: Designed the AI service as a microservice to enable modular and scalable integration with the extraction pipeline.

Automatic Speech Recognition (ASR) Project

Project: Voice Trigger System for Russian, Spanish, and French

Duration: August 2018 - October 2019

Overview

This project focused on developing a Voice Trigger System tailored for Russian, Spanish, and French languages. The system utilizes Automatic Speech Recognition (ASR) technologies to detect specific voice commands or "triggers." The primary tools used were HTK (Hidden Markov Model Toolkit) and Kaldi, both widely recognized in the speech recognition community.

Key Activities

1. Speech Recognition Investigation

  • Objective: Explore and evaluate ASR methodologies based on HTK and Kaldi tools.
  • Tasks:
    • Conducted a the use of mixing tool between HTK and Kaldi for voice trigger systems.
    • Investigated acoustic and language modeling techniques to optimize recognition accuracy for each target language.

2. Voice Trigger Model Fine-Tuning

  • Objective: Adapt and fine-tune the Voice Trigger models for Russian, Spanish, and French.
  • Tasks:
    • Customized and trained models using language-specific datasets to enhance trigger detection accuracy.
    • Addressed language-specific challenges such as phonetic variability and acoustic differences.

Tools & Technologies

  • HTK (Hidden Markov Model Toolkit): Used for initial ASR model training and evaluation.
  • Kaldi: Employed for advanced modeling, including deep learning-based approaches for speech recognition.
  • Datasets: Language-specific datasets for Russian, Spanish, and French to train and validate the models.

Outcomes

  • Successfully developed and fine-tuned Voice Trigger models for Russian, Spanish, and French languages.
  • Achieved high accuracy in detecting voice triggers across different languages by leveraging the strengths of both HTK and Kaldi.
  • The project laid the groundwork for further advancements in multilingual ASR systems, particularly in voice-activated applications.

πŸ› οΈ Technologies & Tools

  • Tools: Git, Flask, Gradio, Streamlit, Docker, Kubernetes, Postman
  • Computer Vision: Document Image Analysis, Image translation, OCR, Object Detection Fine-grained Image Retrieval, Segmentation, Human Action Recognition, GANs, VAEs.
  • GenAI: Hugging Face, Langchain, Unstructured Io, Knowledge Graphs
  • NLP: LLMs, PEFT, LORA, Text classification, RAG
  • Languages: Python, Matlab, Java Script, Bash Script, C++
  • Frameworks/Libraries: Numpy, Scikit-learn, Pytorch, Tensorflow, Transformers, Pandas, OpenCV
  • Cloud/DevOps: Docker, Kubernetes, RESTful APIs, Microservices Architecture, GCP

Pinned Loading

  1. DCLSIR DCLSIR Public

    Jupyter Notebook 1

  2. honghanhh/semeval8 honghanhh/semeval8 Public

    L3i++ at SemEval2024-task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

    Python 2 1

  3. MSAED MSAED Public

    Multi-scale Attention for document image enhancement

    Python 1

  4. StampRetrieval StampRetrieval Public

    Full pipeline for document retrieval based on stamp information

    Python 1

  5. honghanhh/llama-as-a-judge honghanhh/llama-as-a-judge Public

    Python 3

  6. Advance_RAG_document Advance_RAG_document Public

    Python