Enhancing Multilingual Access To Medical Terminology Through NLP-Driven Extraction And Translation
Main Article Content
Abstract
This paper outlines a methodology to increase the usability and readability of clinical reports through the automated recognition of entities, term matching, and translation of medical terminology. Three highly customized spaCy models were used for chemical and disease identification: SciSpacy’s Scientific NER for general scientific entities and JNLPBA for biomedical entities. All relevant terms in the PDF report were extracted automatically. Subsequently, these recognized entities were matched against a predefined CSV dictionary of medical terms. Using exact and fuzzy matching techniques, the system can identify a large number of abbreviations and partial matches along with annotation of the matched terms. Output can also be generated in page-mode, where term descriptions are printed, which can then be downloaded and reviewed as a compact report. Additionally, translatable output in Marathi, Hindi, or Gujarati increases usability for healthcare applications that have multiple linguistic settings, especially aiding patient comprehension and enhancing the availability of complex medical information in various linguistic contexts.