DataLab Workshop – Optical Character Recognition (OCR) and Working with Messy Text Data Thursday, May 19, 2022, 2 – 4pm |
Location:Online Event Type:Workshops and Training Audience Type:Students: Graduate and Professional Optical Character Recognition (OCR) involves computational techniques for converting scanned images of printed or handwritten text into computer-readable formats. OCR helps make documents more searchable and can allow for analyses including text mining and natural language processing. This workshop will provide an overview of existing and emerging tools for unlocking the text in printed images and will demonstrate practical techniques for OCR with Python using the Tesseract OCR engine. Additionally, this workshop will include a discussion and practical examples of evaluating OCR viability, as well as tips for using OCR extracted data in NLP pipelines. This workshop qualifies as an elective for the Text Mining and NLP micro-credential through UC Davis GradPathways. Software: Python Cost: Free of charge. |
![]() | ![]() ![]() Get Event Link × ![]() |