You are here

Detection and elimination of personal data contained in medical images

Student: 
Yassar Almutairi

Principal goal: evaluating and implementing different techniques for detecting, recognising and eliminating text containing personal data in medical images.

Medical images can contain personal information “burned in” the pixel data. This is a problem sometimes overlooked by anonymisation tools, and when it is addressed it usually requires users’ input (like identifying the zones containing the personal information).

The objective of the project is to evaluate different possible solutions for the automatic handling of textual data in medical images. This process can be divided in several steps:
- Detecting the presence of text in the image.
- Localizing the text.
- Recognising the text.
- Evaluating the risk (in terms of confidentiality) and deciding whether to remove it or not.
- Remove risky text.

Various possible approaches can be found for realising each of these steps, and it would be necessary to evaluate which are the most adequate for medical images.
Approaches include the usage of Artificial Neural Networks (and other machine learning methods) and wavelets.

A great challenge is to decide whether a certain text is personal data or not, the difficulty of this task depends on the additional information (not in the pixels) available.

In the case of images in DICOM (Digital Imaging and Communications) format the information in the header can be used to decide if a given text should be removed or not. While for images generated in the same device the personal information would usually appear in the same zone of the images, and if this information is available it would simplify the problem.

But in the general case of medical images coming from different sources and with different formats there is no easy way to establish a criterion for deciding which text should be removed. There has been work for anonymising free text medical data, but these methods normally make use of syntactic and lexical information not available in images. The objective in this case is to research if it is possible to establish some rules for deciding when a text should be removed in the general case.

Project status: 
Finished
Degree level: 
MSc
Background: 
Good programming skills; experience with image processing desirable
Supervisors @ NeSC: 
Student project type: 
References: 
- James Z. Wang, Michel Bilello and Gio Wiederhold, A Textual Information Detection and Elimination System for Secure Medical Image Distribution Journal of the American Medical Informatics Association, Proceedings of the AMIA Annual Symposium, vol. 1997 symposium suppl., pp. 896, Nashville, TN, October 1997. - Datong Chen, Jean-Marc Odobez, Hervé Bourlard, Text detection, recognition in images and video frames. Pattern Recognition 37(3): 595-608 (2004) - I. Neamatullah et al. “Automated de-identification of free-text medical records” BMC Medical Informatics and Decision Making 2008, 8:32