Fwd: Looking for lightweight tool to identify PII

From: Kimberly Kennedy <kimberlymkennedy_at_nyob>
Date: Wed, 24 Apr 2019 12:34:22 -0400
To: CODE4LIB_at_LISTS.CLIR.ORG
Hi everyone,

Dr. Kayaalp asked me to forward his email about the NLM-Scrubber, a tool
for removing PII and health information, to the list.

Thanks!

Kim



Kimberly Kennedy
kimberlymkennedy_at_gmail.com



---------- Forwarded message ---------
From: Kayaalp, Mehmet (NIH/NLM/LHC) [E] <mkayaalp_at_mail.nih.gov>
Date: Mon, Apr 22, 2019 at 10:37 AM
Subject: RE: Looking for lightweight tool to identify PII
To: kimberlymkennedy_at_gmail.com <kimberlymkennedy_at_gmail.com>


Hi Kimberly,

A colleague of mine forwarded your email to me.

You may find NLM-Scrubber, https://scrubber.nlm.nih.gov/, helpful to you,
but it is not a turnkey approach for your problem.

Although NLM-Scrubber does not deal anything but ASCII format, you may not
be able to find a better freeware anywhere to de-identify your documents.
There should be a number of tools to convert PDF to ASCII format. If you
are willing to work on that prerequisite on your own, I would be happy to
help you solve your de-identification problem.

Best,

--mehmet


* Mehmet Kayaalp, M.D., Ph.D. *Lister Hill National Center for Biomedical
Communications
Building 38A
National Institutes of Health
8600 Rockville Pike
Bethesda, MD 20894-3828

*Mehmet.Kayaalp_at_nih.gov
<https://mail.nih.gov/owa/redir.aspx?C=537a86ef78834449801de20bf1550246&URL=mailto%3aMehmet.Kayaalp%40nih.gov>*







Date:    Fri, 19 Apr 2019 13:26:22 -0400

From:    Kimberly Kennedy <kimberlymkennedy_at_GMAIL.COM>

Subject: Looking for lightweight tool to identify PII



Hello!



We are beginning a digitization project at my institution that involves
scanning archival documents that may contain personal identifying
information, such as social security numbers or credit card numbers.  I'm
looking for a tool that will examine the PDFs and identify the ones that
may contain PII, so we can then redact them.



I've experimented a bit with Bulk Extractor Viewer but haven't been able to
get it to work on the scanned PDFs I've created.  I talked to a sales rep
at Spirion and that program seems like overkill for our purposes.  Any
suggestions for other things to try would be appreciated!



Thanks,



Kim





Kimberly Kennedy

Digital Production Coordinator

Northeastern University Library

kimberlymkennedy_at_gmail.com
Received on Wed Apr 24 2019 - 12:38:44 EDT