Re: tool for finding close matches in vocabular list

From: Francis Kayiwa <fkayiwa_at_nyob>
Date: Fri, 21 Mar 2014 14:46:21 -0400
To: CODE4LIB_at_LISTSERV.ND.EDU
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/21/2014 2:34 PM, Andrew Gordon wrote:
> Ken,
> 
> A group in Chicago has been working for a few years now on a
> deduplication toolkit that might do what you are looking for, they
> also have a couple versions that works with an excel file or .csv
> file.
> 
> https://github.com/datamade/dedupe 
> https://github.com/datamade/dedupe-web 
> https://github.com/datamade/csvdedupe
> 
> I have not worked with them extensively, but I have heard others
> find these very useful for entity recognition and resolution.



+1

Attended this very interesting talk on just that

http://pyvideo.org/video/973/big-data-de-duping

./fxk

- -- 
QOTD:
	"A child of 5 could understand this!  Fetch me a child of 5."
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTLIj9AAoJEOptrq/fXk6Mjl4H/jMa3b+ekRYNnnvLBdMXUr/C
p+0tAu3SI5GkfbWe1JGLU6cPcM0Ret22RxKg+QslADZ00aGj2RM8sh+4fV0neFXB
/sA7wHh/8thtFW1njKpaLQZg5f+px6zB8ch9wdp4yf7L0pPb1612fxGRHMjH5u51
vFUAF3r6wM3JIYjAEPKhzq5511soASisV0IWMEyAoRYNyjKbOyan/gN97G/oYxXp
MvwxFAwiOPgwL83Set0kMqztCA2aW76uFwwgvWkhGIcywBR7w7Adl1/MTM9oLBtd
lyeimBXWKvqvArai9txMcC4mOLkZq03FAWypVhe+VOBm4xmmDhowr3YeaaJWl3k=
=Kv3q
-----END PGP SIGNATURE-----
Received on Fri Mar 21 2014 - 14:47:04 EDT