Re: Automatic Content Classification recommendations?

From: Jason Stirnaman <jstirnaman_at_nyob>
Date: Mon, 28 Nov 2011 11:51:07 -0600
To: CODE4LIB_at_LISTSERV.ND.EDU
ConceptSearch http://www.conceptsearching.com/web/ is a commercial search engine and classification tool. Maybe similar to TemaTres, it doesn't use machine-learning but extracts "concepts" out of your documents that can be mapped to vocabulary terms. The vocabulary is then exposed to the end-user as search results facet. It's all driven by MS SQL Server and exposed as web services. 
We've used it here to map medical school lectures to the licensing exam outlines and have experimented a little with autoclassifying the same lecture content by MeSH. 

Jason


Jason Stirnaman
Biomedical Librarian, Digital Projects
A.R. Dykes Library, University of Kansas Medical Center
jstirnaman_at_kumc.edu
913-588-7319


>>> On 11/28/2011 at 12:00 AM, in message <OF1513EA09.C0A3AA92-ONCA257956.001E9BCD-CA257956.00210A32_at_parliament.vic.gov.au>, Peter Neish <Peter.Neish_at_PARLIAMENT.VIC.GOV.AU> wrote:


Hi there,

Just wondering if anyone has any recommendations for systems that will do
automatic content classification through machine learning? We want to
classify newspaper articles using terms from our existing thesaurus and
have a fairly big set of articles already tagged that could be used as a
training set.. Services like OpenCalais don't really fit our need because
we want to use our own thesaurus. Happy to look at both open source and
commercial software.

Thanks,

Peter

--
Peter Neish
Systems Officer
Victorian Parliamentary Library
Ph: 03 9651 8638
peter.neish_at_parliament.vic.gov.au






///////************************************************************///////////////

Parliament of Victoria                                                                                                                    .
Important Disclaimer Notice:


The information contained in this email  including any attachments, may be
confidential and/or privileged. If you are not the intended recipient, please
notify the sender and delete it from  your system. Any unauthorised
disclosure, copying or dissemination of all or part of this email, including
any attachments, is not permitted. This email, including any attachments, should
be dealt with in accordance with copyright and  privacy legislation.
Except where otherwise stated, views expressed are those of the individual sender.
Received on Mon Nov 28 2011 - 12:53:29 EST