Re: indexing pdf files

From: Cindy Harper <charper_at_nyob> Date: Wed, 16 Sep 2009 16:01:16 -0400 To: CODE4LIB_at_LISTSERV.ND.EDU

We're just talking about creating an index, not a separate copy of the
works, right?  because I imagine that copyright has a lot to do with why
this type of thing doesn't already exist.

On Wed, Sep 16, 2009 at 3:08 PM, Eric Lease Morgan <emorgan_at_nd.edu> wrote:

> Eric Morgan wrote:
>
>  http://infomotions.com/highlights/
>>
>
>
>
> Rosalyn Metz wrote:
>
>  I have librarians that would kill for this.  In fact I was talking to
>> one about it the other day.  She felt there must be a way to handle
>> active reading and make it portable.  This would be great in
>> conjunction with RefWorks or Zotero or something along those lines.
>>
>
>
> Yep, when I was creating this application for myself I was wondering what
> it would be like if a whole group, say, an academic department, were to
> systematically contribute to such a thing? I thought the output would be
> pretty exciting.
>
>
> Mark A. Matienzo wrote:
>
>  Have you considered using Solr's ExtractingRequestHandler [1] for the
>> PDFs? We're using it at NYPL with pretty great success.
>>
>> [1] http://wiki.apache.org/solr/ExtractingRequestHandler
>>
>
> Nope, never saw that previously. Thanks for the pointer.
>
>
> Peter Kiraly wrote:
>
>  I would like to suggest an API for extracting text (including highlighted
>> or
>> annotated ones) from PDF: iText (http://www.lowagie.com/iText/).
>> This is a Java API (has C# port), and it helped me a lot, when we worked
>> with extraordinary PDF files.
>>
>
> More tools! Thank you.
>
>
> danielle plumer wrote:
>
>  My (much more primitive) version of the same thing involves reading and
>> annotating articles using my Tablet PC. Although I do get a variety of
>> print
>> publications, I find I don't tend to annotate them as much anymore. I used
>> to use EndNote to do the metadata, then I switched to Zotero. I hadn't
>> thought to try to create a full-text search of the articles -- hmm.
>>
>
> Yes, for a growing number of the tools I create I need to be thinking about
> Zotero as way of "remembering" content. Thanks for... reminding me.
>
>
> Erik Hatcher wrote:
>
>  Here's a post on how easy it is to send PDF documents to Solr from Java:
>>
>>  <
>> http://www.lucidimagination.com/blog/2009/09/14/posting-rich-documents-to-apache-solr-using-solrj-and-solr-cell-apache-tika/
>>
>
> I'm looking forward to the arrival of my Solr books any day now. After
> reading it I hope to have a better handle on the guts of Solr as well as
> increase my abilities to do the sorts of things discussed at the URL above.
>
>
> Thank you, one and all for your replies.
>
> --
> Eric Morgan
>

-- 
Cindy Harper, Systems Librarian
Colgate University Libraries
charper_at_colgate.edu
315-228-7363