Re: Book-scanning projects - a question

From: Eric Lease Morgan <emorgan_at_nyob> Date: Thu, 1 Jul 2010 14:38:36 -0400 To: NGC4LIB_at_LISTSERV.ND.EDU

Someone asked me:

>> 3. Search - Use the result of Step #2 to create
>>    REST-like searches of the Internet Archive
>>    making sure results are returned as XML (or
>>    some other machine-readable format).
> 
> My library has been trying to do something similar with both Internet Archive... Unfortunately, I am not familiar with REST searches.  Could you give more details on how exactly you performed Step 3?

REST-like searches are queries sent as URLs and whose responses are in XML (or some other machine-readable format such as JSON). The following URL queries the Internet Archive with the words "plato" and "republic" and limits to text media types. Carriage returns have been added for readability:

 http://www.archive.org/advancedsearch.php?
 q=plato+republic+AND+mediatype%3Atexts&
 fl%5B%5D=identifier&
 sort%5B%5D=&
 sort%5B%5D=&
 sort%5B%5D=&
 rows=50&
 page=1&
 callback=callback&
 output=xml

Here's the same URL but in "tiny" format:

 http://tinyurl.com/22sop8j

When you submit the second URL you should get back a stream of XML which can be easily parsed for validation purposes.

You can reverse-engineer these REST-like queries by going to the advanced search page of the Internet Archive and scrolling to the second half of the screen:

 http://www.archive.org/advancedsearch.php

'HTH.

-- 
Eric Lease Morgan