Jonathan,
Amazon.com doesn't seem to allow HEAD requests -- it returns a 405 METHOD
NOT ALLOWED status. What's more, GET responses don't seem to include
Content-Length headers.
One thing I've noticed, though is that the "unavailable" response doesn't
include a <title> element, while the regular reader does. You may be able to
come up with a way to make that quicker and more reliable than grepping the
full text.
Michael
--
Michael B. Klein
Digital Initiatives Technology Librarian
Boston Public Library
(617) 859-2391
mklein_at_bpl.org
> From: Jonathan Rochkind <rochkind_at_JHU.EDU>
> Reply-To: "Code for Libraries <CODE4LIB_at_LISTSERV.ND.EDU>"
> <CODE4LIB_at_LISTSERV.ND.EDU>
> Date: Fri, 27 Jun 2008 12:00:54 -0400
> To: <CODE4LIB_at_LISTSERV.ND.EDU>
> Subject: Re: [CODE4LIB] Amazon Web Services and search-inside-the-book
>
> Excellent, thanks Charles.
>
> I can tell you that my technique seems to be working fine, if you want
> to try it too.
>
> Construct a URL:
>
> http://www.amazon.com/gp/reader/ASIN
>
> Requset the URL. Grep the response for "book is temporarily
> unavailable"--if you get it, there's no search inside the book. If you
> don't get it, there is search inside the book. (Sadly, it's still a 200
> HTTP status in response, either way).
>
> I want to look at if I can just do a HEAD request and tell the
> difference between presence and absence of search inside by the
> advertised length of the response. That's Terry Reese's preferred way of
> doing a check for legitimate content at the end of a URL, trying to
> guess from content length with just a HEAD request. Not sure if that
> will work here or not. Would potentially be somewhat more efficient if
> it would.
>
> Jonathan
> --
> Jonathan Rochkind
> Digital Services Software Engineer
> The Sheridan Libraries
> Johns Hopkins University
> 410.516.8886
> rochkind (at) jhu.edu
Received on Fri Jun 27 2008 - 11:04:53 EDT