Re: The "A" in RDA

From: Alexander Johannesen <alexander.johannesen_at_nyob> Date: Tue, 30 Jul 2013 11:02:35 +1000 To: NGC4LIB_at_LISTSERV.ND.EDU

Hi Karen,

I'm not saying they do subject indexing as such (even though I *suspect*
they do some of it), just attaching properties to image contexts (however,
we can also debate what the difference between a subject index and a
property in an ontology truly is), nor can I plonk down references to this
right now; how Google discover context might be clever enough to render any
example I give moot (a picture of a sunset often has the word 'sunset' in
the file name/URI).

As to your reference [2], that's 6 years (!!) old; I uploaded pictures
yesterday and the faces were recognized and tagged, automatically. I can't
point to a paper, but I can do it on my computer right now.

Anyway, I doubt you'll disagree that Google is continually doing stuff to
improve search? A lot of the stuff we complained Google didn't do in the
past have changed as they now do it, and now we're saying some other
particular aspect of search are "what they're not doing." Until, of course,
they do it.

The point here isn't what Google right now have in their various info
portals. It is where they are going vs. where libraries are going.  Google
is spending tons of funds on AI and search technology and infra-structure.
How much are libraries spending on these things? Miniscule infra-structure,
and nothing at all on search technology and AI. We're still tinkering with
FRBR, for goodness sake, some 20 years after someone had a good idea, and
it still isn't mainstream library.

I no longer understand why libraries think they are relevant to the future
of information management, I seriously don't. Librarians are generally good
at staying together and retaining a lot of smarts within their domain, but
coming together to pull off large-scale improvements to the library sector
just doesn't happen anymore. This is why MARC still is the de facto lingua
franca; it's the last time you all got together and agreed on a way to do
it (even though the history of MARC is fraught with variations and
quibbles, but at least you got *somewhere*)

Am I being unfair? Probably. Am I making assumptions left, right and
center? Yeah, I'll admit to projection. But it is not completely unfounded.
I've pushed for putting library IP into popular open-source technology
projects for so many years, but it usually amounts to nothing because
libraries "don't do that", where "that" is investing in technology or time
for improving open software. Sure, as individual units the library world is
not full of funding, but you have the smart people. With a bit of
international maneuvering you could pull of something amazing, however the
fragmented nature is making this difficult, not to mention the political
issues with the diversity within the library world. You know better than
most how well-suited to semantic technologies the library sector is, and
yet their introduction to it is still baby-steps. You haven't even united
on a conform (or merged) upper ontology that is open and shareable, which I
consider to most telling tale of all. FRBR *is* an ontology that should be
used by you all, and yet ... it's expression is mostly in the model of
whatever software (often experimentally) used, instead of doing what you
should have done all along; use a sem web stack with an FRBR ontology on
top. (You even have the massive advantage of being able to deal with
persistent identification, to boot; the library sector could be the de
facto place for identification management on a global scale, forever
cementing your existence in the future of technology, but noooooo .....
libraries don't do that).

I don't get it.

I'm not slamming the work that has gone into these fields (and I know
you've done good work there), but they are small and will probably amount
to very little. Not because the work isn't good, but because the library
world doesn't know what to do with it, at least not on a scale that matters
to most people.

The main issues the library world has traditionally dealt with (access and
availability) are both gone. What's left is subject-matter expertise, and
the collection itself. Subject-matter expertise is in and of itself a
controversial subject (har, har), as in "provide information, not opinion",
but you know what? If librarians could harvest those opinions, you might
actually have something really, really valuable to counter with. The
buildings and the collection is, well, warehousing that politicians will
find cheaper ways of doing.

Oh, and library vendors. They lag. Terribly.

Pessimistically yours,

Alex

On Tue, Jul 30, 2013 at 9:53 AM, Karen Coyle <lists_at_kcoyle.net> wrote:

> Alexander, I'd like some references on that, as well as some examples.
> E.g. a photo with no related text that can be found in a Google search by
> subject matter, or a photo that can be found by subject even though that
> subject is not in the pages related to the photo or image.[1] I realize
> that once images are retrieved, they can be analyzed for patterns that
> imply nudity - this is fairly standard stuff. You can also do a "more like
> this" once you have retrieved images, and you get ones with the same
> subject terms (applied by humans) and similar image patterns (color, lines,
> major areas). Their facial recognition algorithm detects that there is a
> face, not whose face it is. [2] That's why when you do a search for a
> person you get lots of images that aren't of that person, but that were on
> the same page as that person's name.
>
> What I see is that images go through an interpreter for certain
> characteristics (lots of flesh color, looks like a nipple, there's a face
> in here), but that isn't subject indexing.
>
> kc
>
> [1] https://en.wikipedia.org/wiki/**Google_Images<https://en.wikipedia.org/wiki/Google_Images>
> [2] http://arstechnica.com/**uncategorized/2007/05/facial-**
> recognition-slipped-into-**google-image-search/<http://arstechnica.com/uncategorized/2007/05/facial-recognition-slipped-into-google-image-search/>
>
>
> On 7/29/13 3:54 PM, Alexander Johannesen wrote:
>
>> Hiya,
>>
>> So, um, to play the part of the sour-puss;
>>
>> Karen says;
>>
>>> Google uses the text provided by web page creators to interpret
>>> the meaning of the images; it doesn't interpret the images themselves.
>>>
>> Just a quick correction; this was probably true a couple of years ago, but
>> nowadays that's simply not true. All (well, most; there's filters that
>> apply) images you find through Google are indexed after an image
>> interpreter have gone through. This won't tell you what's going on in
>> super
>> details, of course, like interpret the meaning of some scene, but it can
>> detect people, recognize them, tag them, detect female nipples (important
>> to the US for some reason :) ), some attributes about the weather (sunny,
>> raining, etc.), similarities to other pictures (yesterday I uploaded a
>> bunch of pictures of our local classical musical ensemble, and similar
>> pictures were automatically merged to form animated samples, for example,
>> in addition to automatically tag faces it recognized), dominant colors,
>> some shape recognition and a few other bits. And note; this is only the
>> beginning.
>>
>> I find it odd, though, to use pictures as an example of how Google isn't
>> as
>> good. Does the library truly go better? Last I remember, the library, too,
>> didn't interpret pictures.
>>
>> Now, if I walked into the library and asked for pictures of happy people
>> playing in the sun, could you give that? No, not a chance. Or a picture of
>> a cat that looks like Hitler? (in your face, Library!) Could you show me
>> all existing pictures of Wittgenstein? Pictures of Einstein hanging
>> washing? Or the engine block of a Volvo V70 2001 model? There are far more
>> instances of Google giving me the right answer with pictures than not, and
>> I can't for the life of me understand what service you actually think
>> you're bringing to the table anymore.
>>
>> And of course, if Google is doing this to images now (and voice; have you
>> tried the latest voice search? Pretty cool), what else are they also doing
>> in the written text area? We all know about citation in Google Scholar.
>>
>> And you guys are still talking about the "A" in RDA? Good grief.
>>
>
> --
> Karen Coyle
> kcoyle@kcoyle.net http://kcoyle.net
> ph: 1-510-540-7596
> m: 1-510-435-8234
> skype: kcoylenet
>

-- 
 Project Wrangler, SOA, Information Alchemist, UX, RESTafarian, Topic Maps
--- http://shelter.nu/blog/ ----------------------------------------------
------------------ http://www.google.com/profiles/alexander.johannesen ---