[SPAM 1.5] Re: video to text

From: Martin, Will <william.d.martin_at_nyob>
Date: Fri, 21 Oct 2022 19:16:37 +0000
To: CODE4LIB_at_LISTS.CLIR.ORG
Spam detection software, running on the system "avery.infomotions.com",
has identified this incoming email as possible spam.  The original
message has been attached to this so you can view it or label
similar future email.  If you have any questions, see
eric_morgan_at_infomotions.com for details.

Content preview:  I haven't tried it yet, but the Open AI foundation recently
   released an open source neural net called Whisper for transcribing/translating
   English audio. Here's the page: https://openai.com/blog/whisper/ [...] 

Content analysis details:   (1.5 points, 0.5 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
-0.0 RCVD_IN_MSPIKE_H2      RBL: Average reputation (+2)
                            [66.175.211.245 listed in wl.mailspike.net]
-1.9 BAYES_00               BODY: Bayes spam probability is 0 to 1%
                            [score: 0.0000]
 3.0 SINGLE_HEADER_3K       A single header contains 3K-4K characters
 0.2 HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
                            domains are different
 1.7 URIBL_BLACK            Contains an URL listed in the URIBL blacklist
                            [URIs: dltj.org]
-0.6 RP_MATCHES_RCVD        Envelope sender domain matches handover relay domain
 0.0 T_HEADER_FROM_DIFFERENT_DOMAINS From and EnvelopeFrom 2nd level mail
                            domains are different
-0.0 SPF_HELO_PASS          SPF: HELO matches SPF record
-0.0 SPF_PASS               SPF: sender matches SPF record
-1.0 MAILING_LIST_MULTI     Multiple indicators imply a widely-seen list
                            manager



attached mail follows:


I haven't tried it yet, but the Open AI foundation recently released an open source neural net called Whisper for transcribing/translating English audio.  Here's the page:

https://openai.com/blog/whisper/


And here's an article about it:

https://arstechnica.com/information-technology/2022/09/new-ai-model-from-openai-automatically-recognizes-speech-and-translates-to-english/


I haven't experimented with it yet, but I'm kind of interested to try it.

Will

-----Original Message-----
From: Code for Libraries <CODE4LIB_at_LISTS.CLIR.ORG> On Behalf Of Dan Johnson
Sent: Friday, October 21, 2022 1:21 PM
To: CODE4LIB_at_LISTS.CLIR.ORG
Subject: Re: [CODE4LIB] video to text

Thank you, Peter Murray, for the fascinating AWS Transcribe writeup. In case someone is interested in going down that route, I did have success, some years ago, taking a JSON file someone else had generated from AWS Transcribe, and converting it into a very readable .docx. It requires only a simple two-line Python script with the tscribe library. Information here:
https://github.com/kibaffo33/aws_transcribe_to_docx


On Fri, Oct 21, 2022 at 2:10 PM Peter Murray < 000000ab738da05e-dmarc-request_at_lists.clir.org> wrote:

> I did something like this last month for creating transcripts from 
> podcasts using Amazon Transcribe.  Details and links to the code here:
> https://dltj.org/article/generating-podcast-transcripts/

>
>
> Peter
>
> From: Dan Johnson <djohns27_at_nd.edu> <djohns27_at_nd.edu>
> Reply: Code for Libraries <code4lib_at_lists.clir.org> 
> <code4lib_at_lists.clir.org>
> Date: October 21, 2022 at 2:01:30 PM
> To: code4lib_at_lists.clir.org <code4lib_at_lists.clir.org> 
> <code4lib_at_lists.clir.org>
> Subject:  Re: [CODE4LIB] video to text
>
> If your university gives you an Office 365 account (and Notre Dame 
> does), Word 365 will transcribe up to 300 minutes of audio per month 
> from a sound file in the .wav, .mp4, .mpa, or .mp3 formats. In my own 
> (admittedly minor) tinkering, I've been surprised at how good the 
> transcript is. Microsoft has a 90 second tutorial here: <
>
> https://support.microsoft.com/en-us/office/transcribe-your-recordings-

> 7fc2efec-245e-45f0-b053-2a97531ecf57
> >.
>
> If you're handy with AWS, you can also use Amazon Transcribe ( 
> https://aws.amazon.com/transcribe/), but that is much more involved. I 
> have no experience myself, though some colleagues have had success 
> with larger projects there.
>
> Best,
> Dan
>
>
> On Fri, Oct 21, 2022 at 1:58 PM Lolis, John <jlolis_at_whiteplainsny.gov>
> wrote:
>
> > I don't have technology to offer for that purpose, but if you decide 
> > to
> go
> > with a service, I can tell you that I've found Amara to be very
> affordable,
> > of excellent quality and fantastic customer service. I used them to 
> > not only caption videos but to also translate them from several 
> > languages. I couldn't have asked for a better experience with them, 
> > and that was after some back and forth working things out over the extra languages.
> >
> > https://amara.org/

> >
> > John Lolis
> > Coordinator of Computer Systems
> >
> > 100 Martine Avenue
> > White Plains, NY 10601
> >
> > tel: 1.914.422.1497
> > fax: 1.914.422.1452
> >
> > https://whiteplainslibrary.org/

> >
> > *“I would rather have questions that can’t be answered than answers 
> > that can’t be questioned.”* — Richard Feynman <
> >
>
> https://click.fourhourmail.com/5qure95xkf7hvvo93wh2/7qh7h8h05vr4zrtz/a

> HR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvUmljaGFyZF9GZXlubWFu
> > >,
> > theoretical physicist and recipient of the Nobel Prize in Physics in 
> > 1965
> >
> >
> > On Fri, 21 Oct 2022 at 13:20, Eric Lease Morgan <emorgan_at_nd.edu> wrote:
> >
> > > Do you know of a video to text applications? I colleague asked me:
> > >
> > > I have four video recordings of conference sessions and wonder if 
> > > there is a tool or technology that will help me transcribe these 
> > > into the written word?
> > >
> > > Do y'all have any suggestions or experience in this regard?
> > >
> > > --
> > > Eric Morgan
> > > University of Notre Dame
> > >
> >
>
>
> --
> *Daniel Johnson, Ph.D.*
> *English; Digital Humanities**; and Film, Television, and Theatre *
> *Librarian*
> *Navari Family Center for Digital Scholarship, **Hesburgh Libraries*
>
> *University of Notre Dame*
> 250C Hesburgh Library
> Notre Dame, IN 46556
> o: 574-631-3457
> e: djohns27_at_nd.edu
>


--
*Daniel Johnson, Ph.D.*
*English; Digital Humanities**; and Film, Television, and Theatre *
*Librarian*
*Navari Family Center for Digital Scholarship, **Hesburgh Libraries*

*University of Notre Dame*
250C Hesburgh Library
Notre Dame, IN 46556
o: 574-631-3457
e: djohns27_at_nd.edu
Received on Fri Oct 21 2022 - 14:53:21 EDT