Amazon Transcribe converting .json transcript - amazon-transcribe

I'm trying to convert the Amazon Transcribe .json transcript into a more readable transcript (ie. one that separates the text by speaker), How can this be done?

Related

Google Cloud Video intelligence and embedded video content

Does Google Cloud Video intelligence work with embedded video content from Vimeo or YouTube. Will it be able to create tags, see faces etc... since the content is not directly uploaded?
No, it's not possible, as it says in the main page:
It quickly annotates videos stored in Google Cloud Storage, and helps
you identify...
Moreover, in the REST API docs it specifies, for the inputUri field:
Input video location. Currently, only Google Cloud Storage URIs are
supported, which must be specified in the following format:
gs://bucket-id/object-id (other URI formats return
google.rpc.Code.INVALID_ARGUMENT). For more information, see Request
URIs. A video URI may include wildcards in object-id, and thus
identify multiple videos. Supported wildcards: '*' to match 0 or more
characters; '?' to match 1 character. If unset, the input video should
be embedded in the request as inputContent. If set, inputContent
should be unset.
There is a group for discussing Cloud Video Intelligence API features and this question was asked here.

Extract text of OneDrive documents using Graph Api

I have been using Ms Graph API, to download the files of OneDrive successfully.
I was looking for a way to read only the text content (for indexing purpose in my application) using Graph API, for different types of files(pdf,xls,zip,Images etc.) instead of going by the conventional approach of downloading the complete file and then extracting the text using some "Text extracting api" and then index the file, which would be a time consuming task. I am aware GraphAPI has its own search features, but it lacks ability to do complicated search like regular expression search (please correct me if I am wrong). I am sure OneDrive does its own indexing for each file which helps a user to do the basic search.
So, is there any way I can get the text content of the documents using the Graph API?
I don't believe getting a 'preview' of text-based documents is currently available through the API. You will need to make a GET request to fetch the content. If you don't want the full document, you can request a partial range of bytes that you believe would be enough for the document. In addition, to make it easier to handle different file types, we currently support converting common file formats to PDF (to possibly standardize your file parsing logic).

Get info about video that not finished encoding

Where does all meta data about video file comes from? Is it comes from original video or from encoded?
I'm trying to write video duration into database, but i'm not sure if i can get this value during POST on my server
This english is hard to read but I'll take my best shot. I also have zero experience with pandastream but the API looks easy enough. It look like the pandastream requires you to upload the video file first. Then work off an ID.
In the api docs at: https://github.com/pandastream/panda_gem, it looks like
encodings = Panda::Video.find("1234").encodings
means that you have to have encoded it first. They are just sending the attributes down to you over the web API. After you query for the attributes, you can do anything you want with it, like inserting them into a database.
You could even loop through all the videos on your account and get the video duration and bulk insert them.
videos = Panda::Video.all
# loop through videos as |upload|
# find video as video
# insert into database upload.id and video.attributes["duration"]
Good luck!

Parse word docs heroku/s3

I want to implement a functionality that needs to parse word docs, which will uploaded by user and stored on amazon S3. The application will be on heroku. I tried catdoc but it doesn't parse urls. Can anyone suggest tool that can be used on heroku to parse word documents?
UPDATE
I want to scan an uploaded ms-word(.doc) has particular words and tag them accordingly.
If you're just wanting to upload the word document you could take a look at something like the paperclip gem.
This would allow you to save the file on amazon S3 and simply download it, but you could also extend paperclip and run post-processing on the file. This is slightly more complicated.
Like willglynn says, it would be good to know what parsing you need to do, exactly?

Can't Download from youtube

I have a script that downloads mp4 files from youtube. What it does is to generate link of the form http://youtube.com/get_video?video_id=*VIDEO_ID*&&t=*THE_TOKEN*=&fmt=18&asv=2, but it doesn't work anymore (noticed it today). What do you think?
Instead of trying to use get_video to get the video, try parsing fmt_url_map (format-url map) instead.
You should be able to find the fmt_url_map in the same place you found the token (like in the flashvars of the YouTube flash video player or inside the YouTube page somewhere). If you can't find it, send a request to http://www.youtube.com/get_video_info?video_id=VIDEO_ID and you should get a really long result that is in the format of name=value&name=value&... Find "fmt_url_map" inside this result (search through the result for a string that starts with "&fmt_url_map=" and ends with "&").
After you get this value (you may have to url-decode it), it will be something like (without the line breaks):
22|http://blah.youtube.com/videoplayback?blah,
35|http://blah.youtube.com/videoplayback?blah,
...
where each comma-separated entry starts with the fmt value (22 or 35 in the example), followed by a pipe character, which is then followed by the URL where you can use to download the video in that format. (This URL is client-specific, so a URL for a certain client most likely won't work with another client due to YouTube checking IPs. Also, the URLs do expire after a while.)
For a list of the different fmt values, see: http://en.wikipedia.org/wiki/YouTube#Quality_and_formats and show the "Comarison of YouTube media encoding options". NOTE: not all formats may be available for all videos.
Deprecated: won't work anymore!
If you want to download to a server you can use youtube-dl which still works.
Well it seems like they have removed the fmt option. See http://en.wikipedia.org/wiki/YouTube#Quality_and_codecs.
I've created a node.js server that can stream YouTube videos directly to the client and it works. See https://github.com/licson0729/node-YouTubeStreamer for details.

Resources