How can I read sections of a large remote file (via tcpip?) - url

A client has a system which reads large files (up to 1 GB) of multiple video images. Access is via an indexing file which "points" into the larger file. This works well on a LAN. Does anyone have any suggestions as to how I can access these files through the internet if they are held on a remote server. The key constraint is that we cannot afford the time necessary to download the whole file before accessing individual images within it.

You could put your big file behind an HTTP server like Apache, then have your client side use HTTP Range headers to fetch the chunk it needs.
Another alternative would be to write a simple script in PHP, Perl or server-language-of-your-choice which takes the required offsets as input and returns the chunk of data you need, again over HTTP.

If I understand the question correctly, it depends entirely on the format chosen to contain the images as a video. If the container has been designed in such a way that the information about each image is accessible just before or just after the image, rather than at the end of the container, you could extract images from the video container and the meta-data of the images, to start working on what you have downloaded until now. You will have to have an idea of the binary format used.

FTP does let you use 'paged files' where sections of the file can be transferred independently
To transmit files that are discontinuous, FTP defines a page
structure. Files of this type are sometimes known as
"random access files" or even as "holey files".
In FTP, the sections of the file are called pages -- rfc959
I've never used it myself though.

Related

Reading video during cloud dataflow, using GCSfuse, download locally, or write new Beam reader?

I am building a python cloud video pipeline that will read video from a bucket, perform some computer vision analysis and return frames back to a bucket. As far as I can tell, there is not a Beam read method to pass GCS paths to opencv, similar to TextIO.read(). My options moving forward seem to download the file locally (they are large), use GCS fuse to mount on a local worker (possible?) or write a custom source method. Anyone have experience on what makes most sense?
My main confusion was this question here
Can google cloud dataflow (apache beam) use ffmpeg to process video or image data
How would ffmpeg have access to the path? Its not just a question of uploading the binary? There needs to be a Beam method to pass the item, correct?
I think that you will need to download the files first and then pass them through.
However instead of saving the files locally, is it possible to pass bytes through to opencv. Does it accept any sort of ByteStream or input stream?
You could have one ParDo which downloads the files using the GCS API, then passes it to a opencv through a stream, ByteChannel stdin pipe, etc.
If that is not available, you will need to save the files to disk locally. Then pass opencv the filename. This could be tricky because you may end up using too much disk space. So make sure to garbage collect the files properly and delete the files from local disk after opencv processes them.
I'm not sure but you may need to also select a certain VM machine type to ensure you have enough disk space, depending on the size of your files.

Capture a websites objects in separate pcap files

A website usually consists of multiple objects (e.g. text file, a few png files etc.), I would like to know if there's a tool that can capture the individual requests/responses in different pcap files?
So for example if I browse to http://somewebsite.com , and some http://somewebsite.com consists of say {index.html, image1.png, somestylefile.css, image2.png}, the tool would capture the entire load of http://somewebsite.com but generate {index.html.pcap, image1.png.pcap, somestylefile.css.pcap, image2.png.pcap}
I don't know of any tool that can do this, or its possible using scapy or something similar?
A HTTP connection can have multiple requests inside the same TCP connection and browsers make heavy use of this HTTP keep alive. With HTTP pipelining the requests/responses don't even need to be fully separated in time, i.e. a client can send another request even though the response for the previous one is not there. And with HTTP/2 the data can also be interleaved, i.e. several responses transferred at the same time inside the same connection.
Insofar it is not always possible to capture the data as separate pcap file because they might not be separable at the packet level. But if you don't need the original packet boundaries it would be possible to create separate pcap files for each request which not necessarily reflect the original packets but which reflect the application layer, i.e. the response matching the request.
One tool which makes this is httpflow.pl which can extract HTTP/1.x requests/response pairs from an existing pcap (or sniff directly) and writes each request/response into a separate pcap file, as if it would have been a separate TCP connection. It can also clean up the data for easier analysis, i.e. unchunk and uncompress the HTTP body.

Flash HTTP Streaming - Multiple Files

With Flash 10.1+ and the ability to use appendBytes on a NetStream, its possible to use HTTP streaming in Flash for video delivery. But it seems that the delivery method requires the segments to be stored in a single file on disk, which can only be broken into discrete segment files with an FMS or an Apache module. You can cache the individual segment files once they're created, but the documentation indicates that you still must always use an FMS / Apache module to produce those files in the first instance.
Is it possible to break the single on-disk file into multiple on-disk segments without using an FMS, Wowza product or Apache?
There was an application which decompiled the output of the F4fpackager to allow it to be hosted anywhere, without the Apache Module. Unfortunately this application was withdrawn.
It should be possible to use a proxy to cache the fragments. Then you can use these cached files on any webserver.

How to monitor File Uploads without using Flash?

I've been looking for a way to monitor file uploading information without using flash, but probably using ajax, i suppose. I want to monitor speed and percentage of finished file upload.
Do you know of any resource that describes how to do that, or what i should follow to do it ?
In the pre-HTML5 world I believe this requires web-server support. I've used this Apache module successfully in the past:
http://piotrsarnacki.com/2008/06/18/upload-progress-bar-with-mod_passenger-and-apache/
The only way without flash is to do it on the server. The gist is:
Start the file upload
Open a streaming connection to the server
Have the server read the post headers to tell you how large the file is going to be
Have the server repeatedly check the file size (in /tmp generally) to see how complete it is
stream the % done back to the client
I've done it before in other languages, but never in ruby, so not sure of a project that's done it, sorry.

Transferring a file using FTP

First of all this is not a question seeking Programming help. I am doing a project using FTP in that i employed a logic I want peoples to make comment whether the logic is ok or I have to employ a better one. I transfer files using FTP, for ex if the filesize is 10 MB I will split that file into "X" no of files of "y" size depending upon my network speed, after that I send these files one by one and merge it on the client machine.
the network speed in my client side is very low(1kbps) so I want to split files into size of 512 bytes and send it.
It would be better not to split the file but use a client and server that both support resuming file transfers.

Resources