I'm using a library to download media. The download function returns a stream. Occasionally, partway through the download, an error occurs in one of the http requests used to yield each item of the stream.
When such an error occurs, I'm currently discarding the entire download and retrying from the beginning. Without being able to change the library's code, is it possible to retry the last item in the stream, so the http request can be retried again and the download can continue from there?
I realize this is probably impossible since the library's state would need to somehow be rewinded to before the error occurred. But maybe someone has a suggestion for how I can handle library errors like these that might happen many minutes into downloading a large file, and take advantage of the fact that I already have all the data successfully downloaded up to the point of the error.
Related
I have a really specific use-case where in prod I have to play back a continously appended wav file in my java application (I have no way of modify this scenario).
My problem is that when I open this wav file for playback libvlc handles the duration calculation, and after a while it detects EOF, despite the file actual length is much larger since we opened it (my guess is because of buffered playback). This causes the player to stop (raise the finished event). As of today I restart the playback and set the time to the end position from this finished event. This causes little pauses and unacceptable in the final product.
I try to implement a logic in libvlc (vlc 3.0.6 version) that will handle this problem, but I can't really find out the proper way of doing it.
My approach would be to recalculate the duration of the wav file in case it's detecting EOF. If the new duration equals the old one than it is really EOF else it can continue playback.
I have tied to modify the VLC_DEMUXER_EOF handling in input.c, follow the file end trace, modify the demux-run.c and wav.c process, and to play around with the event handling (finished/stopped), but cannot get much closer to a valid solution.
I would really appreciate some help with this one, because I loosing my hair rapidly in the last couple of days. (I'm open for alternatives too, if you have some idea.)
I'm not sure which binding you are using, but I'm assuming vlcj since you mentioned Java.
Anyway, one solution could be to use libvlc_media_new_callbacks. Doc: https://www.videolan.org/developers/vlc/doc/doxygen/html/group__libvlc__media.html#ga591c3cbe56444f1949165b2b9b75d8e2
Implementing these custom callbacks will allow you to tell libvlc explicitly to wait. You can do this in the libvlc_media_read_cb callback, where the documentation states:
If no data is immediately available, then the callback should sleep.
You should find how this API is exposed through whichever binding you use and then use it from Java code.
My solution was to modify the wav file handling in the libvlc source, and build a new vlc player for this specific problem.
By updating the i_data_size in the wav.c Control method, with the stearm_Size(p_demux->s) the player was able to manage the appending, and plays back the files like they were streams (this possibly generates issues with the lenght_change event).
And secondly I have to manage occasional collision, when the appender allocates the file, and stream block operations cannot be executed. I managed this problem by implementing a retry mechanism to the stream.c vlc_stream_ReadRaw method. This will retry the s->pf_read(s, buf, len) call x times with some microsleep, if it returns 0 (it means EOF in usual playbacks, but here it can indicate failed operation).
This is NOT a proper solution by any means, but I was in a hurry, and had to make it work. I will accept the solution described by mfkl.
I am using downloadTask of URLSession to download a large file. The problem i am facing is how to provide pause and resume functionality. I have read that cancelling the downloadTask with resumeData gives back resumeData which can be used next time to resume the download. But for a really large file, this resumeData can be very large (i think. depending on file size and at what stage download is paused it can be very large). How do i persist this large resumeData so that i can use it next time to resume download.
Also there can be multiple downloads at the same time, which increases the same problem more.
The resume data blob does not contain the actual received data. If it did, you wouldn't be able to resume downloading a multi-gigabyte file on a 32-bit architecture.
What it contains is a bunch of metadata for the transfer:
A URL on disk where the temporary file is stored.
A time stamp indicating when the fetch began so it can ask the server if the file has changed since then.
The request URL.
The set of request headers.
Probably some UUIDs.
And it might just be the URL of a metadata file on disk that contains the things listed above. I'm not sure which.
Either way, you will not get a multi-gigabyte resumeData blob. That would be devastating. The whole point of a download task is that it downloads to disk, not to memory.
We use an NSURLSession to download data in the background, and have timeoutIntervalForResource defined so it will timeout if it takes too long, but if, for whatever reason, the source server doesn't exist then it still sits and waits. Is there any way to get it to abort immediately, or 'ask' the NSURLSessionDownloadTask if anything has been downloaded yet?
Failing that, what would be the best way of performing a pre-check to ensure a server exists before trying to download data from it?
These servers may be out of our control so we can't place a small file to download to check availability. The only file we may not about could be a sizeable video, for example.
You can indeed ask the task about its status. First, check the response property. If that is nil, then you haven't gotten the first packet from the server. If that is non-nil, use countOfBytesExpectedToReceive and countOfBytesReceived as needed to determine progress.
I should also note that these properties all support KVO, AFAIK.
You could also perform an explicit DNS lookup prior to scheduling the background request if you'd prefer, with the caveat that doing so would prevent you from scheduling something that might actually work if the user's Internet connection comes back online in the meantime. :-)
First of all let me say that I am not asking "how to check a status of a previously uploaded video". My question is about getting the status of a video from the response of an upload. I am using the dotnet client and right after an upload is completed the response is a Google.Apis.Youtube.v3.Data.Video object. That object has a property called Status that contains the folllowing fields among others: RejectionReason, PrivacyStatus and UploadStatus. The problem is that only the PrivacyStatus and UploadStatus have values. The RejectionReason is null. Jeff Posnick mentioned that (see whole thread here)
There's no way to determine whether a video is a duplicate or not as part of the upload response, because YouTube doesn't know whether the video is a duplicate until it has processed the video, and processing takes place after the upload has completed
That's a bit strange because when I issue a video.list right after the upload the API returns a status for the uploaded video. So even if the video is not published and it seems like YouTube is still indexing/processing the video it already knows the status of the "just uploaded video". So why can it not return the status as part of the response?
It's important that the response include the status because if not then, in the code, we have to do two API calls each time we do an upload: (1) insertmediaupload then (2) video.list. It'd a very costly operation especially that not all uploads will be duplicate.
EDIT
As a response to Jeff-Posnick's comment below the question is "can the API wait for a few seconds and check if the processing is done and then include the status as part of the response?".
I came up with that question because of the behavior I've seen: That's a bit strange because when I issue a video.list right after the upload the API returns a status for the uploaded video. But I've been playing around with the API and got inconsistent results. I have uploaded the same video over and over and sometimes there is "a duplicate" status and sometimes there is none. Please take note of the steps I took, #1 and #2 above. There are no other codes in between of those two API calls.
I'm not sure what the question is here.
You seem to understand the limitations of the way uploads work with the YouTube API, and those limitations still apply with v3 of the YouTube Data API. At the time that a response is returned from the videos.insert() request, the status of the video is not known, because it hasn't been processed yet. The actual processing might happen a second or two after the video has been uploaded, or it might happen a few minutes (or longer) after the video has been uploaded, especially for larger video files. It's not done in real time, and it's not reasonable to expect the videos.insert() API call to block waiting for the processing to be finished.
I'd disagree with your assessment that performing a videos.list(id=...,part=status) is a "very costly operation". The amount of bandwidth and YouTube API quota that consumes is minimal compared to an actual video upload. It would be nice to provide a way to communicate back the processing status independent of the videos.insert() call via some sort of callback or push update mechanism, but we don't have anything available like that at this time. You have to poll videos.list(id=...,part=status).
I am using Nokogiri for parsing XML.
Problem is in response time of external resource. Sometimes it works fine. Sometimes respond time can be over 30 seconds. Sometimes it returns different error codes. What I need is to find out the fastest way to know if my XML is ready to be requested by open-uri. And only then to make actual request.
What I am doing now is setting Timeout to 5 seconds to prevent delays.
begin
Timeout::timeout(5) do
link = URI.escape("http://domain.org/timetable.xml")
#doc = Nokogiri::HTML(open(link))
end
rescue Timeout::Error
#error = "Data Server is offline"
end
For checks at the level your code shows, you'll need cooperation from the remote service, e.g., conditional HEAD requests and/or Etag comparison (those together would be my own preference.) It looks like you may have some of this as you say it sometimes returns error codes, though if the those error codes are in the XML payload they're not going to help and of course, if the remote service's responsiveness is variable it will probably fluctuate between your check and subsequent main GET request.
FWIW: if you're just looking to improve your app's responsiveness when using this data, there are cache approaches you can use, e.g., use a soft-TTL lower than the main TTL that, when expired, causes your cache code to return the cached XML and kick off an async job to refetch the data so it's fresher for the next request. Or use a repeating worker to keep the cache fresh.