What I am trying to accomplish:
Capture RTP Stream in C#
Forward that stream to the System.Speech.SpeechRecognitionEngine
I am creating a Linux-based robot which will take microphone input, send it Windows machine which will process the audio using Microsoft Speech Recognition and send the response back to the robot. The robot might be hundreds of miles from the server, so I would like to do this over the Internet.
What I have done so far:
Have the robot generate an RTP stream encoded in MP3 format (other formats available) using FFmpeg (the robot is running on a Raspberry Pi running Arch Linux)
Captured stream on the client computer using VLC ActiveX control
Found that the SpeechRecognitionEngine has the available methods:
recognizer.SetInputToWaveStream()
recognizer.SetInputToAudioStream()
recognizer.SetInputToDefaultAudioDevice()
Looked at using JACK to send the output of the app to line-in, but was completely confused by it.
What I need help with:
I'm stuck on how to actually send the stream from VLC to the SpeechRecognitionEngine. VLC doesn't expose the stream at all. Is there a way I can just capture a stream and pass that stream object to the SpeechRecognitionEngine? Or is RTP not the solution here?
Thanks in advance for your help.
After much work, I finally got Microsoft.SpeechRecognitionEngine to accept a WAVE audio stream. Here's the process:
On the Pi, I have ffmpeg running. I stream the audio using this command
ffmpeg -ac 1 -f alsa -i hw:1,0 -ar 16000 -acodec pcm_s16le -f rtp rtp://XXX.XXX.XXX.XXX:1234
On the the server side, I create a UDPClient and listen on port 1234. I receive the packets on a separate thread. First, I strip off the RTP header (header format explained here) and write the payload to a special stream. I had to use the SpeechStreamer class described in Sean's response in order for the SpeechRecognitionEngine to work. It wasn't working with a standard Memory Stream.
The only thing I had to do on the speech recognition side set the input to the audio stream instead of the default audio device.
recognizer.SetInputToAudioStream( rtpClient.AudioStream,
new SpeechAudioFormatInfo(WAVFile.SAMPLE_RATE, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
I haven't done extensive testing on it (i.e. letting it stream for days and seeing if it still works), but I'm able to save off the audio sample in the SpeechRecognized and it sounds great. I'm using a sample rate of 16 KHz. I might bump it down to 8 KHz to reduce the amount of data transfer, but I will worry about that once it becomes a problem.
I should also mention, the response is extremely fast. I can speak an entire sentence and get a response in less than a second. The RTP connection seems to add very little overhead to the process. I'll have to try a benchmark and compare it with just using MIC input.
EDIT: Here is my RTPClient class.
/// <summary>
/// Connects to an RTP stream and listens for data
/// </summary>
public class RTPClient
{
private const int AUDIO_BUFFER_SIZE = 65536;
private UdpClient client;
private IPEndPoint endPoint;
private SpeechStreamer audioStream;
private bool writeHeaderToConsole = false;
private bool listening = false;
private int port;
private Thread listenerThread;
/// <summary>
/// Returns a reference to the audio stream
/// </summary>
public SpeechStreamer AudioStream
{
get { return audioStream; }
}
/// <summary>
/// Gets whether the client is listening for packets
/// </summary>
public bool Listening
{
get { return listening; }
}
/// <summary>
/// Gets the port the RTP client is listening on
/// </summary>
public int Port
{
get { return port; }
}
/// <summary>
/// RTP Client for receiving an RTP stream containing a WAVE audio stream
/// </summary>
/// <param name="port">The port to listen on</param>
public RTPClient(int port)
{
Console.WriteLine(" [RTPClient] Loading...");
this.port = port;
// Initialize the audio stream that will hold the data
audioStream = new SpeechStreamer(AUDIO_BUFFER_SIZE);
Console.WriteLine(" Done");
}
/// <summary>
/// Creates a connection to the RTP stream
/// </summary>
public void StartClient()
{
// Create new UDP client. The IP end point tells us which IP is sending the data
client = new UdpClient(port);
endPoint = new IPEndPoint(IPAddress.Any, port);
listening = true;
listenerThread = new Thread(ReceiveCallback);
listenerThread.Start();
Console.WriteLine(" [RTPClient] Listening for packets on port " + port + "...");
}
/// <summary>
/// Tells the UDP client to stop listening for packets.
/// </summary>
public void StopClient()
{
// Set the boolean to false to stop the asynchronous packet receiving
listening = false;
Console.WriteLine(" [RTPClient] Stopped listening on port " + port);
}
/// <summary>
/// Handles the receiving of UDP packets from the RTP stream
/// </summary>
/// <param name="ar">Contains packet data</param>
private void ReceiveCallback()
{
// Begin looking for the next packet
while (listening)
{
// Receive packet
byte[] packet = client.Receive(ref endPoint);
// Decode the header of the packet
int version = GetRTPHeaderValue(packet, 0, 1);
int padding = GetRTPHeaderValue(packet, 2, 2);
int extension = GetRTPHeaderValue(packet, 3, 3);
int csrcCount = GetRTPHeaderValue(packet, 4, 7);
int marker = GetRTPHeaderValue(packet, 8, 8);
int payloadType = GetRTPHeaderValue(packet, 9, 15);
int sequenceNum = GetRTPHeaderValue(packet, 16, 31);
int timestamp = GetRTPHeaderValue(packet, 32, 63);
int ssrcId = GetRTPHeaderValue(packet, 64, 95);
if (writeHeaderToConsole)
{
Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}",
version,
padding,
extension,
csrcCount,
marker,
payloadType,
sequenceNum,
timestamp,
ssrcId);
}
// Write the packet to the audio stream
audioStream.Write(packet, 12, packet.Length - 12);
}
}
/// <summary>
/// Grabs a value from the RTP header in Big-Endian format
/// </summary>
/// <param name="packet">The RTP packet</param>
/// <param name="startBit">Start bit of the data value</param>
/// <param name="endBit">End bit of the data value</param>
/// <returns>The value</returns>
private int GetRTPHeaderValue(byte[] packet, int startBit, int endBit)
{
int result = 0;
// Number of bits in value
int length = endBit - startBit + 1;
// Values in RTP header are big endian, so need to do these conversions
for (int i = startBit; i <= endBit; i++)
{
int byteIndex = i / 8;
int bitShift = 7 - (i % 8);
result += ((packet[byteIndex] >> bitShift) & 1) * (int)Math.Pow(2, length - i + startBit - 1);
}
return result;
}
}
I think you should keep it simpler. Why use RTP and a special library to capture the the RTP? Why not just take the audio data from the Rasperry Pi and use Http Post to send it to your server?
Keep in mind that System.Speech does not support MP3 format. This might be helpful - Help with SAPI v5.1 SpeechRecognitionEngine always gives same wrong result with C#. For System.Speech audio must be in PCM, ULaw, or ALaw format. The most reliable way to determine which formats your recognizer supports is to interrogate it with RecognizerInfo.SupportedAudioFormats.
Then you can post the data to your server (and use ContentType = "audio/x-wav"). We've used a Url format like
http://server/app/recognize/{sampleRate}/{bits}/{isStereo}
to include the audio parameters in the request. Send the captured wav file in the POST body.
One catch we ran into is we had to add a WAV file header to the data before sending it to System.Speech. Our data was PCM, but not in WAV format. See https://ccrma.stanford.edu/courses/422/projects/WaveFormat/ in case you need to do this.
It's an old thread, but was useful for a project I was working on. But, I had the same issues as some other people trying to use dgreenheck's code with a Windows PC as the source.
Got FFMpeg working with that 0 changes to the code using the following parameters:
ffmpeg -ac 1 -f dshow -i audio="{recording device}" -ar 16000 -acodec pcm_s16le -f rtp rtp://{hostname}:{port}
In my case, the recording device name was "Microphone (Realtek High Definition Audio)", but I used the following to get the recording device name:
ffmpeg -list_devices true -f dshow -i dummy
Related
IMPORTANT EDIT
After further investigating, I found out that the packet size is in fact much larger than the stated 1024 bytes, the 1024 bytes were just the limit of the standard out I was using (android studio / flutter).
Some of the packets received are now up to ~27 000 bytes large, however that is nowhere near the actually transmitted size of over 10x that.
I am trying to send singular packets of up to 5 MB in length over a Socket connection in Dart. For this, I am using the following code:
Socket socket = await Socket.connect(globals.serverUrl, globals.serverPort);
Stream<Uint8List> stream = socket?.asBroadcastStream();
Uint8List? response = await stream?.first;
String responseString = String.fromCharCodes(response);
Note that my Server is running Java while the Client is using Dart.
After sending the data packet from the Server to the Client, it successfully receives the first 1024 bytes exactly of the packet, and the rest is nowhere to be found, even after reading stream.first multiple times, they continuously read the newly sent packet and not the remaining bytes of the old packet.
So my question is, how do I require the Socket stream to read ALL bytes of the packet until finished, and not just the first 1024?
EDIT:
The received packet on the client is parsed using:
String? decrypt(String cipherText, String keyString) {
final key = Key.fromUtf8(keyString);
final iv = IV.fromBase64(cipherText.split(":")[1]);
final encrypter = Encrypter(AES(key, mode: AESMode.cbc, padding: null));
final encrypted = Encrypted.fromBase64(cipherText.split(":")[0]);
final decrypted = encrypter.decrypt(encrypted, iv: iv);
globals.log.i("DECRYPTED: $decrypted");
return decrypted;
}
The error that I am getting stems from getting the IV, since the message is cut off at 1024 bytes and the ":" appears much later in the String.
The problem is that the Dart socket split messages bigger than 1024 bytes into multiple packets of 1024 bytes. So there's some approaches you can use to combine them together in the client:
By extending Socket class
I do not believe this is a right solution:
Hard to extend since it's a platform implementation (you can see the sdk implementation of dart:io almost any class method is external).
Hard to maintain.
Since it depends on custom platform implementations you need to do it on multiple platforms.
It's easy to create undocumented memory leaks.
Let me know if you still prefer this approach I'll do a further research.
By using Stream<T>.reduce function
The problem with this approach in your context is that Sockets do not emit a done event when a message is sent by using socket.write('Your message').
So unless you're using a socket to send a single message this function can't help you cause it will return a Future<T> that will never complete (only when the socket connection is closed).
By emitting a EOF message from the server
This is a solution I found even not so elegant, improvements are welcome.
The idea is to concatenate all client received packets into a single one and stop receiving when a pre-determined termination (EOF) string is received.
Implementation
Below is a server implementation that emits a message of 5mb followed by a message:end string every time a new client is connected.
import 'dart:io';
Future<void> main() async {
final ServerSocket serverSocket =
await ServerSocket.bind(InternetAddress.anyIPv4, 5050);
final Stream<Socket> serverStream = serverSocket.asBroadcastStream();
serverStream.listen((client) async {
print(
'New client connected: ${client.address}:${client.port} ${client.done} Remote address: ${client.remoteAddress}');
const int k1byte = 8;
const int k2bytes = k1byte * 2;
const int k1kb = k1byte * 1000;
const int k1mb = k1kb * 1000;
const int k5mb = k1mb * 5;
// Create a 5mb string that follows: '1000.....0001'
final String k1mbMessage = '1${createStringOf(k5mb - k2bytes, '0')}1';
client.write(k1mbMessage);
client.write('message:end');
});
print('Listening on: ${serverSocket.address} ${serverSocket.port}');
}
String createStringOf(int size, [String char = ' ']) {
// https://api.dart.dev/stable/2.17.3/dart-core/String-class.html it says:
// > A sequence of UTF-16 code units.
// And from https://www.ibm.com/docs/en/db2-for-zos/12?topic=unicode-utfs says:
// > UTF-16 is based on 16-bit code units. Each character is encoded as at least 2 bytes.
int dartStringEncodingSize = 2;
assert(size >= dartStringEncodingSize && size.isEven,
'''Dart char contains 2 bytes so we can only create Strings (with exact size) of even N bytes''');
assert(char.length == 1, '''[char] must be a single char String''');
int charCount = size ~/ dartStringEncodingSize;
return char * charCount;
}
And here we can see a client implementation where we use 'our own reduce' function that combine all packets while the termination string is not found.
import 'dart:io';
Future<void> main() async {
final Socket server = await Socket.connect('localhost', 5050);
final Stream<String> serverSocket =
server.asBroadcastStream().map(String.fromCharCodes); // Map to String by default
const kMessageEof = 'message:end';
String message = '';
await for (String packet in serverSocket) {
// If you are using [message] as a List of bytes (Uint8List):
// message = [...Uint8List.fromList(message), ...Uint8List(packet)]
message += packet;
// Do not compare directly packet == kMessageEof
// cause it can be 'broken' into multiple packets:
// -> 00000 (packet 1)
// -> 00000 (packet 2)
// -> 00mes (packet 3)
// -> sage: (packet 4)
// -> end (packet 5)
if (message.endsWith(kMessageEof)) {
// remove termination string
message = message.replaceRange(
message.length - kMessageEof.length,
message.length,
'',
);
}
print('Received: $message'); // Prints '1000000......0000001'
}
}
You can make it more generic if you want by using an extension:
import 'dart:io';
/// This was created since the native [reduce] says:
/// > When this stream is done, the returned future is completed with the value at that time.
///
/// The problem is that socket connections does not emits the [done] event after
/// each message but after the socket disconnection.
///
/// So here is a implementation that combines [reduce] and [takeWhile].
extension ReduceWhile<T> on Stream<T> {
Future<T> reduceWhile({
required T Function(T previous, T element) combine,
required bool Function(T) combineWhile,
T? initialValue,
}) async {
T initial = initialValue ?? await first;
await for (T element in this) {
initial = combine(initial, element);
if (!combineWhile(initial)) break;
}
return initial;
}
}
Future<void> main() async {
final Socket server = await Socket.connect('localhost', 5050);
final Stream<String> serverSocket =
server.asBroadcastStream().map(String.fromCharCodes);
const kMessageEof = 'message:end';
// Reduce with a condition [combineWhile]
String message = await serverSocket.reduceWhile(
combine: (previous, element) => '$previous$element',
combineWhile: (message) => !message.endsWith(kMessageEof),
);
// Remove termination string
message = message.replaceRange(
message.length - kMessageEof.length,
message.length,
'',
);
print('Received: $message');
}
Since the socket itself doesn't send the done event the way I found to reduce all packets into a single one was by emitting 'our own done event'.
I am hoping someone can have suggestions about this issue.
We have a custom driver taken from https://learn.microsoft.com/en-us/samples/microsoft/windows-driver-samples/xpsdrv-driver-and-filter-sample/
The print driver works well and outputs XPS when the documents are opened in MS word or PDF. But when a document is printed from HP5Si series printer, the driver returns 0 bytes. The job is sent from HP5Si printer to the XPS driver. Why is the driver rejecting this input when the source is a HP series printer. What can I do to fix it?
The printer on the AS400 is setup with an IBM HP5Si driver and sends the job to a windows service on a server. This windows service routes the job to XPS driver as if it were an HP series printer. The XPS driver processes this job and returns XPS to the windows service. The windows service then converts to a tiff file.
For some reason if printing is done using this workflow XPS driver returns 0.
If the same document is opened in word or notepad or any not AS400+ HP, it works and XPS is returned.
To prove my theory, I sent a PCL file in C# code to the driver and it returned 0 bytes.
public static void SendBytesToPrinterPCL(string printerName, string szFileName) {
IntPtr lhPrinter;
OpenPrinter(printerName, out lhPrinter, new IntPtr(0));
if (lhPrinter.ToInt32() == 0) return; //Printer not found!!
var rawPrinter = new DOCINFOA() {
pDocName = "My Document",
pDataType = "RAW"
};
StartDocPrinter(lhPrinter, 1, rawPrinter);
using(var b = new BinaryReader(File.Open(szFileName, FileMode.Open))) {
var length = (int) b.BaseStream.Length;
const int bufferSize = 8192;
var numLoops = length / bufferSize;
var leftOver = length % bufferSize;
for (int i = 0; i < numLoops; i++) {
var buffer = new byte[bufferSize];
int dwWritten;
b.Read(buffer, 0, bufferSize);
IntPtr unmanagedPointer = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, unmanagedPointer, buffer.Length);
WritePrinter(lhPrinter, unmanagedPointer, bufferSize, out dwWritten);
Marshal.FreeHGlobal(unmanagedPointer);
}
if (leftOver > 0) {
var buffer = new byte[leftOver];
int dwWritten;
b.Read(buffer, 0, leftOver);
IntPtr unmanagedPointer = Marshal.AllocHGlobal(buffer.Length);
Marshal.Copy(buffer, 0, unmanagedPointer, buffer.Length);
WritePrinter(lhPrinter, unmanagedPointer, leftOver, out dwWritten);
Marshal.FreeHGlobal(unmanagedPointer);
}
}
EndDocPrinter(lhPrinter);
ClosePrinter(lhPrinter);
}
string filePath = #"C:\Users\tom\Desktop\form.PCL";
string szPrinterName = #"\\server\xpsdrv";
Print.SendBytesToPrinterPCL(szPrinterName, filePath);
Then I sent a regular text file to the driver and it successfully converted to XPS.
public static void SendToPrinterNonPCL(string filePath)
{
ProcessStartInfo info = new ProcessStartInfo();
info.Verb = "print";
info.FileName = filePath;
info.CreateNoWindow = true;
info.WindowStyle = ProcessWindowStyle.Hidden;
Process p = new Process();
p.StartInfo = info;
p.Start();
p.WaitForInputIdle();
System.Threading.Thread.Sleep(3000);
if (false == p.CloseMainWindow())
p.Kill();
}
string filePath = #"C:\Users\tom\Desktop\form.txt";
string szPrinterName = #"\\server\xpsdrv";
Print.SendToPrinterNonPCL(filePath);
Why doesn't the driver in Microsoft samples accept PCL? What should I do. I am not a driver developer. This project was given to me.
EDIT:
Initally I didn't know of this printing from AS400. Our legacy driver was built 15 years back. The developer wrote a custom print driver to PCL and a Custom converter to Tiff. But the driver only supported monochrome. I am not a driver expert or a PCL expert or a converter expert. In order to support color and less pixelated feel for the final Tiff, I decided to change it to a XPS driver. Also it is less custom code and could use Microsoft's XPS conversion in WPF. It is not a very big learning curve for a non-driver development person compared to learning PCL and then changing the converter to accomodate color Tiff. But I guess it is falling apart since the users also print from AS400 which sends PCL.
Do you know any good products which we could purchase a license to? We need a PCL driver and a converter to Tiff
Thank you
I was following article on http://blogs.visigo.com/chriscoulson/easy-handling-of-http-range-requests-in-asp-net and wrote simple MVC application to stream large video files.
Here is my code with slight modifications to the code in that tutorial,
internal static void StreamVideo(string fullpath, HttpContextBase context)
{
long size, start, end, length, fp = 0;
using (StreamReader reader = new StreamReader(fullpath))
{
size = reader.BaseStream.Length;
start = 0;
end = size - 1;
length = size;
// Now that we've gotten so far without errors we send the accept range header
/* At the moment we only support single ranges.
* Multiple ranges requires some more work to ensure it works correctly
* and comply with the spesifications: http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.2
*
* Multirange support annouces itself with:
* header('Accept-Ranges: bytes');
*
* Multirange content must be sent with multipart/byteranges mediatype,
* (mediatype = mimetype)
* as well as a boundry header to indicate the various chunks of data.
*/
context.Response.AddHeader("Accept-Ranges", "0-" + size);
// header('Accept-Ranges: bytes');
// multipart/byteranges
// http://www.w3.org/Protocols/rfc2616/rfc2616-sec19.html#sec19.2
if (!String.IsNullOrEmpty(context.Request.ServerVariables["HTTP_RANGE"]))
{
long anotherStart = start;
long anotherEnd = end;
string[] arr_split = context.Request.ServerVariables["HTTP_RANGE"].Split(new char[] { Convert.ToChar("=") });
string range = arr_split[1];
// Make sure the client hasn't sent us a multibyte range
if (range.IndexOf(",") > -1)
{
// (?) Shoud this be issued here, or should the first
// range be used? Or should the header be ignored and
// we output the whole content?
context.Response.AddHeader("Content-Range", "bytes " + start + "-" + end + "/" + size);
throw new HttpException(416, "Requested Range Not Satisfiable");
}
// If the range starts with an '-' we start from the beginning
// If not, we forward the file pointer
// And make sure to get the end byte if spesified
if (range.StartsWith("-"))
{
// The n-number of the last bytes is requested
anotherStart = size - Convert.ToInt64(range.Substring(1));
}
else
{
arr_split = range.Split(new char[] { Convert.ToChar("-") });
anotherStart = Convert.ToInt64(arr_split[0]);
long temp = 0;
anotherEnd = (arr_split.Length > 1 && Int64.TryParse(arr_split[1].ToString(), out temp)) ? Convert.ToInt64(arr_split[1]) : size;
}
/* Check the range and make sure it's treated according to the specs.
* http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
*/
// End bytes can not be larger than $end.
anotherEnd = (anotherEnd > end) ? end : anotherEnd;
// Validate the requested range and return an error if it's not correct.
if (anotherStart > anotherEnd || anotherStart > size - 1 || anotherEnd >= size)
{
context.Response.ContentType = MimeMapping.GetMimeMapping(fullpath);
context.Response.AddHeader("Content-Range", "bytes " + start + "-" + end + "/" + size);
throw new HttpException(416, "Requested Range Not Satisfiable");
}
start = anotherStart;
end = anotherEnd;
length = end - start + 1; // Calculate new content length
fp = reader.BaseStream.Seek(start, SeekOrigin.Begin);
context.Response.StatusCode = 206;
}
}
// Notify the client the byte range we'll be outputting
context.Response.AddHeader("Content-Range", "bytes " + start + "-" + end + "/" + size);
context.Response.AddHeader("Content-Length", length.ToString());
// Start buffered download
context.Response.WriteFile(fullpath, fp, length);
context.Response.End();
}
When I streaming large capacity(around 700MB) video in a network using above code the starting of video is very slow (around 1-2 minutes). In this stage I checked the network requests and it seems browser ask for video and waiting for a response from server. This is really annoying for the user.
Then once it started video is playing smoothly (It is a 720P resolution video and as my network connection is good video is playing very smoothly).
But when I do a seek with controls on html video player, then same issue happens and I have to wait another 1-2 minutes till response completed.
I am using IIS7 (MVC4). If I play the same video which located inside of IIS dir then I can play it without mentioned delay. Also if the video is located outside of IIS folder but if it within the same machine that hosted IIS then also no issues.
I am having this issue when I have video in a network location which is a different machine that IIS hosted.
So conclusion is,
This is not because of user browser trying to load large video in to browser. It is something between video share machine vs IIS.
Any idea about resolving this?
Regards,
-Lasith
I have create a WCF Service that allows uploading large files via BasicHttpBinding using streaming and it is working great! I would like to extended this to show a progress bar (UIProgressView) so that when a large file is being uploaded in 65k chunks, the user can see that it is actively working.
The client code calling the WCF Service is:
BasicHttpBinding binding = CreateBasicHttp ();
BTSMobileWcfClient _client = new BTSMobileWcfClient (binding, endPoint);
_client.UploadFileCompleted += ClientUploadFileCompleted;
byte[] b = File.ReadAllBytes (zipFileName);
using (new OperationContextScope(_client.InnerChannel)) {
OperationContext.Current.OutgoingMessageHeaders.Add(System.ServiceModel.Channels.MessageHeader.CreateHeader("SalvageId","",iBTSSalvageId.ToString()));
OperationContext.Current.OutgoingMessageHeaders.Add(System.ServiceModel.Channels.MessageHeader.CreateHeader("FileName","",Path.GetFileName(zipFileName)));
OperationContext.Current.OutgoingMessageHeaders.Add(System.ServiceModel.Channels.MessageHeader.CreateHeader("Length","",b.LongLength));
_client.UploadFileAsync(b);
}
On the server side, I read the file stream in 65k chuncks and do report back to the calling routine "bytes read", etc. A snippet of code for that is:
using (FileStream targetStream = new FileStream(filePath, FileMode.CreateNew,FileAccess.Write)) {
//read from the input stream in 65000 byte chunks
const int chunkSize = 65536;
byte[] buffer = new byte[chunkSize];
do {
// read bytes from input stream
int bytesRead = request.FileData.Read(buffer, 0, chunkSize);
if (bytesRead == 0) break;
// write bytes to output stream
targetStream.Write(buffer, 0, bytesRead);
} while (true);
targetStream.Close();
}
But I don't know how to hook into the callback on the Xamarin side to receive the "bytes read" versus "total bytes to send" so I can update the UIProgressView.
Has anyone tried this or is this even possible?
Thanks In Advance,
Bo
I want to stream a video to my IPad via the HTML5 video tag with tapestry5 (5.3.5) on the backend. Usually the serverside framework shouldn't even play a role in this but somehow it does.
Anyway, hopefully someone here can help me out. Please keep in mind that my project is very much a prototype and that what I describe is simplified / reduced to the relevant parts. I would very much appreciate it if people didn't respond with the obligatory "you want to do the wrong thing" or security/performance nitpicks that aren't relevant to the problem.
So here it goes:
Setup
I have a video taken from the Apple HTML5 showcase so I know that format isn't an issue. I have a simple tml page "Play" that just contains a "video" tag.
Problem
I started by implementing a RequestFilter that handles the request from the video control by opening the referenced video file and streaming it to client. That's basic "if path starts with 'file' then copy file inputstream to response outputstream". This works very well with Chrome but not with the Ipad. Fine, I though, must be some headers I'm missing so I looked at the Apple Showcase again and included the same headers and content type but no joy.
Next, I though, well, let's see what happens if I let t5 serve the file. I copied the video to the webapp context, disabled my request filter and put the simple filename in the video's src attribute. This works in Chrome AND IPad.
That surprised me and prompted me to look at how T5 handles static files / context request. Thus far I've only gotten so far as to feel like there are two different paths which I've confirmed by switching out the hardwired "video src" to an Asset with a #Path("context:"). This, again, works on Chrome but not on IPad.
So I'm really lost here. What's this secret juice in the "simple context" requests that allow it to work on the IPad? There is nothing special going on and yet it's the only way this works. Problem is, I can't really serve those vids from my webapp context ...
Solution
So, it turns out that there is this http header called "Range" and that the IPad, unlike Chrome uses it with video. The "secret sauce" then is that the servlet handler for static resource request know how to deal with range requests while T5's doesn't. Here is my custom implementation:
OutputStream os = response.getOutputStream("video/mp4");
InputStream is = new BufferedInputStream( new FileInputStream(f));
try {
String range = request.getHeader("Range");
if( range != null && !range.equals("bytes=0-")) {
logger.info("Range response _______________________");
String[] ranges = range.split("=")[1].split("-");
int from = Integer.parseInt(ranges[0]);
int to = Integer.parseInt(ranges[1]);
int len = to - from + 1 ;
response.setStatus(206);
response.setHeader("Accept-Ranges", "bytes");
String responseRange = String.format("bytes %d-%d/%d", from, to, f.length());
logger.info("Content-Range:" + responseRange);
response.setHeader("Connection", "close");
response.setHeader("Content-Range", responseRange);
response.setDateHeader("Last-Modified", new Date().getTime());
response.setContentLength(len);
logger.info("length:" + len);
byte[] buf = new byte[4096];
is.skip(from);
while( len != 0) {
int read = is.read(buf, 0, len >= buf.length ? buf.length : len);
if( read != -1) {
os.write(buf, 0, read);
len -= read;
}
}
} else {
response.setStatus(200);
IOUtils.copy(is, os);
}
} finally {
os.close();
is.close();
}
I want to post my refined solution from above. Hopefully this will be useful to someone.
So basically the problem seemed to be that I was disregarding the "Range" http request header which the IPad didn't like. In a nutshell this header means that the client only wants a certain part (in this case a byte range) of the response.
This is what an iPad html video request looks like::
[INFO] RequestLogger Accept:*/*
[INFO] RequestLogger Accept-Encoding:identity
[INFO] RequestLogger Connection:keep-alive
[INFO] RequestLogger Host:mars:8080
[INFO] RequestLogger If-Modified-Since:Wed, 10 Oct 2012 22:27:38 GMT
[INFO] RequestLogger Range:bytes=0-1
[INFO] RequestLogger User-Agent:AppleCoreMedia/1.0.0.9B176 (iPad; U; CPU OS 5_1 like Mac OS X; en_us)
[INFO] RequestLogger X-Playback-Session-Id:BC3B397D-D57D-411F-B596-931F5AD9879F
It means that the iPad only wants the first byte. If you disregard this header and simply send a 200 response with the full body then the video won't play. So, you need send a 206 response (partial response) and set the following response headers:
[INFO] RequestLogger Content-Range:bytes 0-1/357772702
[INFO] RequestLogger Content-Length:2
This means "I'm sending you byte 0 through 1 of 357772702 total bytes available".
When you actually start playing the video, the next request will look like this (everything except the range header ommited):
[INFO] RequestLogger Range:bytes=0-357772701
So my refined solution looks like this:
OutputStream os = response.getOutputStream("video/mp4");
try {
String range = request.getHeader("Range");
/** if there is no range requested we will just send everything **/
if( range == null) {
InputStream is = new BufferedInputStream( new FileInputStream(f));
try {
IOUtils.copy(is, os);
response.setStatus(200);
} finally {
is.close();
}
return true;
}
requestLogger.info("Range response _______________________");
String[] ranges = range.split("=")[1].split("-");
int from = Integer.parseInt(ranges[0]);
/**
* some clients, like chrome will send a range header but won't actually specify the upper bound.
* For them we want to send out our large video in chunks.
*/
int to = HTTP_DEFAULT_CHUNK_SIZE + from;
if( to >= f.length()) {
to = (int) (f.length() - 1);
}
if( ranges.length == 2) {
to = Integer.parseInt(ranges[1]);
}
int len = to - from + 1 ;
response.setStatus(206);
response.setHeader("Accept-Ranges", "bytes");
String responseRange = String.format("bytes %d-%d/%d", from, to, f.length());
response.setHeader("Content-Range", responseRange);
response.setDateHeader("Last-Modified", new Date().getTime());
response.setContentLength(len);
requestLogger.info("Content-Range:" + responseRange);
requestLogger.info("length:" + len);
long start = System.currentTimeMillis();
RandomAccessFile raf = new RandomAccessFile(f, "r");
raf.seek(from);
byte[] buf = new byte[IO_BUFFER_SIZE];
try {
while( len != 0) {
int read = raf.read(buf, 0, buf.length > len ? len : buf.length);
os.write(buf, 0, read);
len -= read;
}
} finally {
raf.close();
}
logger.info("r/w took:" + (System.currentTimeMillis() - start));
} finally {
os.close();
}
This solution is better then my first one because it handles all cases for "Range" requests which seems to be a prereq for clients like Chrome to be able to support skipping within the video ( at which point they'll issue a range request for that point in the video).
It's still not perfect though. Further improvments would be setting the "Last-Modified" header correctly and doing proper handling of clients requests an invalid range or a range of something else then bytes.
I suspect this is more about iPad than about Tapestry.
I might invoke Response.disableCompression() before writing the stream to the response; Tapestry may be trying to GZIP your stream, and the iPad may not be prepared for that, as video and image formats are usually already compressed.
Also, I don't see a content type header being set; again the iPad may simply be more sensitive to that than Chrome.