Streaming

Input (HTTPRequest) streaming

Defer request parsing until destination stream is known. Allows file upload directly from TCP socket to a file on a disk, without intermediate buffering of entire file in memory.

Example

We would like to save an uploaded file with the same name on local disk. Browser will send a file as a mime part with name 'file'. Request content type will be multipart/form-data.

Site>>answerTo: aHTTPRequest
| stream | 
aHTTPRequest
   postDataAt: 'file'
beforeStreamingDo: [:datum |
	stream := (SpFilename named: datum filename) writeStream.
	datum writeStream: stream].
aHTTPRequest ensureFullRead. "to be sure streaming really occurs"
stream close.   "mandatory, to close open file! "
^HTTPResponse ok  

aHTTPRequest postDataAt: aName beforeStreamingDo: aBlock sets a callback block, which will be called just before streaming begins and at that time it is a right moment to prepare a write stream on output file. Also, at that moment we already have a filename of original file (in a block argument 'datum').

Streaming will begin when we first access any other post data or when response is sent back. This is called a deferred parsing of post data until needed and allows us to prepare an output stream before post data is parsed.

Output (HTTPResponse) streaming

For large file download a streamed response is only viable option. Also for any response where length fo data is not known in advance. Namely, suck a response is sent in so called chunked transfer encoding, which don't need a content length in advance, but it is closed by a special zero sized chunk. See standard and here for more.

  1. get a response prepared for streaming from a request:
    • aRequest streamedResponse.
  2. add all headers to response. This must be done before any streaming occurs!
  3. stream contents to a response using nextPut: and nextPutAll: . You can send both string or binary data.
    • response nextPut: aCharacterOrByte; nextPutAll: aByteStringOrArray
  4. signal end of streaming with closing a response. This is mandatory!
    • response close.
  5. return that response as usuall

Example

This is hypothetical example. We get a new response, prepared in advance for streaming from a request, then add all headers and start streaming or file directly to a response. With closing a response we signal an end of streaming.

	Site>>answerTo: aHTTPRequest
		| stream | 
		response :=  aHTTPRequest streamedResponse.  
		self addResponseHeadersTo: response forPage: self.
		[stream := 'myfile' asFilename readStream binary. 
		[stream atEnd] whileFalse: [aHTTPStreamedResponse nextPut: stream next] ] 
			ensure: [stream close].
		response close.    "mandatory!"
		^response  

 

Benchmarking

Upload of 6MB/20MB file, from local browser to local Swazoo server (on 3GHz Pentium)

  1. before streaming 4s
  2. after SwazooStream refactoring 3s
  3. after input streaming but not yet directy to output file 2s/6s
  4. after direct streaming to output file 2s/6s

Upload 700MB movie locally:

  1. after direct streaming to output file 3:10 min (3.6MB/sec), VW CPU full, 4.5MB local file throughput (both read and write). No GC activity. Other requests were blocked during that upload!
  2. for comparison: direct file copying locally to another directory: 40s (17.5MB/sec) , 20-30MB local file throughput

Upload 22MB on 100Mbps line to collocated server (2x 1.5GHz Pentium, SCSI discs)

  1. 25s (about 1MB/sec), 100% CPU, Other requests blocked during download

Download 20MB/200MB file localy (local Swazoo server (on 3GHz Pentium) to local browser)

  • 7s/42s 3MB/sec, no GC activity (streamed/chunked transfer)

Development notes

Input streaming

  1. SwazooStream is the only stream, remove HTTPStream, HTTPReadStream, HTTPWriteStream, because of unnecessary and costly copying of content from SwazooStream to those intermediary ones.
  2. SwazooStream gets a little more functionality from removed streams
  3. SwazooStream works as a buffer to external TCP socket and buffers about 1024 bytes at once
  4. many tests need to be changes to use SwazooStream, also HTTPReadStream removed
  5. HTTPPostDataArray new istvar parsed and stream, method #isParsed, default is false
  6. HTTPRequest, HTTPPost ensureFullRead to parse POST to the end even that noone else didn't trigger that parsing before. Important to call before sending back HTTPResponse. Call it in HTTPConnection>>produceResponseFor:
  7. HTTPPost for:readFrom: reads only headers and stops before post body.
  8. HTTPPost accessing methods (#postDataAt: etc.) call #ensureFullRead to read and parse post data if not yet. That way post data read is defered until first use. Deferring is necessary in cases a web resource wants to direct post data to its own stream.
  9. Be sure that you don't access HTTPPost postData directly but always through accessing methods! Otherwise can happen that you'll got nil, because data is not yet read and parsed!
  10. all tests pass, also Aida/Web apps work
  11. published as Swazoo 1.1.20 on VW, Squeak also synced
  1. Aida FileUploadTest prepared, with test for normal file upload and for streamed one
  2. new method WebFileInputField toStream: aStream filenameAspect: aSymbol2 forObject: anObject
  3. WebForm: streamed fields are specially registered in a streamedFieldSet
  4. WebFormElement isStreamed, only WebFileInputField can return true for now
  5. WebForm registerStreamedFieldsInto: aHTTPRequest
  6. WebForm acceptFormInputFrom: not need to read file input field, if streamed
  7. WebApplication printWebPage : request isPost ifTrue: [self form registerStreamedFieldsInto: request].
  1. Swazoo HTTPostDatum instvar writeStream added, #isStreamed
  2. HTTPPost postDataAt: aKey streamTo: aWriteStream to announce streaming post data to ouput stream
  3. HTTPPost isPostDataStreamedAt: aKey
  4. published as Swazoo 1.1.21 on VW, Squeak also synced. Aida also published
  1. Swazoo output optimization SwazooBuffer introduced for low garbage buffering , #flush GC bug solved
  2. HTTPPost 4byte buffering of input stream in mime part parser, to skip 'crl
  3. HTTPostDatum writeBlock to be called just before start of streaming to writeStream. It can be used to open the writeStream, because on that time we already know the filename of uploaded file. As a parameter this postDatum is sent
  4. HTTPost postDataAt: aKey beforeStreamingDo: aBlockClosure announce that you want to receive post data directly to a binary stream, which will be set by aBlockClosure. That block must receive and argument, which is a HTTPostDatum and here it can set a writeStream.
  5. HTTPost readEntityFrom:datum:boundary: calls a block (if any) just before streaming begins, with a datum as parameter. This block can then set a write stream in datum (for instance open a output file and stream on it).
  1. Aida WebFileInputField beforeStreamingDo: aBlock filenameAspect: aSymbol2 forObject: anObject
    • A block will be called just before streaming begins, with a HTTPPostDatum as parameter. Block must set a write stream in datum (for instance open an output file and stream on it)
    • Example block: [:datum | datum writeStream: datum filename writeStream binary]
    • Don't forget to close such a stream after! This can be done in App action method.
  2. WebForm registerStreamedFieldsInto: register writeBlock (if any) to post request too.
  1. BiArt (aka AidaCMS) DocumentApp file upload preparation.
  2. DocumentApp prepareAttachmentForStreaming and return write stream to output file. This is called from callback block in WebFileInputField before streaming of uploaded file directly to attachment begins
  3. DocumentApp uploadElement - file input field changed to stream to attachment:

    e add: (WebFileInputField 
    	beforeStreamingDo: 
            	[:datum | 
    		self attachFilename: datum filename. 
     		datum writeStream: self prepareAttachmentForStreaming] 
    	filenameAspect: #attachFilename contentTypeAspect: #attachContentType forObject: self)  
  4. DocumentApp attachFileIfAny - uploaded content is already streamed to attachment (prepared by calling #prepareAttachmentForStreaming just before streaming begins, then by direct streaming from HTTPPost). Close it and add to a document

Output streaming

  1. HTTPStreamedResponse introduced, as subclass of HTTPResponse
  2. response state: #header #streaming #closed
  3. aRequest streamedResponse prepares (if not yet) and returns a HTTPStreamedResponse to stream a response in
  4. HTTPConnection task
  5. HTTPResponse, HTTPStreamedResponse #isStreamed
  6. HTTPResponse produceResponseFor: no need to call #nextPutResponse: for streamed responses, because the do that by themselves.
  7. late deciding to stream or not (for instance, if file is small and cached: do not stream, if large: stream). Not streamed until you call aRequest streamedResponse (should this be enforced?)
  8. SwazooBufer
    • flushChunkTo: a buffer will be sent as a chunk, with hex size in first line then crlf, then a buffer, then crlf
    • closeChunkTo: a zero sized chunk determine and end of response
  9. SwazooStream
    • isChunked, setChunked, resetChunked
    • flush calls chunked or normal flush to write buffer
    • SwazooStream closeResponse to send closing chunk and release buffers
  10. HTTPStreamedResponse
    • close is mandatory !!!
    • HTTPStreamedResponse sendHeaderAndStartStreaming is called when first nextPut or nextPutAll:
  11. HTTPConnection use #closeResponse instead of #flush in nextPutMessage:, nextPutError:
  1. Aida/Web - FileProxy will be first to make streamed , for files larger to sizeAboveMark (40K)
  2. FileProxy>>printHTMLPageOn: aStream forSession: aSession
        | response |
        self isRespondingStreamed
            ifTrue: 
                [response := aSession lastRequest streamedResponse.
                self site addResponseHeadersTo: response forPage: self.
                self streamFileTo: response.
                response close]
            ifFalse: [aStream nextPutAll: self content asByteString]. 
  3. Sending a test file 123test.txt with 1234567890 (just 10 bytes)
  4. FileProxy isRespondingStreamed changed to allow above file to stream
  1. Swazoo optimizations of chunked sending to avoid partitioning chunks in three TCP packets but only one
  2. SwazooBuffer
    • initialize with 6 byte room at the start - so called preamble to store chunk length and crlf if necessary
    • flushTo:chunked: write directly from stream's collection (no more self contents), also skips preamble if not chunked
    • flushChunkedTo: prepare preamble (wich is always fixed width 6) with hex length and spaces in first 4 bytes, then crlf. Then adds another crlf at the end of chunk, then sends all that as one TCP packet!
    • closeChunkTo: also pack zero chunk with two crlf and sends directly to socket as one packet.
  3. Chunked response is now sent in (at least) three TCP packets:
    1. status line, headers and empty line in first packet
    2. chunked data in second (andnext ones, each one about 16KB long) packet
    3. closing chunk in last packet



Updated: 1.4.2009