Unicode and Character Sets in WebStack
--------------------------------------

Unicode text should be converted to the chosen character set (encoding) when
written to the response stream.

Classic Python strings are written directly to the response stream without
encoding.

Character Set Semantics in WebStack
-----------------------------------

Character sets (or encodings) are relevant in two areas:

 * The encoding of output data.
 * The processing of input data.

When producing HTML pages containing form fields and interpreting the values of
such fields from a request body, it is necessary to know...

 * The character set used to encode the values sent by the browser. This is
   typically determined by...

 * The character set used to encode the HTML page from which the field values
   originated.

It is therefore also necessary to remain consistent in the usage of character
sets when specifying content types. WebStack enforces the following rules:

 * Where the request content type specifies a character set, this is used to
   decode the request body parameters unless explicitly overridden.

 * Where the request content type does not specify a character set, a default
   character set is used to decode the request body parameters unless
   overridden.

 * No conversion is done at the request stream level, since information about
   the character set may be missing and the application may wish to override
   any default explicitly at a higher level (such as when it gets request body
   parameters).

 * Where the response content type specifies a character set, this is used to
   encode Unicode response data (eg. HTML pages).

 * Where the response content type does not specify a character set, a default
   character set is used to encode Unicode response data (eg. HTML pages).

Restrictions in and Omissions from Standards
--------------------------------------------

The encoding of character sets such as UTF-16 in HTTP POST request body
messages of content/media type application/x-www-form-urlencoded is not
properly standardised. Therefore, it is highly recommended that UTF-8 be used
as an encoding should the various single byte encodings (eg. ISO-8859-1) not
cover the range of characters to be displayed and received.

Framework Behaviour
-------------------

The Java Servlet API imposes restrictions on decoding request body parameters
by stating that the character encoding (ServletRequest.setCharacterEncoding)
must be set before any reading of the request body is attempted.
