About Encodings 1.1

From Aviberry API

Jump to: navigation, search

Contents
Still have a question?



Let's review API method startConversion. The section below refers only to file names that contain not Latin characters. However if there are national characters in the file names, the required output may fail because the conversion will not be generated.

Let's take the processing of source_url, i.e. the processing of source file name. As described above, source file name should be generated according to standard, i.e. all the special characters should be "percent-encoded". After receiving this parameter, service will perform backwards conversion and receive "standard" source file name that will be used for future operation.

The example of future operation is verifying if the source file really exists. New converted file will be used for that. But source file name in server file system, where it is stored, can be in different charset from transferred file name. Then, using the converted name, service will not be able to determine if the file already exists.

Using the match of corresponding parameters source_url_encoding and source_filename_encoding for source file, you can indicate to server that "transferred to method source file name is in source_url_encoding encoding, but to access the file on server file system the source file name in source_filename_encoding encoding (actual encoding) should be used". I.e. you can define some conversion "rule" of transferred source file name, and the service will do the conversion.

Let's see an example. There is a file пример.flv (Russian characters in file name), originally created in Windows and stored in the root directory of some FTP server ftp.example.com. The correct URL of this file for passing it to service is ftp://ftp.example.com/%EF%F0%E8%EC%E5%F0.flv. Suppose there is an HTML form, where user can enter file name manually and some user entered this exact file. User is not a computer and instead of ftp://ftp.example.com/%EF%F0%E8%EC%E5%F0.flv can type only ftp://ftp.example.com/пример.flv and will be right :-). If form page encoding is UTF-8, then file URL, received from form will look like ftp://ftp.example.com/%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80.flv. If this URL is given as source file to convert, then service will return the "File does not exist" error, because in fact in server file system the file name is %EF%F0%E8%EC%E5%F0.flv. In order to generate a conversion successfully, user has to convert the file name received from form into actual encoding or to set the additional parameters when generating the conversion. In this case user should set the source_filename_encoding to Windows-1251.

The defined conversion "rule" of source file name works only if parameters in match source_url_encoding and source_filename_encoding are set to different values. If values are the same, no conversion is performed, i.e. these parameters are ignored, and service uses the source file name as given. This is the default behaviour, since both parameters by default are set to UTF-8.

The same applies to target file. The list of supported encodings can be found in attachment.

Therefore, the main rule when passing source file name or target file name, containing national characters: filename should be "percent-encoded" in the actual encoding of the file system where the file is stored. If not, API user has to calculate the file name in actual encoding and pass it or use the described above parameters of method startConversion to indicate to server the rule of receiving the file name in actual encoding from the given name.

Supported encodings

byte2be UCS-4BE UTF-8 ISO-2022-JP-MS ISO-8859-10 UHC
byte2le UCS-4LE UTF-7 Windows-1252 ISO-8859-13 ISO-2022-KR
byte4be UCS-2 UTF7-IMAP ISO-8859-1 ISO-8859-14 Windows-1251
byte4le UCS-2BE ASCII ISO-8859-2 ISO-8859-15 CP866
BASE64 UCS-2LE EUC-JP ISO-8859-3 ISO-8859-16 KOI8-R
UUENCODE UTF-32 SJIS ISO-8859-4 EUC-CN ArmSCII-8
HTML-ENTITIES UTF-32BE eucJP-win ISO-8859-5 CP936
Quoted-Printable UTF-32LE SJIS-win ISO-8859-6 HZ
7bit UTF-16 CP51932 ISO-8859-7 EUC-TW
8bit UTF-16BE JIS ISO-8859-8 BIG-5
UCS-4 UTF-16LE ISO-2022-JP ISO-8859-9 EUC-KR




Contents
Still have a question?

Views
Personal tools