UTF-8

In English context text files can be uniquely ASCII, when in an international context text files are usually 8 bits permissive allowing storage of native texts.

In those international context, a Byte Order Mark can appear in start of file to differentiate UTF-8 encoding from legacy regional encoding.

ASCII

The ASCII standard allows ASCII-only text files (unlike most other file types) to be freely interchanged and readable on Unix, Macintosh, Microsoft Windows, DOS, and other systems. These differ in their preferred line ending convention and their interpretation of values outside the ASCII range (their character encoding).

MIME

Text files usually have the MIME type “text/plain”, usually with additional information indicating an encoding. Prior to the advent of Mac OS X, the Mac OS system regarded the content of a file (the data fork) to be a text file when its resource fork indicated that the type of the file was “TEXT”. Under the Microsoft Windows operating system, a file is regarded as a text file if the suffix of the name of the file (the “extension”) is “txt”. However, many other suffixes are used for text files with specific purposes. For example, source code for computer programs is usually kept in text files that have file name suffixes indicating the programming language in which the source is written.

.TXT

.txt is a file format for files consisting of text usually containing very little formatting (e.g., no bolding or italics). The precise definition of the .txt format is not specified, but typically matches the format accepted by the system terminal or simple text editor. Files with the .txt extension can easily be read or opened by any program that reads text and, for that reason, are considered universal (or platform independent).

The ASCII character set is the most common format for English-language text files, and is generally assumed to be the default file format in many situations. For accented and other non-ASCII characters, it is necessary to choose a character encoding. In many systems, this is chosen on the basis of the default locale setting on the computer it is read on. Common character encodings include ISO 8859-1 for many European languages.

Because many encodings have only a limited repertoire of characters, they are often only usable to represent text in a limited subset of human languages. Unicode is an attempt to create a common standard for representing all known languages, and most known character sets are subsets of the very large Unicode character set. Although there are multiple character encodings available for Unicode, the most common is UTF-8, which has the advantage of being backwards-compatible with ASCII: that is, every ASCII text file is also a UTF-8 text file with identical meaning.

The main issue between pure ASCII and pure UTF-8 is limited to the presence or absence of the BOM. According to Unicode Microsoft protocol for txt files use UTF-8.

.TEXT

.text is an alternative file extension to .txt.


Copyright © CCJK Technologies Co., Ltd. 2000-2017. All rights reserved.
TOP