ABSTRACT

Lossless encoding is a method of data representation that allows the original data to be perfectly reconstructed from the compressed data. To achieve this, the encoding function must be one-to-one (injective), ensuring every unique piece of data maps to a unique binary string.


Efficiency in Encoding

When evaluating an encoding algorithm, we measure efficiency in two primary dimensions:

  1. Space: The size of the encoded bitstring (compression ratio).
  2. Time: The computational complexity required to encode and decode the information.

The Mathematical Foundation

For an encoding to be lossless, it must function as a one-to-one mapping. This requirement dictates the minimum number of bits needed to represent a set of data.

Minimum Bit Requirement

If we have a set of data , and we want to encode it into binary strings of length :

  1. The number of unique data items must be less than or equal to the total number of available binary strings:

  2. Solving for gives the theoretical lower bound:

There will always exist an encoding of length bits for any finite data set.


Encoding as a Function

An encoding of a set of Data is a map where the domain is the Set of Data and the codomain is a set of binary strings.

Fixed Length Encoding

  • Codomain: (The finite set of binary strings of length ).
  • Mechanism: Maps every data element to a string of exactly bits.
  • Usage: Best for random access and simple hardware implementations (e.g., standard ASCII).

Variable Length Encoding

  • Codomain: (The infinite set of all binary strings of all possible lengths).
  • Mechanism: Maps data elements to strings of varying lengths.
  • Usage: Optimized for efficiency based on complexity or frequency.
    • More complex/rare data: Requires more bits.
    • Simpler/frequent data: Requires fewer bits.

NOTE

Variable length encoding is essential for effective data compression, as it allows us to minimize the “average” length of the encoded data.


Strategies for Encoding Strings

There are several ways to apply these functional mappings to strings of characters: