Abstract:
Nowadays, it is clear that a huge data storage
is needed to store very large amount of textual
unstructured data. Compression is an effective
technique to less data storage space. Most
unstructured data are random or near to random.
The file has no redundancy cannot be compressed.
Transformation is the back-end pre-processing
algorithm for data compression. It intended to
introduce more redundancy in the data that make
more compressible. It does not compress data by
itself. It can be applied to original text to get more
redundant data. This paper proposes new
transformation method for big unstructured text data.
After transformation, the data file is compressed
using appropriate compression algorithms.