Abstract:
In the era of Big Data, Hadoop Big Data Platform has been embraced by both
individuals and organizations as it can offer cost-effective, large capacity storage and
multi-functional services on a wide range of devices. It is fast raising popularity to
access Hadoop services via client devices. The widespread usage of Hadoop Big Data
Platform could create the environment that is potentially conducive to malicious
activities and illegal operations. Thus, the forensic investigation on Hadoop Big Data
Patform becomes the emerging field for the digital forensic community. There is also
a need for a digital forensic framework relating to the forensic analysis of Hadoop
Platform to guide the forensic works on Hadoop Big Data Platform to discover the
potential evidences in order to identify the usages.
Hadoop produces a large amount of backlog per operation, which has led to
cumulative backlogs of evidence awaiting analysis. The following major forensic
challenges are arising in Hadoop Big Data Platform environment because of: complex
infrastructure, the large amount of Hadoop backlog and lack of location knowledge
about digital evidences. Without knowing where the evidential data may reside, it can
impede an investigation.
This research proposed a forensic investigation framework to guide the
forensic works on Hadoop Big Data Platform. Moreover, as the proactive research
before conducting the forensics, it discovers residual artifacts (potential evidences)
from Server and attached client devices of popular Hadoop Big Data Platforms:
Ambari Hortonworks Data Platform (Ambari HDP), Non-Ambari Hortonworks Data
Platform (Non-Ambari HDP), Cloudera Distribution of Hadoop (CDH) and MapR
Hadoop Platform (MapR).
The experiments are conducted in relation to the use of popular Hadoop Big
Data Platforms by accessing with the client devices of different Operating Systems
(OS). The residual artifacts are also extracted from the attached client devices of
differnet OS. The underlying OS of attached client devices are: Windows PC and
Android Smart Phone.
It was decided to examine a user accessing Hadoop Platforms, and also to
examine any differences when using different browsers: Internet Explorer, Mozilla
Firefox, Google Chrome, and Android Browsers. The file operations are tested withvi
the different client devices for each browser to identify the different circumstance of
usage.
A variety of circumstances were examined, including the different types of
operation to access, upload and download data in the Hadoop. By determining the
residual artifacts on server and client components, this research contributes to a better
understanding of the types of artifacts that are likely to remain. The extracted artifacts
can assist the forensic examiners for future forensic investigation on Hadoop Big Data
Platform.
The popular crime scenarioes which are extended the Forensic Copra‘s crime
cases and CYFOR cases are examined under the guide of proposed forensic
investigation framework for Hadoop Big Data Platform.