Quick and Secure Clustering Labelling for Digital forensic analysis

A.Sudarsana Rao

Department of CSE,

ASCET, Gudur.

A.P, India

Prof C.Rajendra

 Department of CSE

ASCET, Gudur.

A.P, India



In Digital forensic analysis Seized digital devices can provide precious information and evidences about facts and/or individuals on which the investigational activity is performed. In particular, algorithms for clustering documents can facilitate the discovery of new and useful knowledge from the documents under analysisIn that applies document clustering algorithms to digital forensic analysis of computers seized devices in police investigations. In the Digital forensic analysis investigate by carrying out extensive experimentation with clustering algorithms (K-means, K-medoids, Single Link, Complete Link, Average Link, and CSPA) applied in datasets. The proposed work involves investigating automatic approaches for cluster labelling. The assignment of labels to clusters may enable the expert examiner to identify the semantic content of each cluster more quickly—eventually even before examining their contents. Finally, the study of algorithms that induce overlapping partitions (e.g., Fuzzy C-Means and

Expectation-Maximization for Gaussian Mixture Models) is worth of Investigation in Computer seized devices in digital forensic investigation.

Index Terms – Clustering, forensics analysis, digital investigation.



In Digital evidence, as defined as the information and data of investigative value that are stored on, received, or transmitted by a digital device [1],[2], has become lately a crucial component in law enforcement agencies investigations. The relevance of this kind of evidence, collected when electronic data and devices are seized, is established by digital forensics analysts, which more and more often have to deal with massive amounts of data, still increasing with the capacity of mass storage devices.

In a more practical and realistic scenario, domain experts (e.g., forensic examiners) are scarce and have limited time available for performing examinations. Thus, it is reasonable to assume that, after finding a relevant document, the examiner could prioritize the analysis of other documents belonging to the cluster of interest, because it is likely that these are also relevant to the investigation [3]. Such an approach, based on document clustering, can indeed improve the analysis of seized computers, as it will be discussed in more detail later.

Basically this is paper for the police investigations through forensic data analysis. Clustering algorithms are typically used for examining data analysis, where there is little or no prior knowledge about the data. This is exactly the case in several applications of Computer Forensics, including the one mention in this paper. [3] Clustering algorithms have been studied for decades, and the literature on the subject is huge. Therefore, we decided to choose set of (six) representative algorithms in order to show the potential of the proposed approach, namely: the partitional K-means [4] and K-medoids [5], the hierarchical single /Complete /Average Link [6], and the cluster ensemble algorithm known as CSPA [7]. These algorithms were run with different combinations of their parameters, resulting in sixteen different algorithmic instantiations, as shown in Table I. Thus, as a contribution of our work, we compare their relative performances on the studied application domain—using real-world investigation cases conducted by the Brazilian Federal Police Department. In order to make the comparative analysis of the algorithms more realistic, two relative validity indexes (Silhouette [5] and its simplified version [8]) have been used to estimate the number of clusters automatically from data.


Click here to View / Download Full Paper

Volume -02, Issue -07 , July 2014.


Leave a Reply