Mining of Forensic Data from File Fragments

Lanterna, Dario

The wide use of digital technology has the consequence that data and information useful for investigation have to be extracted from digital devices. The digital devices are rarely the corpus delicti, they usually are analysed to define the digital crime scene and to compose the events timeline. Fragments are common in digital environment analysis. Digital devices manage data splitting them into little parts called blocks, cluster, pages, chunks or packets. When investigations need an in depth analysis, fragments are the primary source of information. When an investigation meets ICT environment, it has to handle fragments. The deeper the analysis will be, the greater the number of fragments to be analysed. Digital forensic analysis recover a large quantity of data and most of this data is composed of fragments; automatic methods for information mining are welcome. Data mining techniques help to highlight information contained in data. The work starts with an introduction to legal aspect that affects forensic analysis, then it addresses technological aspect of digital devices analysis, and finally focuses on file fragments analysis. I studied the fragments structure, and this work proposes two methods to perform their classification, the first uses grammar analysis to extract feature from the fragments, the second uses grammar induction and string distance metrics. The evolution of storage technologies changed the fragment generation process, the knowledge of the new generation processes enables effective recovering algorithms. Storage deduplication causes fragments generation based on Rabin algorithm; I studied these storage technologies in order to understand real implementations and to define how to handle fragments coming from these storage devices. Deduplication technology need thorough study using experimental data, and physical acquisition in order to make an identification of markers that help to recognize storage technology. In this work, I propose a detailed analysis of two deduplication engines, and I demonstrate that by combining fragments present in chunkstore is possible to generate files that never existed before in the file system. This problem was marginal when fragmentation was due to fixed size block allocation, but the algorithms used to split files in order to identify common chunks between similar files, may amplify this problem. During recovering of files from deduplicated system, we have to identify the file that contains the list ordered of the chunks for each file; this element is called hash sequence, without the hash-sequence is impossible to demonstrate that a file was really existed. The evolution of technology allows delivering of virtual desktops using cloud services. The virtual desktop infrastructure centralizes storage and computing power, user can connect from anywhere using their own network-connected devices. This technology changes the procedure for digital forensic investigations. Reaching fragments requires the analysis of a whole infrastructure. The analysis requires identifying the user disks, starting from the study of virtual infrastructure. The whole investigation must create a virtual crime scene using all traces left from virtual desktop usage. The work ends with the analysis of a virtual desktop infrastructure. This infrastructure uses fragments, to store user activity in a differential disk. The analysis of these infrastructures is quite different from the analysis of physical desktop computers. The work shows the phases of a virtual desktop delivery, and the traces left during this activity.

The wide use of digital technology has the consequence that data and information useful for investigation have to be extracted from digital devices.