Deep dive into file system analytics by leveraging TreeSize and ETL: Part 1
What we’ll cover
In this piece, I’m eager to explore further into the realm of file systems, delving into their analysis and grasping their intricacies via data. This discussion extends the conversation from my previous post, where I detailed how to generate TreeSize audits with PowerShell and then leverage that data to construct a knowledge graph.
This was my first deep dive into making sense of file systems, focusing on either moving them or boosting their efficiency. There’s a lot more to dig into with knowledge graphs, but for this piece, I’ll keep it straightforward — focusing on stats and trends for quick, clear results without getting into complex simulations.
Objectives and applications
Moving a large amount of data from one place to another, say from a file share to SharePoint Online, isn’t as simple as directing Microsoft to your files and expecting an immediate migration. It’s a more nuanced process that involves understanding and mapping out the structure of your data, identifying complexities, and mitigating any potential issues that arise from limitations or the need for modernization.
In this blog, I’ll bypass the nitty-gritty of the entire migration process and instead, zero in on my approach to exploring and understanding critical aspects of a file share. This focus is crucial for me because it allows us to examine important metrics such as quantity, size, or the depth of the files. Understanding these factors is key to planning a smooth and well-informed data migration and modernization strategy.
Learning new things
My journey began with a fascination for data and the challenge of deciphering complex logic from a high-level perspective. In my professional life, I analyze processes and people to devise solutions — sometimes to boost efficiency, other times to establish clear rules around a process; the scope is broad. This is why I have a constant desire to learn more and develop in this space.
I’ve been using PowerShell and experimenting with JavaScript/Node JS for my tools, though they’re not without their limitations. Like many, I’ve been swayed by the chorus of data analysts praising Python for its effectiveness and speed for these kinds of tasks. The more I delved into data science and Python, the more I encountered the concept of ETL — Extract, Transform, and Load.
My aim wasn’t to master Python and data processing through and through, but rather to explore how AI tools like ChatGPT can help quickly assemble a project to explore the possibilities. I thrive on learning through context and understanding best practices, which helps set me on the right path. Once I grasp the basics, I have a clear roadmap to achieve my goals.