Devin Coldewey is a Seattle-based writer and photographer. He wrote for the TechCrunch network since 2007. Some posts, it would like you to read: the perils of externalization of knowledge | Generation I | Surveillant society | Select two | Frame war | Custom manifest | Our great sin his personal ????-coldewey.cc. ? Read More

IBM Research just set a world record in the data store by creating a disk array, towing 120 petabytes. This was done at the request of the unnamed study group, which requires this unprecedented amount of space to run the simulation of any kind. These demonstrations expands in size as datasets grow, but also added more copies, snapshots, and layoffs.
How they do it? Well the easy part was plugging in 200000 individual hard drives that make up the array. Racks are extra-dense with units and water cooling, but apart from this equipment is fairly simple.
The problem comes when you start, having actually index this space. Some file systems have problems with single files above 4 GB, and some can't handle single drives more than about 3 TB. It's because they just weren't designed to be able to track as many files as large space. Imagine if your work has been called everything in the world is just a different name at first, but after a billion or so you start running out of permutations. It's pretty much the same with file systems, although modern far more promising in terms of their design, and I doubt you'll have this problem again — if you IBM Research.
120 petabytes is an insane amount, eight times more than 15 PB arrays are already out there, and they've had to deal with issues of space. At IBM, a huge array of location tracking and calling its data files is fully 2 PB their own space. You need the index of the next-generation file just to index the index!
Their homegrown file system called General parallel file system, or GPFS. It is designed with vast amounts of mass and parallelism in mind: think RAID for thousands of discs. Files are striped and how many disks, they must be, reduction or elimination of read and write as a performance bottleneck. And boy does: IBM recently record another, indexing files 10 billion in 43 minutes. The previous record? 1 billion files — within three hours. So yes is highly scalable.
An array built by IBM Almaden storage systems, the client will use Untitled part modeling "real phenomenon." This includes the natural sciences, but this can be anything from subatomic particles planetary modeling. These projects are usually taken as much for moving on the ground, both to provide services, though. And of course IBM now boasts that he built this thing, at least until an even bigger one comes along.
No comments:
Post a Comment