Money mangled the megabyte -

Money mangled the megabyte


Some time ago, I was in an engineering meeting and we were discussing bandwidth and data rates. I was waxing eloquent about how various IP blocks could deliver data at a rate of such-and-such megabytes per second. I had just started my discourse and was building up to a full head of steam to deliver my prima facie case when one of the engineers in the room interrupted me and said: “Do you mean mebibytes per second?”

I stopped for a moment squinted and wrinkled up my face and said to him whilst giving my very best 'snooty waiter' impression, “Excuse me?”

He then demurely repeated himself because I was giving him 'that look' that bespoke me thinking quite loudly “Seriously?!? You just interrupted me when I was about to deliver a masterful and insightful comment?” Those of you who know me know 'the look' I am referring to that precedes me sucking all the air out of the room, but I digress…

He said again: “Do you mean mebibytes per second… you know… power of two and all that?” Everyone in the meeting then turned to look at me and — not unlike watching a tennis tournament — waited to see how I was going to return the rather clumsily-served ball. So I did what every well-heeled, hot-blooded engineer would do in that case and simply volleyed with a “Yeah…” and continued on my merry way with my brilliant, although now somewhat tarnished, point.

Now, two things were running through my head during this rather odd exchange. Either this engineer had just said the word 'megabytes' twice the way Elmer Fudd would say it, and therefore he had a speech impediment I had never noticed before. In this case it would be prudent of me to not mock him and say very slowly and distinctly “Yes… M-E-G -A-B-Y-T-E-S” with the emphasis on the 'G'. Alternatively, it might be that I had no idea what the heck he was talking about, in which case it would be best to just nod and move on; after all, whilst unlikely, it was possible that… well, you know.

When I returned to my office, I decided to “research” this complete mispronunciation of the word 'megabytes.' Most of you probably thought I ran to Google as fast as a frat boy to a keg; admit it… there's no shame in that… but I say 'nay nay' (RIP John Pinette). Instead, I went to the place I always go when I want 'real' answers — not answers that are paid for by the highest bidder (ouch… too soon?). Yes, folks… I went to the 'Dark Web,' which filled the void when the Silk Road was shut down.

The answer to the query “Are mebibytes a real thing?” came back very slowly with a “deet deet deet” sound — you know the way computers work in the movies (because of The Matrix , Hollywood directors still think real computers connect to each other through telephone hard lines and run at 1200 baud (n,8,1)).

Surprisingly, my answer from the Dark Web did come back at like two characters per second and in green text — I guess it took that long because the folks running the dark web can't see what the heck they are doing and are always running around looking for a flashlight. Here is the answer — are you ready? NO — it was not 42 — stop that! It was “B-L-A-M-E-T-H-E-I-E-C.” Ooh… now I was getting somewhere!

So, just what is this IEC… this oozing behemoth, this fibrous tumor, this monster of power and expense hatched from the simple human desire for engineering order? How did allegedly smart people spawn a vast, rampant cuttlefish of dominion with its tentacles in every orifice of the body of engineering policy? (My apologies to P.J. O'Rourke.)

OK… so the International Electrotechnical (not a real word, I fear) Commission, or IEC as they like to be called (possibly because they are tired of people telling them that Electrotechnical is not a real word) has been around for 108 years (by way of contrast, my Uncle Milton was around for 100 years and he always left “well enough alone” — which is probably why he was around for 100 years). The IEC is comprised of 10,000+ engineers from 82 countries, and they historically gave us the SI units from which such illustrious units as meter, kilogram, second, Kelvin, and candela were derived. OK… so they also gave us radian, hertz, newton, pascal, joule, watt, volt, farad, Celsius, lumen, and… (I get it… they are smart people… but this time they have gone too far I tell you!!! The megabyte is sacrosanct. How in the world did this come to be? What in the name of all things binary happened?

Blame the hard drive
There is a famous saying that I applied liberally to this tragedy: “Follow the money!” Yep; in this case, it absolutely turns out to be true: “Money Mangled the Megabyte.”

(Source: Rick Tewell)

Let's hop into our “way back” machine and take a trip back through time to the heady days of PCs when people bought hard drives to upgrade their computers — a time of flux capacitors, proton packs (don't cross the proton streams), Donkey Kong, and Teddy Ruxpin (don't judge).

It is the good old 1980s when hard drive sizes were measured in mere megabytes. As amazing as it might seem today, there was at one time a 32MB hard drive size limit 'barrier' in MS-DOS 3.x. I won't go into a long-winded discussion about this barrier here because someone else has already done so (click here for more info on this mind-numbingly fascinating subject).

Not surprisingly, users had a huge hunger for bigger hard drives. They could buy them from peddlers advertising in Byte or PC Magazine , but when their hard drives arrived they often found that they had to concern themselves with a dizzying array of complications, such as the fact that their new drives were physically organized into things like cylinders, heads, and sectors.

Now, I know that this is a bit difficult, but try to imagine such a thing as a 40MB hard drive (which cost ~$1000 at the time). Our example 40MB hard drive could have a 'physical' layout of 640 cylinders, 8 heads, and 17 sectors per track, with 512 bytes per sector. This means that our drive had 4 'platters' with eight 'heads' that would rotate themselves into and out of the gap between the cylinders and 'fly' just above the magnetic field stored on the platters (top and bottom) — kind of like an old phonograph but without physically touching the media.

The data was stored in concentric circles and each of these concentric circles was called a 'track.' So if you had 640 'cylinders,' then the heads could seek discretely to one of the 640 concentric tracks on the platter. Each 'track' was further subdivided into 17 sectors, with each sector storing 512 bytes.

So, let's do the math: 640 cylinders x 8 heads x 17 sectors x 512 bytes per sector = 44,564,480 bytes. Now, stay with me here, because this is the important bit. The hard drive manufacturers would divide 44,564,480 by 1,048,576 (which in my book is a MB) because it is a power of two (2^20), which is what we computer folks like to think in when we think of computery types of numbers, and this results in a product of ~42MB, which is what the hard drive manufacturers would then claim was the capacity of the drive. (I am now climbing up onto my soapbox, so beware.)

In the world that I come from, things are quite simple:

1KB = 1,024 bytes or 2^10 bytes
1MB = 1,048,576 bytes or 2^20 bytes
1GB = 1,073,741,824 bytes or 2^30 bytes1TB = 1,099,511,627,776 bytes or 2^40 bytes

I don't want to argue about this, because this is the way it was from the beginning, is now, and should be until the end of time or until we no longer need these units — whichever comes first. (I am now climbing down off of my soapbox.)

OK, back to our regularly scheduled programming. The only rub was that, in the good old days, hard drives were fairly non-intelligent and they had to be 'physically' formatted by the hard disk controller. This little gem of technology was typically sold separately, or already pre-installed in the computer (in the case of an upgrade). Again, we won't go into the lengthy discussion surrounding this here, but you can find out more by clicking here should you so desire.

Once the physical format was completed, it was up to the user to then apply the operating system's file system to the hard drive in a process called 'logical' formatting. Those readers who are over forty and used to use DOS surely remember the venerable fdisk and format utilities, right?

During this logical formatting process, bad sectors might be found on the hard drive. This was because hard drives are notorious for having some contaminants or imperfections on the platters that prevent data from being read or written to a particular physical location. In this case, these bad spots were simply marked 'bad' by the logical formatting utility and the file system just worked around them. Yes, this process resulted in 'diminished' storage capacity, but the amount of space reduction was minuscule because it represented the actual physical bad spots on the disk at the time, which were typically negligible in a brand new hard drive. OK… got all that?

Now, another key point to note here is that DOS and Windows reports (and likely always will?) storage device capacities as a “power of two,” which means 1MB = 1,048,576 bytes and so on (see my 'soapbox' list above). This is a very important piece of information to remember.

Time passes and hard drive technology advances at an almost breakneck pace. Now we are in the 1990s. A time of Furbys, Beanie Babies, Pokémon, the rise of the hipster, and the swelling of the dot com bubble. Hard drives have become more and more intelligent until, eventually, they start handling their own bad sectors instead of having to rely on the operating system to do this for them. The end result, in short, is that the hard drive simply reports back to the operating system as to how much storage is available on the hard drive, and there is no longer any tedious mucking about in hyperspace with cylinders, heads, and sectors.

How was all this accomplished? Well, in several ways, actually, but the practical upshot was that the hard drive simply 'reserves' sectors that are good — taking them away from the true physical storage capacity of the device — and makes them available for bad sector 'remapping,' meaning that a few sectors of each track are reserved for the remapping of bad blocks.

Now, another interesting thing about hard drives is that bad spots on a hard drive can begin to 'grow' and start showing up in places where they weren't detected before. That means that you can't just reserve a fixed number of good sectors ahead of time to replace the actual number of bad sectors on the hard drive. Instead, you must come up with a very good estimate as to how many sectors are likely to go bad over the life of the drive and reserve enough good sectors for those “sure to appear” bad spots as well. Ah… did the light bulb just come on? Now we start to get a glimpse of the crux of the problem — right?

This reservation of sectors on every track to handle bad block remapping obviously results in a reduction of the actual usable space on the hard drive. These new, more-intelligent hard drives report how much 'real' storage space is on the hard drive, taking into account the reserved blocks for error handling. So, once you connect the hard drive to your computer, the 100MB drive that you paid so much money for now actually shows up to the operating system as ~93MB because ~7MB was reserved for bad block handling. Your 100GB hard drive transmogrifies itself into a 93GB drive, while your 1TB hard drive becomes a 931GB hard drive. Are you mad yet? Well you might not have been, but a whole lot of people were hopping mad — mad enough to actually file a lawsuit against Western Digital (a leading hard drive manufacturer) over this very issue (click here to add a little additional color to this matter).

So, the hard drive engineers went to talk to the marketing folks, and the whole thing kind of went down like this:

Engineers: “We have good news and we have bad news…”

Marketing: “Let's have the good news first…” (They are ever the optimists).

Engineers: “Our customers will never see bad blocks ever again on our hard drives… we've fixed all that.”

Marketing: “That's great! So what's the bad news?”

Engineers: “The hard drives are going to show up to the operating system smaller then you want them to be.”

Marketing (now furrowing their brows): “Wait, how much smaller?”

Engineers: “Small enough that you'd better not go out there saying that they are 512MB hard drives any more. They will be more like 498MB hard drives.”

Marketing: “Seriously?! We can't do that. We can't say 498. We have to at least say 510. We always round up. It's the way of our people.”

Engineers: “Well, don't say we didn't warn you…”

Marketing: “Fine, we'll figure it out” (which is code for “we are going to round up anyway,” but they still round the price down from $800 to $799.99, which makes the consumer think it's really only $700 — right?)

However, somewhere on this long dusty road of storage technology, a very smart marketing person figured out that a megabyte doesn't really have to be 1,048,576 bytes. What if it was… say… only 1,000,000 bytes? Now the logic behind this goes as follows: “If we say 100MB, then since the hard drive thinks in powers of two, which means 104,857,600 bytes, this should give us enough room factoring in bad block handling to say: 'Well Mr. or Ms. Customer, you DO actually have an effective 100,000,000 bytes of storage, which in our book is 100MB!'” Based on this reasoning, they went ahead and stuck “100MB” in the ads and on the box. In effect, the marketing folks simply changed the definition of a megabyte from 1,048,576 bytes to 1,000,000 bytes. Voila! Problem solved and off to lunch.

Now those of you who are 'engineering minded' will do the math and say: “Wait a minute, 104,857,600 bytes is still less than the 7MB delta in your previous paragraph, so you still would have less than 100,000,000 actual usable bytes.” That's a clever catch. I kinda hoped you wouldn't notice that, but since you did… I would like to point out that the actual physical storage is still based on cylinders, heads, and sectors, even though these are no longer exposed to the user.

Based on this method of storage organization, a 100MB hard drive would usually have more storage capacity then a perfect 100MB, and oftentimes would hold 107MB or more once the numbers were run. Thus, so redefining 100MB to mean 100,000,000 bytes held sway (this is particularly true as you get to bigger drive sizes).

Now, when we move to the really big hard drives, the problem compounds. As I mentioned earlier, in the case of 1TB hard drive, when you factor in error handling and improper definitions, you might actually end up with a 1,000,000,000 byte drive, instead of the 1,099,511,627,776 byte drive you might have been expecting. Worse still, Windows is going to tell you that the drive you just hooked up to your PC is actually a 953MB hard drive… well, ain't that special? That's a delta of almost 100 billion bytes that you thought you bought no matter how you slice it (assuming you think 1TB is a power of two… which it is). This is a pretty big flippin' deal, which explains why the proverbial fur started to fly.

Enter the IEC
In 1998, this issue was somehow brought to the attention of the girls and boys at the IEC. Perhaps they could smell it coming from 5,823 miles away. At any rate, I suspect that some rather high paid lawyer was looking for some legal justification to use at trial to defend the hard drive marketing folks' declaration that they unilaterally changed the term MB from what we all know to be 1,048,576 bytes to 1,000,000 bytes.

After searching high and low, they came across the folks at the IEC and dared to ask their opinion (this, of course, after asking everyone from the plumber's union to their favorite chefs in San Francisco what they thought). After some highfalutin debate by the appropriately ordained IEC committee, I am quite sure that some engineer (not a computer programmer, mind you) said something like: “Well y'all… like we have defined kilo to mean 10^3 and mega to mean 10^6 and giga to mean 10^9 for like something like 100 years give or take… like, you know… the kilometer, kilogram, and so on… you feel me here?”

I imagine the debate went something exactly like that… led by a quite brilliant California valley girl who went to a very nice engineering college in Alabama and has hip hop leanings (the girl… not the college).

At this point, I am sure the illustrious committee handling this weighty matter agreed and said unanimously: “Yep… you are exactly right our brilliant fearless leader; your logic is flawless. A megabyte is now, hereby, and forever 1,000,000 bytes… (at this point somebody banged a gavel to add weightiness to the proceedings), …so now that's settled, what's for lunch?”

It was at this point that someone else must have spoken up and said: “Wait! We have to come up with a unit that the computer-minded people can use to mean 1,048,576 bytes… after all, we just took away their precious megabyte.” Anxious to get to their sumptuous buffet, they must have not spent too much discussing this particular point, because what they came up with was — and it is hard to type this with a straight face, much less say it with a straight face — the mebibyte, which they also cleverly abbreviated as 'MiB' (silly me; I always thought this was an abbreviation for “Men in Black”).

So what was the rationale for the made up word mebibyte? The IEC folks just took the starting letter of the word 'binary' and jammed it into the word megabyte replacing the 'g' with the 'b' and inexplicably replacing the 'a' with an 'i', possibly attempting some feeble attempt at engineering humor. I mean… perhaps 'mebabyte' would have been better than 'mebibyte'. It's arguable, but any improvement on the term mebibyte would have been… well… an improvement.

As an aside… I am using Microsoft Word to write this column, and it keeps on telling me that 'mebibyte' isn't actually a word. The fact that Word's the spell checker is having fits sort of tells you just how well accepted this silly unit is. But, the IEC didn't stop there; Oh no, they went “all in” and messed up the rest of our venerable units as shown in the following table.

(Source: Rick Tewell)

Do you want to know my take on all this nonsense? Well, 'Kibibyte' sounds like a treat for dogs; 'Yobibyte' sounds like a cartoon character; 'Mebibyte' and 'Tebibyte' sounds like a cartoon character trying to say megabyte and terabyte; 'Gibibyte' and 'Zebibyte' sound like folks who used to mess with the twelve tribes of Israel from the Bible; and don't get me started on what 'pebibyte' sounds like…

So, there we have it, this is the story of how “Money Mangled the Megabyte”(I was going to title this column “How I Met Your Mebibyte,” but for some reason an alliterative title appealed to me more).

Now we have this “official” silly sounding set of units that more and more computery people are thinking we have to use. Worse still, there is an increasing number of people who think a megabyte is 1,000,000 bytes. This maketh me sad. “Viva La Megabyte!!!!” I say; that is, the power of two megabyte… you know, 1,048,576 bytes… Oh, never mind!

8 thoughts on “Money mangled the megabyte

  1. “Thanks for the welcome. I have another blog entry planned for submission soon! Soon, however, is a seriously relative word. Hopefully, I don't mean “soon” in geologic scale…but rather engineering scale – which is different from marketing and / or exe

    Log in to Reply
  2. “Rick, that was fan to read, thanks! nAbout the matter, I believe you lost this battle. The MiB is silly, I agree, I kinda fill embarrassed when I use it, but there's no way we can rollback people's head, such as we'd do with a git repository. There was a

    Log in to Reply
  3. “Alas…I fear you are correct. Battle lost. Makes for a good story though. Apparently, even the venerable Dr. Donald Knuth made an alternative suggestion which was rebuffed. If they didn't listen to him…Then! But hey guess what? A megabyte is 1,048,576

    Log in to Reply
  4. “You suggest that reserved sectors complicated the counting of actual storage space. About that same time, the “sector” changed from 17/cylinder to a variable number per cylinder. Each sector was now the same physical length, so the outer cylinders ha

    Log in to Reply
  5. “Indeed – over time, many tricks were employed to help mitigate / solve the issue. A good friend of mine was responsible for the firmware development team at Maxtor and the techniques / algorithms they came up with to boost storage capacities and handle

    Log in to Reply

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.