I read up on the history of the ancient convention that 1024 Bytes are called 1 Kilobyte. The problem with the convention is that it’s totally unintuitive unless you know it.
Unfortunately, Microsoft decided to use the following conventions and now the whole world uses it:
- 1 KB = 1024 Bytes
- 1 MB = 1024 * 1024 Bytes
- 1 GB = 1024 * 1024 * 1024 Bytes.
Basically, that is a mish-mash of the ancient 70s convention of using kB for 1000 Bytes and KB for 1024 Bytes, and an abuse of the SI definitions of M and G prefixes. Actually, there is no mB or gB convention, although that would have been logic in the original convention. This is due to the fact that in the 70s – the age of large and expensive computers -, nobody believed that mass storage would actually be achievable at all.
Just assume you never used a computer, ancient UNIX tools or listened to a computer science lecture, or were taught anything about computers. Wouldn’t you expect that
- 1 KB = 1000 Bytes
- 1 MB = 1000 * 1000 Bytes
- 1 GB = 1000 * 1000 * 1000 Bytes?
I filed a bug report against glib, with an historical analysis of the usage of all conventions and formalized nomenclatures in existence (slightly wrong) demanding that g_format_size_for_display() uses the latter conventions. This actually matches IEC recommendations.
One important side-effect of the conventions are:
- K=1000: Memory sticks and main memory cells are made in powers-of-two – because the address line uses binary logic (i.e. powers-of-two). Historically, their size is advertized with K=1024 to get nice, non-fractional values. Below the 1 GB limit, they were probably advertized with kB rather than KB – but that shouldn’t be relevant anymore. With K=1000, on your computer screen memory (and memory sticks) shows up LARGER than advertized.
- K=1024: Hard disks do not have such cell architectures, and they are advertized with K=1000. It was some kind of marketing trick in the very beginning, making the disk look larger than you expect, when you set K=1024 as old-fashioned “IT geek”. The effect is that with K=1024, on your computer screen hard disks look SMALLER than advertized.
Compare for yourself: Which of the two statements is positive, psychologically:
- In contrast to Windows, under Linux my 70 GB hard disk has 70 GB as advertized, and my 1 GB memory sticks grow to 1,07 GB
- Like under Windows, under Linux my 70 GB hard disk shrinks to 65,1 GB and my 1 GB memory sticks have 1 GB as advertized
Wouldn’t it also be nice to have a 100 MB file with 100 * 1000 Kilobytes? No more calculator I/O or right-clicking required for estimating the “actual” size in byte units!
I am mostly writing this blog entry to get some feedback from our users, rather than from programmers. Please also mention your background in your blog comments! Further concrete information regarding historic conventions and IEC and SI standards is available in the bug report mentioned above.
Also note that I do NOT demand to use the additional odd KiBi, MiBi, GiBi IEC convention that in fact make the current situation worse by using prefixes nobody knows, still defining Ki = 1024. My guess is that it was just introduced for offering an alternative for traditionalists who probably wanted “some convention with the beloved 1024”. But it is a non-traditional measurement prefix for a traditional concept, which makes it unattractive both for old(-fashioned) traditionalists and young pragmatists.
Update
I removed the possibly intimidating roundhouse kicks against IT community, and somewhat out-of-context IRC log excerpts. Sorry if anybody felt insulted – some certainly did. You can find an interesting collection of opinions and personal backgrounds in the blog comments.
I am a part-time hobby developer but mostly a user and I agree with you. I believe that introducing K=1024 when K has been 1000 since the dawn of time is a punch in the nuts on the consistency and should be stopped.
If 100 Kb of data is equal to 102 400 bytes than should not 100 Km be equal to 102 400 meters and 2 Kg be equal to two thousand and forty-eight grams?
Centi, deci, kilo, mega, giga, etc are already defined! Do not change and confuse!
I am with you, Christian.
It’s not just that people don’t care that K=1024. They don’t care if K=1000 either.
K = 1024, and generally always has done, except for HDs, in my experience. If you want clarity, use IEC units (Kibibytes, Gibibytes, etc.)
In a way it does not matter too much, so not much need to change, and not much harm from changing. But the problem gets bigger as the numbers get bigger, so the sooner we fix it the better.
It is a pain figuring out how to partition a hard disk (100GB disk, i want to install 4 distros. how big do i make the partitions?).
Its a pain when putting photos in folders to go onto a DVD. DVD says 4.7 GB, but if i let a folder get that big it wont fit. I end up doing 4GB, as that seems safe.
I have never had to do a ‘will this application fit in RAM’ calculation. so i dont think it will matter that RAM sizes are not round numbers. (top says that i have 1294884k 🙂 )
using base 10 everywhere would be good. it makes the maths easier. it is consistent with SI.
(I am a PhD physics student, and a nerd)
This has already been solved…
Use the Binary prefix instead….
http://en.wikipedia.org/wiki/Binary_prefix#Specific_units_of_IEC_60027-2_A.2
Geeky users are the ones most likely to complain loudly. Are you prepared for a storm of complaints?
I certainly see your point, and in some ways, I agree. It would be a very bold, daring, and risky move.
And it would confuse the hell out of users. That is, the inconsistency with applications the users are already used to, as well as the conventions used everywhere else in the system (every single application that display file/folder sizes in KB/MB/GB; ls, mc, image viewers, Firefox, etc.).
Tip: saying “GET A LIFE” is rarely the best way to win an argument.
I’m not ‘average user’. I’m rather PU if not a programmer. I prefere the purist and therefore for me:
k = 1000 as defined in SI – consistent with ‘everything’
Ki = 1024 as defined somewhere else
! guess that the default mode should be k = 1000 as it is more positive (bigger HD, spped connection etc.) and additional (for some geeks like me 😉 ) Ki = 1024.
Productive way to solve the problem is to use something else entirely. For instance size of files? How about… pages, minutes, whatever? Use percentages too where applicable. I don’t know why the users really should know about bytes and bits at all.
I totally agree with you in this subject. 1024 is pure insanity and odd-tradition. We should respect the SI, not computer tradition. (On my background: I do some seldom UI and graphics programming (not web), but I mostly try to do other stuff, like movies.)
I am an ASIC design engineer. I use 1K = 1024 on a daily basis. If you talk about sizes of memories all the time, you have to have a shorthand to discuss them with your peers. Saying “1K” or “16K” is a little easier than saying the actual number of bits / bytes. These things are designed in powers of 2, so that’s just a natural shorthand.
On the other hand, I agree that this just confuses your average computer user or consumer. That’s why recent consumer goods are sized like “holds x hours of mp3’s” or “records about y hours of HD video.” The actual size in bytes is meaningless to consumers. I think we should try as hard as possible to get away from showing average users the numbers at all, never mind the distinction of K = 1000 vs. K = 1024.
They should be aware maybe of the actual size of their total memory or disk, but only if they go out of their way to look for it. Other than that, they should be able to judge the relative size of files (disk) and processes (memory). To do that, they need a visual indicator of some kind, or a number. What the number is based on is irrelevant, as long as they can see which files are eating their disk space and which programs are consuming their memory.
So I think you’re pissed off on behalf of the average user, but average users are rarely, if ever, bitten by this. They just don’t care. If they’re smart enough to look at their memory usage, they’ll maybe kill the biggest one. If their disk fills up, they’ll run a cleanup wizard or get a bigger disk. Only pseudo-geeks get bitten by the 1000 vs. 1024 thing.
I’m a normal user. I don’t care about filesizes/memory sizes/disk sizes. _AT ALL_
Well, at least in Spain, in school and high school, kids learn that 1 KB = 1024 bytes instead of 1000 (they also learn to use a bit of Windows usage…), then, I am not sure if making gnome to use 1KB=1000 bytes will be really an improvement :-/
Other option I have seen in some places is use Kb for kilobit and KB for kilobyte
I’ve been thought K=1024 but you are right that this is totally counter-intuitive. So I’m fine with changing this. GF’s opinion was “K means 1000, right?”
My background would be strictly IT but i actually think you have a point. At least interface wise it would probably be nicer to work with 1000 instead of 1024. The head math when dealing with that would be easier.
Talking about the actual “size” of a file is moot anyway, because are we talking about the amount of bytes it is or how many sectors it occupies on disk/cd/flash. I believe windows (and probably *NIX as wel but not sure) shows the size it occupies, which is why stuff shrinks when burned on cd. Which to a normal user must be odd. So i doubt anyone (should) become upset because of using 1000 instead of 1024 in a size way.
However i doubt it will ever get done. Don’t underestimate the grey bearded community that still views gnome/kde as giant useless memory/cpu hogs full of bloat.
Computer Technology Student here and i agree.
(perhaps you should’ve just lef the channel instead of attacking these people though).
Choose battles that have a point and that you have a chance of winning. Or get a life, either one.
just use the IEC 60027-2 standard – it’s the only unambiguous one currently. When you use KiB, MiB, GiB, at least you know for sure what you’re working with.
Using conventions that “nobody knows” is still beter than conventions with overloaded meaning.
If most software starts adopting KiB, MiB etc, there will be be pressure for the rest to conform, and GB will naturally take on the SI meaning.
BTW, for me K=1024 is more natural because of historical reasons. When I started out with computers, K was 1024 everywhere. Disks didn’t matter because all I had was RAM and ROM (measured in K=1024), and tapes, maybe. All the different systems I was exposed to back then used this convention.
Sometimes history and common sense clash. Ignoring either will probably get you nowhere.
Note that the “1024 scheme” got broken with floppy disks manufacturers, advertising 1.44MB 3.5″ disks that were actually 1440KB (with 1KB = 1024B), so 1MB was 1024000B…
I think that the format should be 1000 which ever way you show it. I’m personally sick to death at having to guess what unit is used and I think that we should have gotten rid of the 1024 after we rose above the Mega Bite mark. It’s just a legacy of not wanting to deal with floating points.
I’m mostly a user but I do program, but mostly I don’t have time for that unless I’m working and not in school.
I do think there should be an gconf option for people that are still stuck in there ways.
And regarding the IRC, chill, people don’t have to agree with you all the time.
No that my word means anything to anyone, but I think k-, M-, G-, … should be used as defined in SI, i.e. in powers of ten, but since computers are (usually) binary-based, it makes sense to use powers of two as well – and for that we have KiB, MiB, GiB, … Just set up the conventions (whatever you deem to be the right conventions) and use them everywhere, otherwise they are useless. KB is a misuse, and mB and gB are wierd, that small letters would confuse people even more.
And not that I care much, but for *people* it’s more natural to count in powers of ten, so in gui I think we should use the SI prefixes unless there is a strong reason to use the binary ones.
But definitely I’d prefer it you would rid of JEDEC wherever possible, it’s just confusing.
Murray: it’s not that people don’t care, they just (non-geeks) assume K=1000. The discrepancy is never big enough for them to think something is wrong.
When the HD manufacturer uses K=1000 but their OS uses K=1024, they just assume that manufacturers advertise bigger drives that they are really selling.
No fucking way. Don’t even think of touch my measurement units. 2^10, 2^20, 2^30 were good in the 90s, they’re good now.
I don’t think people care for file sizes or disk sizes at all… At most they want to know whether some file fits on their USB key drive or not. The people who proudly look at the size of their new 500GB drive are the geeks and nerds who already know this 1000/1024 quirk 🙂
a few quotations from the post:
“In contrast, I claim that K=1024 it is just something for geeks and nerds and antisocial people who want to distinguish themselves from the masses by using conventions nobody understands.”
“I quit in grief – and as a social being, I went out. Shaking my head about those guys!”
Are you serious or ironical? Calling someone “antisocial” is not going to get you a meaningful dicussion. Neither will “… who want to distinguish themselves from the masses …”.
I’ve been pushing for some time to have file sizes displayed using powers-of-10, even creating patches for glib and Nautilus to do so. But nobody else seemed interested, since K=1024 seems so ingrained into the programmer mindset. Besides giving one less thing for my non-programmer friends to remember about computers — when Kilo is “1000” or “1024” — it would finally help put to death that silly urban myth about HDD manufacturers.
I do disagree with one part of this post though — memory sizes should be measured with power-of-2 units, using the proper MiB/GiB suffix. In a few non-scientific tests on friends with power-of-10 RAM sizes, the different values + unusual decimal places caused them to think there is a bug in the system monitor. Using MiB lets the displayed numbers match what’s on the box (the only thing users care about).
Therefore, the solution I’ve adopted (and patched into my local GNOME installation) is to have two separate functions for rendering values — one for file/disk sizes (power-of-10), and one for memory (power-of-2).
+1 Martin
(switch to 1000 Bytes = 1 KB, drop JEDEC, offer IEC kibi/mebi/etc. for purists)
fwiw, I didn’t find your blog very convincing though. I think you would have been better off just asking the question “Why don’t we use 1000 Bytes = 1 KB as the default unit instead of 1024?” and let people arrive at the logical & sensible conclusion both of us did. 😉
People like you make Martian probes crash.
When I say gigabyte, I really mean gibibite, I just don’t want to sound like a retard.
Same goes for everyone else.
I agree that people don’t care, 1024 or 1000, people will read 65,1GB as “65G” and probably round that up to 70 anyway, that’s how people have done. It is probable that if you sized the icons proportionally would be better than stating precisely the size, people just want to know if it will fit or not 🙂 .
It’s all part of a secret plan to turn users into programs…. powers of 2 FTW !
Agreed. The SI prefixes don’t change their meanings just because they’re being used on a computer. “Kilo”=”1000”. “Mega”=”1,000,000”, “Giga”=”1,000,000,000”. The end.
On the other hand, the binary measurements do have their uses. It’s useful sometimes, especially when doing low level driver hacking and such like, to have units of measure that are powers of two. This is similar to how particle physicists sometimes measure things in eV, for example – there’s already the Joule, which is the SI unit of energy, but the electron-volt is more useful for what they are doing. However, Kilo- is still Kilo-. If you want binary prefixes, use the Kibi-, Mebi-, Gibi-, binary prefixes.
Yes! Powers of ten please! I have always been confused with the 1024 for kilobyte, and with storage growing over the years it becomes worse and worse… my harddrive is now 1024 x 1024 x 1024 x 1024 bytes… that doesn’t make any sense at all.
I’ve been using computers all my life, so I’ve never had a*personal* problem with 1024 – but it has always felt a bit stupid, and more specifically, it has *always* been awkward to explain both how and *why* to others, no matter if they are non-technical or really good at say math or physics but not programmers. I would love for GNOME to take the lead in putting these things right. Go 1000!
As someone who’s only vaguely familiar with these subjects, I do know that generally a KB is believed to be 1024. I do believe that 1000 makes more sense, and I wish that everyone would use that standard, however. This is what I believe(d) with nothing more than what I read on your blog here.
Having done some research at this point though, and more thinking than I ever really wanted to do on the subject, I think it may be best to stick with 1024. I checked several sites using “KB” and “MB”, and they were assuming 1024. Western Digital’s site does use 1000 for their drives, but at the bottom says so, although not in terms anyone would really care about that isn’t already familiar with the issue. SanDisk also seem to use 1000 for their flash drives. (On their website at least.)
More importantly than what’s technically right or wrong though, what does a user expect and care about? Using 1000 does make HDs sized as advertised, and flash drives (some apparently, at least) larger than advertised. It also makes files listed using 1024 sizes seem larger than advertised when measured in 1000, however. So a file downloaded from the web will be getting stored on a disk that is as-advertised or larger, but the file itself will be taking up more space. Files copied between Windows and Linux will appear larger on Linux. I don’t know about OS X unfortunately, but would love to hear from someone that does.
I think it would be better to be technically wrong but meet user expectations and follow accepted conventions than it would be to be technically right, but introduce possible confusion and appear to be technically worse. I would keep the “wrong” version of KB before continuing to use KB by default but making it 1000.
If this was my decision to make, I would use the “KiB”, “MiB”, “GiB” measurements by default. Many people probably won’t care, and the sizes they’re actually getting will be consistent with their expectations. Those who do notice and care about the odd label may actually look into it and find out what it all actually means. Hardcore geeks will probably already be familiar with it, and I have found places online which measure files using the “GiB” label.
I would love to hear why I am wrong, for the record. Especially if OS X does in fact use 1000 for KB.
Checked sites:
lugradio.org (season four episode one hq mp3 specifically)
download.com
random legitimate torrent site
Hello,
I’m a geek myself and knew about that K=1024 for a long time. Here are my comments:
1) The base 1024 is a hardware detail that should not be known by users.
2) KiB it stupid, nobody knows what it means, not even geeks.
3) Displaying more capacity is better than less, otherwise users will think that the capacity written on the USB Key is lying.
4) I observed that displaying less capacity is understood by some users as a deterioration of the hardware, just like bad sectors, etc…
5) My GF said “AH! That’s the reason!!! euh… no sorry I don’t understand a single word of what you said…”
+1 for K=1000 😉
i’m as user and a person how know many computer-newbies …. 1000! It’s a really natural choise. k = 1000, whether k or K
Why not kilo etc. for HDD and gibibytes for memory?
But more importantly, when writing MB is has to be the powers of ten, not 2.
Otherwise use MiB in the UI!
I doesn’t matter if ‘1024’ is unintuitive, it’s what it is. RAM *has* be measured in powers of 2, as do any storage media that use flash chips (as you rightly state). Hard drives don’t have to be measured in powers of 2, but they should be for consistency with the other two, and saying that they shouldn’t is accepting the marketing gimmick that allows what is essentially false advertising. People that care enough to calculate filesizes starting from bits and bytes (and, by extension, kilobytes) care enough to know how many smaller units are in the larger, and, as with so much else in computing, you just have to learn it (it’s not exactly obvious what a mouse is for just by looking at it).
If it’s really that hard, Google “10 megabytes in kilobytes” for the right answer.
I have a background in physics, and the constant and ongoing online abuse of the SI system bothers me to no end. Also, as Martin correctly points out: capitalization matters!
k is kilo
K is kelvin
m is milli
M is mega
b is bit
B is byte
s is second
S is siemens
For instance if someone is talking about bandwidth, and uses the term “mbps” it’s really difficult to take that person seriously. The “m” is obviously intended to be “mega” but what’s the “b”?
The most elegant solution is of course to start using the standardized binary prefixes (Ki, Mi, Gi, etc.).
yeah, k=1000 should be used! I really don’t see any reason to keep 1024 going.
Also then a terabyte would be 1024^4, which is around 100 gb more than a tb of 1000^4, so as capacities grow every vendor will eventually switch to k=1000.
Also you are probably one of those people, like myself, that should avoid irc.
It’s wrong to redefine an accepted term, regardless of ideological/consistency-fetish issues. It only leads to confusion. It has been done two times — first, by redefining kilo to mean 1024 in the context of bits and bytes. Second, by re-redefining kB to mean 1000B. In the first case there was not too much confusion — everyone learning computers was told it means 2^n and that’s it. Nobody ever used k to mean 1000, so while there was a quirk for newcomers, it only had to be learnt once and from then on they would know exactly what it means. The second case is a lot worse: now nobody can truly know what is meant by kB.
Personal attack: And it’s thanks to people like you, blog poster 😉
1) This wasn’t Microsoft’s idea.
2) It’s a convention. Like assuming log() is based 2 in computer science.
Thank you for fighting for this.
The 1024 thing is so dumb. The number displayed is for humans to read, so why choose to use a system that people are not familiar with? People should not have to adapt to computers. Exposing random technical details to the user is just bad design.
So what if ram has to be measured in powers of two. That is an implementation detail that regular users do not care about. Why would a regular non technical user care that the current way ram is implemented required it to have a size in bytes that is a power of 2?
And yes it is way harder to google “10 megabytes in kilobytes” than it is to move a decimal point in your head. One is a simple placewise conversion that we have been taught since grade school, is really easy to do in your head, and have reinforced throughout our daily lives. And one involves either dividing large numbers in your head, or using some technological means to have something compute it for you.
I’m surprised you just happened upon this issue, this is an old one.
The blunt truth is that 2^10 is a very useful number for storage, not counting overwhelming historical convention. Hard-disk manufacturers are the only ones measuring storage in base 10. Absolutely the only ones.
However, note that network speeds at the device level are base 10. 100 megabits per second ethernet cards work at a theoretical low-level speed of 100,000,000 bits per second (then you subtract the overhead of the various layers). Of course most people (including me) like to see transfer rates in applications in KiB/s.
I’m a fan of using KiB and KB, but I think there is merit in discussion of it, I’m not absolutely sure that’s the best way. I do however think that the idea that using base 10 for everything is logical and the obvious user-friendly choice is in fact the silliest idea in the debate. Everyone’s files would now have a different size than they’ve had on every remotely mainstream operating system in the last 20 years, over what is merely a convention.
If you think users should have an abstraction away from that, perhaps the best one is hiding the filesize altogether, and showing “% of device used” or something.
If your argument is that K=1024 is wrong because people have needed to be taught that, point me to a person who intuitively knew without being told that K=1000 for other things.
Well, just because RAM has to be measured in powers of 2 doesn’t mean that they have to be labeled with MB/GB. Using the proper units will probably not cause alot of chaos. XiB = 1024, XB=1000. People will ask 1 time before they get it. About as many times as they ask why their new harddrive is noway near the size they paid for.
As a user I find the 1K = 1024 convention confusing, but I think that having to deal with different conventions for different programs would be much worse… 99% of the users don’t care if 10M is 10’000’000 byte or 10’485’760 and the ones who care would be really confused/annoyed to see that the size is not the expected one.
My 0.02 $
1000, 1024, 1050, 1012, 989… it doesn’t matter.
What people do care about is that it matches what they expect, i.e. what’s most used outside of GNOME. So when they go buy a camera or device of some sort that can hold X , the system also says that it’s capacity is X .
In fact, I would guess most people aren’t even too interested in knowing their device can hold X most of the time. It is probably more interesting to them how many pictures more they can but on the camera. Or how many percent storage there is left on a disk. Etc etc…
In truth, I find this whole discussion a little bit… pointless. 🙂
But if you have to choose between 1000 or 1024, go with what’s mostly used by the rest of the world. Not what some standard says nor how it has been historically.
You get a gold star if you don’t show any X and instead show what people actually care about. 😉 (And yes, sometime that might be kilo/mega bytes but I’m guessing mostly, it is not.)