Turbo TrueCrypt 5.0 - AES in assembler
Â
Two days ago the long-awaited new version of TrueCrypt has been released. The new version supports pre-boot-authentication and is available for Windows, Linux and Mac OS X.
The developers claim that read/write speed was improved by up to 100% compared to the previous version 4.3. In this blog I will describe how to get an additional performance improvement by replacing TrueCrypt’s AES implementation (which is C code) by a highly optimized assembler version.
TrueCrypt uses the AES implementation from Brian Gladman. On his home page you can download the full source code. The archive contains a C implementation and assembler implementations for the Intel x86 and AMD x86-64 architecture. To compile the assembler sources you will need the YASM assembler which can freely downloaded here.
The good news is that the AES-ECB encrypt and decrypt functions have the same interface as the C version which is used in TrueCrypt. So you don’t need to adapt the interface to get it working :-).
Lets get hands on! You will need the following compilers and development kits in order to compile TrueCrypt for Windows:
- Microsoft Visual Studio 2005 with SP1
- Microsoft Visual C++ 1.52
- Windows Driver Development Kit (DDK) Vista Build 6000
After installing the compilers and the Vista DDK, you will need to set two environment variables:
- ‘MSVC16_ROOT’ should point to the installation directory of MS Visual C++ 1.52
- ‘WINDDK_6000_ROOT’ should point to the installation directory of the Vista DDK
Download the Windows version (with CR/LF line endings) of the TrueCrypt source code here.
Unzip the ‘TrueCrypt 5.0 Source.zip’, open the ‘TrueCrypt.sln’ solution file in Visual Studio 2005, select ‘All’ as the active solution configuration and build the project.
As said before, to compile the AES assembler implementation you will need the YASM assembler. Unzip Brian Gladmans AES implemenatation and assemble ‘aes_x86_v1.asm’:
yasm-0.6.2-win32.exe -f win32 aes_x86_v1.asm
There is also a 64 bit implementation ‘aes_amd64.asm’ provided in the package. I did not build an optimized TrueCrypt 64 bit Windows driver yet. I also did not try to compile the Linux or Mac OS X versions. The steps are very similar and YASM is available for other platforms and does support other object formats. You simply need to specify a different object format by a yasm command line switch (e.g. -f win64 to build a 64 bit COFF object from the ‘aes_amd64.asm’ listing).
Rename the ‘aes_v86_v1.obj’ to ‘aescrypt.obj’. Copy this file into your TrueCrypt source code directory under ‘TrueCrypt/Crypto/Release’. Please note that you will need to overwrite the existing ‘aescrypt.obj’ which was built before from C code.
If you now build the solution in Visual Studio, it will link the new AES object file into the ‘TrueCrypt.exe’. Please note that the new object file which is under ‘TrueCrypt/Crypto/Release’ will only be used when creating TC volumes, for benchmarking the performance and for testing the code against the testvectors. If you run the freshly build ‘TrueCrypt.exe’ and do the benchmark you will see a performance increase of around +15% to +40%, depending on your hardware.
However, we want this performance in the TrueCrypt driver – therefore we will need to compile the ‘aes_v86_v1.asm’ for the TrueCrypt driver.
This task is a bit more complicated: You will need to modify the assembler source code to change the calling convention and name mangling to __stdcall. Open the ‘aes_v86_v1.asm’ and search for:
; AES Encryption Subroutine
do_name _aes_encrypt
Change the function name in the above code into _aes_encrypt@12:
; AES Encryption Subroutine
do_name _aes_encrypt@12
At the end of this subroutine you will see the following code:
add esp,stk_spc
do_exit
Replace the ‘do_exit’ with ‘ret 0ch’:
add esp,stk_spc
ret 0ch
You need to do both changes for the ‘aes_decrypt’ subroutine, too.
After doing this, you can rebuild the ‘aes_v86_v1.obj’ with yasm:
yasm-0.6.2-win32.exe -f win32 aes_x86_v1.asm
Unfortunately, YASM doesn’t provide a switch which will add a linker comment to the COFF object, telling the linker that this file uses SAFESEH (safe structured exception handling). The ‘truecrypt.sys’ driver is linked with SAFESEH, so the linking step with the newly created COFF object will fail.
Rename the new ‘aes_x86_v1.obj’ to ‘aescrypt.obj’. Copy this file into your TrueCrypt source code directory under ‘TrueCrypt/Crypto/obj_driver_release’. Again, you will overwrite the existing ‘aescrypt.obj’ C version with the assembler version.
I couldn’t find a specification which explains how the information that an object uses SAFESEH is reflected in the COFF object. So, my solution to get the stuff linking was to disassemble the new ‘aescrypt.obj’ using IDA Pro in order to convert the YASM code into MASM syntax. Then I used ML.EXE (MASM), which is included in Visual Studio 2005, to assemble the listing generated by IDA Pro. MASM supports a /SAFESEH switch, which will add this information to the COFF object:
ml.exe -c -safeseh aes_x86_v1.asm
Rename the ‘aes_x86_v1.obj’ to ‘aescrypt.obj’ and copy it into the ‘TrueCrypt/Crypto/obj_driver_release’ directory.
You can now build the solution once more and it should link the optimized AES implementation and create a new ‘truecrypt.sys’ driver. You will need to overwrite the ‘Truecrypt.exe’ and ‘truecrypt.sys’ driver in the TrueCrypt’s program files folder with the latest version. You will also need to replace the driver in the ‘Windows/system32/drivers’ directory.
That’s it! Simply reboot your machine and enjoy the improved AES performance in TrueCrypt.
You can download an archive which contains the modified object files here.
Enjoy!
53 Comments so far
Leave a reply
Many people already asked me for a precompiled version
Unfortunately I’m not allowed to distribute the modified version of TrueCrypt. I respect their license, so I will not offer any precompiled binaries for download. And yes, I already asked the TrueCrypt foundation to use the assembly version from Brian Gladman, but so far I didn’t get any response.
So, if you want to take benefit of the opimized AES implementation, here are 3 solutions for you:
If you plan to use DiskCryptor you might want to wait for the next version. I found a bug in the current 0.2.5 beta sources (buffer overflow when creating dynamic AES code) and already informed the author.
Thx! It worked nice. 100 MB Buffer with “normal” TrueCrypt: AES 56 MB/s
and the ASM-Version: 92,9 MB/s
That’s really heavy!
At first I couldn’t believe it:
http://www.webtemp.org/images/tc5.png
My notebook hard drive. You can actually *see* the encrypted system partition.
Now with your modification:
http://www.webtemp.org/images/tc5asm.png
Heavy speed.
and my 3.2Ghz C2D is… TWOFISH.
But the fastets original TC algo on my 3 Ghz AthlonXP (wtf?)
The slowest is SERPENT (TC w/o cascades) and the one which I never use is… AES. Slower then Twofish (256bit) and only with “sufficient” security (NIST). Twofish and Serpent have “very high” security.
Have you also find pretty asm-versions of Twofish? I mean… trustable :-\ Gladman is a one. And you are perhaps a one
Please dont forget guys. The most weeknes in crypotgraphy is not the method, but the IMPLEMENTATION!
Truecrypt 5 isn’t faster at my E6600 than Truecrypt 4.
Really fast is DiskCryptor http://freed0m.org/?index=dcrypt (russian), http://www.google.com/translate?u=http://freed0m.org/?index=dcrypt&langpair=ru%7Cen&hl=en&ie=UTF8 (google translation)
It is compatible to Truecrypt 4.3
Original Truecrypt 5:
Enc: AES 66,0 MB/s Dec: 54,7 MB/s Mean: 60.3 MB/s
Modified Truecrypt 5:
Enc: AES 82,0 MB/s Dec: 81,7 MB/s Mean: 81,8 MB/s
Could you please post the listing generated by IDA Pro?
Wow that is really a very usefull enhancement of TC - thanks for your work!!!
Can I use the truecrypt.sys for 64bit Vista? Guess I don’t. Could you make a precompiled 64bit .sys, too? Would be very appreciated.
I was searching for “Microsoft Visual C++ 1.52″ and saw it’s a very old compiler. I pointed the environment variable to the C++ compiler in VS2008, but I get an error during compiling then. Can you tell me if I really need this very very old compiler?!
Grazie tante!
TC5 - normal:
http://img352.imageshack.us/img352/752/beforehu7.bmp
TC5 - mod:
http://img183.imageshack.us/img183/6207/aftercb2.bmp
Original TC 5:
Enc: 80,2MB/s Dec: 65,4MB/s Mean: 72,8MB/s
Modified TC 5:
Enc: 98,3MB/s Dec: 99MB/s Mean: 98,7MB/s
[E6400 @ 2800MHz, 2GB RAM @ 875MHz]
Thanks for this modification!
@Frank:
You will need Microsoft Visual C++ 1.52 to compile the full TrueCrypt project. VC++ 1.52 includes a 16 bit compiler and assembler for DOS. This is required in order to compile the boot sector and loader.
However, it is not required to compile the drivers or main application. Simply remove the boot loader from the project and you will not need it.
TC 4.3 - 68.2/55.3
TC with AES-ASM - 75.1/73.2
DiskCryptor 0.2.5 - 111.4/108.6
would be great if someone could give a tutorial for doing this on linux..
or provide a precompiled package for linux..
thanks and greetz
@iceman:
aes.h
#ifdef ASM_CRYPTO
typedef void (fastcall *aescode)(unsigned char *in, unsigned char *out);
#define aes256_decrypt(in, out, key) ((aescode)(key)->dk_code)(in, out)
#define aes256_encrypt(in, out, key) ((aescode)(key)->ek_code)(in, out)
#else
typedef void (*aescode)(unsigned char *in, unsigned char *out, aes256_key *key);
void aes256_decrypt(unsigned char *in, unsigned char *out, aes256_key *key);
void aes256_encrypt(unsigned char *in, unsigned char *out, aes256_key *key);
#endif
@Peter:
Wow … code is dynamically generated. The AES round-keys get injected in dynamically generated code.
gen_cipher_coder() is some real russian hacker code: 0×4567xxxx 0×1234xxxx
Yes, this should improve performance.
Hi,
is there an optimized 64 bit version around? I downloaded your 32 bit and it works fine on my wife’s notebook, now I need something for my home pc :]
I tried a 10 MB benchmark using TC 5:
(My machine: Core 2 Duo, 2GHz)
asm version:
encr: 65,2 MB/s
decr: 64,9 MB/s
c implementation:
encr: 53,0 MB/s
decr: 44,5 MB/s
Hi,
excellent contribution! Performance boot averages about 20-40% (for instance 80 MB/s instead 60MB/s). Unfortunately I am not a C++ programmer and don’t have a C++ compiler. But if you would continue your work and adapt the assembler aes code even to future versions of truecrypt, your site will become very valuable and important for everybody - specially if privacy becomes more and more important!
Thank you very much and best regards,
Jost
Thank you very very much
Is it (pleeeasssssse %-) possible,
that you compile a 64bit Version für Vista?
Thanks in Advance
Great Job!
Speed is crucial if one intends to encrypt the system partition, so for that it is indeed very valuable.
However, I do think that TC programmers have a valid point to weigh legibility over performance.
CPU: CoreDuo T2300 @ 1.6GHz
(tested with 200MB Buffer Size)
“c version”:
Enc.: 38,1 MB/s
Mean: 41.8 MB/s
“asm version”:
Enc.: 55,8 MB/s —> ~46% improvement!
Mean: 55,9 MB/s —> ~34% improvement
(TwoFish; Mean: 40,4 MB/s)
Thanks man I gonna test it this weekend when I’m reinstalling my System (and a BSOD won’t hurt).
I already have done some testing with your precompiled 32bit binaries and can confirm that you indeed did a great job, everything is working flawlessly!
You might be interested in the fact, that I used your precompiled binaries for a massive full system drive encryption (300GB Sata Raid 0). I got no errors at all, everything ist working as it should be!
I was a little nervous as I ran the encryption (glad I had some balls in my pants and ran it anyway, because encrypting the drive worked flawlessly
Next thing I did, as the encryption finished: I fired up some huge file copy testing including huge amount of data (15- 17 GB of data, copied several times vice versa).
Therefore I used an external drive, that was encrypted by using a triple cascade (done a while ago through TC 4.3a). First thing I noticed was a huge speed improvement of the external drive that had the old encryption! Apart from that, even the old TC 4.3a AES encryption seems to be 100% compatible with your assembly version. The copying tests also turned out to be not a problem at all.
By far the most impressive thing the aes speed of my system drive: Using the regular TC binaries, I got a maximum mean speed of 57 MB/s. Now, using your ASM version of the AES, it’s 100 MB/s!
Been using your precompiled binaries for about two days now and did not have a single problem!
Excellent job you’ve done mate, congrats on your assembly version of the aes algo!
Thank you for your effort and please: Keep up the good work for upcoming versions of TC!
One strange thing you should also know about: If I create a rescue disk using your precompiled binaries, it has a lot of data missing if I do a hex compare with rescue disks made by the original TC version!
Maybe you should have a closer look at this by ourself!
Thats the only strange thing I discovered so far, everything is working great!
Thanks again - You rock (really I mean it)!
I do get 80MB/s with the ASM implementation on a AMD64X2 4400, which is 100% speed increase BUT only with Buffers > 5MB. For 1MB (and smaller) the ASM version is slower than the C version from vanilla Truecrypt, atleast on my machine. If someone cares to check on his machine I would be very interested in the results.
The Question is, which Buffersizes are actually relevant in daily Truecrypt usage?
Cascades with AES seem to be faster though, even with small buffer sizes.
Yes, great job brother, my System has an improvement of nearly 300% (34mbs to 96mbs). Don’t know why, but the two cores are much less used, and there are no peaks anymore. I wish i had that over a year ago. My movies were never fluent, sometimes there came some “hicks” when the peaks appear and the buffer ran out. Thats history, thanks to you.
Ich schreibe nicht wirklich oft Kommentare, aber dein Blog ist einfach klasse
Hoffentlich folgen noch weitere Beitraege
viele gruesse, simon
Just reinstalled windows XP x64 and the AES encryption is running. Benchmark with 5 MB said 120 MB/Sec. I let you know when I experience any problems with it…
Great,
it’s really fast - without problems. Will there be a precompiled version with this “turbo” of TrueCrypt 5.0a?
@ iceman:
TC 5.0a is out. Would be great to see your updated binaries. Thanks again!
Same here, I need the “new” x64 binary then. Tank you
Thanks, didn’t see it
@Br0ken
Did the rescue-disk work?
@sluggish:
I suppose it’s working now from what I can say doing a rough hex compare between 5.0 and 5.0a ISO … A lot of data has changed, looks like 5.0a Rescue Disk is much
more complete now!
I’ll run some tests on it in ~ 2 hours!
Hello There,
Does anybody have a good Linux-Howto?
Thanks!
Rescue Disks created by TC 5.0a are working flawlessly now!
But I got another problem, if someone please could help me out: I’m looking for WDK 6000 to compile TC 5.0a by myself, but the only thing I can find is 6001 which gives me errors if I try compiling the drivers in mvs 2005 …
WDK 6000 download is gone at the microsoft website, anyone got a clue where to still get it?
I don’t want to take away your pleasure with this version, but what you are doing is so dangerous.
TrueCrypt normally operates in XTS mode. Thanks to your version, it is now ECB, the worst of all modes.
http://en.wikipedia.org/wiki/Electronic_code_book
“In some senses, it doesn’t provide serious message confidentiality, and it is not recommended for use in cryptographic protocols at all.”
There is no plausible deniabilty at all with this version. Same plaintext blocks will be stored as same ciphertext blocks.
It was nice, that you wanted to help, but your modification is affecting security very servere.
@Dangerous:
The modified version still operates in XTS mode. Any block-cipher operates in ECB (Electronic CodeBook).
You have a certain key-size and a certain data-block-size and it performs a N-bytes in -> N-bytes out operation.
A block-cipher mode of operation (e.g. CBC, CFB, OFB, LRW, …, XTS) is working on top of the ECB operation of any block-cipher (e.g. Rijndael=AES, Serpent, Blowfish, DES etc).
So, this is still valid for the modified version of TrueCrypt. If it wouldn’t be like that, why could you mount an existing TC-Container with the modified version of TrueCrypt?
The change does not affect security in any manner - it just improves performance significantly by using a faster AES operation.
Optimizing the XTS-operation would also increase the performance a bit, but you can’t save as much cycles as on the AES operation itself.
Hope this explains how things are working
@Broken:
Which errors do you get when compiling the project? I don’t think the error is related to the WDK version you are using.
Hello, can you write a Howto for Linux and OSX Leopard?
Or if any Howto exist can you give us the URL.
Thank you, very great job!
@iceman:
No problem at all using WDK 6001, it was my fault
Source code only needed minor modifications to be compiled properly using MVS 2005 SP1 and WDK 6001 under Windows Vista.
PS:
You’re absolutly right on what you wrote about XTS mode!
BTW:
Why do you recommend using Gladman’s new assembly version of the AES if TC 5.x has to be compiled under MVS 2005 SP1?
It’s pretty much the same source code, with one exception:
Gladman Source Code (16-04-07) (MVS 2005)
section .text align=32
Gladman Source Code (17-01-08) (MVS 2008)
section .text
As far as I know:
“section. text” is alligned to a multiple of 32 by this command. Please have a look at it and explain if/ how it may have affect on the application’s stability!
“Beg the developers of TrueCrypt to link against Brian Gladmans AES implementation and hope that they change their mind.”
There doesn’t seem to be much chance of that happening. Whole threads are being deleted from the Truecrypt forums where folks dared to even discuss such an idea.
iceman, i not informed of buffer overflow.
buffer overflow when creating dynamic AES code is not present. Decryptor code size as 3572 bytes, encryptor size as 3596. Size of code buffer as 4096 bytes.
However, there are a few bug in line “a = p32(s)[0]; b = a & 0xff;”. This code read 3 bytes of data after static data array. This is not critical bug because it is not effect on program working. My AES implementation is passed testing on any AES test vectors and fully compatible with TrueCrypt encrypted volumes.
This small bug will be fixed in next version.
@ntldr:
I emailed you about gluc in sys/aes.c in gen_cipher_code() function a couple of days ago.
You are using do/while loop until srclen is 0. You are right about buffer size and input length: buffer size is 4096 bytes (1 page), srclen is only 3572/3596 bytes.
However, when you find magic 0×4567 or 0×1234, you move dst and s pointer by 4, but you don’t decrement srclen by additional 3.
This way you are copying too much: If I’m not wrong, (16 + 60) * 3 bytes = 228 bytes are copied too much. This is still less than 4096 bytes, so I agree that this bug is not severe.
However, I implemented your code for some tests in TrueCrypt and found this buffer overflow (my buffer size was sizeof encryptor/decryptor and not 4096 ;-)).
To get it working I changed the code to perform while (s < sEnd) and calculate sEnd before looping.
This my fixed code:
static void gen_cipher_code(u8 *dst, void *src, int srclen, u32 *rk)
{
u32 a, b;
u32 off;
/* copy AES code to code buffer */
memcpy(dst, src, srclen);
/* patch round keys and all AES tables references in code */
for (off = 0; off > 16)
{
case 0×4567: /* patch AES table reference */
ppv(dst + off)[0] = rel_tab[b];
off += sizeof(void*);
break;
case 0×1234: /* patch round key */
p32(dst + off)[0] = rk[b];
off += sizeof(void*);
break;
default: off++;
}
}
}
The SAFESEH stuff is very trivial: Just add NO_SAFESEH=1 to all SOURCES files.
Oh, and one more thing:
do_exit already specifies the ret %x code, you just need to add the define DLL_EXPORT to the yasm/nasm command line.
Even further, why not use aes_x86_v2.asm and adding appropriate definition in aesopt.h?
Oh, just one last thing: The TrueCrypt license does indeed allow you to distribute the modified version, but you’re required to remove all references to the product name “TrueCrypt”. That is, you just need to make the GUI print a different name (f.e. “FastCrypt”), and in the Readme you should mention that it’s based on TrueCrypt, but one shouldn’t harass their developers with questions/support/bugs.
I’ve now come so far to have replaced the Serpent implementation with Gladman’s implementation, and removed all the legacy stuff (BlowFish, DES, 3DES, SHA1, CAST, AES-LRW). Including various optimizations, the driver is still down to 171 KB, and of course amazing speed. I also included some fixes for some errors (well, even vulnerabilities) which I’ve reported to the developers.
“I’ve now come so far to have replaced the Serpent implementation with Gladman’s implementation, and removed all the legacy stuff (BlowFish, DES, 3DES, SHA1, CAST, AES-LRW). Including various optimizations, the driver is still down to 171 KB, and of course amazing speed. I also included some fixes for some errors (well, even vulnerabilities) which I’ve reported to the developers.”
You planning to share the goodness (=driver) with us?
@ anonymous / iceman
Interesting to be able to follow the somewhat technical discussion here - may the less gifted hope for a ready made “FastCrypt” version?
Anyone tested “FastCrypt” or is there a download for testing it?
Hi all,
seems that TrueCrypt foundation has implemented AES in ASM in its latest Version 5.1 - so, let’s thank Andreas for kicking off the process!
It seems they included ASM-AES in the newest (5.1) version of Truecrypt … so no need to fiddle with the sources (thank goodness!); but thanks 1000 times for getting the process started!