Algorithms - Encryption/Cryptography
Wiki defines the encryption - "Encryption is the process of transforming information (referred to as plaintext) using an algorithm (called a cipher) to make it unreadable to anyone except those possessing special knowledge, usually referred to as a key".
As long as both sides of communication have the correct cipher/algorithm, they can decode/decipher any message the other sent.
Encryption systems belong in one of two groups:
- Symmetric-key encryption or Private-key encryption
- The encryption and decryption keys are the same.
- Thus communicating parties must agree on a secret key before they wish to communicate. In other words, both ends must know the key before communication can take place. This implies that there is a secure channel to pass the key between the two ends. If a secure channel existed there would be no need for encryption in the first place.
- Secret-key cryptography operates about 1000 times faster than public-key cryptography.
- Data Encryption Standard (DES): The most common Symmetric-key encryption scheme used today.
- Advanced Encryption Standard (AES): The official successor to DES since December 2001.
- However, it is important to note that symmetric-key still plays a major role in the implementation of a Public-key encryption.
- Asymmetric-key encryption or Public-key encryption
- It was first described publicly by Stanford University professor Martin Hellman and graduate student Whitfield Diffie in 1976. Their paper described a two-key crypto system in which two parties could engage in a secure communication over a non-secure communications channel without having to share a secret key.
For a simplified algorithm, How come the private key is not involved in communication? - Simple description:
Suppose I tell you that I have two prime numbers, 3 and 7, and that I want to calculate the product; it should take almost no time to calculate that value, which is 21. Now suppose, instead, that I tell you that I have a number, 21, and I need you tell me which pair of prime numbers I multiplied together to obtain that number. You will eventually come up with the solution but whereas calculating the product took milliseconds, factoring will take longer. The problem becomes much harder if I start with primes that have 400 digits or so, because the product will have ~800 digits. - from An Overview of Cryptography - For another concrete example, How come the private key is not involved in communication?
- Public-key cryptography uses asymmetric key algorithms where the key used to encrypt a message is not the same as the key used to decrypt it.
- Each user has a pair of cryptographic keys - a public encryption key and a private decryption key.
- In public-key cryptosystems, the public key may be freely distributed, while its paired private key must remain secret only known to recipient.
- The encryption key is public: that is, anyone (friend or foe) has access to the encryption key, and can encrypt messages.
- However, only the receiving party has access to the decryption key and thus is the only one capable of reading the encrypted messages.
- Sender don't need to know the recipient's private key, so there is no need for a secure channel to pass a message to the recipient.
- The two keys are different but mathematically related although knowledge of one key does not allow someone to easily determine the other key. In other words, the parameters are chosen so that determining the private key from the public key is either impossible or prohibitively expensive.
- A very popular public-key encryption programs are Pretty Good Privacy (PGP), GNU Privacy Guard (GnuPG or GPG), and Transport Layer Security (TLS).
- RSA: The first, and still most common named for the three MIT mathematicians who developed it - Ronald Rivest, Adi Shamir, and Leonard Adleman.
- Public key encryption also allows some measure of message authentication. If a person encrypts a message with their private key, then recipients can check that it came from the claimed sender by decrypting it with the sender's public key. If the message decrypts successfully then it must have been created by someone who knows the sender's private key - hopefully only the sender.
- Public key encrpytion is generally too slow to be used for extended exchanges, so once the ends of a connection have authenticated themselves to each other, a random symmetric key is generated and sent via public key encryption. This key is then used to encrypt the connection with a faster, symmetric cipher.
- It was first described publicly by Stanford University professor Martin Hellman and graduate student Whitfield Diffie in 1976. Their paper described a two-key crypto system in which two parties could engage in a secure communication over a non-secure communications channel without having to share a secret key.
I use TrueCrypt full-disk encryption on my laptop, makes me feel better when I travel with my laptop. If the laptop is stolen I feel confident they will only get hardware, they won't get the files. I also use Dropbox for backups of some critical stuff, but I don't want my critical stuff stored in the cloud unencrypted, so I use GPG to encrypt it before moving it into the Dropbox folder. Also, I use a program called KeePass to store usernames and passwords to websites. The encryption on the KeePass database itself is pretty tight. As long as I can remember the KeePass master passphrase, I'm not worried about loosing the other passwords. I think anything that you type a password into every day is fairly safe from forgetting, you don't forget a password you type every day. The real risk is when you make an encrypted backup that you won't look at for a year, you'll forget that password for sure. That's where KeePass really comes in handy. For more, visit Gizmodo Chatroom
Cryptography relies on having complex keys to encrypt your data, and obviously those keys should be random. But if you know anything about computers, you know they're horrible at generating random numbers. They just can't do it. Instead, they'll take obscure variables like your computer's clock time, and spin those out into something pseudorandom. If somehow you can find out the variable though, it's not random at all.
Mega, Kim Dotcom's big, flashy new copyright-dismantling file-sharing/storage site with encryption up the wazoo has finally launched. Mega's taking that a step further by adding you to the equation; the way you twitch your hand on the mouse, or how you type out your username will get wrapped into your cryptokeys as well. And those are variables that are unlikely to be traced and damn near impossible to reproduce.
For more, visit Random Data From Your Mouse and Keyboard To Beef Up Its Already Insane Encryption (Updated), Jan 19, 2013
The key in public-key encryption is based on a hash value which is a value that is computed from a base input number using a hashing algorithm. Actually, the hash value is a summary of the original value, and it is nearly impossible to derive the original input number without knowing the data used to create the hash value.
Good hash function should meet the following conditions:
- It should be very difficult to guess the input string based on the output string.
- It should be very difficult to find two different input strings having the same hash output.
- It should be very difficult to modify the input string without modifying the output hash value.
Input | Hashing | Value |
---|---|---|
2012 | Input x 317835289 | 639484601468 |
The example above is just to give the basic idea. However, actually the public-key encryption is much more complicated.
Hash functions
Hash Function | Hash Output Size (bits) | Secure? |
---|---|---|
MD4 | 128 | No |
MD5 | 128 | No |
SHA-1 | 160 | No |
SHA-256 | 256 | Yes |
SHA-512 | 512 | Yes |
More info on hash.
Though encoding and encryption are different, they share the similarities:
- Both transform data into another format.
- Both of them are reversible
In terms of the purpose they are different:
- Encoding
The primary purpose of encoding is to transform data so that it can be consumed properly by a different type of system, e.g. binary data being sent over email, or viewing special characters on a web page (URL Encoding, Base64 for example). The goal is not to keep information secret, but rather to ensure that it's able to be properly consumed.
Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it.
- Encryption
The purpose of encryption is to transform data in order to keep it secret from others, e.g. sending a secret letter that only the targeted receiver should be able to read, or sending a password over the Internet securely. Rather than focusing on usability, the goal is to ensure the data cannot be consumed by anyone other than the intended recipient(s).
Encryption transforms data into another format in such a way that only specific individual(s) can reverse the transformation. It uses a key, which is kept secret, in conjunction with the plaintext and the algorithm, in order to perform the encryption operation. As such, the ciphertext, algorithm, and key are all required to return to the plaintext.
No single algorithm is ideal for all situations but the following general principles apply (from http://technet.microsoft.com/en-us/library/ms345262.aspx)
- Strong encryption generally consumes more CPU resources than weak encryption.
- Long keys generally yield stronger encryption than short keys.
- Asymmetric encryption is stronger than symmetric encryption using the same key length, but it is relatively slow.
- Block ciphers with long keys are stronger than stream ciphers.
- Long, complex passwords are stronger than short passwords.
- If we need to encrypt lots of data, we should encrypt the data using a symmetric key, and encrypt the symmetric key with an asymmetric key.
- Encrypted data cannot be compressed, but compressed data can be encrypted. If we use compression, we compress data before encrypting it.
Here is the list of encryption algorithms.
The most simplest one is an encryption using symmetric key, and the following example shows how to encrypt/decrypt using GPG on linux. GNU Privacy Guard is a Free Software GNU GPLed implementation of the crypto standards OpenPGP and CMS (used by S/MIME). Also known as GnuPG or GPG. Almost all linux flavor have GPG installed by default.
For more info on GPG visit
- python-gnupg - A Python wrapper for GnuPG
- Using GPG to Encrypt Your Data - NASA
- SaltyCrane Blog
- Beginner's Guide to GnuPG
First we have a file, myfile which has simple text "My File", and want it to be encrypted.
$ gpg -c myfile Enter passphrase: Repeat passphrase:
Then, we will have addition file, myfile.gpg
To decrypt the file:
$ gpg -d myfile.gpg gpg: CAST5 encrypted data Enter passphrase: My File gpg: WARNING: message was not integrity protected
It prints out the file content to the screen, but if we want to save it to another file, then
$ gpg --output myfile.decrypted -d myfile.gpg gpg: CAST5 encrypted data gpg: encrypted with 1 passphrease gpg: WARNING: message was not integrity protected
Note that this encryption is using a symmetric cryptographic algorithm, one that uses the same key on both sides. In other words, we decrypt the file using the same passphrase that we encrypt with.
Sender - The one who encrypts a document
Recipient - The one who receives the encrypted document
Here is the simplest description of the process:
- Recipient send a public key to sender.
- Sender encrypt a document using the public key.
- Recipient decrypt the document using a private key.
1. Recipient should generate/export key.
import gnupg # gen_key() gpg = gnupg.GPG(gnupghome='~/gpghome') input_data = gpg.gen_key_input( name_email='receiver@company.com', passphrase='receiver_passwd') key = gpg.gen_key(input_data) # export_keys() ascii_armored_public_keys = gpg.export_keys(key) ascii_armored_private_keys = gpg.export_keys(key, True) with open('mykeyfile.asc', 'w') as f: f.write(ascii_armored_public_keys) f.write(ascii_armored_private_keys)
It will produce a keyfile mykeyfile.asc:
-----BEGIN PGP PUBLIC KEY BLOCK----- Version: GnuPG v2.0.14 (GNU/Linux) mI0EUOU/PAEEAK6BI1nHnMCerOEeS9CpemtEeXNIbJYDw9Jx+THmvmM+FmqUHgRM uSCBnEDT1fxnrtVn+577LYfun4hxQRUJ1hBB2f+2SudOGBVDTqnlXrSLsH1eMtki R/car7f1IOqHYYFN8sqPO3dwLj0HgY0frxDWs4k75EcyB1SYh8mseABnABEBAAG0 QEF1dG9nZW5lcmF0ZWQgS2V5IChHZW5lcmF0ZWQgYnkgZ251cGcucHkpIDxyZWNl aXZlckBjb21wYW55LmNvbT6IuAQTAQIAIgUCUOU/PAIbLwYLCQgHAwIGFQgCCQoL BBYCAwECHgECF4AACgkQ6NgbQaqJprFlEgQAlmFmb4SSOP4TjbWZpsGHvsoW4K7U Mqelhe/nrXxx9YO8QOkErzOQJXNQ9clnPX9T6mKEVuG3drmkJq3guzy997auBhYz ALHgc4dp6wab1G1C9nuGefU+yer0KT4+oMd8DFC3YbClV0TOtT5733lavu77NtiE gfRdt4fQebctkfU= =spP5 -----END PGP PUBLIC KEY BLOCK----- -----BEGIN PGP PRIVATE KEY BLOCK----- Version: GnuPG v2.0.14 (GNU/Linux) lQH+BFDlPzwBBACugSNZx5zAnqzhHkvQqXprRHlzSGyWA8PScfkx5r5jPhZqlB4E TLkggZxA09X8Z67VZ/ue+y2H7p+IcUEVCdYQQdn/tkrnThgVQ06p5V60i7B9XjLZ Ikf3Gq+39SDqh2GBTfLKjzt3cC49B4GNH68Q1rOJO+RHMgdUmIfJrHgAZwARAQAB /gIDAs6GUPxejXNS03rcnPxjG0NaxI7iPjZ9cuR0ohfHLI3b0rc4icYcub9OASSl pXLAXsaovgvwtUR7k4QA1uIhG+Lq/dL2QwPbEtMU9WTtMN+sUDXRVuGN5jnNYEJr Oz+SSPU5aFE5Ls8X/kxPgZbk/b56fQExP8PfdPRyF0ORA5Y5/Df2acFa8glbMXK2 Hfq1Yiv0VF58U9lAibs1nZGedZc2UCpWLqIFqcfGYxGaDnhbK3n1UPYfyy/oUuBA OVl0cofYjXPTIzTXzMI40f6PdpJbE6vVu26lmQ9M65jy4xvcAeyqZcFAkVQJGBb+ 1FWwV4hHCe1/FrKT4fqzlYwvvWzsUFobjpu5qx55Gs2jrGrrOCUpF5GKDZaaYDHW /DQbCP71FGy+kXzEOAy6A39hh9FVpkmXRd3fZswtP2eof1d7AGqYbfpNKsdDjHoG j302OSqLAUlPoRnKnjcOP6p09p0h0MF4dHkYirGeu8YktEBBdXRvZ2VuZXJhdGVk IEtleSAoR2VuZXJhdGVkIGJ5IGdudXBnLnB5KSA8cmVjZWl2ZXJAY29tcGFueS5j b20+iLgEEwECACIFAlDlPzwCGy8GCwkIBwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJ EOjYG0GqiaaxZRIEAJZhZm+Ekjj+E421mabBh77KFuCu1DKnpYXv5618cfWDvEDp BK8zkCVzUPXJZz1/U+pihFbht3a5pCat4Ls8vfe2rgYWMwCx4HOHaesGm9RtQvZ7 hnn1Psnq9Ck+PqDHfAxQt2GwpVdEzrU+e995Wr7u+zbYhIH0XbeH0Hm3LZH1 =DIY4 -----END PGP PRIVATE KEY BLOCK-----
In this example, though the receiver exported private-key file as well just for demonstration purpose,
we do not want expose that info to the outside.
2.1 Sender imports the key from the file just received.
import gnupg from pprint import pprint gpg = gnupg.GPG(gnupghome='~/gpghome') key_data = open('mykeyfile.asc').read() import_result = gpg.import_keys(key_data) pprint(import_result.results)
This will print out:
[{'fingerprint': u'629531E4B2A447D97A007C7BE8D81B41AA89A6B1', 'ok': u'0', 'text': 'Not actually changed\n'}, {'fingerprint': u'629531E4B2A447D97A007C7BE8D81B41AA89A6B1', 'ok': u'16', 'text': 'Contains private key\nNot actually changed\n'}]
2.2 The sender can encrypt a document using the public key from the receiver.
import gnupg gpg = gnupg.GPG(gnupghome='~/gpghome') open('my-unencrypted.txt', 'w').write('This will be the content of the file') with open('my-unencrypted.txt', 'rb') as f: status = gpg.encrypt_file( f, recipients=['receiver@company.com'], output='my-encrypted.txt.gpg') print 'ok: ', status.ok print 'status: ', status.status print 'stderr: ', status.stderr
If it ran successfully, we get the following message:
ok: True status: encryption ok stderr: [GNUPG:] BEGIN_ENCRYPTION 2 9 [GNUPG:] END_ENCRYPTION
So, the sender generated my-encrypted.txt.gpg from my-unencrypted.txt.
3. Now, the receiver can decrypt the file using his private key.
import gnupg gpg = gnupg.GPG(gnupghome='~/gpghome') with open('my-encrypted.txt.gpg', 'rb') as f: status = gpg.decrypt_file(f, passphrase='receiver_passwd', output='my-decrypted.txt') print 'ok: ', status.ok print 'status: ', status.status print 'stderr: ', status.stderr
If successful, the message should look like this:
ok: True status: decryption ok stderr: [GNUPG:] ENC_TO E94624CF0A1E0A55 1 0 [GNUPG:] USERID_HINT E94624CF0A1E0A55 Autogenerated Key (Generated by gnupg.py)[GNUPG:] NEED_PASSPHRASE E94624CF0A1E0A55 E94624CF0A1E0A55 1 0 [GNUPG:] GOOD_PASSPHRASE gpg: encrypted with 1024-bit RSA key, ID 0A1E0A55, created 2013-01-03 "Autogenerated Key (Generated by gnupg.py) " [GNUPG:] BEGIN_DECRYPTION [GNUPG:] PLAINTEXT 62 1357202776 [GNUPG:] PLAINTEXT_LENGTH 36 [GNUPG:] DECRYPTION_OKAY [GNUPG:] GOODMDC [GNUPG:] END_DECRYPTION
Finally, receiver decrypted the document and got my-decrypted.txt.
Note: among the two key identifiers, the passphrase was used only on recipient side for key generation and decryption.
How can we detect the encrypted file? I mean, how do we know if a file has been encrypted or not?
Here is the python code doing it for us. It's using unix command file profile --mime -b. As one of the sample types of file, I included plain text/encrypted/decrypted/pyc files:
import subprocess files = ['myfile.py','encrypted_myfile.gpg','decrypted_myfile','myfile.pyc'] for f in files: mimetype = subprocess.Popen(['file', f, '--mime-type', '-b'], stdout=subprocess.PIPE).stdout.read().strip() print "mime-type of %s = %s" %(f, mimetype)
Output from the run is:
$ python mimeType.py mime-type of myfile.py = text/plain mime-type of encrypted_myfile.gpg = application/pgp mime-type of decrypted_myfile = text/plain mime-type of myfile.pyc = application/octet-stream
Note: depending on the system, we may have to use just --mime instead of --mime-type.
This is a simplified algorithm of Whitfield Diffie and Martin Hellman's dual-key (public key) published in 1976. Short answer is that we can encrypt and decrypt with the same key without telling the other side what the the key is. That's because after some manipulation of prime number and mod operation (%), both sides can come up with the same key, and no need to share that information.
OK. Let's start.
- First of all, each side should agree with the two numbers: a base number and a big prime number.
- Then, each side should keep a secret (private) number.
- Suppose, the base = 3, the prime = 47.
- Receiver's private number = 4, and sender's = 8.
- Algorithm is like this:
The info they should send to the other side = base^private % 47 - So, the number from the receiver to the sender = 3^4 % 47 = 34.
- Likewise, the number from the sender to the receiver = 3^8 % 47 = 28.
- Then, after the following operations, they will have the same key number.
- The formula = (number they got from the other side)^private % 47.
- Receiver: 28^4 % 47 = 37.
- Sender: 34^8 % 47 = 37.
- Wow, they got the same number 37!
- That's the key that they can use to encrypt/decrypt.
- They do not have to inform that key to the other side!
- This is the primary advantage of public-key encryption over private-key encryption.
- Here, in this example, we used a very small prime number, 47. But we need to use huge (hundreds of digits of) prime number. With modulo 47, the key is between 0 - 46, and anybody can hack it.
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization