Creating Security: Encoding vs. Encryption vs. Hashing

Posted by hambrice on January 9, 2019

I was recently involved in a discussion about the difference between encryption and hashing which made me realize that I didn’t have a great understanding of the difference, and decided to do a little more digging on the topic. As I was searching, I also came upon the topic of encoding, which added a different wrinkle to the idea of when exactly each concept should be used and the benefits of each. While all of these deal with the conversion of data into a different form or scheme, the process and reasoning for their use differs greatly.

Firstly, a brief overview of each of these of topics is helpful. In the case of encoding, while we are altering the form of data, the goal of encoding is typically not for security. Instead, we are simply trying to convert data into a form that is more useful. For example, compression of a file to save memory can be a type of encoding. In this example, we aren’t trying to keep the original information secure or private.

Encryption and hashing are the topics that deal more with security, but the method for achieving that and the end result can differ. Encryption makes use of keys to convert or ‘encrypt’ data into a cipher text. This means that the data can be decrypted or returned to its original form as long as a person has access to the correct key. Hashing makes use of hashing algorithms, such as MD5, that are designed so that the data cannot be converted back to its original form. Hashing algorithms typically should have several features. Firstly, they should be predictable, meaning that given the exact same input, they should return the exact same value. On the other hand, hashing algorithms should avoid collisions and provide unique outputs, meaning that even a slight change to the input should generate a completely different response. More importantly, hashing algorithms should ideally be irreversible, meaning that once the data has been converted, it is impossible to return the output back to its original form.

Ultimately, the big difference between these topics comes down to the process of reversing the change that has occurred. In the case of encoding, we aren’t worried about someone being able to convert the data back to it’s original form, so typically the scheme for conversion is not kept secret. The concept is similar to encryption, with the exception that a component of the conversion of the process is kept secret, which is typically referred to as the key. The idea behind encryption is that without the key, you should not be able to gain access to the original form of the data. In the case of hashing, the goal is that the original data can never be gleaned from the resulting hash.

Once you understand the concepts and differences, it’s a little easier to see their applications. As I mentioned, encoding can be useful, for example, if you’re wanting to compress a file. With encryption, we need the data to be able to be readable, but only to those that we want to access it (by giving those entities access to the associated key), so this could be used for the transmission of sensitive business data. The downside to this is that once you gain possession of the key, you can then access that information. With hashing, there’s no concern of keeping keys private, as the information cannot (in theory) ever be converted back. This is useful for the storage of passwords, as we never want to give anyone the ability to know the original form of the data, yet by using a predictable hashing algorithm, we still always possess the ability to compare inputs to see if they were originally the same.