In a world where hundreds of millions of terabytes of data are generated daily, the need for cybersecurity is crucial and constant. The data needs to be preserved and transmitted in its original form across several layers of applications, devices, and countries. Hence, an infallible verification method is required to check the data received simultaneously. Checksum is such a method.
But what is checksum?
As an integral aspect of IP protocols, the checksum is a must-know concept for aspiring cybersecurity professionals. This article explores the fundamentals of checksums, the various algorithms associated with them, and their use cases and calculations. This is a glimpse into the knowledge you’ll get in greater depth when you join a solid cybersecurity program .
What is Checksum?
A checksum is an error detection method that utilizes a sequence of numbers and letters to an error during data transmission. It is the value obtained after running an algorithm called a cryptographic hash function and indicates any bits of data lost during transmission. This is crucial for data integrity and security.
For example, if you want to check if a file has been uploaded completely, you can cross-verify the checksum of the original and uploaded files. If there is a mismatch, the concerned parties are alerted to possible data loss or third-party tampering.
Two protocols are typically used to calculate the checksum value—Transmission Control Protocol (TCP) and User Datagram Protocol (UDP). While TCP is used to verify and track data packet transmission, UDP is useful for preventing the deceleration of transmission time.
Why Apply Checksums?
A checksum comprises a sequence of bytes to form a digital fingerprint for data. It can be created for entire files and parts of a file, such as particular data types stored in a cloud. If there is a change in any data, however minute, it modifies the checksum completely.
Thus, it is a foolproof technique to ensure the security of online data transmission. Here are a few reasons to apply them:
- Ensure data integrity during transmission via network, cloud, or hard drives.
- Verify data security during storage on private hard drives, clouds, and organization networks.
- Detect malicious and deliberate tampering and hacking of data stored and transmitted over private and public networks and cloud servers.
- Resolve data loss due to sudden operational glitches in the functioning of the computer or servers, such as a random shutdown or freezing of operation during data copying or transferring.
- Identify incomplete data transfer in case of accidental editing or deleting files during the transmission or process breaks due to IT issues.
- Check possible data modification during storage, especially for shared drives and cloud servers when multiple people work simultaneously on the same file.
- Recover minor errors that may have occurred due to improper process steps.
- Detect data duplication in multiple files stored with different names at different locations.
- Create an inventory of data for future archiving and storage when data from obsolete forms of storage devices such compact discs, records, cassettes, and film reels is ingested and uploaded using newer storage modes.
- Verify and confirm that entities have handed over files in totality without any data modification.
- Assure the security and authenticity of data stored over long periods.
Types of Checksum Algorithms
Multiple checksum algorithms are chosen based on the purpose and application. Here’s a list of some of the most common algorithms:
- MD5 (Message Digest 5)
MD5 is an algorithm that receives an input and produces a checksum of 128 bits displayed as 32 hexadecimal digits. However, it is prone to issues such as collision.
- SHA-1 (Secure Hash Algorithm-1)
Commonly used in digital signatures, SHA-1 generates an output hash value of 160 bits. This checksum is displayed as a 40-digit hexadecimal string.
- SHA-2 (Secure Hash Algorithm-2)
This is a family of algorithms recommended by the National Institute of Standards and Technology (NIST). It consists of:
- SHA-224 and SHA-256 produce checksums of 256 bits (32 bytes) and are presented as a 64-character hexadecimal output.
- SHA-384, SHA-512, SHA-512/224, and SHA-512-256 produce checksums of 512 bits (64 bytes) shown as a string of 128 hexadecimal digits.
- SHA-3 family (Secure Hash Algorithm-3) .
This family of algorithms is based on a cryptographic approach. The major algorithms include:
- SHA3-224 generates a checksum of 224 bits (28 bytes) displayed as a string of 56 hexadecimal characters.
- SHA3-256 produces a checksum of 256 bits (32 bytes) composed as an output of 64 hexadecimal characters.
- SHA3-384 generates a checksum of 384 bits (48 bytes) displayed as a string of 96 hexadecimal characters.
- SHA-512 produces a checksum of 512 bits (64 bytes) displayed as an output of 128 hexadecimal digits.
- CRC ( cyclical redundancy check checksum algorithms)
This family of algorithms uses polynomial division to ascertain the values and is based on cyclic codes. Typically used to identify accidental data modifications, they consist of the following algorithms:
- CRC-16 produces a 16-bit checksum (2 bytes) composed as a string of 4-character hexadecimal digits.
- CRC-32 generates a 32-bit checksum (4 bytes) displayed as an output of 8 hexadecimal digits.
- CRC-64 produces a 64-bit checksum (8 bytes) composed of a string of 16 hexadecimal characters.
Checksum Use Cases and Examples
Checksum plays a massive role in cybersecurity, and its use varies with the specific requirements of each sector. Let’s take a look at some use cases:
A bank employs online onboarding of customers and Know Your Customer (KYC) verification. The customer sends the digital files of their identity proofs. The checksum algorithm verifies that the uploaded files have complete data and there has been no data loss during transmission.
A hospital with patient records employs checksums to verify that the correct test results match the right patient number. This ensures that there is no misplacing of results or communication of the wrong results to the patient and that the diagnosis is correct.
E-commerce websites use checksums to identify the product chosen on their website with its availability in the inventory database. This ensures that the customer receives exactly what they ordered.
The energy sector is security-heavy, as the electricity grids are prone to malicious hacking. Hence, checksums detect any tampering with incoming documents via emails or uploads. Credit card information entered to pay the electricity bills is verified by checksums when it passes through the payment gateways to completion.
An example is using checksums in a hospital to confirm the data integrity during data transfer. A hospital has laboratories and software for tracking patients’ history and diagnostics. Typically, the test results from diagnostic equipment are stored as an Extensible Markup Language (XML) file. A checksum is generated when the file is created using the SHA-1 algorithm.
Next, the data is converted into a CSV file via a validated tool like Microsoft Excel. The tool generates one more checksum for this CSV file using the SHA-1 algorithm. Finally, the SAS program required for the analysis of the CSV file creates a checksum for the file selected to be imported. It compares the checksum to that of the CSV file.
If the two match, the SAS program imports the CSV file and creates a data set. Thus, they confirm that the data generated in the instrument has successfully reached the SAS program and has been converted into a dataset for analysis.
Further details of the process with the use of specific algorithms and the generation of checksums are included in the curriculum of several online resources, such as cybersecurity bootcamps that take you through the nitty-gritty of the process.
What is Checksum: Working Steps
Checksum generation and verification require several steps, as described below-
Step 1: Checksum sender
- The original data is divided into the ‘m’ number of blocks with ‘n’ data bits per block.
- All the ‘k’ data blocks are added.
- Checksum is obtained by complementing the addition result using 1’s complement.
Step 2: Transmission of data
- The checksum value is integrated into the original data bit.
- The transmission of data to the receiver side is initiated.
- The received data is divided into ‘k’ number of blocks.
- The ‘k’ data blocks and the checksum value are added.
- 1’s complement is used to complement the addition result.
- If the result is 0, the data has no errors after transmission.
- If the answer is a non-zero value, the received data has errors. In this case, the receiver asks for retransmission of data.
A prime example of this process is moving a file from one folder to another. The checksum at the receiver end is a non-zero value if the file is open on the desktop. This brings up the prompt to close the file and try again. Once the file is closed and the moving resumed, the checksum produced at the receiver end has a zero value, indicating that the complete file has been successfully moved to the new destination folder.
How to Calculate Checksums?
Following the steps described above, let us look at an example to clarify the concept better.
Suppose we have to implement the checksum method for a data value 10011001 1100010 00100100 10000100.
Step 1: We divide the data into four blocks with 8 bits each. Then, we add them as shown below:
Step 2: We generate the checksum value by performing the 1’s complement on the bit addition result.
11011010 → Checksum value
Step 3: We integrate the original data bit and the checksum value and begin data transmission.
11011010 10011001 1100010 00100100 10000100
Step 4: The receiver will perform the checksum checking with the bit addition and the 1’s complement.
↓ 1’s complement
Step 5: As the complement has zero value, the data has been successfully transmitted.
Checksums are essential to the CIA triad (Confidentiality, Integrity, and Availability), a key concept in cybersecurity. Well-structured online cybersecurity training will dive deeper into such concepts and equip you with the ins and outs of cybersecurity, including the fundamentals, network systems, and system administration, to name a few.
Enroll today to enter the exciting world of cybersecurity!
- What is checksum with an example?
A string of numeric or alphanumeric characters distinguishes one data set from another. For example, the sentence ‘The quick brown fox jumped over the lazy dog’ has a checksum of 08a008a01d498c404b0c30852b39d3b8. However, the addition of a period at the end, ‘The quick brown fox jumped over the lazy dog.’ generates a checksum of 5c6ffbdd40d9556b73a21e63c3e0e904, which is entirely different from the earlier one.
- Why is checksum used?
It is used to verify the data integrity during storage and transmission. It cross-checks the data for bit loss, duplication, tampering, and incomplete transmission.
- How is a checksum calculated?
It is calculated by dividing the data into ‘m’ blocks of ‘n’ bits each, adding the blocks, and then complementing the result using 1’s complement.
- What is the size of a checksum?
The size of a checksum may vary depending on the data. It is divided into smaller bits for addition and complementing before transmission.
- What is the difference between CRC and checksum?
CRC stands for Cyclical Redundancy Check. It is the algorithm used to generate a checksum.