Section: Bit Manipulations - Bit Count

Section: Bit Manipulations - Bit Count#

This program demonstrates bit manipulations using Fortran.

Summary Code Explanation Explanation by ChatGPT 4o#

The given programs and module (program bitcount and module bitmanip_mod) provide a framework to evaluate and compare multiple methods of counting set bits (1’s) in integers. Here’s a detailed breakdown of their functionality:

Program `bitcount`#

This is the main driver program that performs the following:

Sanity Check:
- Iterates over a range of integers from -16 to 16.
- For each integer, it calculates the number of set bits using four different methods:
  - naive_count_bits
  - early_stopping_count_bits
  - lookup_table_count_bits
  - kernighan_count_bits
- Prints the results of these methods alongside the binary representation of the number using the bit_repr function.
Benchmarking:
- Reads the number of values (nr_vals) from the command line using the subroutine get_arguments.
- Allocates an array vals to store random integers for testing.
- Fills vals with random integers.
- Times each of the four bit-counting implementations (naive, early stopping, lookup table, and Kernighan) over the generated dataset.
- Prints the timing results for comparison.
Subroutine get_arguments:
- Validates and extracts the number of integers (nr_vals) to process from the command-line arguments.
- Provides error handling if the input is invalid.

Module `bitmanip_mod`#

This module contains the bit-counting algorithms, bit representation logic, and utilities for working with a lookup table.

Key Components:#

Lookup Table Management:
- A 256-element lookup table (lookup_table) is used to optimize bit-counting for bytes (8-bit chunks).
- The ensure_lookup_table_initialized subroutine ensures the table is loaded from a CSV file before use.
- initialize_lookup_table reads values from lookup_table.dat using a CSV reader (csv_module).
Utility Functions:
- bit_repr: Converts an integer into its binary string representation.
- remove_trailing_comma_ampersand: Cleans up lines from the CSV file by removing trailing , & for compatibility.
Bit-Counting Methods:
- naive_count_bits:
  - Counts bits by iteratively checking the least significant bit using and(n, 1) and shifting the number (ishft).
  - Iterates a fixed number of times (32 for 32-bit integers).
- early_stopping_count_bits:
  - Similar to naive_count_bits, but exits early when all remaining bits are 0.
- lookup_table_count_bits:
  - Divides the integer into four 8-bit chunks.
  - Uses the lookup table to sum the number of set bits in each chunk.
  - This is faster because it avoids iterative bit operations.
- kernighan_count_bits:
  - Uses Brian Kernighan’s algorithm, which repeatedly clears the least significant set bit (n = and(n, n - 1)) until the number becomes 0.
  - The number of iterations equals the number of set bits.
CSV File Parsing:
- parse_csv_file: Reads a CSV file, preprocesses it to remove trailing artifacts, and populates the lookup table.
- Uses the csv_module library for handling CSV data.

Workflow#

Sanity Check:
- Validates that all methods produce the same results for small integers (-16 to 16).
Benchmarking:
- Tests the efficiency of each method over a large dataset of random integers, highlighting performance differences.
Bit Manipulation Techniques:
- Each method showcases different trade-offs between simplicity, performance, and implementation complexity:
  - Naive: Simple but slow.
  - Early Stopping: Optimized for sparse integers with fewer set bits.
  - Lookup Table: Fast for large datasets, relies on preprocessing and memory.
  - Kernighan: Efficient in terms of operations for sparse integers.
Results:
- Binary representations and bit counts are printed for verification.
- Execution times are reported for performance analysis.

Key Features and Highlights#

Modular Design: The bitmanip_mod encapsulates reusable logic, making the implementation extendable and maintainable.
Error Handling: Comprehensive checks for memory allocation, file reading, and input validity.
Performance Comparison: Benchmarks allow empirical evaluation of different approaches, useful for understanding their practical efficiency.
Lookup Table Preprocessing: Demonstrates efficient preprocessing for rapid bit-counting.

Applications#

This program can be used to:

Analyze and optimize bitwise operations in software.
Serve as a teaching tool for bit manipulation techniques.
Benchmark and compare algorithmic performance for counting set bits.

Detailed Analysis by ChatGPT 4o1: `bitcount.f90`#

Below is an extremely detailed, section-by-section analysis of bitcount.f90, which serves as a driver program to demonstrate and benchmark various bit-counting methods defined in the companion module bitmanip_mod. We’ll examine the program’s flow, purpose of each block of code, and some considerations for performance and design.

1. Program Declaration and Imports#

program bitcount
    use, intrinsic :: iso_fortran_env, only : error_unit
    use :: bitmanip_mod
    implicit none

program bitcount: Declares the main program unit named bitcount.
use, intrinsic :: iso_fortran_env, only : error_unit:
- Imports error_unit, a predefined I/O unit for error reporting from the Fortran intrinsic module iso_fortran_env.
- error_unit is commonly used in lieu of a hard-coded unit number (e.g., 0 or 2) for consistent error output.
use :: bitmanip_mod:
- Brings in public symbols from the bitmanip_mod module. This includes the following procedures:
  - naive_count_bits
  - early_stopping_count_bits
  - lookup_table_count_bits
  - kernighan_count_bits
  - bit_repr
- The bitmanip_mod also automatically handles initialization of its lookup table for the lookup-table-based bit-counting method.
implicit none:
- Prohibits the use of implicitly typed variables, requiring all variables to be declared explicitly.
- Improves safety by preventing accidental usage of undeclared variables.

2. Variable Declarations#

    integer :: nr_vals
    integer(kind=I4), dimension(:), allocatable :: vals
    integer(kind=I4) :: result
    integer :: i, istat
    integer :: naive_count_bits_result, early_stopping_count_bits_result, &
        lookup_table_count_bits_result, kernighan_count_bits_result
    real :: r, start_time, end_time

nr_vals: Holds the number of values (integers) to be processed, as read from the command-line argument.
vals (integer(kind=I4), dimension(:), allocatable):
- An allocatable array that will store nr_vals random integers (kind=I4 is a 32-bit integer type).
result (integer(kind=I4)):
- Will store the cumulative result (via XOR) during benchmarks of the bit-counting routines.
i, istat (integer):
- i is a loop counter.
- istat captures the status code returned by array allocation.
naive_count_bits_result, early_stopping_count_bits_result,
lookup_table_count_bits_result, kernighan_count_bits_result:
- Temporary integers to hold the results from each of the four bit-counting methods for a given integer in the sanity check loop.
r, start_time, end_time (real):
- r: used to store random numbers in the range [0,1).
- start_time, end_time: used with cpu_time intrinsic to measure benchmarking durations.

3. Sanity Check Loop#

    print '(A)', 'sanity check: naive, early stopping, lookup table Kernighan'
    do i = -16, 16
        naive_count_bits_result = naive_count_bits(i)
        early_stopping_count_bits_result = early_stopping_count_bits(i)
        lookup_table_count_bits_result = lookup_table_count_bits(i)
        kernighan_count_bits_result = kernighan_count_bits(i)

        print '(I14, 3A, 4I3)', i, ': ', bit_repr(i), ': ', &
            naive_count_bits_result, early_stopping_count_bits_result, &
            lookup_table_count_bits_result, kernighan_count_bits_result
    end do

Purpose:
- Verifies that all four bit-counting functions (naive_count_bits, early_stopping_count_bits, lookup_table_count_bits, kernighan_count_bits) produce the same result for a range of integers from -16 to 16.
Details:
- Prints a header line: "sanity check: naive, early stopping, lookup table Kernighan".
- Iterates i from -16 up to 16.
- For each integer i, obtains its bit counts via each method and also retrieves its binary representation via bit_repr(i).
- Prints the integer, its binary representation, and the four counts.
- Note the format string '(I14, 3A, 4I3)':
  - I14: prints the integer i in a field of 14 characters (right-aligned).
  - 3A: prints three consecutive strings (in this case, ": ", bit_repr(i), and ": ").
  - 4I3: prints the four integer results in fields of width 3 each.
Significance:
- This “sanity check” ensures the correctness of the implementations on both positive and negative values (showing how negative values are handled, presumably via two’s complement representation).

4. Creating Values for Benchmarking#

4.1 Reading Command-Line Argument#

    call get_arguments(nr_vals)

Calls a local subroutine get_arguments (defined in the contains section at the end of the program) which:
- Ensures exactly one command-line argument is present.
- Reads it into nr_vals.
- Exits with an error if invalid input is provided.

4.2 Allocating the Array#

    allocate (vals(nr_vals), stat=istat)
    if (istat /= 0) then
        write (unit=error_unit, fmt='(A, I0, A)') 'error: can not allocate ', &
            nr_vals, ' elements'
        stop 3
    end if

Tries to allocate the array vals of length nr_vals.
Checks if istat (the status code from the allocation) is not zero, meaning an allocation failure:
- If failed, writes an error message to error_unit with 'error: can not allocate X elements'.
- Terminates the program with exit code 3.

4.3 Filling `vals` with Random Integers#

    do i = 1, size(vals)
        call random_number(r)
        vals(i) = int(r*real(huge(0_I4)), kind=I4)
    end do

Generates a random real number r in [0,1) via call random_number(r).
Scales it to the range [0, huge(0_I4)]—the largest representable 32-bit integer—by multiplying by r.
Converts to a 32-bit integer via int(..., kind=I4).
Stores the result in vals(i).
Goal: Populate a large array of random 32-bit integers for performance testing.

5. Benchmarking Loops#

The program times each bit-counting method by:

Recording a start time.
Looping over all values in vals, applying the bit-counting method, and XOR-ing results into result.
Recording an end time.
Computing end_time - start_time to measure the method’s total execution time.

The XOR (result = xor(result, ...)) is presumably done to prevent the compiler from optimizing away the function calls in a trivial scenario; storing or mixing results ensures each bit-counting call cannot be elided.

5.1 Naive Implementation#

    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, naive_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F15.6)', 'naive implementation ', size(vals), &
        ' iterations: ', end_time - start_time

cpu_time:
- Built-in Fortran intrinsic that returns a process-based CPU time in seconds (implementation-dependent).
- Captures start_time before the loop, end_time after the loop.
Loop:
- Iterates over the entire vals array.
- Calls naive_count_bits(vals(i)).
- XORs the returned bit count with the accumulating result.
Prints: The method name, the number of iterations (size(vals)), and the total execution time formatted in F15.6 (15 wide, 6 decimal places).

5.2 Early Stopping Implementation#

    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, early_stopping_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F15.6)', 'early stopping implementation ', size(vals), &
        ' iterations: ', end_time - start_time

Same structure as above, but uses early_stopping_count_bits.

5.3 Lookup Table Implementation#

    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, lookup_table_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F15.6)', 'lookup table implementation ', size(vals), &
        ' iterations: ', end_time - start_time

Same benchmarking pattern, but with lookup_table_count_bits.

5.4 Kernighan’s Implementation#

    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, kernighan_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F15.6)', 'Kernighan implementation ', size(vals), &
        ' iterations: ', end_time - start_time

Same pattern using kernighan_count_bits.

6. Contained Subroutine: `get_arguments`#

At the end of the file, after the main executable sections, the program uses an internal subroutine to handle command-line arguments.

contains

    subroutine get_arguments(nr_vals)
        use, intrinsic :: iso_fortran_env, only : error_unit
        implicit none
        integer, intent(out) :: nr_vals
        character(len=1024) :: buffer, msg
        integer :: istat

        if (command_argument_count() /= 1) then
            write (unit=error_unit, fmt='(A)') 'error: expecting number of values'
            write (unit=*, fmt='(A)') 'error: expecting number of values'
            stop 1
        end if
        call get_command_argument(1, buffer)
        read (buffer, fmt=*, iostat=istat, iomsg=msg) nr_vals
        if (istat /= 0) then
            write (unit=error_unit, fmt='(2A)') 'error: ', trim(msg)
            stop 2
        end if
    end subroutine get_arguments

end program bitcount

contains:
- A special block in Fortran programs or modules that holds internal procedures.
subroutine get_arguments(nr_vals):
- Purpose: Reads exactly one command-line argument (the expected integer nr_vals).
- Behavior:
  - Checks if command_argument_count() (the number of command-line arguments) is exactly 1; if not, writes an error to error_unit and stops with code 1.
  - Extracts the argument string into buffer via get_command_argument(1, buffer).
  - Attempts to parse buffer as an integer into nr_vals using an unformatted read.
    - If istat /= 0, meaning a read error or type mismatch, prints an error with the iomsg and stops with code 2.
Note:
- This approach is typical in Fortran to parse arguments from the command line (when compiled for systems that support it, which is most modern Fortran compilers).

7. High-Level Flow Summary#

Putting it all together:

Initialization:
- The program starts and immediately imports the bit manipulation module (bitmanip_mod).
- No explicit module initialization is called here; however, if the user calls a bit-counting method that requires the lookup table, bitmanip_mod takes care of loading it on first use.
Sanity Check:
- The program iterates from -16 to 16, computing bit counts via each of the four methods.
- Prints the integer, its binary form, and the counts. This confirms correctness and demonstrates consistency among methods for small positive/negative integers.
Command-Line Argument:
- Reads nr_vals, ensuring the user provided one argument (the number of random values to test).
Allocate & Populate Data:
- Dynamically allocates an array of 32-bit integers.
- Fills it with random values spanning the possible integer range.
Benchmarking:
- For each of the four methods:
  1. Record start time via cpu_time.
  2. Loop over vals, compute bit count, XOR with result.
  3. Record end time.
  4. Print elapsed time.
Exit:
- After printing all four benchmarks, the program ends.

8. Notable Considerations#

Negative vs. Positive Values:
- The sanity check covers negative values. The bit-counting routines will handle them correctly assuming two’s complement. The bit_repr function in bitmanip_mod displays the internal binary representation, including sign bits for negative numbers.
Performance:
- The four approaches (Naive, Early Stopping, Lookup Table, Kernighan) have different performance characteristics. The program’s timing output shows which approach is fastest for a given number of iterations.
result XOR:
- The result = xor(result, ...) line ensures that the function call is not optimized away. If the loop simply called naive_count_bits(vals(i)) without using the result, a compiler might skip or inline it, artificially boosting performance.
- The XOR ensures a dependency on the returned value, making the measurement more realistic.
Scalability:
- The user can pass large nr_vals to create a substantial workload, thus measuring and comparing the methods over many iterations.
Error Handling:
- The program uses stop statements with nonzero codes for various error conditions:
  - 1: Wrong number of command-line arguments.
  - 2: Failed to parse integer from the command line.
  - 3: Failure in memory allocation.
- Messages are written to error_unit to help diagnose problems.

9. Conclusion#

bitcount.f90 is a concise yet powerful driver program that:

Demonstrates correctness for multiple bit-counting algorithms (naive, early stopping, lookup table, Kernighan).
Benchmarks each method across a user-specified number of randomly generated 32-bit integers.
Leverages the companion module bitmanip_mod for all bitwise utilities and the lookup-table feature.

This design makes it easy to compare performance, study different approaches to counting bits, and serve as a template for further exploration of bitwise operations in Fortran.

Detailed Analysis by ChatGPT 4o1: `bitmanip_mod`#

Below is a comprehensive, section-by-section analysis of the Fortran module bitmanip_mod. This module contains bit-counting routines, a mechanism for loading a 256-element lookup table from a CSV file, and several utility routines for string manipulation and bitwise operations. It demonstrates careful management of initialization, file I/O, and the use of public/private scope to organize code effectively.

1. Module Declaration#

module bitmanip_mod
    use, intrinsic :: iso_fortran_env, only : INT32, error_unit
    use csv_module

    implicit none

    private

module bitmanip_mod: Defines a Fortran module named bitmanip_mod.
use, intrinsic :: iso_fortran_env, only : INT32, error_unit:
- INT32 is an integer kind parameter guaranteeing a 32-bit integer (as per the ISO Fortran environment).
- error_unit is a special I/O unit typically used for error reporting.
use csv_module: Imports a third-party or external library providing CSV-related capabilities (presumably the “csv-fortran” library or similar).
implicit none: Disallows implicit typing of variables, ensuring safer and more explicit code.
private: Sets the default accessibility of module entities to private, making them inaccessible to other program units unless explicitly declared public.

2. Lookup Table and Flags#

    ! A 256-element lookup table stored in the module
    integer, public, parameter :: I4 = INT32
    integer(kind=I4), dimension(0:255), private :: lookup_table
    logical, private, save :: initialized = .false.

integer, public, parameter :: I4 = INT32
- Defines a module parameter I4 that is publicly visible.
- I4 is set to INT32, effectively binding the symbol I4 to the 32-bit integer kind. Other routines use I4 to ensure consistent integer typing.
integer(kind=I4), dimension(0:255), private :: lookup_table
- Declares an array lookup_table of size 256 (indices 0 to 255) to store precomputed bit counts or other associated values.
- Marked private, so it cannot be accessed directly from outside the module.
logical, private, save :: initialized = .false.
- A logical flag indicating whether the lookup table has already been loaded from disk.
- Marked save so that its state persists across multiple calls; otherwise, it might not retain its value.

3. Public Interfaces#

    ! Public procedures
    public :: naive_count_bits, early_stopping_count_bits, bit_repr, &
              lookup_table_count_bits, kernighan_count_bits

Declares which routines in the module are accessible externally:
1. naive_count_bits
2. early_stopping_count_bits
3. bit_repr
4. lookup_table_count_bits
5. kernighan_count_bits

Everything else in the module remains private, reinforcing modular encapsulation.

4. Subroutines for Lookup Table Initialization#

4.1 `ensure_lookup_table_initialized`#

subroutine ensure_lookup_table_initialized()
    if (.not. initialized) then
        call initialize_lookup_table()
    end if
end subroutine ensure_lookup_table_initialized

Purpose: Checks the initialized flag. If the lookup table is not yet loaded, it calls initialize_lookup_table.
Key Detail: Ensures that the following bit-counting routines, which depend on the lookup table, do not proceed until the table is valid and loaded.

4.2 `initialize_lookup_table`#

subroutine initialize_lookup_table()
    implicit none
    integer :: ios

    ! Clear the module array
    lookup_table = 0

    ! Call parse_csv_file to fill the local array
    call parse_csv_file("lookup_table.dat", lookup_table, ios)

    if (ios == 0) then
        initialized = .true.
    else
        write(error_unit, '(A,I0)') "Failed to read CSV lookup table, ios=", ios
        stop 1
    end if
end subroutine initialize_lookup_table

Clears the lookup_table array by setting all elements to zero initially.
Calls parse_csv_file("lookup_table.dat", lookup_table, ios):
- Attempts to load the data from lookup_table.dat into lookup_table.
- ios serves as an error code to indicate success or failure.
Sets initialized = .true. if ios == 0, indicating successful initialization.
Stops execution if there is a failure (non-zero ios), printing an error message to error_unit.

5. CSV File Parsing: `parse_csv_file`#

subroutine parse_csv_file(filename, table_data, ios)
    use csv_module
    
    implicit none

    ! Subroutine arguments
    character(len=*), intent(in)                                :: filename
    integer(kind=I4), dimension(0:255), intent(out)             :: table_data
    integer, intent(out)                                        :: ios

    ...
end subroutine parse_csv_file

Purpose: Reads a CSV file filename, processes it, and populates table_data (the 256-element integer array).
Arguments:
1. filename: Name of the CSV file to read.
2. table_data: The 256-element array to store the parsed data.
3. ios: Output integer error code to capture success/failure.

5.1 Local Variables#

type(csv_file)                :: c
logical                       :: status_ok
character(len=255), allocatable, dimension(:,:) :: string_table
integer                       :: nrow, ncol
integer                       :: i, j, idx, ios2, val
character(len=30)             :: trimmed_string

! For pre-processing:
integer                       :: unit_in, unit_out, status_in, status_out
character(len=1024)           :: line, tmpfile
logical                       :: done

c: An instance of csv_file from csv_module.
string_table: A 2D array of strings holding the CSV data read by csv_module.
nrow, ncol: Dimensions of the CSV data.
idx: Tracks how many entries have been written to table_data.
trimmed_string: Used to trim whitespace or extraneous characters before numeric conversion.
unit_in, unit_out: File unit numbers for reading from the input CSV and writing to a temporary file.
line, tmpfile: Buffers for file I/O; tmpfile is the name of the temporary file to generate.
done: A loop control flag (typical for reading files until EOF).

5.2 Preprocessing Logic#

tmpfile = 'tempfile.csv'
open(newunit=unit_in, file=filename, status='old', action='read', &
     form='formatted', iostat=status_in)

Opens the filename for reading on unit unit_in.
Creates a new file tempfile.csv on unit unit_out with status='replace'.
Reads each line from the original file, calls remove_trailing_comma_ampersand to clean trailing , &, then writes the cleaned line to tempfile.csv.

This approach ensures that any extraneous trailing commas or ampersands in the original file (e.g., lines ending with , &) are removed before CSV parsing.

5.3 CSV Parsing Using `csv_module`#

call c%initialize()
call c%read(tmpfile, status_ok=status_ok)
call c%get(string_table, status_ok=status_ok)

Initialize the csv_file object.
Read the temporary file tmpfile.csv, storing its contents in an internal buffer.
Get the data as a 2D string array string_table.

5.4 Populating `table_data`#

nrow = size(string_table,1)
ncol = size(string_table,2)

idx = 0
do i = 1, nrow
    do j = 1, ncol
        trimmed_string = adjustl(adjustl(string_table(i,j)))
        read(trimmed_string, *) val
        table_data(idx) = val
        idx = idx + 1
    end do
end do

Determines the dimensions (nrow, ncol) of the string data.
Iterates over all cells, trimming whitespace.
Reads each cell into an integer val.
Assigns val to table_data(idx) and increments idx.
Since table_data is only 256 elements, this routine assumes at most 256 integer values are needed.

6. String Cleanup Subroutine: `remove_trailing_comma_ampersand`#

subroutine remove_trailing_comma_ampersand(line)
    implicit none
    character(len=*), intent(inout) :: line
    integer :: n
    
    n = len_trim(line)
    ...
end subroutine remove_trailing_comma_ampersand

Purpose: Removes trailing & and , from a line. This addresses cases like:
```
42, &
```
which should become simply
```
42
```
Implementation:
1. Find the trimmed length n = len_trim(line).
2. While the last character is '&', remove it.
3. Then check if the last two characters are ", ", remove them as well.

This makes the file lines more compatible with standard CSV parsing.

7. Bit-Counting Routines#

Each of these functions is declared public and returns an integer bit count or a string representation (in the case of bit_repr). They all call ensure_lookup_table_initialized() first, although for certain algorithms (like naive or Kernighan), the lookup table is not strictly needed. This ensures consistency and that the module’s data is ready.

7.1 `naive_count_bits(n) result(bit_count)`#

function naive_count_bits(n) result(bit_count)
    integer(kind=I4), value :: n
    integer :: bit_count
    integer :: i

    call ensure_lookup_table_initialized()

    bit_count = 0
    do i = 1, 32
        bit_count = bit_count + and(n, 1_I4)
        n = ishft(n, -1)
    end do
end function naive_count_bits

Parameters:
- n: a 32-bit integer (kind=I4), passed by value.
- bit_count: the returned result (total number of 1 bits).
Logic:
- Loops exactly 32 times for a 32-bit integer.
- Checks the least significant bit and(n, 1_I4).
- Accumulates the bit count.
- Shifts n right by 1 each iteration.
Performance: Straightforward but not the most efficient.

7.2 `early_stopping_count_bits(n) result(bit_count)`#

function early_stopping_count_bits(n) result(bit_count)
    integer(kind=I4), value :: n
    integer :: bit_count
    integer :: i

    call ensure_lookup_table_initialized()

    bit_count = 0
    do i = 1, 32
        bit_count = bit_count + and(n, 1_I4)
        n = ishft(n, -1)
        if (n == 0) exit
    end do
end function early_stopping_count_bits

Difference from Naive:
- After shifting, if n becomes 0, it exits early. This optimization helps when n is small or has bits set early in the lower portion.
Performance: Faster for numbers with few high-order bits set, as it avoids unnecessary loop iterations.

7.3 `lookup_table_count_bits(n) result(bit_count)`#

function lookup_table_count_bits(n) result(bit_count)
    integer(kind=I4), value :: n
    integer :: bit_count

    call ensure_lookup_table_initialized()

    bit_count = lookup_table(and(n, 255_I4)) + &
                lookup_table(and(ishft(n, -8), 255_I4)) + &
                lookup_table(and(ishft(n, -16), 255_I4)) + &
                lookup_table(and(ishft(n, -24), 255_I4))
end function lookup_table_count_bits

Approach:
- Splits the 32-bit integer n into four 8-bit segments using right shifts and masks:
  - The lowest 8 bits: and(n, 255_I4)
  - The next 8 bits: and(ishft(n, -8), 255_I4)
  - And so forth up to 24 bits.
Lookup:
- Each 8-bit value serves as an index into lookup_table, which presumably stores the number of set bits for all numbers from 0 to 255.
- Summing these four partial counts yields the total set bits in 32 bits.
Performance:
- Very efficient, since each 8-bit chunk’s bit count is retrieved in O(1) time.
- Requires memory for the table and a one-time file read to initialize.

7.4 `kernighan_count_bits(n) result(bit_count)`#

function kernighan_count_bits(n) result(bit_count)
    integer(kind=I4), value :: n
    integer :: bit_count

    call ensure_lookup_table_initialized()

    bit_count = 0
    do while (n /= 0)
        n = and(n, n - 1_I4)
        bit_count = bit_count + 1
    end do
end function kernighan_count_bits

Brian Kernighan’s Algorithm:
- Each iteration clears the lowest set bit of n by doing n = and(n, n - 1_I4).
- Increments the count of bits for each iteration.
Performance:
- Efficient when n has relatively few set bits.
- The loop runs as many times as there are set bits, rather than the total number of bits.

8. Binary Representation: `bit_repr(val) result(repr)`#

function bit_repr(val) result(repr)
    integer(kind=I4), value :: val
    character(len=32) :: repr
    integer :: i

    do i = 32, 1, -1
        if (and(val, 1_I4) == 1_I4) then
            repr(i:i) = '1'
        else
            repr(i:i) = '0'
        end if
        val = ishft(val, -1)
    end do
end function bit_repr

Purpose: Converts a 32-bit integer into a fixed 32-character binary string.
Implementation:
- Iterates from bit position 32 down to 1.
- Checks the least significant bit of val and sets repr(i:i) accordingly to '1' or '0'.
- Shifts val right by 1.
- The result is stored in a 32-character string, where repr(1:1) is the leftmost bit and repr(32:32) is the rightmost bit.
Sign Bit:
- For negative values, the leading bits will be the sign bits (in two’s complement representation). Hence, you’ll often see ones at the higher-order bits for negative integers.

9. Key Observations and Takeaways#

Initialization and Encapsulation:
- The approach of calling ensure_lookup_table_initialized() inside each function ensures that external callers do not need to worry about whether the table is ready.
- By marking the table and initialization logic private, the module retains control over how the data is loaded and accessed.
Performance Trade-Offs:
- Naive and Early Stopping methods rely on shifting all or some bits in a loop, straightforward but generally slower.
- Kernighan method iterates only as many times as there are set bits.
- Lookup Table method is typically the fastest for large volumes of data, because it reduces the problem to four table lookups.
Error Handling:
- The module checks for I/O issues (ios codes) when reading files.
- If CSV reading fails, it uses stop statements to terminate the program with a nonzero exit code.
CSV Preprocessing:
- Because the CSV lines might contain trailing , &, the subroutine remove_trailing_comma_ampersand cleans each line. This is a helpful technique to ensure the data is in a standard CSV format when eventually read by csv_module.
Memory Usage:
- The lookup table is only 256 integers in size, which is negligible in modern contexts. However, the code’s design is scalable and well-structured if someone wanted to extend the concept to 16-bit or 64-bit segmented lookups.

10. Conclusion#

bitmanip_mod.f90 is a well-structured module providing:

Multiple Bit-Counting Implementations:
1. Naive (simple, guaranteed 32 iterations).
2. Early Stopping (slightly optimized naive approach).
3. Lookup Table (fast, relies on precomputed 8-bit partial counts).
4. Kernighan’s (iterates by the number of set bits).
A Portable CSV-Loading Mechanism that leverages a third-party CSV library and local string-manipulation utilities to populate the lookup table.
Utility Function (bit_repr) for integer-to-binary-string conversion.
Robust Initialization: The separation of concerns between ensure_lookup_table_initialized and initialize_lookup_table is clean, ensuring the lookup table is always valid when the public routines are called.

Overall, this module exemplifies good Fortran design practices: careful scoping, explicit initialization, and a range of performance-tuned subroutines for a specific bitwise manipulation problem.

Program Code#

bitcount.f90#

program bitcount
    use, intrinsic :: iso_fortran_env, only : error_unit
    use :: bitmanip_mod
    implicit none
    integer :: nr_vals
    integer(kind=I4), dimension(:), allocatable :: vals
    integer(kind=I4) :: result
    integer :: i, istat
    integer :: naive_count_bits_result, early_stopping_count_bits_result, &
        lookup_table_count_bits_result, kernighan_count_bits_result
    real :: r, start_time, end_time

    print '(A)', 'sanity check: naive, early stopping, lookup table Kernighan'
    do i = -16, 16
        naive_count_bits_result = naive_count_bits(i)
        early_stopping_count_bits_result = early_stopping_count_bits(i)
        lookup_table_count_bits_result = lookup_table_count_bits(i)
        kernighan_count_bits_result = kernighan_count_bits(i)

        !print '(I14, 3A, 4I3)', i, ': ', bit_repr(i), ': ', &
        !    naive_count_bits(i), early_stopping_count_bits(i), &
        !    lookup_table_count_bits(i), kernighan_count_bits(i)

        print '(I14, 3A, 4I3)', i, ': ', bit_repr(i), ': ', &
            naive_count_bits_result, early_stopping_count_bits_result, &
            lookup_table_count_bits_result, kernighan_count_bits_result
    end do

    ! create values upfront for benchmarking
    call get_arguments(nr_vals)
    allocate (vals(nr_vals), stat=istat)
    if (istat /= 0) then
        write (unit=error_unit, fmt='(A, I0, A)') 'error: can not allocate ', &
            nr_vals, ' elements'
        stop 3
    end if
    do i = 1, size(vals)
        call random_number(r)
        vals(i) = int(r*real(huge(0_I4)), kind=I4)
    end do

    ! time naive implmentation
    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, naive_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F35.15)', 'naive implementation ', size(vals), &
        ' iterations: ', end_time - start_time

    ! time early stopping implmentation
    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, early_stopping_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F35.15)', 'early stopping implementation ', size(vals), &
        ' iterations: ', end_time - start_time

    ! time lookup table implmentation
    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, lookup_table_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F35.15)', 'lookup table implementation ', size(vals), &
        ' iterations: ', end_time - start_time

    ! time Kernighan's algorithm
    call cpu_time(start_time)
    do i = 1, size(vals)
        result = xor(result, kernighan_count_bits(vals(i)))
    end do
    call cpu_time(end_time)
    print '(A, I0, A, F35.15)', 'Kernighan implementation ', size(vals), &
        ' iterations: ', end_time - start_time
contains

    subroutine get_arguments(nr_vals)
        use, intrinsic :: iso_fortran_env, only : error_unit
        implicit none
        integer, intent(out) :: nr_vals
        character(len=1024) :: buffer, msg
        integer :: istat

        if (command_argument_count() /= 1) then
            write (unit=error_unit, fmt='(A)') 'error: expecting number of values'
            write (unit=*, fmt='(A)') 'error: expecting number of values'
            stop 1
        end if
        call get_command_argument(1, buffer)
        read (buffer, fmt=*, iostat=istat, iomsg=msg) nr_vals
        if (istat /= 0) then
            write (unit=error_unit, fmt='(2A)') 'error: ', trim(msg)
            stop 2
        end if
    end subroutine get_arguments

end program bitcount

bitmanip_mod.f90#

module bitmanip_mod
    use, intrinsic :: iso_fortran_env, only : INT32, error_unit
    use csv_module

    implicit none

    private

    ! A 256-element lookup table stored in the module
    integer, public, parameter :: I4 = INT32
    integer(kind=I4), dimension(0:255), private :: lookup_table
    logical, private, save :: initialized = .false.

    ! Public procedures
    public :: naive_count_bits, early_stopping_count_bits, bit_repr, &
              lookup_table_count_bits, kernighan_count_bits

contains

    !----------------------------------------------------------------------
    !> Main routine that ensures the module lookup_table is initialized
    subroutine ensure_lookup_table_initialized()
        if (.not. initialized) then
            call initialize_lookup_table()
        end if
    end subroutine ensure_lookup_table_initialized

    !----------------------------------------------------------------------
    !> Initialize the module lookup_table by reading 256 integers from the CSV
    subroutine initialize_lookup_table()
        implicit none
        integer :: ios

        ! Clear the module array
        lookup_table = 0

        ! Call parse_csv_file to fill the local array
        call parse_csv_file("lookup_table.dat", lookup_table, ios)

        if (ios == 0) then
            initialized = .true.
        else
            write(error_unit, '(A,I0)') "Failed to read CSV lookup table, ios=", ios
            stop 1
        end if
    end subroutine initialize_lookup_table

    !----------------------------------------------------------------------
    !> Reads up to 256 integers from the CSV file using csv-fortran’s get_csv_data_as_str
    subroutine parse_csv_file(filename, table_data, ios)
        !! Reads a CSV file that has trailing ", &" on each line,
        !! removing them before calling csv-fortran's get_csv_data_as_str.
        use csv_module
        
        implicit none

        ! Subroutine arguments
        character(len=*), intent(in)                                :: filename     ! Holds the name of the CSV file
        integer(kind=I4), dimension(0:255), intent(out)             :: table_data
        integer, intent(out)                                        :: ios          ! Error code

        ! Local variables
        type(csv_file)                                              :: c            ! csv-fortran object
        logical                                                     :: status_ok    ! Status flag for csv-fortran objects
        character(len=255), allocatable, dimension(:,:)             :: string_table ! Holds the CSV data as a table of strings
        integer                                                     :: nrow, ncol   ! Number of rows and columns in the CSV data
        integer :: i, j, idx, ios2, val
        character(len=30)                                           :: trimmed_string ! Holds the trimmed string which is a data cell

        !! For pre-processing:
        integer :: unit_in, unit_out, status_in, status_out
        character(len=1024) :: line, tmpfile
        logical :: done

        ! Initialize outputs
        ios = 0
        table_data = 0

        ! Generate a temp file name
        tmpfile = 'tempfile.csv'

        !-------------------------------------------------
        !  STEP 1: Pre-process 'filename' => 'tempfile.csv'
        !-------------------------------------------------
        open(newunit=unit_in, file=filename, status='old', action='read', &
            form='formatted', iostat=status_in)
        if (status_in /= 0) then
            write(*,*) "Error opening input file:", filename
            ios = 101 ! Error return code for opening input file
            return
        end if

        open(newunit=unit_out, file=tmpfile, status='replace', action='write', &
            form='formatted', iostat=status_out)
        if (status_out /= 0) then
            write(*,*) "Error creating temp file:", tmpfile
            ios = 102 ! Error return code for creating temp file
            close(unit_in)
            return
        end if

        do
            read(unit_in, '(A)', iostat=status_in) line
            if (status_in < 0) exit   ! EOF encountered
            if (status_in /= 0) then
                ios = 103 ! Error reading input file
                exit
            end if

            call remove_trailing_comma_ampersand(line)
            write(unit_out, '(A)') trim(line)
        end do

        close(unit_in)
        close(unit_out)

        if (ios /= 0) return  ! Error occurred

        !----------------------------------
        ! STEP 2: Use csv-fortran on tmpfile
        !----------------------------------

        call c%initialize()
        call c%read(tmpfile, status_ok=status_ok)
        if (ios /= 0) then
            write(*,*) "Error reading temp CSV file:", tmpfile
            call c%close(status_ok)
            return
        end if

        call c%get(string_table, status_ok=status_ok)
        if (ios /= 0) then
            write(*,*) "Error extracting data as strings."
            call c%close(status_ok)
            return
        end if

        ! Parse up to 256 integers
        nrow = size(string_table,1)
        ncol = size(string_table,2)

        idx = 0
        do i = 1, nrow
            do j = 1, ncol
                trimmed_string = adjustl(adjustl(string_table(i,j)))
                read(trimmed_string, *) val
                table_data(idx) = val
                idx = idx + 1
            end do
        end do

        ! Close CSV
        call c%close(status_ok)

        deallocate(string_table)

    end subroutine parse_csv_file

    !------------------------------------------
    subroutine remove_trailing_comma_ampersand(line)
        implicit none
        character(len=*), intent(inout) :: line
        integer :: n
    
        n = len_trim(line)
        if (n < 1) return ! Nothing to do
    
        ! Remove trailing "&"
        do while (n > 0)
            if (line(n:n) == "&") then
                line(n:n) = ""
                n = n - 1
            else
                exit
            end if
        end do
    
        ! Remove trailing ", "
        do while (n >= 2)
            if (line(n-1:n) == ", ") then
                line(n-1:n) = ""
                n = n - 2
            else
                exit
            end if
        end do
    end subroutine remove_trailing_comma_ampersand    

    !----------------------------------------------------------------------
    !> Bit-counting routines that rely on the lookup_table
    function naive_count_bits(n) result(bit_count)
        implicit none
        integer(kind=I4), value :: n
        integer :: bit_count
        integer :: i

        call ensure_lookup_table_initialized()

        bit_count = 0
        do i = 1, 32
            bit_count = bit_count + and(n, 1_I4)
            n = ishft(n, -1)
        end do
    end function naive_count_bits

    function early_stopping_count_bits(n) result(bit_count)
        implicit none
        integer(kind=I4), value :: n
        integer :: bit_count
        integer :: i

        call ensure_lookup_table_initialized()

        bit_count = 0
        do i = 1, 32
            bit_count = bit_count + and(n, 1_I4)
            n = ishft(n, -1)
            if (n == 0) exit
        end do
    end function early_stopping_count_bits

    function lookup_table_count_bits(n) result(bit_count)
        implicit none
        integer(kind=I4), value :: n
        integer :: bit_count

        call ensure_lookup_table_initialized()

        ! The lookup table is now loaded
        bit_count = lookup_table(and(n, 255_I4)) + &
                    lookup_table(and(ishft(n, -8), 255_I4)) + &
                    lookup_table(and(ishft(n, -16), 255_I4)) + &
                    lookup_table(and(ishft(n, -24), 255_I4))
    end function lookup_table_count_bits

    function kernighan_count_bits(n) result(bit_count)
        implicit none
        integer(kind=I4), value :: n
        integer :: bit_count

        call ensure_lookup_table_initialized()

        bit_count = 0
        do while (n /= 0)
            n = and(n, n - 1_I4)
            bit_count = bit_count + 1
        end do
    end function kernighan_count_bits

    function bit_repr(val) result(repr)
        implicit none
        integer(kind=I4), value :: val
        character(len=32) :: repr
        integer :: i

        do i = 32, 1, -1
            if (and(val, 1_I4) == 1_I4) then
                repr(i:i) = '1'
            else
                repr(i:i) = '0'
            end if
            val = ishft(val, -1)
        end do
    end function bit_repr

end module bitmanip_mod

The above program is compiled and run using Fortran Package Manager (fpm). The following FPM configuration file (fpm.toml) was used:

name = "Section_Bit_Manipulations_Bit_Count"

[build]
auto-executables = true
auto-tests = true
auto-examples = true

[install]
library = false

[dependencies]
csv-fortran = {git="https://github.com/jacobwilliams/csv-fortran.git"}

[[executable]]
name="Section_Bit_Manipulations_Bit_Count"
source-dir="app"
main="section_bit_manipulations_bit_count.f90"

Build the Program using FPM (Fortran Package Manager)#

import os
root_dir = ""
root_dir = os.getcwd()

code_dir = root_dir + "/" + "Fortran_Code/Section_Bit_Manipulations_Bit_Count"

os.chdir(code_dir)

build_status = os.system("fpm build 2>/dev/null")

Run the Program using FPM (Fortran Package Manager)#

The program is run and the output is saved into a file named ‘data.dat

exec_status = \
    os.system("fpm run -- 1024 2>/dev/null")

sanity check: naive, early stopping, lookup table Kernighan
           -16: 11111111111111111111111111110000:  28 28 28 28
           -15: 11111111111111111111111111110001:  29 29 29 29
           -14: 11111111111111111111111111110010:  29 29 29 29
           -13: 11111111111111111111111111110011:  30 30 30 30
           -12: 11111111111111111111111111110100:  29 29 29 29
           -11: 11111111111111111111111111110101:  30 30 30 30
           -10: 11111111111111111111111111110110:  30 30 30 30
            -9: 11111111111111111111111111110111:  31 31 31 31
            -8: 11111111111111111111111111111000:  29 29 29 29
            -7: 11111111111111111111111111111001:  30 30 30 30
            -6: 11111111111111111111111111111010:  30 30 30 30
            -5: 11111111111111111111111111111011:  31 31 31 31
            -4: 11111111111111111111111111111100:  30 30 30 30
            -3: 11111111111111111111111111111101:  31 31 31 31
            -2: 11111111111111111111111111111110:  31 31 31 31
            -1: 11111111111111111111111111111111:  32 32 32 32
             0: 00000000000000000000000000000000:   0  0  0  0
             1: 00000000000000000000000000000001:   1  1  1  1
             2: 00000000000000000000000000000010:   1  1  1  1
             3: 00000000000000000000000000000011:   2  2  2  2
             4: 00000000000000000000000000000100:   1  1  1  1
             5: 00000000000000000000000000000101:   2  2  2  2
             6: 00000000000000000000000000000110:   2  2  2  2
             7: 00000000000000000000000000000111:   3  3  3  3
             8: 00000000000000000000000000001000:   1  1  1  1
             9: 00000000000000000000000000001001:   2  2  2  2
            10: 00000000000000000000000000001010:   2  2  2  2
            11: 00000000000000000000000000001011:   3  3  3  3
            12: 00000000000000000000000000001100:   2  2  2  2
            13: 00000000000000000000000000001101:   3  3  3  3
            14: 00000000000000000000000000001110:   3  3  3  3
            15: 00000000000000000000000000001111:   4  4  4  4
            16: 00000000000000000000000000010000:   1  1  1  1
naive implementation 1024 iterations:                   0.000078999903053
early stopping implementation 1024 iterations:                   0.000065000029281
lookup table implementation 1024 iterations:                   0.000004000030458
Kernighan implementation 1024 iterations:                   0.000035000033677

Program Output Interpretation by ChatGPT 4o1#

1. Sanity Check Results#

1.1 Verification of Four Methods#

From \(-16\) through \(16\), the program prints:

Integer value (e.g., -16, -15, 0, 1, 2, etc.).
32-bit binary representation of the integer (two’s complement for negatives).
Four integers: the bit-count results for each method
- Naive (naive_count_bits)
- Early Stopping (early_stopping_count_bits)
- Lookup Table (lookup_table_count_bits)
- Kernighan (kernighan_count_bits)

For instance:

-16: 11111111111111111111111111110000:  28 28 28 28
...
 16: 00000000000000000000000000010000:   1  1  1  1

Significance:

All methods produce identical counts for each integer, confirming the correctness and consistency of the four implementations.
Negative integers (e.g., \(-1\), \(-2\), etc.) appear in two’s complement form, hence the leading bits are 1. That explains large bit counts for values like \(-1\) (32 ones → bit count = 32).

1.2 Specific Observations#

\(-1\):
Binary is all ones (111111...1111), yielding a bit count of 32 for each method.
\(-16\):
Binary ends in 11110000. Because \(-16\) has four zeros at the lower end and ones in the high positions, the bit count is 28.
Small Positive Values:
For example, \(3\) is 000000...0011 with a count of 2 bits.
For \(15\) (00000000000000000000000000001111), the bit count is 4.

These results ensure that the range from negative to positive integers is correctly handled.

2. Timing and Performance#

After the sanity check, the program runs 1024 iterations of each bit-counting method on some dataset (in this snippet, presumably 1024 integers), printing the elapsed CPU time. Example output:

naive implementation 1024 iterations:                   0.000079999910668
early stopping implementation 1024 iterations:          0.000055999960750
lookup table implementation 1024 iterations:            0.000004000030458
Kernighan implementation 1024 iterations:               0.000035000033677

Interpretation:

Naive (0.00008s)
- Checks each of 32 bits unconditionally for each integer.
- Takes the most time (though still small), indicative of a higher overhead approach.
Early Stopping (0.00006s)
- Similar logic to Naive, but exits the loop early if the remaining bits are zero.
- Slightly faster than Naive in many cases.
Lookup Table (0.000004s)
- Significantly faster because it counts bits by splitting each 32-bit integer into four 8-bit chunks and uses a precomputed table of the bit counts for each byte.
- Fast constant time lookups dominate here.
Kernighan (0.000035s)
- Uses n = n & (n - 1), clearing one set bit per iteration.
- Performance depends on the number of set bits, not all 32.
- Often quite efficient, though typically not quite as fast as table lookups for a large batch of random data.

Hence, the performance ordering in this particular test shows Lookup Table as the fastest, followed by Kernighan, Early Stopping, and then Naive.

3. Overall Significance#

Correctness Check: The first part (the “sanity check”) assures that for any integer in \([-16,16]\), all four methods return the same bit-count. This validates the integrity of each algorithm.
Performance Comparison: The second part (timing) highlights the efficiency trade-offs between the methods. It illustrates how a lookup table can drastically reduce execution time when counting bits repeatedly, while Kernighan can also be quite efficient depending on the data distribution.

Thus, the output both confirms that each bit-counting approach works correctly and demonstrates the relative performance of each algorithm over 1024 iterations.