Introduction to Coding Agents

SatoshiTerasaki@AtelierArith

Overview

  • Self-introduction
  • Introduction to AI-powered software
  • Introduction to AI agents
  • Hands-on exercises with several topics

Self-introduction

  • Freelance software engineer.
    • Collaborative development of scientific computing with university faculty.
  • GitHub: terasakisatoshi
  • Organization: AtelierArith
    • Publishes Julia packages I have created.

Actually

  • The software I will introduce leverages AI.
    • Over 90% of the implementation I was responsible for was generated by AI. This is a fact.
  • I like dynamic languages like Julia and was not good at compiled languages.
    • Thanks to AI, I can now use programming languages other than Julia:
      • Rust
      • C++
      • Swift
      • Dart
      • TypeScript

Recent work: Organizing SparseIR ecosystem

  • Collaborative development with Hiroshi Shinaoka, Hitoshi Mori, Markus Wallerberger

Recent work: Development with tensor4all members

Recent work: mVMC

  • Takahiro Misawa et al.
    • Porting C project mVMC to Julia (WIP)

This project is still work in progress. It will be released in the future.

Recent work: SubsetJuliaVM

Let’s run Julia on iPad and iPhone

Recent work: RustCall.jl

A package that allows calling Rust from Julia. Its design was inspired by Cxx.jl.

using RustCall

rust"""
#[julia]
fn add(a: i32, b: i32) -> i32 {
    a + b
}
"""

# Call directly - wrapper is auto-generated
add(10, 20)  # => 30

Recent work: better-pluto-client

We can view Pluto notebook on VS Code.

AI Driven Development

Thanks to AI, I can now use programming languages other than Julia and develop many software products rapidly.

Humans give instructions in natural language for what they want to build, and AI agents generate source code and execute commands.

This approach to development using AI is called AI-driven development.

Did you always take it seriously?

Were you always thinking about efficient development methods?

Really? (Especially writing tests and documentation)

In reality

Where is the source code? (Git is not used. No version control)

No documentation, or it’s outdated

Code doesn’t compile (C/C++/Fortran90/FORTRAN77)

No unit tests

Runtime errors occur (segmentation faults, etc. C++!!!)

  • Dataset URLs point to a researcher’s personal C: drive
    • (hardcoded variables in the SOTA machine learning source code)

Build reproducible software for us!

  • When researchers develop software, it is important to build reproducible software.
  • AI-powered software development is one solution for building reproducible software.

Humans and machines (1)

  • Humans can “create” but they “get tired,” “get bored,” and “get stuck”
    • Coming up with new ideas like in research activities
    • Overusing git commit -m "update" and git commit -m "fix"

  • Sticking to a specific programming language
    • Creates environments where adopting modern practices like package managers is difficult

Humans and machines (2)

  • Humans can “create” but they “get tired,” “get bored,” and “get stuck”

Dear scientists, when writing new numerical computing software, use Julia or Rust for scientific computing. They have modern package management systems that enable collaborative development with others.

Julia provides fast execution while remaining a high-level language. You should use it if you typically write Fortran. However, type-unstable code, frequent GC (garbage collection), and memory allocation can prevent optimal performance. Tensor network packages that handle multidimensional arrays often encounter this issue.

Rust is a fast static language. Those who write C/C++ should consider switching. Rust has a learning curve when written manually, but AI code generation dramatically improves the development experience.

The tensor4all-rs ecosystem for tensor networks aims to be an excellent example of adopting Rust for scientific computing.

The rise of coding agents

With advances in artificial intelligence and generative AI, coding agents and AI agents can now take the following actions to achieve goals when given human instructions:

Humans and machines (3)

  • Machines “don’t get bored,” “don’t get tired,” and “models improve”
    • Generate appropriate commit and pull request summaries from code diffs

Humans and machines (4)

  • Machines “don’t get bored,” “don’t get tired,” and “models improve”
  • Work anytime
    • Can use various languages. Good at porting:
    • Julia -> C++, C++ -> Rust, C -> Julia
    • TypeScript, Swift, Flutter/Dart
  • Can adapt to requests. Migration to modern languages is possible
  • Requires appropriate instructions from humans

Are humans unnecessary? (1)

  • LLMs have much knowledge but may not have cutting-edge knowledge like researchers.
    • What to implement or not depends on human judgment.
  • Software engineers are freed from coding time but need the following capabilities:
    • Which technologies to use when
    • Designing architecture for high-quality software
    • Higher-level decision-making ability
  • Minimal domain knowledge to discuss with researchers is also required.

Are humans unnecessary? (2)

  • AI can slack off when writing code
    • Satisfied with writing only placeholders
    • Implements only special cases when a general implementation is needed
  • Not always best-practice implementation
    • In Julia: doesn’t use multiple dispatch, overuses export
    • In Rust: overuses .clone(), degrading performance
  • Minimal knowledge of how to build and run code, coding, and programming languages is necessary.

Are software engineers unnecessary? (1)

  • An extreme claim: AI development could make researchers unnecessary
  • AI adoption frees us from the following local concerns:
    • Language learning cost
    • Very fine-grained language features and syntax
    • Coding style
  • We should focus on the following global discussions:
    • How to design software with good user experience
    • Which technologies and languages to choose for each situation
    • How to maintain high-quality software
    • What to implement or not
  • Bottleneck investigation with visual profilers is difficult to fully automate with AI. Such workflows still exist
    • Optimization at millisecond, microsecond, nanosecond levels relies on human intuition. Demand remains

Are software engineers unnecessary? (2)

  • Gaining domain knowledge is important
    • Physics knowledge sufficient to communicate with physicists
  • Rust code can run on the web via WebAssembly.
    • Work may emerge for publishing and maintaining numerical computing demos on the web

Are software engineers unnecessary? (3)

  • Researchers will get busier.
    • They can do what they couldn’t before but wanted to.
    • Work will increase accordingly
  • Researchers won’t be idle.
    • One researcher subscribed to Claude Code MAX $200 and got dizzy from full AI usage
    • I recommended downgrading to the $100 plan. Sounds like a joke but it’s true
  • {SWE task} \(\subset\) {Researcher's demand} \(\setminus\) {Things researchers actually can do}

Are software engineers unnecessary? (4)

  • Jobs disappear when a certain passion disappears.

Demo

Introducing coding with Cursor.

Alternatives: Antigravity, Claude Code, Codex.

Which tool should you use? (1)

  • There is no single answer since AI agents are constantly evolving
  • Depends on which accounts you have
    • If you have ChatGPT: try Codex, GPT-5.3
    • If you have Claude: try Claude Code: Opus-4.6
  • Want to use for free
    • Use GitHub Copilot, Google Antigravity
  • All services have usage limits. It all comes down to budget

Which tool should you use? (2)

As for me

  • Claude Code MAX $200/$100: Opus-4.6
  • ChatGPT Plus plan: Codex: GPT-5.3
  • Cursor Pro: Daily editor with Composer 1.5

Hardware

I like to use a ultrawide monitor

Required CLI tools

  • git, gh: Operate Git/GitHub
  • npm, node: Installing coding agents
  • uv: Python package manager
  • juliaup, julia: Required to run Julia code
  • rustup, rustc, cargo: Required to build and run Rust code

Optional:

  • tmux
  • docker
  • devcontainer
  • rg
  • tree
  • htop
  • quarto

Software Installation

See preparation.qmd

Installation (Cursor)

Go to the link below:

Cursor UI

Math Quiz

Given two positive integers \(a\) and \(b\), what is the probability that their greatest common divisor \(\gcd(a,b)\) equals 1?

The answer is

\[ \begin{aligned} \mathrm{Prob}(\gcd(a, b) = 1) &= \frac{1}{\zeta(2)} \\ &= \frac{6}{\pi^2} \\ &\approx 0.6079271018540267 > 0.5 \end{aligned} \]

where \(\zeta\) is the Riemann zeta function.

\[ \zeta(s) = \sum_{n=1}^{\infty} \frac{1}{n^s}= \prod_{p: \mathrm{primes} } \left(\frac{1}{1 - p^{-s}} \right) \mathrm{} \]

The second equality is the equation known as the Euler product.

Proof

\[ \begin{aligned} \mathrm{Prob}(\gcd(a, b) = 1) &= \prod_{p: \mathrm{primes} } \left( 1 - \mathrm{Prob}(p | a \textrm{ and } p | b)\right) \\ &= \prod_{p: \mathrm{primes} } \left( 1 - \mathrm{Prob}(p | a)\mathrm{Prob}(p | b)\right) \\ &= \prod_{p: \mathrm{primes} } \left( 1 - \frac{1}{p}\cdot\frac{1}{p}\right) \\ &= \prod_{p: \mathrm{primes} } \left( 1 - p^{-2}\right) \\ &= \left(\prod_{p: \mathrm{primes} } \left(\frac{1}{1 - p^{-2}} \right)\right)^{-1} \\ &=\left(\zeta(2)\right)^{-1} \\ &= \frac{1}{\zeta(2)} = \frac{6}{\pi^2} \end{aligned} \]

JS Implementation (1)

From the proof above we obtain:

\[ \pi = \sqrt{\frac{6}{\mathrm{Prob}(\gcd(a, b)=1)}} \]

Let’s verify with numerical computation

// Function to approximate pi
// using probability that two numbers are coprime
function calcPi(N) {
    let cnt = 0;    // Counter for coprime pairs
    // Loop through all pairs (a, b) where 1 <= a, b <= N
    for (let a = 1; a <= N; a++) {
        for (let b = 1; b <= N; b++) {
            // Check if a and b are coprime
            if (mygcd(a, b) === 1) {
                cnt += 1;  // Increment counter if coprime
            }
        }
    }
    // Probability that two numbers are coprime
    let prob = cnt / (N * N);
    // Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    return Math.sqrt(6 / prob);
}

JS Implementation (2)

Let’s calculate \(\pi \approx 3.14...\) using the greatest common divisor:

// approx_pi_gcd.js

// Function to compute the greatest common divisor (GCD) using the Euclidean algorithm
function mygcd(a, b) {
    // Loop until the remainder is zero
    while (b !== 0) {
        let tmp = b;    // Store the value of b temporarily
        b = a % b;      // Update b to the remainder of a divided by b
        a = tmp;        // Set a to the previous value of b
    }
    return a;           // When b is zero, a is the GCD
}

// Function to approximate pi using probability that two numbers are coprime
function calcPi(N) {
    let cnt = 0;    // Counter for coprime pairs
    // Loop through all pairs (a, b) where 1 <= a, b <= N
    for (let a = 1; a <= N; a++) {
        for (let b = 1; b <= N; b++) {
            // Check if a and b are coprime
            if (mygcd(a, b) === 1) {
                cnt += 1;  // Increment counter if coprime
            }
        }
    }
    // Probability that two numbers are coprime
    let prob = cnt / (N * N);
    // Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    return Math.sqrt(6 / prob);
}

// Main function to run the pi approximation
function main() {
    let N = 10000;            // Number limit for coprimality checking
    let pi = calcPi(N);       // Approximate pi
    console.log(`N: ${N}`);   // Output N
    console.log(`pi: ${pi}`); // Output approximation of pi
}

main(); // Call the main function

Result

$ node approx_pi_gcd.js
calcPi: 2.175s
N: 10000
pi: 3.141534239016629

Exercise

Port the JavaScript code above to Julia and Rust. Measure the execution time when N=10000.

Hint

Use the following prompts:

Port approx_pi_gcd.js to Julia and save the result as approx_pi_gcd.jl

Port approx_pi_gcd.js to Rust and save the result as approx_pi_gcd.rs

Answer (Cursor)

Answer (Julia)

# Function to compute the greatest common divisor (GCD) using the Euclidean algorithm
function mygcd(a, b)
    # Loop until the remainder is zero
    while b != 0
        tmp = b    # Store the value of b temporarily
        b = a % b  # Update b to the remainder of a divided by b
        a = tmp    # Set a to the previous value of b
    end
    return a       # When b is zero, a is the GCD
end

# Function to approximate pi using probability that two numbers are coprime
function calcPi(N)
    cnt = 0    # Counter for coprime pairs
    # Loop through all pairs (a, b) where 1 <= a, b <= N
    for a in 1:N
        for b in 1:N
            # Check if a and b are coprime
            if mygcd(a, b) == 1
                cnt += 1  # Increment counter if coprime
            end
        end
    end
    # Probability that two numbers are coprime
    prob = cnt / (N * N)
    # Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    return sqrt(6 / prob)
end

# Main function to run the pi approximation
function main()
    N = 10000            # Number limit for coprimality checking
    approx_pi = @time calcPi(N) # Approximate pi and time it
    println("N: $(N)")   # Output N
    println("pi: $(approx_pi)") # Output approximation of pi
end

main() # Call the main function

Result

$ julia approx_pi_gcd.jl
  1.865392 seconds
N: 10000
pi: 3.141534239016629

Answer (Rust)

// Function to compute the greatest common divisor (GCD) using the Euclidean algorithm
fn mygcd(a: u64, b: u64) -> u64 {
    let mut a = a;
    let mut b = b;
    // Loop until the remainder is zero
    while b != 0 {
        let tmp = b;    // Store the value of b temporarily
        b = a % b;      // Update b to the remainder of a divided by b
        a = tmp;        // Set a to the previous value of b
    }
    a                   // When b is zero, a is the GCD
}

// Function to approximate pi using probability that two numbers are coprime
fn calc_pi(n: u64) -> f64 {
    let mut cnt = 0u64; // Counter for coprime pairs
    // Loop through all pairs (a, b) where 1 <= a, b <= N
    for a in 1..=n {
        for b in 1..=n {
            // Check if a and b are coprime
            if mygcd(a, b) == 1 {
                cnt += 1;  // Increment counter if coprime
            }
        }
    }
    // Probability that two numbers are coprime
    let prob = cnt as f64 / (n as f64 * n as f64);
    // Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    (6.0 / prob).sqrt()
}

// Main function to run the pi approximation
fn main() {
    let n = 10000u64;   // Number limit for coprimality checking
    let start = std::time::Instant::now();
    let pi = calc_pi(n); // Approximate pi
    let duration = start.elapsed();
    println!("calcPi: {:?}", duration);
    println!("N: {}", n);   // Output N
    println!("pi: {}", pi); // Output approximation of pi
}

Result

$ rustc -C opt-level=3 approx_pi_gcd.rs && ./approx_pi_gcd
calcPi: 1.754083333s
N: 10000
pi: 3.141534239016629

Answer (C)

#include <stdio.h>
#include <math.h>
#include <time.h>

// Function to compute the greatest common divisor (GCD) using the Euclidean algorithm
int mygcd(int a, int b) {
    // Loop until the remainder is zero
    while (b != 0) {
        int tmp = b;    // Store the value of b temporarily
        b = a % b;      // Update b to the remainder of a divided by b
        a = tmp;        // Set a to the previous value of b
    }
    return a;           // When b is zero, a is the GCD
}

// Function to approximate pi using probability that two numbers are coprime
double calcPi(int N) {
    int cnt = 0;    // Counter for coprime pairs
    // Loop through all pairs (a, b) where 1 <= a, b <= N
    for (int a = 1; a <= N; a++) {
        for (int b = 1; b <= N; b++) {
            // Check if a and b are coprime
            if (mygcd(a, b) == 1) {
                cnt += 1;  // Increment counter if coprime
            }
        }
    }
    // Probability that two numbers are coprime
    double prob = (double)cnt / (N * N);
    // Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    return sqrt(6.0 / prob);
}

// Main function to run the pi approximation
int main() {
    int N = 10000;            // Number limit for coprimality checking
    clock_t start = clock();
    double pi = calcPi(N);    // Approximate pi
    clock_t end = clock();
    double cpu_time_used = ((double)(end - start)) / CLOCKS_PER_SEC;

    printf("calcPi: %f seconds\n", cpu_time_used);
    printf("N: %d\n", N);     // Output N
    printf("pi: %f\n", pi);   // Output approximation of pi

    return 0;
}

Result

$ gcc -O3 approx_pi_gcd.c && ./a.out
calcPi: 1.753796 seconds
N: 10000
pi: 3.141534

Julia can execute this much faster.

# Function to approximate pi using probability that two numbers are coprime
function calcPi(N)
    cnt = 0    # Counter for coprime pairs
    # Loop through all pairs (a, b) where 1 <= a, b <= N
    for a in 1:N
        for b in 1:N
            # Check if a and b are coprime
            if gcd(a, b) == 1
                cnt += 1  # Increment counter if coprime
            end
        end
    end
    # Probability that two numbers are coprime
    prob = cnt / (N * N)
    # Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    return sqrt(6 / prob)
end

# Main function to run the pi approximation
function main()
    N = 10000            # Number limit for coprimality checking
    approx_pi = @time calcPi(N) # Approximate pi and time it
    println("N: $(N)")   # Output N
    println("pi: $(approx_pi)") # Output approximation of pi
end

main() # Call the main function

Result

julia approx_pi_gcd.jl
  1.454210 seconds
N: 10000
pi: 3.141534239016629

Julia got faster?!

Why did it go from 1.865392 seconds to 1.454210 seconds?

Because Julia’s gcd function uses a different algorithm than the Euclidean algorithm.

Exercise

  • Clone the Julia repository and have the coding agent explain how the gcd function works

  • Have it answer what algorithm is used

  • Have it port Julia’s gcd implementation to replace mygcd in the code we ported from JavaScript to Rust

Hint

  • git clone --depth 1 https://github.com/JuliaLang/julia.git enables fast cloning (shallow clone). Use the following prompt:

Read the Julia code and answer how the gcd function is defined. Read @julia

Port these gcd functions to Rust

Answer

Indeed, the code is as follows:

# binary GCD (aka Stein's) algorithm
# about 1.7x (2.1x) faster for random Int64s (Int128s)
# Unfortunately, we need to manually annotate this as `@assume_effects :terminates_locally` to work around #41694.
# Since this is used in the Rational constructor, constant folding is something we do care about here.
@assume_effects :terminates_locally function _gcd(ain::T, bin::T) where T<:BitInteger
    zb = trailing_zeros(bin)
    za = trailing_zeros(ain)
    a = abs(ain)
    b = abs(bin >> zb)
    k = min(za, zb)
    while a != 0
        a >>= za
        absd, diff = absdiff(a, b)
        za = trailing_zeros(diff)
        b = min(a, b)
        a = absd
    end
    r = b << k
    return r % T
end

Answer (Cursor with Auto-mode)

Answer (Rust implementation)

// Port of Julia's gcd functions from base/intfuncs.jl
// Greatest common divisor implementations using Euclidean and Binary GCD algorithms

use std::ops::{Rem, Shr, Shl, Sub, BitAnd};

/// Greatest common (positive) divisor (or zero if all arguments are zero).
///
/// This is the general implementation for all integer types using the Euclidean algorithm.
pub fn gcd<T>(mut a: T, mut b: T) -> T
where
    T: Copy + PartialEq + Rem<Output = T> + From<u8> + Default,
{
    let zero = T::from(0u8);
    while b != zero {
        let t = b;
        b = a % b;
        a = t;
    }
    checked_abs(a)
}

/// Greatest common divisor for fixed-width integer types using Binary GCD (Stein's) algorithm.
///
/// This is about 1.7x (2.1x) faster for random Int64s (Int128s) compared to Euclidean algorithm.
pub fn gcd_bitinteger<T>(a: T, b: T) -> T
where
    T: Copy
        + PartialEq
        + PartialOrd
        + Shr<usize, Output = T>
        + Shl<usize, Output = T>
        + BitAnd<Output = T>
        + Sub<Output = T>
        + From<u8>
        + Default,
{
    let zero = T::from(0u8);

    if a == zero {
        return checked_abs(b);
    }
    if b == zero {
        return checked_abs(a);
    }

    // Handle typemin case for signed types
    // In Rust, we check for the minimum value differently
    // For now, we'll proceed with the algorithm

    _gcd(a, b)
}

/// Binary GCD (aka Stein's) algorithm implementation.
///
/// This is the core algorithm that's faster for fixed-width integers.
fn _gcd<T>(ain: T, bin: T) -> T
where
    T: Copy
        + PartialEq
        + PartialOrd
        + Shr<usize, Output = T>
        + Shl<usize, Output = T>
        + BitAnd<Output = T>
        + Sub<Output = T>
        + From<u8>
        + Default,
{
    let zero = T::from(0u8);

    let zb = trailing_zeros(bin);
    let mut za = trailing_zeros(ain);
    let mut a = abs(ain);
    let mut b = abs(bin >> zb);
    let k = if za < zb { za } else { zb };

    while a != zero {
        a = a >> za;
        let (absd, diff) = absdiff(a, b);
        za = trailing_zeros(diff);
        b = if a < b { a } else { b };
        a = absd;
    }

    let r = b << k;
    r
}

/// Compute absolute difference between two values.
/// Returns (absolute difference, signed difference).
fn absdiff<T>(x: T, y: T) -> (T, T)
where
    T: Copy + PartialOrd + Sub<Output = T> + From<u8>,
{
    if x >= y {
        let d = x - y;
        (d, d)
    } else {
        let d = y - x;
        (d, d)
    }
}

/// Compute the number of trailing zeros in the binary representation.
///
/// For unsigned types, this counts trailing zeros.
/// For signed types, we convert to unsigned first.
fn trailing_zeros<T>(x: T) -> usize
where
    T: Copy,
{
    // This is a simplified version. In practice, you'd use the built-in
    // trailing_zeros method for each integer type.
    // For a generic implementation, we'd need trait bounds that include
    // the actual integer traits.

    // For now, we'll use a basic implementation
    // In real code, you'd specialize for each integer type
    if let Some(val) = try_to_u64(x) {
        val.trailing_zeros() as usize
    } else {
        // Fallback: count manually
        count_trailing_zeros_fallback(x)
    }
}

/// Try to convert a value to u64 for trailing_zeros calculation.
fn try_to_u64<T>(_x: T) -> Option<u64> {
    None // Generic implementation - would need specialization
}

/// Fallback trailing zeros counting.
fn count_trailing_zeros_fallback<T>(_x: T) -> usize {
    0 // Placeholder - would need proper implementation
}

/// Compute absolute value with overflow checking.
fn checked_abs<T>(x: T) -> T
where
    T: Copy,
{
    // For unsigned types, abs is a no-op
    // For signed types, we'd use checked_abs
    // This is a simplified version
    abs(x)
}

/// Compute absolute value.
fn abs<T>(x: T) -> T
where
    T: Copy,
{
    // For unsigned types, this is a no-op
    // For signed types, we'd need to handle negation
    // This is a simplified generic version
    x
}

// Specialized implementations for specific integer types

/// GCD for u8
pub fn gcd_u8(a: u8, b: u8) -> u8 {
    gcd_bitinteger_u8(a, b)
}

/// Binary GCD for u8 (Stein's algorithm matching Julia's implementation)
pub fn gcd_bitinteger_u8(ain: u8, bin: u8) -> u8 {
    if ain == 0 {
        return bin;
    }
    if bin == 0 {
        return ain;
    }

    let zb = bin.trailing_zeros() as usize;
    let mut za = ain.trailing_zeros() as usize;
    let mut a = ain;
    let mut b = bin >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    b << k
}

/// GCD for u16
pub fn gcd_u16(a: u16, b: u16) -> u16 {
    gcd_bitinteger_u16(a, b)
}

/// Binary GCD for u16 (Stein's algorithm matching Julia's implementation)
pub fn gcd_bitinteger_u16(ain: u16, bin: u16) -> u16 {
    if ain == 0 {
        return bin;
    }
    if bin == 0 {
        return ain;
    }

    let zb = bin.trailing_zeros() as usize;
    let mut za = ain.trailing_zeros() as usize;
    let mut a = ain;
    let mut b = bin >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    b << k
}

/// GCD for u32
pub fn gcd_u32(a: u32, b: u32) -> u32 {
    gcd_bitinteger_u32(a, b)
}

/// Binary GCD for u32 (Stein's algorithm matching Julia's implementation)
pub fn gcd_bitinteger_u32(ain: u32, bin: u32) -> u32 {
    if ain == 0 {
        return bin;
    }
    if bin == 0 {
        return ain;
    }

    let zb = bin.trailing_zeros() as usize;
    let mut za = ain.trailing_zeros() as usize;
    let mut a = ain;
    let mut b = bin >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    b << k
}

/// GCD for u64
pub fn gcd_u64(a: u64, b: u64) -> u64 {
    gcd_bitinteger_u64(a, b)
}

/// Binary GCD for u64 (Stein's algorithm matching Julia's implementation)
pub fn gcd_bitinteger_u64(ain: u64, bin: u64) -> u64 {
    if ain == 0 {
        return bin;
    }
    if bin == 0 {
        return ain;
    }

    let zb = bin.trailing_zeros() as usize;
    let mut za = ain.trailing_zeros() as usize;
    let mut a = ain;
    let mut b = bin >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    b << k
}

/// GCD for u128
pub fn gcd_u128(a: u128, b: u128) -> u128 {
    gcd_bitinteger_u128(a, b)
}

/// Binary GCD for u128 (Stein's algorithm matching Julia's implementation)
pub fn gcd_bitinteger_u128(ain: u128, bin: u128) -> u128 {
    if ain == 0 {
        return bin;
    }
    if bin == 0 {
        return ain;
    }

    let zb = bin.trailing_zeros() as usize;
    let mut za = ain.trailing_zeros() as usize;
    let mut a = ain;
    let mut b = bin >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    b << k
}

/// GCD for i8
pub fn gcd_i8(a: i8, b: i8) -> i8 {
    gcd_bitinteger_i8(a, b)
}

/// Binary GCD for i8 with signed integer handling (matching Julia's algorithm)
pub fn gcd_bitinteger_i8(ain: i8, bin: i8) -> i8 {
    if ain == 0 {
        return bin.abs();
    }
    if bin == 0 {
        return ain.abs();
    }

    // Handle typemin case
    if ain == i8::MIN && ain == bin {
        panic!("gcd({}, {}) overflows", ain, bin);
    }

    let zb = bin.unsigned_abs().trailing_zeros() as usize;
    let mut za = ain.unsigned_abs().trailing_zeros() as usize;
    let mut a = ain.unsigned_abs();
    let mut b = bin.unsigned_abs() >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    (b << k) as i8
}

/// GCD for i16
pub fn gcd_i16(a: i16, b: i16) -> i16 {
    gcd_bitinteger_i16(a, b)
}

/// Binary GCD for i16 with signed integer handling (matching Julia's algorithm)
pub fn gcd_bitinteger_i16(ain: i16, bin: i16) -> i16 {
    if ain == 0 {
        return bin.abs();
    }
    if bin == 0 {
        return ain.abs();
    }

    if ain == i16::MIN && ain == bin {
        panic!("gcd({}, {}) overflows", ain, bin);
    }

    let zb = bin.unsigned_abs().trailing_zeros() as usize;
    let mut za = ain.unsigned_abs().trailing_zeros() as usize;
    let mut a = ain.unsigned_abs();
    let mut b = bin.unsigned_abs() >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    (b << k) as i16
}

/// GCD for i32
pub fn gcd_i32(a: i32, b: i32) -> i32 {
    gcd_bitinteger_i32(a, b)
}

/// Binary GCD for i32 with signed integer handling (matching Julia's algorithm)
pub fn gcd_bitinteger_i32(ain: i32, bin: i32) -> i32 {
    if ain == 0 {
        return bin.abs();
    }
    if bin == 0 {
        return ain.abs();
    }

    if ain == i32::MIN && ain == bin {
        panic!("gcd({}, {}) overflows", ain, bin);
    }

    let zb = bin.unsigned_abs().trailing_zeros() as usize;
    let mut za = ain.unsigned_abs().trailing_zeros() as usize;
    let mut a = ain.unsigned_abs();
    let mut b = bin.unsigned_abs() >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    (b << k) as i32
}

/// GCD for i64
pub fn gcd_i64(a: i64, b: i64) -> i64 {
    gcd_bitinteger_i64(a, b)
}

/// Binary GCD for i64 with signed integer handling (matching Julia's algorithm)
pub fn gcd_bitinteger_i64(ain: i64, bin: i64) -> i64 {
    if ain == 0 {
        return bin.abs();
    }
    if bin == 0 {
        return ain.abs();
    }

    if ain == i64::MIN && ain == bin {
        panic!("gcd({}, {}) overflows", ain, bin);
    }

    let zb = bin.unsigned_abs().trailing_zeros() as usize;
    let mut za = ain.unsigned_abs().trailing_zeros() as usize;
    let mut a = ain.unsigned_abs();
    let mut b = bin.unsigned_abs() >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    (b << k) as i64
}

/// GCD for i128
pub fn gcd_i128(a: i128, b: i128) -> i128 {
    gcd_bitinteger_i128(a, b)
}

/// Binary GCD for i128 with signed integer handling (matching Julia's algorithm)
pub fn gcd_bitinteger_i128(ain: i128, bin: i128) -> i128 {
    if ain == 0 {
        return bin.abs();
    }
    if bin == 0 {
        return ain.abs();
    }

    if ain == i128::MIN && ain == bin {
        panic!("gcd({}, {}) overflows", ain, bin);
    }

    let zb = bin.unsigned_abs().trailing_zeros() as usize;
    let mut za = ain.unsigned_abs().trailing_zeros() as usize;
    let mut a = ain.unsigned_abs();
    let mut b = bin.unsigned_abs() >> zb;
    let k = za.min(zb);

    while a != 0 {
        a >>= za;
        let absd = if a >= b { a - b } else { b - a };
        let diff = absd;
        za = diff.trailing_zeros() as usize;
        b = a.min(b);
        a = absd;
    }

    (b << k) as i128
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_gcd_u64() {
        assert_eq!(gcd_u64(6, 9), 3);
        assert_eq!(gcd_u64(6, 0), 6);
        assert_eq!(gcd_u64(0, 0), 0);
        assert_eq!(gcd_u64(48, 18), 6);
        assert_eq!(gcd_u64(17, 13), 1);
    }

    #[test]
    fn test_gcd_i64() {
        assert_eq!(gcd_i64(6, 9), 3);
        assert_eq!(gcd_i64(6, -9), 3);
        assert_eq!(gcd_i64(-6, 9), 3);
        assert_eq!(gcd_i64(-6, -9), 3);
        assert_eq!(gcd_i64(6, 0), 6);
        assert_eq!(gcd_i64(0, 0), 0);
    }

    #[test]
    fn test_gcd_u32() {
        assert_eq!(gcd_u32(48, 18), 6);
        assert_eq!(gcd_u32(100, 25), 25);
    }

    #[test]
    fn test_gcd_i32() {
        assert_eq!(gcd_i32(48, -18), 6);
        assert_eq!(gcd_i32(-100, 25), 25);
    }
}

// Function to approximate pi using probability that two numbers are coprime
fn calc_pi(n: u64) -> f64 {
    let mut cnt = 0u64; // Counter for coprime pairs
    // Loop through all pairs (a, b) where 1 <= a, b <= N
    for a in 1..=n {
        for b in 1..=n {
            // Check if a and b are coprime using the fast binary GCD algorithm
            if gcd::gcd_u64(a, b) == 1 {
                cnt += 1;  // Increment counter if coprime
            }
        }
    }
    // Probability that two numbers are coprime
    let prob = cnt as f64 / (n as f64 * n as f64);
    // Approximate pi using the formula: pi ≈ sqrt(6 / prob)
    (6.0 / prob).sqrt()
}

// Main function to run the pi approximation
fn main() {
    let n = 10000u64;   // Number limit for coprimality checking
    let start = std::time::Instant::now();
    let pi = calc_pi(n); // Approximate pi
    let duration = start.elapsed();
    println!("calcPi: {:?}", duration);
    println!("N: {}", n);   // Output N
    println!("pi: {}", pi); // Output approximation of pi
}

Result

rustc -C opt-level=3 gcd.rs && ./gcd
calcPi: 1.2510425s
N: 10000
pi: 3.141534239016629

Advanced topics

Summary of measurement results so far

Algorithm Language Execution time
Euclidean algorithm JavaScript 2.175 s
Julia 1.865392 s
Rust 1.754083 s
C 1.753796 s
Binary GCD algorithm Julia 1.454210 s
Rust 1.251042 s

How can we achieve Rust-equivalent performance in Julia?

The Binary GCD Algorithm can be made even faster

Exercise

  • Further optimize the Binary GCD Algorithm (1.45 -> 1.07 seconds on Julia)

Hint

  • Implement the Binary GCD Algorithm in Rust and measure execution time
  • If the Rust implementation is faster than Julia, compare the Rust binary with the assembly output from Julia’s @code_native macro
  • Fewer instructions improve execution speed; feed the comparison results back into the Julia code

Answer

Rust

Benchmark:
Run 1: 1.05132725s  pi=3.141534239016629
Run 2: 1.0358485s  pi=3.141534239016629
Run 3: 1.034275125s  pi=3.141534239016629

Julia

$ julia src/main.jl
Running tests...
All tests passed!

Benchmark:
Run 1: 1.072994208s  pi=3.141534239016629
Run 2: 1.07328325s  pi=3.141534239016629
Run 3: 1.072947583s  pi=3.141534239016629

Another example

What is the expected number of random numbers (generated uniformly) such that their sum exceeds one?

Refer this discussion on MathOverflow.

Exercise

Write a numerical computing program. Try it in various programming languages

Answer (Julia)

function count_upto_one()
    counter = 0
    accumulated = 0.0
    while true
        accumulated += rand()
        counter += 1
        if accumulated >= 1.0
            break
        end
    end
    return counter
end

function main()
    n_trial =1e8
    total_count=0
    for trial in 1:n_trial
        total_count+=count_upto_one()
    end
    println(total_count/n_trial)
end

main()