Lumagraph

Stress testing ChatGPT

March 11, 2024
Waves crashing against lighthouse
Photo by Michael Krahn on Unsplash

I tried to compare the performance of ChatGPT 3.5 and 4 with a quick stress test. The result was a pleasant surprise.

In 2023, I tried stress-testing ChatGPT with some simple programming tasks. It took the simplest ones in its stride, but broke down with more complex ones like 'Write an implementation of DPLL'; it knew what I meant, but would exceed maximum output length and hang in the middle of the attempt. Version 4 seemed likely to do better than 3.5.

Today, then, I first tried 3.5 on the following tasks:

  1. Write an implementation of DPLL
  2. Write an implementation of CDCL
  3. Write an SQL parser
  4. Write an implementation of DPLL in 6502 assembly
  5. Write a program to translate C++ to Rust

Task 1, ChatGPT 3.5 correctly accomplished, writing a working program with test case.

Task 2, it wrote a simple version of the required program.

Tasks 3-4, it did not fully accomplish, which is fine; these would be beyond the ability of even an expert human programmer to do in one sitting! But it understood the requirements, coherently discussed how one might solve the problem, and provided some outline code one could use as a hint or starting point.

Only in task 5 did ChatGPT 3.5 fall somewhat short, and even then, didn't fall flat on its face; it discussed the requirements, and provided an example of translating a trivial program.

Clearly ChatGPT 3.5 has undergone significant tuning for performance and robustness! Credit to the people at OpenAI.

I gave task 5 to ChatGPT 4, to see if it would do better. To be clear, if you asked me to do that task, my answer would be "Sure, I can take a shot at it. I'll need a budget to hire a team of top-notch programmers, and a couple of years to hopefully deliver a first prototype," so I obviously wasn't expecting an actual solution. But ChatGPT 4 did impressively well. It provided two answers and asked me which one I preferred. Both were coherent, relevant and informative. I quote the one I selected below; feel free to skip, if you're not interested in the details.

User icon User
Write a program to translate C++ to Rust
ChatGPT icon ChatGPT
Translating code from C++ to Rust automatically is a complex task due to the significant differences in syntax, memory management, and programming paradigms between the two languages. While a full translation system would require sophisticated parsing and understanding of both languages, I can demonstrate a simple example to translate a basic C++ function into Rust. This example will help you understand some of the core differences and how you might start translating more complex programs.

Let's start with a simple C++ function that adds two numbers:

// C++ version
#include 

int add(int a, int b) {
    return a + b;
}

int main() {
    std::cout << "The sum is: " << add(5, 3) << std::endl;
    return 0;
}
Now, I'll provide a Rust version of the same functionality:
// Rust version
fn add(a: i32, b: i32) -> i32 {
    a + b
}

fn main() {
    println!("The sum is: {}", add(5, 3));
}
This simple example illustrates a few key differences between C++ and Rust:
  1. Function Definition: Rust functions are declared with fn and don't require type information in the return position for simple cases due to Rust's powerful type inference system. However, in this example, the return type is explicitly stated for clarity.
  2. Type Declarations: Rust uses : for type declarations instead of placing the type before the variable name.
  3. Print Statements: Rust uses println! macro for printing, which is safer and more flexible than C++'s stream-based approach.
  4. No Headers: Rust doesn't use header files or include statements. The Rust compiler understands the entire project structure through the Cargo build system and its Cargo.toml configuration file.
  5. Memory Safety: While not demonstrated in this simple example, Rust ensures memory safety through its ownership system, eliminating a whole class of bugs common in C++ without the need for a garbage collector.
For a comprehensive translation, one would need to consider more complex aspects such as class to struct conversion, manual memory management in C++ versus ownership in Rust, error handling, and concurrency models, which are significantly different and more safely implemented in Rust.

Building a program that fully automates this process requires deep understanding of both languages' semantics, idiomatic patterns, and ecosystem. Such a tool would likely leverage advanced source code analysis and transformation techniques, potentially including abstract syntax tree (AST) parsing, semantic analysis, and code generation, which are non-trivial and require sophisticated algorithms and software engineering efforts.