Testing Autonomous Coders (or AI SWE)

I created this list of AI Software “Engineers”.

I used my project django-template as a test. It has a straightforward issue that any AI/developer should be able to solve: Makefile #488

GitHub Copilot coding agent Link to heading

Pricing: $39/month or $390/year

Testing on May/2025

Setup Link to heading

Signing up: https://github.com/features/copilot/plans
Allowing to access my repos: https://github.com/settings/copilot/coding_agent

Testing Link to heading

Issue: Update psycopg from 2 to 3 · Issue #509 · lucasrcezimbra/django-template · GitHub

It created this WIP PR: [WIP] Update psycopg from 2 to 3 · Pull Request #524 · lucasrcezimbra/django-template · GitHub
It started this agent session: Update psycopg from version 2 to 3
✅ It checked and understood the repo structure
❌ It tried to run the tests and failed because pytest was not installed; it tried to install and failed due a PyPI read timeout; then it gave up.

✅ It “thought” and came up with a plan:

- [ ] Update `{{cookiecutter.project_slug}}/pyproject.toml` to use psycopg 3.x instead of psycopg2-binary
- [ ] Update test assertions in `tests/test_database.py` to check for psycopg instead of psycopg2-binary
- [ ] Check for any other files that might need updates for psycopg3 compatibility
- [ ] Verify changes work with the existing database connection configuration

⚠ It updated psycopg by editing {{cookiecutter.project_slug}}/pyproject.toml instead of running poetry. Given that, it installed psycopg==3.2.0 instead of the latest 3.2.9.
✅ It updated tests/test_database.py correctly.
✅ It searched for another occurrences of psycopg2 and didn’t find.
✅ It checked the settings.py file and understood it’s using dj-database-url, so no need to change.
⚠ It “thought” that dj-database-url supports psycopg3 which is true, but it referenced a release note that doesn’t exist.
version 2.0.0 (released in August 2023) added support for psycopg3 … Reference: https://github.com/jazzband/dj-database-url/blob/master/CHANGELOG.md
Version 2.0.0 was released April 2023 and there is no mention of psycopg3.
✅ It pushed everything; the tests passed on CI.
🧑 I asked it to update psycopg to the latest version. I purposely didn’t say what is the latest, because I wanted to see if it would find out (PR Comment)
✅ It tried to install and use pip-search twice and didn’t work, then it ran pip index versions psycopg, found out the latest version, update the pyproject.toml and pushed the change.

OpenHands Link to heading

Homepage | GitHub | Docs

OpenHands was the first AI SWE I tested on Dec/2024.

Attempt 1. Using Docker Link to heading

First I tried the Quick Start approach from the README that uses Docker:

docker run -it --rm --pull=always \
    -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.16-nikolaik \
    -e LOG_ALL_EVENTS=true \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/.openhands:/home/openhands/.openhands \
    -p 3000:3000 \
    --add-host host.docker.internal:host-gateway \
    --name openhands-app \
    docker.all-hands.dev/all-hands-ai/openhands:0.16

Because I didn’t review the code, and the Docker command needed some untrusted permissions, I created a (free tier) EC2 instance on AWS to run it:

OS: Amazon Linux 2023 AMI
Instance type: t2.micro
Storage: 16GB (I started with 8GB, but it wasn’t enough because I pulled multiple versions of the Docker image).
Tailscale to expose the service to my local network

It didn’t work. The logs were not helpful. I tried with different versions (0.16, 0.15, and 0.14). One of the errors was: AttributeError: 'NoneType' object has no attribute 'logs'. I found similar issues on GitHub, but none solved my issue. So, I gave up.

Attempt 2. GitHub Actions Link to heading

My second attempt was to run it on GitHub Actions.

I followed the instructions from Docs - Using the OpenHands GitHub Action and README - OpenHands Github Issue Resolver.

Setting up and fixing Anthropic API Key Link to heading

My first commit was this that adds the GitHub Actions workflow. The run failed with the following error:

ERROR:root:<class 'litellm.exceptions.AuthenticationError'>: litellm.AuthenticationError: AnthropicException - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"}}

I had two guesses:

Anthropic API keys were wrong or not working. So, I tested it locally and updated it on GitHub.
The OpenHands action was broken. So, I tried to pin the version to 0.16.1 (it was using from the main branch before).

I ran the workflow again and it partially worked:

✅ created the Makefile, committed, and pushed it into a new branch
❌ didn’t open the PR
❌ didn’t run make commands to test the Makefile
❌ didn’t understand that the Makefile was supposed to be for the template and not for the root project.

Opening PR Link to heading

My next steps were opening the PR:

Manually opened the PR using the description written by OpenHands.
Waited for the CodeRabbit to run (idea: could configure CodeRabbit to mention @openhands-agent in the PR comments and let the AIs talk).
Deleted all CodeRabbit comments because I didn’t want them to interfere with my comments.

Asking for fixes Link to heading

After the PR was open, I asked OpenHands to fix it by moving the Makefile:

@openhands-agent, the Makefile is supposed to belong to the project created by this template. So, it should be inside the {{cookiecutter.project_slug}}/ folder.

Then I found some bugs:

Error: Unhandled error: SyntaxError: Unexpected token '{'. and Error: Unhandled error: SyntaxError: Unexpected identifier 'cookiecutter'

I guessed it was because my comment had `, so I removed them and it worked. OpenHands added a new commit to the PR.

Back and forth Link to heading

After some back and forth, I gave up on this issue and tried with others. After more back and forth, I gave up on OpenHands at all and solved all the issues myself.

You can follow the back and forth in the PRs: feat(#488): Makefile · #500, Fix issue #501: style: add ruff rules “N804”, “N805” · #503, and Fix issue #504: Run lint and tests for generated projects · #505