Can Amazon's CodeWhisperer Write Better Python than You?

By
Brian Tarbox
August 31, 2022

The first record of code completion appears to be in a Pascal editor called Alice back in 1985. It had auto-indent, auto-completion of BEGIN/END control structures and even syntax coloring. Controversy ensued: in the early days of Alice, there was concern that code completion made writing software too easy. But it’s really just a syntactic assistant. Code completion helps you write code that compiles because the syntax is correct but it can’t help you write code that does the right thing or even anything useful.

Alice

CoPilot by GitHub and now CodeWhisperer by Amazon change that by going well beyond syntax assistance to generate semantically correct code. They don’t just provide the outline of an if-statement, they will create entire routines for you. But how good is a code assistant in 2022? This article will focus on CodeWhisperer to help answer that question.

Trial Run: Reading from S3 with Python

Amazon launched CodeWhisperer as a preview this past June and today it supports Python, Java, and JavaScript. In a recent post on the AWS compute blog Mark Richman explains that CodeWhisperer’s models are trained on a “variety of data sources including Amazon open source code.” With that corpus (which apparently does exist) informing CodeWhisperer’s models, writing code to read a file from S3 should be a good use case to take it for a spin.

To work with CodeWhisperer (CW), you need to start with a comment that describes what you want your function to do. The more descriptive and precise the comment, the better the system can infer the logic you’d like.

You need to begin your comment with Function so that CW knows that you want to create one. In general, when comments are outside of a function, CW needs a hint.

CW will then take the comment and generate a function definition. At this point you’re given a chance to modify it before the body of the function is generated. CW may also present a few options of function definitions for you to choose from.

Alice

Screen shot of CodeWhisperer's IntelliJ integration

Press “Insert Code” and your function is created and dropped below the comment.  Notice CodeWhisperer not only inserted the code but also created a docstring.

Looks good! This code does what you’d expect given the comment and it was generated in seconds. Time spent looking up boto3 APIs is replaced by reviewing code to make sure the semantics check out.

Let’s see what happens when you ask it for a bit more. It might be useful to have a function that returns the first ‘n’ lines from a file in S3.

Impressive! CodeWhisperer generated a correct function using the function it previously created as a helper method.

Before CW generates the body of the function, you could take the opportunity to improve the readability of the code by modifying the parameter names. For example, if we type:

The system will propose:

But we can modify the function definition to be more descriptive before the function body is generated:

If we accept the function definition CodeWhisperer builds the function body using both the function signature and comment as input. The resulting function will include our improved parameter names.

CodeWhisperer can do more than create whole functions; it can also insert code snippets within functions, and will infer relevant variables in the process.

CodeWhisperer as a Serious Productivity Booster

Using CodeWhisperer, I was able to get working code completed much faster than If I was writing the code myself. The biggest value so far was demonstrated in the S3 examples. If I wrote that code myself, the majority of the time would have been looking up boto API docs and making sure I got the calls right. All that was done for me in three seconds.

This got me thinking about an annoying chunk of code I’ve found myself spending too much time on: sending metrics to CloudWatch. Let’s see if CodeWhisperer can help if I provide it the comment: "Function to emit a CloudWatch metric."

That is extremely useful! CW saved me many keystrokes and several minutes of reviewing API docs. I'd probably end up refactoring this code, but even if I coded it up from scratch, I'd blow out raw logic that would look something like this as a first step. In three seconds, I’ve avoided a significant chunk of time writing boilerplate code and now have a good starting point to customize or refactor.

So is CodeWhisperer a Better Programmer than Me?

Despite the clickbait-y title, whether CW’s code is better or worse than mine is at the margins and not really important. What is significant is that it has the potential to save me a ton of time and mental space to focus on improving, refactoring and testing. It’s making me a better programmer by taking on some of the undifferentiated heavy lifting.

But the examples above are use cases I’d expect an Amazon tool, trained on Amazon open source code, to do well. Surely, CW is not going to be very useful where most developers spend (or should be spending) most of their time: writing domain logic. Let’s see how CW might break down there. We can start by pulling an example from the Python docs for dataclasses.

I wonder if CodeWhisperer can help add a method to this class. Let's see what happens if I append the comment: "Function that returns whether this item costs more than $10"

Very cool. Notice CW gave the function an intuitive name and included reference to self. I wonder if it would be pushing the envelope if I tried to use CW to help me test.

🤯 Wow. In the above code, I typed the comment and CW did the rest. Testing seems to be a fantastic example of where CW can save time. I didn’t need to waste brain cycles on test values and typing out all the member variables and methods.

CodeWhisperer’s Limitations

It’s still the early days with code assistants and there are plenty of kinks. Researchers have found that GitHub CoPilot generated code with security vulnerabilities 40% of the time. Data is not available for CodeWhisperer yet but AWS seems to emphasize the focus on security.

In my testing I experienced some examples where CW generated functions with bugs or the result did not reflect my intention. This example is supposed to return the longest matching line between two files but just returns the first line that matches:

Other times CW fell short for me are when it didn’t have enough context to understand my intention. But upon reflection, I think it could have come through if the surrounding code was structured better. If you’ve designed your code with classes that effectively represent the nouns of your domain, it’s easy to imagine CW would be able to help create domain-specific logic given well-defined comments. And as for the bugs, these will certainly improve over time.

Closing Thoughts

If you get the chance to experiment with CW, it might lead you to imagine there could be a day where someone will write the very last human-written line of code in history. Until then, CW can help you become a better programmer so that, if the world’s last coder is you, humanity’s last line of code won’t have a bug.

Recommended Posts