Controversy Surrounds chardet's License Change Amid AI-Driven Rewrite

Computer engineers and programmers have long utilized reverse engineering to replicate the functionality of computer programs without directly copying the copyright-protected code. However, AI coding tools are now introducing new complexities to this 'clean room' rewrite process, touching on legal, ethical, and practical considerations.

These complexities became particularly evident last week with the release of a new version of chardet, a widely used open-source Python library for automatic character encoding detection. Initially developed by Mark Pilgrim in 2006, chardet was distributed under the LGPL license, which imposes stringent conditions on its reuse and distribution.

Dan Blanchard, who has maintained the repository since 2012, released version 7.0 of chardet last week. He described it as a 'ground-up, MIT-licensed rewrite' that was enhanced with the help of Claude Code, making it 'much faster and more accurate' than previous versions.

In a conversation with The Register, Blanchard expressed his long-standing desire to integrate chardet into the Python standard library. However, license restrictions, speed issues, and accuracy limitations had previously hindered this goal. With the support of Claude Code, Blanchard claimed he managed to upgrade the library 'in roughly five days', achieving a 48-fold increase in performance.

Despite these improvements, not everyone is pleased with the outcome. A user under the name Mark Pilgrim emerged on GitHub to argue that the new version's MIT licensing constitutes an unauthorized re-licensing of Pilgrim's original code. Pilgrim contends that as a modification of the LGPL-licensed code, the new version of chardet should uphold the same LGPL license.