Skip to content

Undefined Behavior in _PyUnicodeWriter_WriteASCIIString: NULL pointer passed to memcpy when len is 0 #146196

@ashm-dev

Description

@ashm-dev

Bug report

Bug description:

The function _PyUnicodeWriter_WriteASCIIString in Objects/unicodeobject.c (or Objects/unicode_writer.c in some versions) contains a potential Undefined Behavior. When len is 0 and ascii is NULL, it calls memcpy with a NULL source pointer.

According to the C standard, passing NULL to memcpy is undefined even if the count is zero.

Proof of Concept

Clang's UndefinedBehaviorSanitizer (UBSan) reports:
../Objects/unicode_writer.c:494:36: runtime error: null pointer passed as argument 2, which is declared to never be null

This happens in the following block:

    case PyUnicode_1BYTE_KIND:
    {
        const Py_UCS1 *str = (const Py_UCS1 *)ascii;
        Py_UCS1 *data = writer->data;
        memcpy(data + writer->pos, str, len); // <--- UB if str is NULL and len is 0
        break;
    }

Mitigation

The function should return early if len == 0. This is a common pattern in CPython to avoid unnecessary work and prevent UB with memory functions.

    if (len == -1)
        len = strlen(ascii);
    if (len == 0)
        return 0;

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions