Skip to content

Potential Data Truncation in Task Logs Leading to Errors with php spark tasks:list #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
b1tc0re opened this issue May 19, 2025 · 5 comments · May be fixed by #188
Open

Potential Data Truncation in Task Logs Leading to Errors with php spark tasks:list #187

b1tc0re opened this issue May 19, 2025 · 5 comments · May be fixed by #188
Labels
bug Something isn't working

Comments

@b1tc0re
Copy link

b1tc0re commented May 19, 2025

Description:

In the current implementation, the exception written to $taskLog->error is serialized in its entirety using serialize($taskLog->error ?? null) and stored in the database. The field used to store this data is of type TEXT. The serialized exception object can become quite large, especially when the exception includes a long stack trace or nested exceptions. This can lead to data truncation when writing to the TEXT field, as its size is limited. This data truncation causes errors when calling php spark tasks:list

Solution:

Instead of storing the entire serialized exception, you can store only the critical exception data. This will reduce the size of the data that needs to be stored and will ensure that the exception information is stored more reliably. However, this will not completely solve the problem.

Example Implementation:

   protected function updateLogs(TaskLog $taskLog)
    {
        if (setting('Tasks.logPerformance') === false) {
            return;
        }

        // "unique" name will be returned if one wasn't set
        $name = $taskLog->task->name;

        $errorData = null;
        if ($taskLog->error) {
            $errorData = [
                'message' => $taskLog->error->getMessage(),
                'code'    => $taskLog->error->getCode(),
                'file'    => $taskLog->error->getFile(),
                'line'    => $taskLog->error->getLine(),
            ];
            // Consider limiting stack trace if necessary:
            // $errorData['trace'] = array_slice($taskLog->error->getTrace(), 0, 5);
        }

        $data = [
            'task'     => $name,
            'type'     => $taskLog->task->getType(),
            'start'    => $taskLog->runStart->format('Y-m-d H:i:s'),
            'duration' => $taskLog->duration(),
            'output'   => $taskLog->output ?? null,
            'error'    => serialize($errorData ?? null),
        ];

        // Get existing logs
        $logs = setting("Tasks.log-{$name}");
        if (empty($logs)) {
            $logs = [];
        }

        // Make sure we have room for one more
        /** @var int $maxLogsPerTask */
        $maxLogsPerTask = setting('Tasks.maxLogsPerTask');
        if ((is_countable($logs) ? count($logs) : 0) > $maxLogsPerTask) {
            array_pop($logs);
        }

        // Add the log to the top of the array
        array_unshift($logs, $data);

        setting("Tasks.log-{$name}", $logs);
    }
@michalsn michalsn added the bug Something isn't working label May 20, 2025
@michalsn
Copy link
Member

We have two options to consider:

1. Continue storing logs via the settings library

This approach requires truncating logs based on the available space in the TEXT field, which varies between database engines.
We'll need to calculate the maximum log length based on the field size and the configured limit for the number of stored logs.

However, this method has limitations:

  • If the maximum log count is set too high (e.g., 100), it may effectively prevent logs from being stored at all.
  • This issue is especially problematic with certain drivers like OCI8, where field size for TEXT is 4000 bytes.

2. Create a dedicated table for task logs

This option allows for a more appropriate and scalable log storage approach:

  • We can use a large field to store the complete error object and stack trace.
  • Avoids field size limitations and makes querying logs more efficient and flexible.

What do you think @lonnieezell @paulbalandan @samsonasik ?

@paulbalandan
Copy link
Member

Sorry, I'm not familiar with why is the full error object needed to be serialized. Can we not just get the exception class, message, file and line?

@michalsn
Copy link
Member

Actually, I forgot about this one, but yes - we can store just the basic info. I don't see a lot of benefits in storing everything.

@michalsn
Copy link
Member

michalsn commented May 21, 2025

This is exactly what we do in the queue library: https://github.com/codeigniter4/queue/blob/develop/src/Handlers/BaseHandler.php#L171

@michalsn michalsn linked a pull request May 21, 2025 that will close this issue
5 tasks
@michalsn
Copy link
Member

I made a PR: #188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants