Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: file data error #275

Open
1 task done
ywapple opened this issue Feb 6, 2025 · 3 comments
Open
1 task done

[Bug]: file data error #275

ywapple opened this issue Feb 6, 2025 · 3 comments
Assignees
Labels

Comments

@ywapple
Copy link

ywapple commented Feb 6, 2025

bit7z version

4.0.x

Compilation options

BIT7Z_7ZIP_VERSION, BIT7Z_AUTO_FORMAT, BIT7Z_USE_NATIVE_STRING

7-zip version

v23.01

7-zip shared library used

7z.dll / 7z.so

Compilers

MSVC

Compiler versions

vs2022

Architecture

x86

Operating system

Windows

Operating system versions

windows11

Bug description

BitFileExtractor extractor{ lib, BitFormat::Auto };
I have a file that is damaged and has a size of 1.5GB
If I use the extractor. extract (zipFile, strOutDir) function, I may only be able to extract a few files and exit abnormally
If I use the extractor.xtractMatching function to extract all files in sequence, I can continue to extract many more files
But the performance is really low and the speed is very slow!
try
{
extractor.extractMatching(zipFile, iten.first, strTemp.GetString());
}
catch (const bit7z::BitException& ex)
{
string strMsg = ex.what();
auto failedFiles = ex.failedFiles();
//auto errorType = ZipErrorType(strMsg);
OutputDebugStringA(strMsg.c_str());
}
Is there any way to solve this problem?
The exception reported in this file is:
Failed to extract the archive: Data error.

Steps to reproduce

No response

Expected behavior

No response

Relevant compilation output

Code of Conduct

@rikyoz
Copy link
Owner

rikyoz commented Feb 10, 2025

Hi!
Sorry for the late reply.

I have a file that is damaged and has a size of 1.5GB
If I use the extractor. extract (zipFile, strOutDir) function, I may only be able to extract a few files and exit abnormally
If I use the extractor.xtractMatching function to extract all files in sequence, I can continue to extract many more files
But the performance is really low and the speed is very slow!

The reason for the low performance might be that extractMatching first reads the whole archive, stores the indices of the items that match the pattern, and then does the actual extraction, passing the vector of indices to 7-Zip.This preprocessing can be expensive, especially for large archives.
I have some ideas on how to avoid the preprocessing, but I need to test them, hopefully before the next v4.1-beta.

Also, you didn't post all the code, but if you happen to call extractMatching in a loop, or multiple times, the preprocessing will be done multiple times, further degrading performance. Ideally you should only call extractMatching once.

Is there any way to solve this problem?
The exception reported in this file is:
Failed to extract the archive: Data error.

As I said, I have some ideas for improving the performance of extractMatching that I need to test.
I also want to see if 7-Zip can ignore errors while performing an operation (something like skipping bad files) to allow bit7z users to choose the best error handling approach for their use case. But even if this is possible, I probably won't implement it anytime soon.

@ywapple
Copy link
Author

ywapple commented Feb 11, 2025

Your analysis process inspired me
I'll call directly :BitInputArchive inputArchive(extractor, zipFile);

			for (auto& iten : mapIdxFiles)
			{
				try
				{
					vecItems.clear();
					vecItems.push_back(iten.first->index);
					inputArchive.extractTo(strTemp, vecItems);
					//extractor.extractMatching(zipFile, iten.first, strTemp);
				}
				catch (const bit7z::BitException& ex)
				{
					strMsg = ex.what();
					errorType = ZipErrorType(strMsg, &strError);
					....
				}

}

@rikyoz
Copy link
Owner

rikyoz commented Feb 11, 2025

Good!
As a side note, your code made me realize that it is actually possible to implement an overload for the extractTo method that doesn't require to allocate a std::vector object, allowing to directly call inputArchive.extractTo(strTemp, iten.first->index) in your case.
I'll implement it as soon as possible, either in the next v4.0.10 or in the v4.1-beta.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants