Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extracting message size from BODYSTRUCTURE - extra layer of brackets in result #475

Closed
mungewell opened this issue Jun 22, 2022 · 1 comment

Comments

@mungewell
Copy link

I am trying to extract the size of a message from the BODYSTRUCTURE report. I notice that it contains a multipart parameter, and this lets me evaluate the report.

However it seems that I get a third type of result, which contains an extra 'layer'.

---
([(b'text', b'plain', (b'charset', b'US-ASCII', b'format', b'flowed'), None, None, b'7bit', 39, 0, None, None, None, None), (b'message', b'rfc822', (b'name', b'Dell Order Has Been Acknowledged.eml'), None, None, b'8bit', 37758, (b'7 May 2020 19:23:25 -0500', b'=?utf-8?B?RGVsbCBPcmRlciBIYXMgQmVlbiBBY2tub3dsZWRnZWQ=?=', ((b'Dell Canada Online Sales', None, b'dell_automated_email', b'dell.com'),), ((b'Dell Automated Email', None, b'automated_email', b'dell.com'),), ((b'Dell Canada Online Sales', None, b'dell_automated_email', b'dell.com'),), ((None, None, b'simon', b'mungewell.org'),), None, None, None, b'<[email protected]>'), (b'text', b'html', (b'charset', b'utf-8'), None, None, b'7bit', 33956, 500, None, None, None, None), 574, None, (b'attachment', (b'filename', b'Dell Order Has Been Acknowledged.eml', b'size', b'37758')), None, None)], b'mixed', (b'boundary', b'=_5b7507bbac3255404153ce14dd599572'), None, None, None)
multipart: 37797
---
(b'text', b'plain', (b'charset', b'UTF-8', b'format', b'flowed'), None, None, b'8bit', 4368, 219, None, None, None, None)
singlepart: 4368
---
(b'text', b'plain', (b'charset', b'UTF-8', b'format', b'flowed'), None, None, b'8bit', 3387, 90, None, None, None, None)
singlepart: 3387
---
([([(b'text', b'plain', (b'charset', b'utf-8'), None, None, b'quoted-printable', 41, 2, None, None, None, None), (b'text', b'html', (b'charset', b'utf-8'), None, None, b'quoted-printable', 275, 5, None, None, None, None)], b'alternative', (b'boundary', b'----5C9PD0MHM0TEX79M75OIIFNDOBHL76'), None, None, None), (b'image', b'jpeg', (b'name', b'20200514_112103.jpg'), None, None, b'base64', 2657010, None, (b'attachment', (b'filename', b'20200514_112103.jpg', b'size', b'1941660')), None, None)], b'mixed', (b'boundary', b'----RLFQJ3Q7Y8M4ND8TW7OSOP76VYXBTO'), None, None, None)
Traceback (most recent call last):
  File "move_too_big_imap.py", line 36, in <module>
    size += part[6]
IndexError: tuple index out of range

Why does that last message have ([([(b'text, rather than ([(b'text'??

Extract of my code

    for msgid, data in mail.fetch(smaller, ['BODYSTRUCTURE']).items():
        bodystructure = data[b'BODYSTRUCTURE']
        print("---")
        print(bodystructure)

        if bodystructure:
            if bodystructure.is_multipart:
                size = 0
                for part in bodystructure[0]:
                    size += part[6]
                print("multipart: %d" % size)
            else:
                size = bodystructure[6]
                print("singlepart: %d" % size)

            if size > maxsize:
                print("Size: %d" % size)
@mjs
Copy link
Owner

mjs commented Jul 9, 2022

Your program is making assumptions about the possible structures for emails. Email parts can be arbitrarily nested. The message that is breaking your code is fairly complex. Reformatting the BODYSTRUCTURE for readability:

(
  [ 
    (
      [ 
        (b'text', b'plain', (b'charset', b'utf-8'), None, None, b'quoted-printable', 41, 2, None, None, None, None), 
         (b'text', b'html', (b'charset', b'utf-8'), None, None, b'quoted-printable', 275, 5, None, None, None, None)
      ], 
      b'alternative', (b'boundary', b'----5C9PD0MHM0TEX79M75OIIFNDOBHL76'), None, None, None
    ), 
    (b'image', b'jpeg', (b'name', b'20200514_112103.jpg'), None, None, b'base64', 2657010, None, 
    (b'attachment', (b'filename', b'20200514_112103.jpg', b'size', b'1941660')), None, None),
  ], 
  b'mixed', (b'boundary', b'----RLFQJ3Q7Y8M4ND8TW7OSOP76VYXBTO'), None, None, None,
)

At the outer layer, the email is multipart/mixed. Within that, the first part is a multipart/alternative part which itself contains plain text and HTML versions of the main email text. That's then followed by an inline jpeg and then a jpeg attachment (seemingly for the same image based on the filename!).

In order to handle any email structure your code needs to recursively walk through the BODYSTRUCTURE responses. I hope that makes sense.

@mjs mjs closed this as completed Jul 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants