Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python (continues issue/discussion #3) #7

Open
kevinlekiller opened this issue Aug 17, 2016 · 89 comments
Open

Python (continues issue/discussion #3) #7

kevinlekiller opened this issue Aug 17, 2016 · 89 comments

Comments

@kevinlekiller
Copy link
Owner

No description provided.

@WhitePeter
Copy link
Collaborator

Yeah, I did it. Now I just need to rename things. Hopefully nothing breaks. But I am quite pleased with my solution. What do you think?

@kevinlekiller
Copy link
Owner Author

That's a good solution, much simpler than what I had in mind, just 2 things I'd change, the help (the numbers are not useful anymore) and moving the dict inside the class (I try to avoid global variables, in this case they are only used in this class). I'm not sure if it works but you could also try assigning choices= with sorted(LOGLEVELS.keys(), key=LOGLEVELS.get, reverse=True), saves a bit of memory.

@WhitePeter
Copy link
Collaborator

WhitePeter commented Aug 17, 2016

I'm not sure if it works but you could also try assigning choices= with sorted(LOGLEVELS.keys(), key=LOGLEVELS.get, reverse=True)

Yes, that works, to, but I figured, we might need it again. A previous version did reuse it, so I wanted to cache this.

BTW, how can I get rid of this, we set the loglevel, so why is there a default in this place? simply removing the assignment and only leaving loglevel in gets me complaints, that there is one too few arguments.

@kevinlekiller
Copy link
Owner Author

You'll need to refactor a bit, the issue is handler.setLevel(verbosity). Maybe make handler a property and change the level of it in the method set_verbosity, we only have 1 handler anyways, so it should work.

@WhitePeter
Copy link
Collaborator

Yes, I just noticed, set_verbosity() or now set_loglevel() is only used once. Does that really need to be a method?
Also, it is a bit too long for my taste, how about set_level()? The log part is clear already from the context.

@WhitePeter
Copy link
Collaborator

Hmm, and I now still get at least one INFO at -v warning

@kevinlekiller
Copy link
Owner Author

You could do for example, from the main function : log.logger.setLevel(20) log.handler.setLevel(20), but if you move the handler.setLevel into the Log.set_level, then you kill 2 birds with 1 stone, since you just need to call log.set_level(20) and it will set both.

@kevinlekiller
Copy link
Owner Author

Hmm, and I now still get at least one INFO at -v warning

The issue is the Log class is the first instanced, since it's required by everything else, but at that point we've not parsed any arguments, so we set the default loglevel to 20 until we've parsed the arguments.

@WhitePeter
Copy link
Collaborator

WhitePeter commented Aug 17, 2016

That doesn't sound right. We should make sure the arguments get parsed first, then. argparse emits its own messages anyway.

@kevinlekiller
Copy link
Owner Author

That's what I originally wanted, but I wasn't sure if you were set on using the logger for the argument parsing class so I tried to accommodate, but if we both agree, then we should remove logger from the argument parser to make things a bit simpler.

@kevinlekiller
Copy link
Owner Author

If we change it, then we can do in the main function: Utils, checkArgs, Log, then set the Log instance on Utils.log

@WhitePeter
Copy link
Collaborator

So, you're saying everything related to logging needs to go from the argparser, right?

@WhitePeter
Copy link
Collaborator

But wait, how can we then set the loglevel at the argparsing stage?

@kevinlekiller
Copy link
Owner Author

Oops, was going to edit (deleted).

I think it would make the code a bit more simple.

@WhitePeter
Copy link
Collaborator

Oh crap, how can I reopen this?

@WhitePeter
Copy link
Collaborator

Ah, found it ;)

@WhitePeter WhitePeter reopened this Aug 17, 2016
@WhitePeter
Copy link
Collaborator

Yes, but since we need to set verbosity by a method, Log must be instanciated at the argparsing stage already, or am I missing something? Is this maybe a Catch 22 situation? And if so, how to solve it?

@kevinlekiller
Copy link
Owner Author

kevinlekiller commented Aug 17, 2016

Basically, we make a Utils instance in main (utils = Utils()) then we check the args (checkArgs(utils)) in main again, then we set the Utils instance log property: utils.log = Log(utils.loglevel) in main also.

utils.loglevel is a property we set in checkArgs (self.utils.loglevel = args.loglevel)

Something like that?

@WhitePeter
Copy link
Collaborator

Um, OK, have to digest this first I think. This OOP stuff still makes my head hurt. ;)

@kevinlekiller
Copy link
Owner Author

kevinlekiller commented Aug 17, 2016

Think of it like a array kind of:

x = [0 =[0 = "z"], 1 = [0 = "a", 1 = "b"]]

x[0][0] is "z"
x[1][0] is "a"
x[1][1] is "b"

class X:
def init(self):
self.0 = 0()
self.1 = 1()

class 0:
0 = "z"

class 1:
0 = "a"
1 = "b"

x = X()
x.0.0 is "z"
x.0.0 is "a"
x.0.1 is "b"

@kevinlekiller
Copy link
Owner Author

In a more practical example:

class Options:
maximum_size = 100

def init(self):

Here I change the Options.maximum_size from 100 to 50

self.maximum_size = 50

def change_max_size(self, size):
self.maximum_size = size

options = Options() # I create a instance of options
print(options.maximum_size) # This will print 50
options.maximum_size = 100 # I set it to 100
options.change_max_size(200) # Here's another way to change the size.

@WhitePeter
Copy link
Collaborator

You're making it worse, ow. :-D Maybe I just need a break.

BTW, what does __get_ref_loudness have to do in Checkargs()?

@kevinlekiller
Copy link
Owner Author

kevinlekiller commented Aug 17, 2016

It doesn't but I didn't see a better place to put it, I figured it was something that should only be run once, and Checkargs() is erased from memory after it's ran (since we don't store it in a variable), if we put it in Utils() for example, then it will remain in memory until the program exits (since we pass utils instance to ThreadMkvrg).

@WhitePeter
Copy link
Collaborator

WhitePeter commented Aug 17, 2016

OK, but from a logical POV it does not make sense. It makes it really hard for me to grasp, what's going on and why.

@kevinlekiller
Copy link
Owner Author

We might want to rename CheckArgs, since it also checks binaries, maybe call it Setup or something (you're better at names than me).

@WhitePeter
Copy link
Collaborator

Oh no, it does too many things I think. Let's make smaller objects. And have a real structure. Isn't that the point of OOP? Have something parse the arguments, then something that checks those for sanity and if the program has all it needs. Or check first, if the binaries are in place. If not, why parse arguments? Fail then and there.

@kevinlekiller
Copy link
Owner Author

b"matroska" in data works and doesn't need to convert to string

@WhitePeter
Copy link
Collaborator

Thx, I am just putting it together.

@WhitePeter
Copy link
Collaborator

Done. Looks good so far.

@kevinlekiller
Copy link
Owner Author

I found a PHP library which reads matroska files (and tags), I was trying to figure out how it works, I think I got a decent idea, but it's a pain, it had to go through every byte checking if the bytes match those in the page you listed above, there's no order to how those can be done in a matroska so it has to do conditionals on every byte and change byte offset on the file handle every time accordingly. I was hoping I could just write something simple in python just to read the tags, but that's not possible, going through everything is required, to make sure the byte offset is at the right position.

@WhitePeter
Copy link
Collaborator

Oh my, looks like it's not worth it. I am still convinced that using the json from mkvinfo and conversion to and from xml can work. We should have everything we need to find the right place to put our data into the json object. Then convert it back to XML and feed that to mkvpropedit. Maybe in the beginning just write all tags back with --tags all:....

@kevinlekiller
Copy link
Owner Author

Yeah if you want to see the nightmare, here it is : https://github.com/JamesHeinrich/getID3/blob/master/getid3/module.audio-video.matroska.php#L497

@WhitePeter
Copy link
Collaborator

LOL, it's only ~800 lines of code. No biggy. :P

@kevinlekiller
Copy link
Owner Author

@005a58d, I think that does not work as expected, I think the first conditional is always true.

@kevinlekiller
Copy link
Owner Author

@WhitePeter
Copy link
Collaborator

@005a58d, I think that does not work as expected, I think the first conditional is always true.

Yeah, I thought so myself, reverted it already.

@WhitePeter
Copy link
Collaborator

Didn't use any() though. I think that is overkill for only 2 strings.

WhitePeter added a commit that referenced this issue Aug 21, 2016
Actually use a method of MkxFile().
WhitePeter added a commit that referenced this issue Aug 21, 2016
Again, the code does not get used. I hope however that the idea gets across.
WhitePeter added a commit that referenced this issue Aug 24, 2016
Now we actually leverage MatroskaFile(), but only, for the time being,
to basically replicate the prior behaviour. The goal remains to
multiprocess on an audio track level, so multiple tracks of one and the
same file can be processed concurrently.
WhitePeter added a commit that referenced this issue Aug 24, 2016
Think I am getting the hang of super(), hopefully. Also, some minor cleanup, i.e. use
self.utils.log.. instead of self.log.. to avoid confusion.
WhitePeter added a commit that referenced this issue Aug 25, 2016
since to be processed it must be an actual file of type matroska, which this class guarantees for
its objects, move the methods that deal with something involving a path here.
@kevinlekiller
Copy link
Owner Author

Just had a look at your commits, fine work!

I'm still alive, just a bit busy right now.

@WhitePeter
Copy link
Collaborator

Thanks, I am glad you like it. I've learned a great deal in the past few days. And I have failed, a lot. ;)
And I think I might have been sloppy on the way. Let's see, if I can "harden" the class attributes a bit more, like proper accessors and mutators for data attributes, make the latter as private as python allows, hide as much as possible.
You think it is worth it, or should we keep something like that for later?

@kevinlekiller
Copy link
Owner Author

I think it's a good idea, if you do it now it might be easier than trying to do it later.

WhitePeter added a commit that referenced this issue Aug 26, 2016
Make the path of an MkxFile private and provide accessor. Also, check on initialization that we
have a file at our hands, raise exception if not. This way we can just try and except through the
list of arguments, could maybe leverage that for the recursive search maybe.
WhitePeter added a commit that referenced this issue Aug 26, 2016
I consider this a semi-major change. Prior to this we always created two instances of
MatoroskaFile(), one while filling utils.files and checking if it is a matroska and then when
actually processing in process_thread(). But at that stage we have all the information we need
already. So pass the path of the file, which is guaranteed to be a matroska into the queue. Since
utils.files has now become an ordered dictionary, it also holds an entire MatroskaFile object with
all the methods attached. Before we just threw away that object when we only tried initializing the
object and then only take its path and put that in utils.files.
While that might hurt memory usage I think it is worth it for not having to do the checking twice.
@WhitePeter
Copy link
Collaborator

Huh, somewhere along the way loglevel setting went missing. I've put in some debug messages which I still cannot see at -v debug. :(

@WhitePeter
Copy link
Collaborator

FYI, I am currently experimenting with the json module to get the old tags. I think, once we get to the stage where we can just add our tags without destroying prior unrelated ones, we have something to work with.
I think this tagging thing even deserves its own module. It can come in quite handy if one wants to manipulate mkx tags of any kind.

@kevinlekiller
Copy link
Owner Author

Indeed, there's a lack of tools when it comes to that.

Btw, just some quick testing with mkvinfo:

mkvinfo -z file

This shows the size of all the EBML entries:

+ EBML head size 40
|+ EBML version: 1 size 4
|+ EBML read version: 1 size 4
|+ EBML maximum ID length: 4 size 4
|+ EBML maximum size length: 8 size 4
|+ Doc type: matroska size 11
|+ Doc type version: 4 size 4
|+ Doc type read version: 2 size 4
+ Segment, size 1802186950 size 1802186962
|+ Seek head (subentries will be skipped) size 86
|+ EbmlVoid (size: 4004) size 4013
|+ Segment information size 121
| + Timecode scale: 1000000 size 7
| + Muxing application: libebml v1.3.4 + libmatroska v1.4.5 size 38
| + Writing application: mkvmerge v9.4.0 ('Knurl') 64bit size 34
| + Duration: 2579.953s (00:42:59.953) size 7
| + Date: Thu Aug 25 02:52:28 2016 UTC size 11
| + Segment UID: 0x8e 0xd9 0xfe 0x37 0x01 0xbc 0xa4 0x89 0xa3 0x71 0xcb 0xd1 0x79 0xa4 0x04 0x50 size 19
|+ Segment tracks size 210
| + A track size 121
|  + Track number: 1 (track ID for mkvmerge & mkvextract: 0) size 3
|  + Track UID: 18308047031429458341 size 11
|  + Track type: video size 3
|  + Lacing flag: 0 size 3
|  + MinCache: 1 size 4
|  + Codec ID: V_MPEG4/ISO/AVC size 17
|  + CodecPrivate, length 43 (h.264 profile: High @L4.0) size 46
|  + Default duration: 41.708ms (23.976 frames/fields per second for a video track) size 8
|  + Video track size 24
|   + Pixel width: 1916 size 4
|   + Pixel height: 1076 size 4
|   + Display width: 1916 size 7
|   + Display height: 1076 size 7
| + A track size 45
|  + Track number: 2 (track ID for mkvmerge & mkvextract: 1) size 3
|  + Track UID: 13819279113119135370 size 11
|  + Track type: audio size 3
|  + Codec ID: A_AC3 size 7
|  + Default duration: 32.000ms (31.250 frames/fields per second for a video track) size 8
|  + Audio track size 11
|   + Sampling frequency: 48000 size 6
|   + Channels: 6 size 3
| + A track size 38
|  + Track number: 3 (track ID for mkvmerge & mkvextract: 2) size 3
|  + Track UID: 1016340927463145369 size 11
|  + Track type: subtitles size 3
|  + Default flag: 0 size 3
|  + Lacing flag: 0 size 3
|  + Codec ID: S_TEXT/UTF8 size 13
|+ EbmlVoid (size: 1147) size 1150
|+ Cluster size 54904

By seeking, we can get some data, for example "Segment information":

import binascii
handle = open(path, "rb")
# 40 is EBML head size
# 8 is the amount of bytes between the EBML head and SeekHead, this was the same for all files I tested.
# 86 is Seek head
# 4 is amount of bytes between Seek head and EbmlVoid, same for all files I tested
# 4013 is EbmlVoid size
handle.seek(40 + 8 + 86 + 4 + 4013)
data = handle.read(121)
hex = str(binascii.hexlify(data), 'ascii')
hex = ':'.join(hex[i:i+2] for i in range(0, len(hex), 2))
print(data)
print(hex)
handle.close()

Here's the output (string representation and hex under):

b"\x15I\xa9f\xf4*\xd7\xb1\x83\x0fB@M\x80\xa3libebml v1.3.4 + libmatroska v1.4.5WA\x9fmkvmerge v9.4.0 ('Knurl') 64bitD\x89\x84J\x1dw\xc4Da\x88\x06\xdaH\x11;z8\x00s\xa4\x90\x8e\xd9\xfe7\x01\xbc\xa4\x89\xa3q\xcb\xd1y\xa4\x04P"

15:49:a9:66:f4:2a:d7:b1:83:0f:42:40:4d:80:a3:6c:69:62:65:62:6d:6c:20:76:31:2e:33:2e:34:20:2b:20:6c:69:62:6d:61:74:72:6f:73:6b:61:20:76:31:2e:34:2e:35:57:41:9f:6d:6b:76:6d:65:72:67:65:20:76:39:2e:34:2e:30:20:28:27:4b:6e:75:72:6c:27:29:20:36:34:62:69:74:44:89:84:4a:1d:77:c4:44:61:88:06:da:48:11:3b:7a:38:00:73:a4:90:8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50

As we can see, the 15:49:a9:66 matches with segment info on: https://matroska.org/technical/specs/index.html

Also near the end we have: 73:a4:90:8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50, 7a:a4 is SegmentUID according to https://matroska.org/technical/specs/index.html, 8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50 matches the 0x8e 0xd9 0xfe 0x37 0x01 0xbc 0xa4 0x89 0xa3 0x71 0xcb 0xd1 0x79 0xa4 0x04 0x50 from the mkvinfo output.

Maybe we can get the tags like this, by seeking to the appropriate location.

@WhitePeter
Copy link
Collaborator

WhitePeter commented Aug 28, 2016

I am using the json data one gets when running mkvmerge -i --idfentification-format json .... It's all in there. I can output the tags as 'REPLAY_GAIN_PEAK': '1.00123'. It is only a matter of time til I figure out where to hook that in. You should have put everything in place on the XML front, I noticed. It just lacked a way to preserve the original tags. Think I have found that way:

#!/usr/bin/env python
"""Find and manipulate matroska tags"""
from __future__ import print_function
import sys
import json
import os
import subprocess
from StringIO import StringIO


def main(args):
    for arg in args[1:]:
        if os.path.isfile(arg):
            mkvmerge_id_json_data = get_mkxident(arg)
            # print(repr(mkvmerge_id_json_data))
            # print("{}".format(mkvmerge_id_json_data.keys()))
            # print("Track tags: {}".format(mkvmerge_id_json_data["track_tags"]))
            tracks = mkvmerge_id_json_data["tracks"]
            # print("Tracks: {}".format(tracks))
            for track_id in range(len(tracks)):
                assert track_id == tracks[track_id]["id"]
                # This track_id is what bs1770gain uses as index for the --audio option.
                properties = tracks[track_id]["properties"]
                print("track_id: {}, track_uid: {}".format(track_id, properties["uid"]))
                # print("""'tracks[{}]["properties"]': {}""".format(tracks.index(track),
                # properties))
                for key in properties:
                    if key.startswith("tag_"):
                        print("""We've found tag '{}', with value '{}'."""
                              .format(key[4:].upper(), properties[key]))


def get_mkxident(path):
    """
    This gets us all the relevant info, including tags(!), there is to a matroska file, any
    mediafile for that matter. This is actually a list of dicts. We will be interested in the
    "tracks" dict, like get_mkxident(path)[u"tracks"][track_id][u"properties"], where we will find
    the tags, if any exist.
    """
    return json.load(
        StringIO(
            subprocess.check_output(
                ["mkvmerge", "-i", "--identification-format", "json", os.path.normpath(path)]
            )
        )
    )


if __name__ == "__main__":
    main(sys.argv)

@WhitePeter
Copy link
Collaborator

WhitePeter commented Aug 28, 2016

OK, updated my double-post.

@WhitePeter
Copy link
Collaborator

Just FYI, I am still alive. I experimented with above code a little and it looked very promising until I tried to run it on python3. It seems there is fundamental change between python2 and 3 when it comes to processing the json data. I still haven't figured out exactly, where it goes wrong. Strictly speaking StringIO should not be necessary for this, so I went without it, because it was the first to complain. The code runs fine without it in python2 but now json.decode() trips over something else.
Having read only a minimum about the differences and incompatibilities between the major python versions I can only assume that it must have something to do with how python3 "thinks" about file I/O.

I'll keep at it, though, but maybe python2 compatibility was not the best of ideas, after all. :(

On the plus side, I am able to store the original tags in a nice dict construct like {track_id: {tag_name: tag_value}. Eventually that will become an attribute of MatroskaFile() or even MatroskaTrack(). If only there wasn't this teeny problem of running on the wrong python version. :P But there must be a way and I am going to find it.

@kevinlekiller
Copy link
Owner Author

Yeah the inconsistencies between python 2 and 3 are annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants