Python (continues issue/discussion #3) #7

kevinlekiller · 2016-08-17T17:04:17Z

No description provided.

WhitePeter · 2016-08-17T18:17:42Z

Yeah, I did it. Now I just need to rename things. Hopefully nothing breaks. But I am quite pleased with my solution. What do you think?

kevinlekiller · 2016-08-17T18:30:33Z

That's a good solution, much simpler than what I had in mind, just 2 things I'd change, the help (the numbers are not useful anymore) and moving the dict inside the class (I try to avoid global variables, in this case they are only used in this class). I'm not sure if it works but you could also try assigning choices= with sorted(LOGLEVELS.keys(), key=LOGLEVELS.get, reverse=True), saves a bit of memory.

WhitePeter · 2016-08-17T18:48:14Z

I'm not sure if it works but you could also try assigning choices= with sorted(LOGLEVELS.keys(), key=LOGLEVELS.get, reverse=True)

Yes, that works, to, but I figured, we might need it again. A previous version did reuse it, so I wanted to cache this.

BTW, how can I get rid of this, we set the loglevel, so why is there a default in this place? simply removing the assignment and only leaving loglevel in gets me complaints, that there is one too few arguments.

kevinlekiller · 2016-08-17T18:53:08Z

You'll need to refactor a bit, the issue is handler.setLevel(verbosity). Maybe make handler a property and change the level of it in the method set_verbosity, we only have 1 handler anyways, so it should work.

WhitePeter · 2016-08-17T18:57:53Z

Yes, I just noticed, set_verbosity() or now set_loglevel() is only used once. Does that really need to be a method?
Also, it is a bit too long for my taste, how about set_level()? The log part is clear already from the context.

WhitePeter · 2016-08-17T19:00:27Z

Hmm, and I now still get at least one INFO at -v warning

kevinlekiller · 2016-08-17T19:02:05Z

You could do for example, from the main function : log.logger.setLevel(20) log.handler.setLevel(20), but if you move the handler.setLevel into the Log.set_level, then you kill 2 birds with 1 stone, since you just need to call log.set_level(20) and it will set both.

kevinlekiller · 2016-08-17T19:04:39Z

Hmm, and I now still get at least one INFO at -v warning

The issue is the Log class is the first instanced, since it's required by everything else, but at that point we've not parsed any arguments, so we set the default loglevel to 20 until we've parsed the arguments.

WhitePeter · 2016-08-17T19:06:36Z

That doesn't sound right. We should make sure the arguments get parsed first, then. argparse emits its own messages anyway.

kevinlekiller · 2016-08-17T19:08:35Z

That's what I originally wanted, but I wasn't sure if you were set on using the logger for the argument parsing class so I tried to accommodate, but if we both agree, then we should remove logger from the argument parser to make things a bit simpler.

kevinlekiller · 2016-08-17T19:10:53Z

If we change it, then we can do in the main function: Utils, checkArgs, Log, then set the Log instance on Utils.log

WhitePeter · 2016-08-17T19:11:42Z

So, you're saying everything related to logging needs to go from the argparser, right?

WhitePeter · 2016-08-17T19:14:06Z

But wait, how can we then set the loglevel at the argparsing stage?

kevinlekiller · 2016-08-17T19:14:12Z

Oops, was going to edit (deleted).

I think it would make the code a bit more simple.

WhitePeter · 2016-08-17T19:14:40Z

Oh crap, how can I reopen this?

WhitePeter · 2016-08-17T19:14:59Z

Ah, found it ;)

WhitePeter · 2016-08-17T19:17:06Z

Yes, but since we need to set verbosity by a method, Log must be instanciated at the argparsing stage already, or am I missing something? Is this maybe a Catch 22 situation? And if so, how to solve it?

kevinlekiller · 2016-08-17T19:17:14Z

Basically, we make a Utils instance in main (utils = Utils()) then we check the args (checkArgs(utils)) in main again, then we set the Utils instance log property: utils.log = Log(utils.loglevel) in main also.

utils.loglevel is a property we set in checkArgs (self.utils.loglevel = args.loglevel)

Something like that?

WhitePeter · 2016-08-17T19:18:43Z

Um, OK, have to digest this first I think. This OOP stuff still makes my head hurt. ;)

kevinlekiller · 2016-08-17T19:25:11Z

Think of it like a array kind of:

x = [0 =[0 = "z"], 1 = [0 = "a", 1 = "b"]]

x[0][0] is "z"
x[1][0] is "a"
x[1][1] is "b"

class X:
def init(self):
self.0 = 0()
self.1 = 1()

class 0:
0 = "z"

class 1:
0 = "a"
1 = "b"

x = X()
x.0.0 is "z"
x.0.0 is "a"
x.0.1 is "b"

kevinlekiller · 2016-08-17T19:32:03Z

In a more practical example:

class Options:
maximum_size = 100

def init(self):

Here I change the Options.maximum_size from 100 to 50

self.maximum_size = 50

def change_max_size(self, size):
self.maximum_size = size

options = Options() # I create a instance of options
print(options.maximum_size) # This will print 50
options.maximum_size = 100 # I set it to 100
options.change_max_size(200) # Here's another way to change the size.

WhitePeter · 2016-08-17T19:32:28Z

You're making it worse, ow. :-D Maybe I just need a break.

BTW, what does __get_ref_loudness have to do in Checkargs()?

kevinlekiller · 2016-08-17T19:33:56Z

It doesn't but I didn't see a better place to put it, I figured it was something that should only be run once, and Checkargs() is erased from memory after it's ran (since we don't store it in a variable), if we put it in Utils() for example, then it will remain in memory until the program exits (since we pass utils instance to ThreadMkvrg).

WhitePeter · 2016-08-17T19:37:07Z

OK, but from a logical POV it does not make sense. It makes it really hard for me to grasp, what's going on and why.

kevinlekiller · 2016-08-17T19:39:23Z

We might want to rename CheckArgs, since it also checks binaries, maybe call it Setup or something (you're better at names than me).

WhitePeter · 2016-08-17T19:44:17Z

Oh no, it does too many things I think. Let's make smaller objects. And have a real structure. Isn't that the point of OOP? Have something parse the arguments, then something that checks those for sanity and if the program has all it needs. Or check first, if the binaries are in place. If not, why parse arguments? Fail then and there.

kevinlekiller · 2016-08-20T20:17:17Z

b"matroska" in data works and doesn't need to convert to string

WhitePeter · 2016-08-20T20:17:44Z

Thx, I am just putting it together.

WhitePeter · 2016-08-20T20:22:21Z

Done. Looks good so far.

kevinlekiller · 2016-08-20T20:24:20Z

I found a PHP library which reads matroska files (and tags), I was trying to figure out how it works, I think I got a decent idea, but it's a pain, it had to go through every byte checking if the bytes match those in the page you listed above, there's no order to how those can be done in a matroska so it has to do conditionals on every byte and change byte offset on the file handle every time accordingly. I was hoping I could just write something simple in python just to read the tags, but that's not possible, going through everything is required, to make sure the byte offset is at the right position.

WhitePeter · 2016-08-20T20:28:56Z

Oh my, looks like it's not worth it. I am still convinced that using the json from mkvinfo and conversion to and from xml can work. We should have everything we need to find the right place to put our data into the json object. Then convert it back to XML and feed that to mkvpropedit. Maybe in the beginning just write all tags back with --tags all:....

kevinlekiller · 2016-08-20T20:30:10Z

Yeah if you want to see the nightmare, here it is : https://github.com/JamesHeinrich/getID3/blob/master/getid3/module.audio-video.matroska.php#L497

WhitePeter · 2016-08-20T20:33:26Z

LOL, it's only ~800 lines of code. No biggy. :P

kevinlekiller · 2016-08-20T20:36:07Z

@005a58d, I think that does not work as expected, I think the first conditional is always true.

kevinlekiller · 2016-08-20T20:38:44Z

This might be a way to do it: https://stackoverflow.com/questions/3389574/check-if-multiple-strings-exist-in-another-string

WhitePeter · 2016-08-20T21:15:10Z

@005a58d, I think that does not work as expected, I think the first conditional is always true.

Yeah, I thought so myself, reverted it already.

WhitePeter · 2016-08-20T21:16:28Z

Didn't use any() though. I think that is overkill for only 2 strings.

Actually use a method of MkxFile().

Again, the code does not get used. I hope however that the idea gets across.

Now we actually leverage MatroskaFile(), but only, for the time being, to basically replicate the prior behaviour. The goal remains to multiprocess on an audio track level, so multiple tracks of one and the same file can be processed concurrently.

Think I am getting the hang of super(), hopefully. Also, some minor cleanup, i.e. use self.utils.log.. instead of self.log.. to avoid confusion.

since to be processed it must be an actual file of type matroska, which this class guarantees for its objects, move the methods that deal with something involving a path here.

kevinlekiller · 2016-08-25T09:56:01Z

Just had a look at your commits, fine work!

I'm still alive, just a bit busy right now.

WhitePeter · 2016-08-25T12:44:10Z

Thanks, I am glad you like it. I've learned a great deal in the past few days. And I have failed, a lot. ;)
And I think I might have been sloppy on the way. Let's see, if I can "harden" the class attributes a bit more, like proper accessors and mutators for data attributes, make the latter as private as python allows, hide as much as possible.
You think it is worth it, or should we keep something like that for later?

kevinlekiller · 2016-08-25T22:14:36Z

I think it's a good idea, if you do it now it might be easier than trying to do it later.

Make the path of an MkxFile private and provide accessor. Also, check on initialization that we have a file at our hands, raise exception if not. This way we can just try and except through the list of arguments, could maybe leverage that for the recursive search maybe.

I consider this a semi-major change. Prior to this we always created two instances of MatoroskaFile(), one while filling utils.files and checking if it is a matroska and then when actually processing in process_thread(). But at that stage we have all the information we need already. So pass the path of the file, which is guaranteed to be a matroska into the queue. Since utils.files has now become an ordered dictionary, it also holds an entire MatroskaFile object with all the methods attached. Before we just threw away that object when we only tried initializing the object and then only take its path and put that in utils.files. While that might hurt memory usage I think it is worth it for not having to do the checking twice.

WhitePeter · 2016-08-26T16:43:39Z

Huh, somewhere along the way loglevel setting went missing. I've put in some debug messages which I still cannot see at -v debug. :(

WhitePeter · 2016-08-28T09:21:49Z

FYI, I am currently experimenting with the json module to get the old tags. I think, once we get to the stage where we can just add our tags without destroying prior unrelated ones, we have something to work with.
I think this tagging thing even deserves its own module. It can come in quite handy if one wants to manipulate mkx tags of any kind.

kevinlekiller · 2016-08-28T12:37:48Z

Indeed, there's a lack of tools when it comes to that.

Btw, just some quick testing with mkvinfo:

mkvinfo -z file

This shows the size of all the EBML entries:

+ EBML head size 40
|+ EBML version: 1 size 4
|+ EBML read version: 1 size 4
|+ EBML maximum ID length: 4 size 4
|+ EBML maximum size length: 8 size 4
|+ Doc type: matroska size 11
|+ Doc type version: 4 size 4
|+ Doc type read version: 2 size 4
+ Segment, size 1802186950 size 1802186962
|+ Seek head (subentries will be skipped) size 86
|+ EbmlVoid (size: 4004) size 4013
|+ Segment information size 121
| + Timecode scale: 1000000 size 7
| + Muxing application: libebml v1.3.4 + libmatroska v1.4.5 size 38
| + Writing application: mkvmerge v9.4.0 ('Knurl') 64bit size 34
| + Duration: 2579.953s (00:42:59.953) size 7
| + Date: Thu Aug 25 02:52:28 2016 UTC size 11
| + Segment UID: 0x8e 0xd9 0xfe 0x37 0x01 0xbc 0xa4 0x89 0xa3 0x71 0xcb 0xd1 0x79 0xa4 0x04 0x50 size 19
|+ Segment tracks size 210
| + A track size 121
|  + Track number: 1 (track ID for mkvmerge & mkvextract: 0) size 3
|  + Track UID: 18308047031429458341 size 11
|  + Track type: video size 3
|  + Lacing flag: 0 size 3
|  + MinCache: 1 size 4
|  + Codec ID: V_MPEG4/ISO/AVC size 17
|  + CodecPrivate, length 43 (h.264 profile: High @L4.0) size 46
|  + Default duration: 41.708ms (23.976 frames/fields per second for a video track) size 8
|  + Video track size 24
|   + Pixel width: 1916 size 4
|   + Pixel height: 1076 size 4
|   + Display width: 1916 size 7
|   + Display height: 1076 size 7
| + A track size 45
|  + Track number: 2 (track ID for mkvmerge & mkvextract: 1) size 3
|  + Track UID: 13819279113119135370 size 11
|  + Track type: audio size 3
|  + Codec ID: A_AC3 size 7
|  + Default duration: 32.000ms (31.250 frames/fields per second for a video track) size 8
|  + Audio track size 11
|   + Sampling frequency: 48000 size 6
|   + Channels: 6 size 3
| + A track size 38
|  + Track number: 3 (track ID for mkvmerge & mkvextract: 2) size 3
|  + Track UID: 1016340927463145369 size 11
|  + Track type: subtitles size 3
|  + Default flag: 0 size 3
|  + Lacing flag: 0 size 3
|  + Codec ID: S_TEXT/UTF8 size 13
|+ EbmlVoid (size: 1147) size 1150
|+ Cluster size 54904

By seeking, we can get some data, for example "Segment information":

import binascii
handle = open(path, "rb")
# 40 is EBML head size
# 8 is the amount of bytes between the EBML head and SeekHead, this was the same for all files I tested.
# 86 is Seek head
# 4 is amount of bytes between Seek head and EbmlVoid, same for all files I tested
# 4013 is EbmlVoid size
handle.seek(40 + 8 + 86 + 4 + 4013)
data = handle.read(121)
hex = str(binascii.hexlify(data), 'ascii')
hex = ':'.join(hex[i:i+2] for i in range(0, len(hex), 2))
print(data)
print(hex)
handle.close()

Here's the output (string representation and hex under):

b"\x15I\xa9f\xf4*\xd7\xb1\x83\x0fB@M\x80\xa3libebml v1.3.4 + libmatroska v1.4.5WA\x9fmkvmerge v9.4.0 ('Knurl') 64bitD\x89\x84J\x1dw\xc4Da\x88\x06\xdaH\x11;z8\x00s\xa4\x90\x8e\xd9\xfe7\x01\xbc\xa4\x89\xa3q\xcb\xd1y\xa4\x04P"

15:49:a9:66:f4:2a:d7:b1:83:0f:42:40:4d:80:a3:6c:69:62:65:62:6d:6c:20:76:31:2e:33:2e:34:20:2b:20:6c:69:62:6d:61:74:72:6f:73:6b:61:20:76:31:2e:34:2e:35:57:41:9f:6d:6b:76:6d:65:72:67:65:20:76:39:2e:34:2e:30:20:28:27:4b:6e:75:72:6c:27:29:20:36:34:62:69:74:44:89:84:4a:1d:77:c4:44:61:88:06:da:48:11:3b:7a:38:00:73:a4:90:8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50

As we can see, the 15:49:a9:66 matches with segment info on: https://matroska.org/technical/specs/index.html

Also near the end we have: 73:a4:90:8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50, 7a:a4 is SegmentUID according to https://matroska.org/technical/specs/index.html, 8e:d9:fe:37:01:bc:a4:89:a3:71:cb:d1:79:a4:04:50 matches the 0x8e 0xd9 0xfe 0x37 0x01 0xbc 0xa4 0x89 0xa3 0x71 0xcb 0xd1 0x79 0xa4 0x04 0x50 from the mkvinfo output.

Maybe we can get the tags like this, by seeking to the appropriate location.

WhitePeter · 2016-08-28T14:28:33Z

I am using the json data one gets when running mkvmerge -i --idfentification-format json .... It's all in there. I can output the tags as 'REPLAY_GAIN_PEAK': '1.00123'. It is only a matter of time til I figure out where to hook that in. You should have put everything in place on the XML front, I noticed. It just lacked a way to preserve the original tags. Think I have found that way:

#!/usr/bin/env python
"""Find and manipulate matroska tags"""
from __future__ import print_function
import sys
import json
import os
import subprocess
from StringIO import StringIO


def main(args):
    for arg in args[1:]:
        if os.path.isfile(arg):
            mkvmerge_id_json_data = get_mkxident(arg)
            # print(repr(mkvmerge_id_json_data))
            # print("{}".format(mkvmerge_id_json_data.keys()))
            # print("Track tags: {}".format(mkvmerge_id_json_data["track_tags"]))
            tracks = mkvmerge_id_json_data["tracks"]
            # print("Tracks: {}".format(tracks))
            for track_id in range(len(tracks)):
                assert track_id == tracks[track_id]["id"]
                # This track_id is what bs1770gain uses as index for the --audio option.
                properties = tracks[track_id]["properties"]
                print("track_id: {}, track_uid: {}".format(track_id, properties["uid"]))
                # print("""'tracks[{}]["properties"]': {}""".format(tracks.index(track),
                # properties))
                for key in properties:
                    if key.startswith("tag_"):
                        print("""We've found tag '{}', with value '{}'."""
                              .format(key[4:].upper(), properties[key]))


def get_mkxident(path):
    """
    This gets us all the relevant info, including tags(!), there is to a matroska file, any
    mediafile for that matter. This is actually a list of dicts. We will be interested in the
    "tracks" dict, like get_mkxident(path)[u"tracks"][track_id][u"properties"], where we will find
    the tags, if any exist.
    """
    return json.load(
        StringIO(
            subprocess.check_output(
                ["mkvmerge", "-i", "--identification-format", "json", os.path.normpath(path)]
            )
        )
    )


if __name__ == "__main__":
    main(sys.argv)

WhitePeter · 2016-08-28T18:01:02Z

OK, updated my double-post.

WhitePeter · 2016-09-01T22:09:40Z

Just FYI, I am still alive. I experimented with above code a little and it looked very promising until I tried to run it on python3. It seems there is fundamental change between python2 and 3 when it comes to processing the json data. I still haven't figured out exactly, where it goes wrong. Strictly speaking StringIO should not be necessary for this, so I went without it, because it was the first to complain. The code runs fine without it in python2 but now json.decode() trips over something else.
Having read only a minimum about the differences and incompatibilities between the major python versions I can only assume that it must have something to do with how python3 "thinks" about file I/O.

I'll keep at it, though, but maybe python2 compatibility was not the best of ideas, after all. :(

On the plus side, I am able to store the original tags in a nice dict construct like {track_id: {tag_name: tag_value}. Eventually that will become an attribute of MatroskaFile() or even MatroskaTrack(). If only there wasn't this teeny problem of running on the wrong python version. :P But there must be a way and I am going to find it.

kevinlekiller · 2016-09-01T23:11:35Z

Yeah the inconsistencies between python 2 and 3 are annoying.

kevinlekiller mentioned this issue Aug 17, 2016

Python version. #3

Closed

WhitePeter added a commit that referenced this issue Aug 17, 2016

make --verbosity take strings instead of int (#7)

099aa81

WhitePeter closed this as completed Aug 17, 2016

WhitePeter reopened this Aug 17, 2016

WhitePeter added a commit that referenced this issue Aug 20, 2016

Do a proper check for matroska file type (#7)

52058b8

WhitePeter added a commit that referenced this issue Aug 20, 2016

Simplified the matroska type check (#7)

005a58d

WhitePeter added a commit that referenced this issue Aug 21, 2016

Another step in refactoring (#7)

12391b7

Actually use a method of MkxFile().

WhitePeter added a commit that referenced this issue Aug 21, 2016

Some prototypes to outline the idea of tracks (#7)

3357609

Again, the code does not get used. I hope however that the idea gets across.

WhitePeter added a commit that referenced this issue Aug 24, 2016

Cleaner version of inheritance (#7)

85d445a

Think I am getting the hang of super(), hopefully. Also, some minor cleanup, i.e. use self.utils.log.. instead of self.log.. to avoid confusion.

WhitePeter added a commit that referenced this issue Aug 25, 2016

Moved some methods into MatroskaFile (#7)

ff3f7ed

since to be processed it must be an actual file of type matroska, which this class guarantees for its objects, move the methods that deal with something involving a path here.

Python (continues issue/discussion #3) #7

Python (continues issue/discussion #3) #7

Comments

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016 • edited Loading

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016 • edited Loading

kevinlekiller commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016 • edited Loading

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016 • edited Loading

kevinlekiller commented Aug 17, 2016

Here I change the Options.maximum_size from 100 to 50

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 17, 2016 • edited Loading

WhitePeter commented Aug 17, 2016 • edited Loading

kevinlekiller commented Aug 17, 2016

WhitePeter commented Aug 17, 2016

kevinlekiller commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

kevinlekiller commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

kevinlekiller commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

kevinlekiller commented Aug 20, 2016

kevinlekiller commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

WhitePeter commented Aug 20, 2016

kevinlekiller commented Aug 25, 2016

WhitePeter commented Aug 25, 2016

kevinlekiller commented Aug 25, 2016

WhitePeter commented Aug 26, 2016

WhitePeter commented Aug 28, 2016

kevinlekiller commented Aug 28, 2016

WhitePeter commented Aug 28, 2016 • edited Loading

WhitePeter commented Aug 28, 2016 • edited Loading

WhitePeter commented Sep 1, 2016

kevinlekiller commented Sep 1, 2016

WhitePeter commented Aug 17, 2016 •

edited

Loading

WhitePeter commented Aug 17, 2016 •

edited

Loading

kevinlekiller commented Aug 17, 2016 •

edited

Loading

kevinlekiller commented Aug 17, 2016 •

edited

Loading

kevinlekiller commented Aug 17, 2016 •

edited

Loading

WhitePeter commented Aug 17, 2016 •

edited

Loading

WhitePeter commented Aug 28, 2016 •

edited

Loading

WhitePeter commented Aug 28, 2016 •

edited

Loading