Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MIDI file decoder #1004

Merged
wader:master from
Sep 8, 2024

Conversation

twystd
Copy link
Contributor

@twystd twystd commented Aug 27, 2024

Hi,

First draft of a MIDI file decoder for your comments and suggestions - I've decoded the file in the way that seemed most useful to me but there are a couple things I'm a little undecided about so it seemed like a good time to get some feedback :-).

twystd added 30 commits August 10, 2024 18:25
@twystd
Copy link
Contributor Author

twystd commented Aug 30, 2024

Nice! With a dog I hope!! :-) :-)

I'm busy on the changes you've suggested but they're going to take a little while, so happy moving ..

Copy link
Owner

Nice! With a dog I hope!! :-) :-)

No dogs yet :) but we have two cats!

I'm busy on the changes you've suggested but they're going to take a little while, so happy moving ..

Thanks and no stress 👍

- fixed remaining snake-cased event names in tests and examples (cf. wader#1004 (comment))
- report 'status' and 'event' seperately (cf. wader#1004 (comment))
- decoded TimeSignature metaevent as length + byte fields (cf. wader#1004 (comment))
- decoded SequencerSpecificEvent metaevent as length + byte fields (cf. wader#1004 (comment))
- reworked Errorf's as Fatalf's for data reads (cf. wader#1004 (comment))
…(cf. wader#1004 (comment))

- restructured and simplified metaevent decoding
- restructured and simplified  MIDI event decoding
- added sample events for remaining metaevents
- moved MIDI test and sample files to testdata/midi
- fixed fuzzing fail 8450de010e750ed5
- delineated NoteOn and NoteOff events
- simplified MIDI event decoding with function map
- decoded MIDI event and channel as nibbles
- reworked system exclusive event decoding
- fixed peekEvent to correctly decode end of file events
- reduced format 0, format 1 and format 2 test files to basic MIDI skeleton
- reworked reference.mid file as a test/development sample only
- added fq tests for the single event MIDI files
- updated examples to use twinkle.mid
@twystd
Copy link
Contributor Author

twystd commented Sep 1, 2024

<smile> for when the cats have forgiven you for moving them!

I've restructured the decoding along the lines you suggested and TBH am a lot happier with it - the code is a lot cleaner and the output makes more sense. But .. see what you think :-).

Copy link
Owner

for when the cats have forgiven you for moving them!

One of them recovered fast but the other one is still a bit grumpy, he also has some teeths removed some days ago so that did not help :)

I've restructured the decoding along the lines you suggested and TBH am a lot happier with it - the code is a lot cleaner and the output makes more sense. But .. see what you think :-).

Yeap looks better i think 👍 i've been in similar situations with decodes, overthinking things a bit and using value/sythenetic fields a bit too much, good to step back and try be more "basic" maybe use more structs and/or split into a bit more. I guess a good guideline is always use normal fields if there is at least some sense of "these bits are involved to produce the value" and only value fields for derived or "extra" things that don't correspond to anything in the source bits.

Will have a deeper look later in the week

format/midi/midi.go Outdated Show resolved Hide resolved
// ... decode tracks
d.FieldArray("tracks", func(d *decode.D) {
for d.BitsLeft() > 0 {
if err := skipTo(d, "MTrk"); err != nil {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a midi files can have "garbage" or unknown things between tracks?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is d.(Try)PeekFind function that can be used to peek in various ways, maybe can be used?

Copy link
Contributor Author

@twystd twystd Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not garbage no - it's a little bit ambiguous but at the very least it has to be a valid chunk with a 4 character tag and a length. The theory being that it would allow extensions and customisations without breaking existing MIDI file decoders.

But .. I've only ever seen this referenced once and can't remember having ever come across a MIDI file that had anything other than MThd and MTrk chunks, so I've removed it. If anybody ever raises an issue and can supply a MIDI file with other chunks it's easy enough to put back.

Removed in 9f057b6.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, could it make sense to have a case that decodes unknown tags as just tag+data?

Copy link
Contributor Author

@twystd twystd Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just to make sure I dug out the spec and this is what it says:

MIDI Files are made up of chunks. Each chunk has a 4-character type and a 32-bit length,
which is the number of bytes in the chunk. This structure allows future chunk types to be
designed which may easily be ignored if encountered by a program written before the chunk
type is introduced. Your programs should expect alien chunks and treat them as if they
weren't there.

So tag + data makes sense - it's not a string so I've also included the length:

    |                                               |                |  tracks[0:5]: 0xe-0x56 (72)
    |                                               |                |    [0]{}: other 0xe-0x1e (16)
0x00|                                          4d 54|              MT|      tag: "MTrx" 0xe-0x12 (4)
0x10|72 78                                          |rx              |
0x10|      00 00 00 08                              |  ....          |      length: 8 0x12-0x16 (4)
0x10|                  01 23 45 67 89 ab cd ef      |      .#Eg....  |      data: raw bits 0x16-0x1e (8)
    |                                               |                |    [1]{}: track 0x1e-0x2a (12)
0x10|                                          4d 54|              MT|      tag: "MTrk" 0x1e-0x22 (4)

Not sure that it's really correct to call the section "tracks" any longer though - what do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 reclutant to call them chunks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with content in the end - as in header + content. It seemed to make the most sense :-)

Fixed in 9c7f7f9.

format/midi/testdata/format-0.fqtest Outdated Show resolved Hide resolved
format/midi/metaevents.go Outdated Show resolved Hide resolved
- Removed handling for non-MIDI chunks (cf. wader#1004 (comment))
- Removed redundant chunk length asserts (cf. wader#1004 (comment))
- Removed redundant header FieldArray (cf. wader#1004 (comment))
- Replaced FieldUintFn with FieldUnn for SMTPEOffsetEvent (cf. wader#1004 (comment))
- Added note to midi.md re. supported chunks
- Reworked event types as snake-case (cf. wader#1004 (comment))
- Decoded 'alien' chunks (cf. wader#1004 (comment))
@twystd
Copy link
Contributor Author

twystd commented Sep 5, 2024

Quick question - I'm considering removing any remaining checks that aren't actually catastrophic and deferring them to something like a --validate option (to be implemented later), e.g.:

...
	d.FramedFn(length*8, func(d *decode.D) {
		format := d.FieldU16("format")
		if format != 0 && format != 1 && format != 2 {
			d.Errorf("invalid MThd format %v (expected 0,1 or 2)", format)
		}
...

Any thoughts?

Copy link
Owner

Quick question - I'm considering removing any remaining checks that aren't actually catastrophic and deferring them to something like a --validate option (to be implemented later), e.g.:

Yeah i think that is better. The original idea was to keep decoder code free from as much explicit error checking as possible to make them be very straight forward. But think i forgot a bit about making the common error reporting nicer, i think it's a bit of a mess atm, feel free to look into :) i remember i spent some time on trying to improve on preserving "partial" decode trees on errors but it gets a bit tricky to combine with panic/recover control flow.

BTW here is a overflow of how requre/asser/validate works https://github.com/wader/fq/blob/master/pkg/decode/decode_gen.go.tmpl#L89-L123 (note fq was started pre generics :) i have some ideas how to make the decode API nicer by using generic but is quite a bit of effort, we'll see!)

Copy link
Owner

I think it's starting to look ready for merge, something you want to add? (except possible rename to chunks)

- Renamed 'tracks' section to 'content' (cf. wader#1004 (comment))
- Removed 'validation only' errors (cf. wader#1004 (comment))
- Decoded MThd divisions field with SMPTE timecode
@twystd
Copy link
Contributor Author

twystd commented Sep 6, 2024

Agreed, it makes much more sense not to fail because at least one use case for this thing is finding and fixing errors in files:

  • I've removed the validation errors
  • And also tweaked the division field in the header to be decoded more correctly as a struct

re. merge. I've run it on a folder of about 100 files and it seems fine and even found one file with an unsuspected error, and fuzzing it for about an hour turned up nothing (whew!) and I can't think of anything else I want to add/change, so unless you find anything it's probably good to go ... but maybe let me give it another once through just in case?

Copy link
Owner

Sounds good, just let me know when you think it's ready. I might spend some time reviewing if i get some time. Also remember that we can always do additional PR:s later on.

@twystd
Copy link
Contributor Author

twystd commented Sep 7, 2024

And that's it from me I think - I tightened up the status/event decoding logic which was the thing that was subconsciously nagging at me. Anything else is largely cosmetic and I'm sure your cats would like to have your attention :-).

Copy link
Owner

👍 Let's merge! thanks and of course feel tree to open more PRs!

dcbb94d into wader:master Sep 8, 2024
5 checks passed
@twystd
Copy link
Contributor Author

twystd commented Sep 8, 2024

Woohoo :-) ... thanks for all the reviews and comments! A second set of eyes makes such a difference ..

Copy link
Owner

No problem, always nice to help and learn a bit about a new format. Also it's a weird satisfying feeling seeing what every bit in a file is used for 😄

Copy link
Owner

https://fosstodon.org/@wader/113102715574487149

twystd pushed a commit to transcriptaze/midiasm that referenced this pull request Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants