Introduction
Looking at Rock Band and Guitar Hero, I thought that they were great games but that they could accomplish so much more if they were more educational. So, for my course project at Simon Fraser University, I decided to make a similar game based on MIDI files. MIDI files potentially provide limitless songs to choose from, and provide data in a format suitable for emulating Rock Band and Guitar Hero. Furthermore, MIDI allows MIDI Star to use real instruments for input (I use a Yamaha Clavinova and Yamaha DTXplorer Drum Set).
I worked for Electronic Arts for 20 years, and at work, I had to use EA's internal libraries and tools to produce games. Having retired, I had no libraries to work with, and had to search for working code snippets to create MIDI Star. The obvious choice for me was to use DirectX because I could access more hardware than using OpenGL. About half of the sample code used MFC and the other half used .NET. Some of the code used wrappers, and others didn't. DirectX can be hard to compile because everyone has a different version of the SDK installed, and if you don't have the correct definitions, compiling can be very frustrating. Hopefully, I have all the correct defines so that later versions of DirectX will compile. And after some self-debating, I decided that I could probably get things working more quickly using MFC. Furthermore, I don't have to deal with installing the proper .NET runtimes. However, I may try to port this to .NET later, if I feel inspired.
Why use MFC or .NET at all? The simple reason is that the game is the most important feature, not the GUI leading up to the game. GUIs are also highly susceptible to design changes until you have a clear idea in your head. Lastly, I only had 6 weeks to design and program this for my course. If you are new to programming, my advice is always work on the important stuff first and the risky stuff next. This way, you can abort the project without wasting too much time. There's no point in spending time to create a front end GUI for a non-existent product.
So, what can you learn from this article that's different from all the other good articles on this site? The code provided is a complete game, so you can use it as a framework for your own game. I have found that a good way to learn how to program is to look at lots of complete products to see how they are put together. Before creating my own games, I spent years porting other people's work from one machine onto other machines. You may like the way I have structured this game, or you may like how others have structured their game, but you can always learn something either way.
However, the biggest problem I ran into in creating this game was parsing a MIDI file and sending MIDI messages to the drivers, which is what this article focuses on.
The August 2008 version of the SDK is required to compile the source code, and the equivalent runtimes need to be installed for the standalone executable. You may need to add these libraries to the Additional Dependencies box in Linker/Input: strsafe.lib d3dxof.lib dxguid.lib d3dx9.lib d3d9.lib winmm.lib dinput8.lib comctl32.lib to get the compiler working.
Background
If you are competent in C++, you should be able to read all the code. The most complicated C++ features are using derived classes with virtual functions and pointers to pointers. Most library calls that I use are well documented in MSDN. I rely heavily on MFC and DirectX, so if you understand these libraries, you will have no problems.
Most machines have a lot of RAM these days, and have no trouble playing high quality digital audio. So, why use MIDI? Let's start with the acronym. MIDI stands for Musical Instrument Digital Interface, that was developed in the early 1980's. It allows instruments to communicate and synchronize with other instruments, controllers, and computers. Today, it is mainly used for sequencing music on computers. Back to the question, why use MIDI? The first reason is that MIDI is compact: you only need tens of kilobytes to store a song, versus megabytes for an MP3. This is why many ring tones are stored as MIDI files. The second reason is that you can change the tempo of a song without affecting the pitch of the song. If you play an MP3 or WAV slower, the pitch lowers. But, MIDI simply sends messages for notes to be played and stopped, therefore the pitch remains true. For games, you can change the mood of the game by changing the tempo of the music. This can be more seamless than trying to transition between digital songs. The third reason is you have to be a good musician to record digital music, but you don't have to be a good musician to sequence MIDI music. The last reason is that you can insert meta events into a MIDI file which can be used to trigger events in your game such as a special animation or special logic.
Although I do not believe there are any copyrights for the MIDI files listed in my tutorial, I have left them out of my zip package just in case they do exist. MIDI files are copyrightable in the U.S.; you can find this on the MIDI Manufacturers Association's website. If you click on Help when running the game, there are the links to download the suggested MIDI files.
Using the Code - Parsing
In order to process a MIDI file, you will need to extract the following files (source and headers): MIDIInterface, MIDISong, MIDITrack, MIDIEvents, and Helper. Unless you are modifying MIDI Star, you will want to strip out all references to Mapper
and Game
. I won't show too much code because I will be explaining the key concepts, which in turn will make it easier to understand the code, which in turn will allow you to modify it to suit your needs. In other words, I will be using a top-down approach to my explanations. The current code should be easily modified and extended if you wish to create your own MIDI sequencer. ExtractNotes
in MIDISong.cpp is an example of how to write a filter for MIDI events.
Here's a quick primer on MIDI files; we will limit this to SMF Type 1 (SMF stands for Standard MIDI File) because it is the most common and general. The first thing to note is that any multi-byte value is stored in Big Endian format. Of course, this doesn't work on Intel chips, so Helper has routines to correct this: ChangeEndianShort
and ChangeEndianLong
. A MIDI file contains any number of tracks which contain any number of MIDI events. MIDI tracks have nothing to do with the MIDI hardware standard; they are merely used for organizing music when sequencing. MIDI allows for 16 different instruments (timbre) to be used simultaneously. These 16 instruments correspond to channels where channel 10 is normally used for drums. FYI, you can have more than 16 instruments in a song (but not simultaneously) by using patch changes. Each channel is polyphonic, so you can play multiple notes (voices) simultaneously, that is, a chord.
There are two MIDI event types: channel and system. There are 7 channel event types to control an instrument, and the system event type is further broken down into system events and meta events. System events need to be sent to the MIDI interface, but meta events are meant for the sequencer, and therefore they are not sent to the MIDI interface. Examples of channel events are note on, note off, and pitch bend. Examples of system events are download a wavetable definition, reset the system and time clock. Examples of meta events are name of track, name of instrument, and end of track.
Where are the MIDI messages sent? In the past (the 1980's), processing sound required too much CPU bandwidth (you were laughing if you had a 33 MHz CPU), so hardware manufacturers developed sound cards with multiple frequency modulation (FM) voice generators. Better sound cards had more FM generators, so you could play more instruments and more chords simultaneously. You would simply send your MIDI messages to the sound card through a device driver. Today, computers are fast enough to process sound, and Windows can do this through software, Microsoft GS Wavetable SW Synth. If you have a good sound card with drivers, the synthesizer can take advantage of the audio acceleration.
You may have noticed that I use MIDI events and MIDI messages. I want to make the distinction that events become messages when they are parsed from the file and sent to the MIDI interface. Messages become events when they are encoded and put into a file buffer. A major distinction is that events have an associated time whereas messages happen in real-time so time is not attached to a message. For more information about MIDI, refer to Wikipedia for an overview, or The Sonic Spot for more in-depth information, or the de-facto reference MIDI.org. For some good general information about audio and multimedia in general, you can read Fundamentals of Multimedia by Ze-Nian Li and Mark S. Drew (a shameless plug for my professor).
The parsing of the file is not particularly tricky if you understand a couple of peculiarities: variable length data and running status. MIDI interfaces only run at 31 Kbps, so data going through the interface needs to be as compact; this is the reason why variable length data and running status are necessary. All events are comprised of a delta time, a status byte, and parameters. The delta time is the amount of time between events on a track, and the delta times need to be accumulated to calculate the absolute time within a song (0 is a valid delta time). Delta times are stored in variable length data format. Variable length data must be stored in 7 bit bytes in Big Endian order where high order zero bytes are discarded. All higher order 7 bit bytes have the high bit set. The lowest order 7 bit byte has the high bit cleared. Variable length data also has a limit of 4 bytes so the maximum value is 0x0FFFFFFF (yes, this contradicts The Sonic Spot). Here are a few examples of the conversion:
Value
| Value (in binary)
| Variable Length Data Encoding
| Variable Length Data Encoding (in binary)
|
---|
0x46
| 1000110
| 0x46
| 1000110
|
0x3404
| 1101000 0000100
| 0xE804
| 111010000 00000100
|
0x03000000
| 0011000 0000000 0000000 0000000
| 0x98808000
| 10011000 10000000 10000000 00000000
|
Here is the code for decoding and encoding. As you can see, decoding is far easier.
long DecodeVarLen(UCHAR **p)
{
UCHAR c;
int len = 0;
do
{
c = **p;
*p = (*p)+1;
len = (len << 7) + (c & 0x7f);
} while (c & 0x80);
return len;
}
void EncodeVarLen(UCHAR **p, UINT len)
{
UINT mask = 0x0fe00000;
int shift = 7*3;
bool setit = false;
do
{
if (((len & mask) != 0) || setit || (!setit && shift == 0))
{
if (shift == 0)
**p = (UCHAR)len;
else
**p = (UCHAR)((len >> shift) & 0x7F) + 0x80;
*p = (*p)+1;
setit = true;
len -= (len & mask);
}
mask >>= 7;
shift -= 7;
} while (shift >= 0);
}
Status bytes always have the high bit set. Parameters for events always have the high bit cleared. The upper nibble of the status byte holds the event type, and since the high bit is set, there are only 8 event types (the event type is for events and messages). The first 7 event types are channel event types, where the lower nibble is the channel number and the extra parameters are also listed:
Status Byte (Upper Nibble) | Channel Event Type | First Parameter | Second Parameter |
---|
0x8 | Note Off | Note Number | Velocity |
0x9 | Note On | Note Number | Velocity |
0xA | Polyphonic Key Pressure | Note Number | Pressure |
0xB | Control Change | Controller Number | Controller Value |
0xC | Program Change | Program Number | Not used |
0xD | Channel Pressure | Pressure | Not used |
0xE | Pitch Wheel Change | Most significant 7 bits | Least significant 7 bits |
The last event type is the system event type - 0xF. So, what is a running status byte? Recall that the MIDI interface is slow and data need to be compact. If you examine a stream of messages, most of them will be note on's. If the note and channel number are same, then the status byte is repeated for consecutive events. In this case, the status byte is not sent in a MIDI messages, nor is it saved in the event for MIDI files to increase bandwidth. MIDI sequencers and parsers can detect this because the status byte has the high bit set and data bytes have the high bit cleared. You might be asking why the MIDI stream only has note on's without the corresponding note off's. MIDI treats a note on with 0 volume as a note off. I found this out the hard way when hooking up my MIDI drums to a free MIDI capture that didn't process this properly and I got no sound (you get what you pay for). This made it extremely difficult to find out what part of my hardware/software chain was broken. A possible factor in their error is that with drums, the note off comes very quickly after the note on.
From the previous table, you can see that program change and channel pressure only have a single data byte while the other channel events have two data bytes. System events can be a little confusing depending on how you interpret them. System events can have zero, a fixed amount, or a variable amount of data bytes. Regardless of this, the number of data bytes is stored in variable length data format (the same as the delta time). It is then followed by the actual data bytes. We now have enough information to parse the MIDI file.
It is possible to pass a stream into my classes and parse the stream, but I prefer to have the calling routine load the MIDI file completely into memory and pass a pointer to that buffer. I find it easier and more efficient to use char pointers than using the stream function because I can simply examine a buffer and compare the pointer to where it should be relative to the parsing algorithm. Parsing the song header is simple and self-explanatory. Each track has its data copied into its own buffer (this is so that the file buffer can be freed after parsing). Parsing is done with three passes. Fewer passes are possible, but using three passes makes the code more readable. The first pass, Parse
, does a syntax check to verify that the MIDI is valid and counts the number of notes, events, and tempo changes. The second pass, Parse2
, copies the events into my data structures and copies the tempo changes into an array. The third pass, Parse3
, calculates all the real time of the events relative to the start of the song. Parse
and Parse2
use DecodeEvent
and NextEvent
to decode and iterate through the track events; they aren't terribly interesting because they are just big case
statements. The interesting routine is DecodeEvent
(all the previous routines can be found in MIDITrack.cpp):
void MIDITrack::DecodeEvent()
{
UCHAR *p = this->mpEvent;
this->mDeltaTime = DecodeVarLen(&p);
if (*p < 128)
{
p--;
this->mStatus = this->mRunningStatus;
}
else
{
this->mStatus = *p >> 4;
this->mRunningStatus = this->mStatus;
this->mChannel = *p & 0xf;
}
if (this->mStatus == 0xF)
{
if (this->mChannel == 0xF)
{
this->mParam1 = *(p+1);
UCHAR *p2 = p+2;
this->mLength = DecodeVarLen(&p2);
this->mpData = p+3;
this->mpNextEvent = p+3+this->mLength;
}
else
{
UCHAR *p2 = p+1; this->mLength = DecodeVarLen(&p2);
this->mpData = p+2;
this->mpNextEvent = p+2+this->mLength;
}
}
else if (this->mStatus == 0xC || this->mStatus == 0xD)
{
this->mParam1 = *(p+1);
this->mpNextEvent = p+2;
}
else
{
this->mParam1 = *(p+1);
this->mParam2 = *(p+2);
if (this->mStatus == 9 && this->mParam2 == 0)
this->mStatus = 8;
this->mpNextEvent = p+3;
}
this->mDecoded = true;
}
During the second pass, events are copied into the derived classes of MIDIEvent
: MIDINote
, MIDIChanSingle
, MIDIChanDouble
, MIDISystem
, and MIDIMeta
. These are the minimum classes required to process events; you can derive more specialized classes if your MIDI requirements are different. Here is the base class found in MIDIEvents.h:
class MIDIEvent
{
private:
double mRealTime;
UCHAR mEventType;
public:
MIDIEvent() {};
virtual ~MIDIEvent() {};
double Time() {return this->mRealTime;};
void Time(double time) {this->mRealTime = time;};
UCHAR EventType() {return this->mEventType;};
void EventType(UCHAR e) {this->mEventType = e;};
virtual void SendMessage(UCHAR *vlbuff) = 0;
virtual void Trace() = 0;
virtual UCHAR Channel() {return 255;};
virtual bool IsNoteEv() {return false;};
};
As you can see from the base class, there are only two common attributes for all events; they come with the appropriate set
and get
functions. The IsNoteEv
function should only be overridden if the derived class contains note on or note off event types. The Channel
function should be overridden for channel event types, and should simply return the channel number. The Trace
function needs to be overridden if you want to display useful debug information. Finally, the SendMessage
function needs to be handled in the derived classes; channel event types should send a short MIDI message, and system event types should send a long MIDI message (SendShrtMsg
and SendLongMsg
in MIDIInterface
). Obviously, for your derived functions, you want to add appropriate attributes with set
and get
functions.
Originally, I wrote the parser in one pass, and the reason for the rewrite was tempo changes. For some bizarre reason, a downloaded MIDI file that I was testing had a tempo change for each note rather than varying the delta times. There is also nothing in the specifications to force all the tempo changes to occur on the same track. Theoretically, all tracks should be processed in parallel, and if tempo changes are placed in the higher tracks, slight timing errors can occur when the lower tracks are processed before the tempo changes in higher tracks. Prior to the third pass executing, the cumulated delta times of the tempo changes are sorted and then the real times calculated. I haven't mentioned it up until now, but delta times are not real-time values; they must be converted to real-time, but it can be in one of two formats depending on a flag in the song header. It's not too interesting, so I will just supply the routine without explanation:
double MIDISong::TempoTime(long absTime)
{
if (this->mTimeDivision & 0x8000)
{
double fps = (double)((this->mTimeDivision & 0x7f00)>>16);
if (fps == 29.0)
fps = 29.97;
return (double)absTime/(fps*(double)(this->mTimeDivision&0xff));
}
return (double)absTime*60.0/(double)(this->mTempo*this->mTimeDivision);
}
You may wonder why I chose to use a bubble sort for the tempo changes. The tempo changes should mostly be sorted; in fact, if they are all in one track, they will be sorted, and bubble sort is O(n) versus O(n log n) for the fast sorts. This routine is CalcTempoChanges
. Finally, the third pass uses CalcRealTime
to resolve the real time for all the events:
double MIDISong::CalcRealTime(int abstime)
{
int i = 0;
CTempo *ct = this->mpTempos;
while (abstime >= ct->mAbsTime && i < this->mNumTempos)
{
ct++;
i++;
}
ct--;
i--;
this->Tempo(ct->mTempo);
return ct->mRealTime + this->TempoTime(abstime-ct->mAbsTime);
}
Using the Code - MIDI Messages
A physical MIDI interface is not required if you are just using the keyboard or a gamepad. As previously stated, the Microsoft GS Wavetable SW Synth will process any MIDI output. Depending on your hardware setup, you may be able to choose a different MIDI output device. If you have a MIDI instrument, then you probably already have a hardware MIDI interface for your computer. These used to be available on older sound cards, but newer low end cards usually don't have them anymore. An inexpensive option is to get a USB MIDI interface, mine is from M-Audio.
MIDIInterface
is my class that interfaces to MIDI at the driver level. This class requires MFC; this could be good or bad depending on how you want to use it. It uses CComboBox
to populate the list of MIDI devices, but it is easily modifiable to be an array or vector of strings. The following is the usage order for MIDI input (MIDI output is the same except there is no StartOut
):
EnumerateIn
- get a list of the MIDI devicesInitializeIn
- initialize a specific MIDI deviceStartIn
- enable capture of messagesIsDeviceIn
- test if the MIDI device is valid for inputGetChanMess
- get an input messageStopIn
- disable capture of messageCloseIn
- close the MIDI device
The messages are stored in a circular 512 byte buffer. It doesn't need to be this large, but it is better to play it safe. The reason is that the messages should be processed fairly quickly so that the buffer won't fill up. Windows generates the following Windows messages for MIDI messages: MIM_OPEN
, MIM_CLOSE
, MIM_ERROR
, MIM_MOREDATA
, MIM_LONGDATA
, MIM_LONGERROR
, and MIM_DATA
. All messages are ignored except for MIM_DATA
which captures all channel event types. You will need to write your own handlers to capture system event types, and you will also need to create new structures to store this info. Capturing MIDI messages is not tricky, you just have to know that you cannot call system routines (Windows) from within the MIDI message callback function, and you want the function to be fast so you don't miss any messages; see MIDIInProc
in MSDN for the list of system routines that are callable. Although the time (in milliseconds) is stored, I do not use it because I want to quantize the messages with the game clock. The callback stores the message and posts a message to the game message pump in game.cpp where it can be evaluated against the MIDI track and channel that is selected for play. Here are the relevant code snippets:
void MIDIInterface::Callback(HMIDIIN hmidiIn, UINT wMsg, DWORD ,
DWORD dwParam1, DWORD dwParam2)
{
MIDIMessage md;
if (hmidiIn == this->mhMidiIn)
{
switch (wMsg)
{
case MIM_OPEN:
case MIM_CLOSE:
case MIM_ERROR:
case MIM_MOREDATA:
case MIM_LONGDATA:
case MIM_LONGERROR:
break;
case MIM_DATA:
md.mTime = dwParam2;
md.mStatus = (UCHAR)(dwParam1 & 0xFF);
md.mParam1 = (UCHAR)((dwParam1>>8) & 0xFF);
md.mParam2 = (UCHAR)((dwParam1>>16) & 0xFF);
assert(MIDI_BUFFER_SIZE == 512);
if ((((this->mTail+sizeof(MIDIMessage)) &
(MIDI_BUFFER_SIZE-1)) != this->mHead) &&
(md.mStatus & 0xF0) != 0xF0)
{
memcpy(this->mBuffer+this->mTail, &md, sizeof(MIDIMessage));
this->mTail = (this->mTail+sizeof(MIDIMessage))&(MIDI_BUFFER_SIZE-1);
PostMessage(this->mHWnd, MM_MIM_DATA, 0, 0);
}
break;
default:
break;
}
}
}
LRESULT Game::MsgProc(UINT msg, WPARAM wParam, LPARAM lParam)
{
...
case MM_MIM_DATA:
if (midi->IsDeviceIn() && !this->mSongOver && !this->mReplaySong)
{
do
{
MIDIMessage *message = midi->GetChanMess();
if (message != NULL)
{
UCHAR evStat = message->mStatus & 0xF0;
if (evStat == 0x90 && message->mParam2 == 0)
evStat = 0x80;
if (evStat == 0x90 || evStat == 0x80)
{
evStat = evStat + this->mSong->PlayChannel();
if (this->mGameInterface == 1)
{
UCHAR ind = 0xFF;
if (this->mPlayUnmarked ||
((ind = map->FindMIDI(message->mParam1)) !=
0xFF && map->Use(ind)))
{
UCHAR note;
if (ind != 0xFF)
note = map->GetNote(ind);
else
note = map->CodeMIDI(message->mParam1);
if (note != 0xFF)
{
UCHAR midiVel;
if (this->mPlayMIDIVel)
midiVel = message->mParam2;
else
midiVel = this->FindMIDIVel(note);
this->RecordNote(evStat, note, midiVel);
}
}
}
}
}
} while (midi->NextChanMess());
}
return 0;
Sending MIDI messages is much simpler than receiving them. Just call the initialization routines, and use the following routines to output:
void MIDIInterface::SendShrtMsg(UCHAR status, UCHAR param1, UCHAR param2)
{
if (this->mhMidiOut != NULL)
midiOutShortMsg(this->mhMidiOut,
(((UINT)param2)<<16)+(((UINT)param1)<<8)+status);
}
void MIDIInterface::SendLongMsg(void *buffer, int len)
{
if (this->mhMidiOut != NULL)
{
MIDIHDR mh;
mh.lpData = (LPSTR)buffer;
mh.dwBufferLength = len;
mh.dwFlags = 0;
UINT err = midiOutPrepareHeader(this->mhMidiOut,
&mh, sizeof(MIDIHDR));
if (!err)
{
err = midiOutLongMsg(this->mhMidiOut, &mh, sizeof(MIDIHDR));
if (err)
{
TCHAR errMsg[120];
midiOutGetErrorText(err, errMsg, 120);
}
while (MIDIERR_STILLPLAYING ==
midiOutUnprepareHeader(this->mhMidiOut, &mh, sizeof(MIDIHDR)))
;
}
}
}
Well, I lied a little. Sending long messages is not that simple. You have to know what to fill the long messages with. This could be simple fixed length data, or it could be variable length data such as wavetable information. Fortunately for my application, this is all provided in the MIDI file, and I just send what is there without having to know what I am sending. Again, you will have to consult MMA's website for the specific system messages that you want to process.
Problems
Although MIDI Star is functional, there are some minor technical problems that I want to try to resolve.
The first problem is timer interrupts. Although it works, it is a bit of a hack. I am using the Windows SetTimer
routine to generate interrupts, and I am asking for a 200 Hz interrupt rate. However, I am only getting a 58 Hz interrupt rate, so I have to multiply my elapse time for my interrupt by a factor of 3.44. When I lower the interrupt rate, the multiplier changes because the effective interrupt rate is no longer 58 Hz. If anyone has a solution to this, I would love to see it. No doubt it will be something simple that I have overlooked. This is the code from game.cpp:
this->mpTimer = SetTimer(d3di->HWnd(), 1, 1000/MUSIC_RATE, 0);
...
case WM_TIMER:
{
...
if (!this->mSongOver)
{
double oldTime = this->mSong->PlaySongTime();
this->mSong->PlayUpdate(RATE_MULTIPLIER/MUSIC_RATE);
if (this->mReplaySong)
this->InstantReplay(oldTime);
}
...
The second problem is with the joystick/gamepad input. I didn't have time to fully digest the code snippet from MSDN, and as a result, I am unable to refresh the joystick/gamepad list of devices. Therefore, a gamepad is only recognized if a gamepad is plugged in during the application startup.
Points of Interest
Matching up note on's with note off's is tricky, and you can't assume that the MIDI file is error free. You may get extraneous note on's or off's, so matching is not possible. It is also possible to get two note on's for the same note number before getting a note off; this happens with my drum set where it just sends a note off message a fixed amount of time from the note on message. You have to decide how you want your code to handle these situations. For MIDI Star, overlapping times for a particular note number is not permitted or desirable, so I just force them to not overlap.
This project required about 100 hours to prototype. There was about another 50 hours spent on a partial rewrite and final debugging. Commenting and cleaning up the code to conform to this article and writing this article required about another 50 hours.
You may find these additional tools useful when working with MIDI devices or files: MIDI-OX and XVI32 (a hex file editor).
History
I was a senior software engineer for Electronic Arts Canada for 20 years. Some of the projects that I have developed are Evolution, Test Drive, NHL '94, NBA Live series and Need For Speed. The platforms that I have developed on are Apple II, C64, Atari 8 bit, PC, NES, SNES, Genesis and XBOX. I have now obtained my teaching certificate for grade school.