Generative techniques in AWS DeepComposer
AWS DeepComposer currently offers two music studio experiences. In both experiences, you can explore the generative models that are available. In this topic, you can learn about the different generative models and how to use them together to create unique compositions.
To start using AWS DeepComposer, you need a trained model and an input melody. To create a composition, AWS DeepComposer performs inference using the trained model and your input track. During the inference process, the trained model generates a prediction. You can also modify the available inference hyperparameters and model options in the AWS DeepComposer Music studio experiences to fine-tune your musical creation process.
If you are new to machine learning, generative AI, or the different AWS DeepComposer Music studio
experiences, see the Getting started with AWS DeepComposer topic,
or the tutorial about the basics of
generative AI
For a deeper introduction into the different generative AI techniques supported by AWS DeepComposer,
see the Learning capsules
Important - Browser requirements
AWS DeepComposer fully supports the Chrome browser. Other browsers offer limited support for the AWS DeepComposer console and hardware. For more information about browser compatibility, see Browser support for AWS DeepComposer.
Using the AR-CNN technique in the AWS DeepComposer Music studio
The autoregressive convolutional neural network (AR-CNN) technique is supported in both AWS DeepComposer Music studio experiences, classic and remixed.
The AR-CNN generative technique uses a U-Net architecture originally developed for image generation tasks. In AWS DeepComposer, the AR-CNN learns to compose music by first attempting to detect notes that sound missing or out of place while the model is being trained. Then it replaces those notes with notes it thinks would likely appear in the dataset that it was trained on. The AR-CNN in AWS DeepComposer was trained using a dataset that consists of chorales by Johann Sebastian Bach. To train this model, the audio inputs where first converted to piano roll images. In each piano roll image, the horizontal axis represents time and the vertical axis represents pitch.
In the classic music studio experience, you can use a sample melody, record a melody, or import a melody. When you select your input melody in the music studio and choose Enhance input melody, the AR-CNN technique modifies the notes in your input melody.
You can use the Advanced parameters to adjust how much your input track is modified during inference. If you are unhappy with some of the decisions that the model made during inference, you can help the model by editing notes in the melody. Use the Edit melody tool to add or remove notes, or to change the pitch or the length of notes in the track that was generated. You can then perform inference, again, on the edited track.
Edit melody changes are not saved until inference is performed again.
To save these changes prior to to performing inference again, choose Download melody.
To edit your input melody
-
Open the AWS DeepComposer console
. -
In the navigation pane, choose Music studio.
-
To open the Input melody section, choose the right arrow (▶).
-
Choose Edit melody.
-
Under Source of input melody, choose Sample melody, Custom recorded melody, or Imported track.
-
On the Edit melody page, you can edit your track in the following ways:
-
Choose (double-click) a cell to add or remove a note.
-
Drag a cell up or down to change the pitch of a note.
-
Drag the edge of a cell left or right to change the length of a note.
-
-
To listen to your changes, choose Play (▶).
-
When you have finished, choose Apply changes.
In the remixed music studio experience, you can use a sample track, record a custom track, or import a track. You can choose the AR-CNN technique on the ML technique page.
The AR-CNN technique modifies the notes in your input melody. After you create your first new melody, you can modify the AR-CNN parameters that are available for this technique.
This model adds and removes notes in your input track. If you are unhappy with some of the decisions that the model made during inference, you can help the model by editing the notes in the melody. Use the Edit melody tool to add or remove notes, or to change the pitch or the length of notes in the track that was generated. You can then perform inference on the edited track.
On the Inference output page, you can choose Enhance again to perform inference multiple times. You can also access the Edit melody tool from this page.
Edit melody changes are not saved until inference is performed again.
To save these changes prior to to performing inference again, choose Download melody.
To edit your input melody
-
Choose Edit melody.
-
On the Edit melody page, you can edit your track in the following ways:
-
Choose (double-click) a cell to add or remove a note.
-
Drag a cell up or down to change the pitch of a note.
-
Drag the edge of a cell left or right to change the length of a note.
-
-
To listen to your changes, choose Play (▶).
-
When you have finished, choose Apply changes.
Model parameter options for the AR-CNN technique
The model parameters for the AR-CNN technique are the same in both AWS DeepComposer Music studio experiences.
The AR-CNN technique uses the AutoregressiveCNN Bach sample model. To help create unique musical tracks, you can modify the different parameters. These parameters, also known as inference hyperparameters, control how much the model changes your melody.
- Sampling iterations
-
Controls the number of times your input melody is passed through the model. Increasing the number of iterations results in more notes being added and removed from the melody.
- Maximum input notes to remove
-
Controls the percentage of input melody to be removed during inference. By increasing this parameter, you are allowing the model to use less of the input melody as a reference during inference. After performing inference, you can use the Edit melody tool to further modify your melody.
- Maximum number of notes to add
-
Controls the number of notes that can be added to the input melody. By increasing this number, you might introduce notes which sound out of place into your melody. It's also a creative way to experiment with your chosen melody. After performing inference, you can use the Edit melody tool to further modify your melody.
- Creative risk
-
Controls how much the model can deviate from the music that it was trained on. More technically, when you change this value, you are changing the shape of the output probability distribution. If you set this value too low, the model will choose only high-probability notes. If you set this value too high, the model will more likely choose lower-probability notes.
How the AR-CNN technique works with other generative techniques in the music studio
In both AWS DeepComposer Music studio experiences, you can use the AR-CNN technique in collaboration with the GANs technique or the Transformers technique. When using the AR-CNN technique with either technique, you must start by working with the AR-CNN technique.
With the GANs technique, you can create accompaniment tracks, but when you return to your edited melody, those accompaniment tracks will no long be available.
After you switch to the Transformers technique, you cannot return to the AR-CNN technique to further edit your melody.
Note
Each time your perform inference with the AR-CNN technique, your new composition is saved automatically. If you want to save a melody that you have modified using the Edit melody tool prior to performing inference another time, choose Download melody.
Using the GANs technique in the AWS DeepComposer Music studio
The generative adversarial networks (GANs) technique is supported in both AWS DeepComposer Music studio experiences, remixed and classic.
GANs are neural networks that consist of a generator and a discriminator. In AWS DeepComposer, the generator learns to compose music that is as realistic as possible with feedback from the discriminator. The discriminator treats the generator's output as being as unrealistic as possible while holding the input training sample as the ground truth.
In the music studio, two different GAN architectures are available, MuseGAN and U-Net. Both architectures use a convolutional neural network (CNN) because the first step in training these models is to convert the input audio into an image-based representation of music called a piano roll.
- MuseGAN
-
The MuseGAN architecture was built specifically to generate music. AWS DeepComposer comes with five genre-based sample models that use the MuseGAN architecture. In AWS DeepComposer, you can also train a custom MuseGAN model and use it in either music studio experience.
- U-Net
-
The U-Net architecture has been adapted for music generation. It was originally developed for image-generation tasks. The name for this architecture stems from its unique U shape. This shape allows the CNN to pass information on the left side (the encoder) to the layers on the right side (the decoder) without passing through the entire neural network. AWS DeepComposer does not have a sample model that uses the U-Net architecture. Instead, you can train a custom U-Net model and use it in either music studio experience.
To learn more about training a custom model using AWS DeepComposer, refer to the topic on training a custom model.
Performing inference with the GANs technique in the AWS DeepComposer Music studio experiences
In either AWS DeepComposer Music studio experience, you can use the GANs technique to perform inference, which creates new accompaniment tracks for your composition.
You can use a sample melody, record a custom melody, or import a melody. For more information, see the topic on creating compositions with the AWS DeepComposer Music studio.
The GANs technique creates four accompaniment tracks based on your input melody. Each accompaniment track initially comes from a broad class of musical instruments. You can modify the instrument class and enter for each accompaniment track except for drums.
To modify your accompaniment tracks, you can use the same steps in either AWS DeepComposer Music studio experience. The only difference is where this feature is located. In the classic music studio experience, you can access your newly generated tracks on the music studio landing page. In the remixed music studio experience, you can access the accompaniments on the Inference output page.
To change an instrument type after generating a composition
-
To open an Instrument type, such as String ensemble 1, choose the right arrow (▶).
-
Choose an instrument type.
-
Choose the down arrow (▼) to open the Instrument menu.
-
Choose an instrument.
Model parameter options for the GANs technique
When using the GANs generative technique, you can select from either Sample models or Custom models. In either music studio experience, you can choose from five different genre-based Sample models, which were trained using the MuseGAN architecture.
If you trained a custom U-Net or MuseGAN model, you can find it and the Sample models under Model.
In the remixed music studio, you can find the Model option on both the ML technique and the Inference output pages.
How the GANs technique works with other generative techniques in the music studio
In either AWS DeepComposer Music studio experience, you can use the GANs technique in collaboration with the AR-CNN technique. The GANs technique is not compatible with the Transformers technique.
To use the GANs technique with the AR-CNN technique, you must first use the AR-CNN technique to enhance your input melody. When ready, you can switch to the GAN technique and create accompaniment tracks.
Note
In the classic music studio experience, you can return to the AR-CNN technique after creating accompaniment tracks with the GANs technique. If you do so, the tracks that were generated with the GANs technique will be automatically saved as a new composition. When you modify your input melody using the AR-CNN technique, you must generate new accompaniment tracks, because the previously created accompaniment tracks cannot be accessed.
Using the Transformers technique in the AWS DeepComposer Music studio
The Transformers generative technique is used to solve sequence modeling problems. In sequential modeling, the model takes into account previous outputs when generating the next output. To do this, the Transformers technique uses the concept of attention. This concept allows the model, when given a specific input sequence, to better understand which other parts of the sequence are important.
Unlike the GANs and AR-CNN techniques, the Transformers technique doesn't treat music generation as an image-generation problem. Instead, it treats music generation like a text-generation problem. To solve this problem, the model needs to create tokens that represent the musical inputs. The tokens are used when predicting which notes should come next. In AWS DeepComposer, the Transformers technique uses the style, pattern, or musical motifs found in the input melody when generating the extended output track.
Edit melody changes are not saved until inference is performed again.
To save these changes prior to to performing inference again, choose Download melody.
Performing inference with the Transformers technique
To get started with the Transformers technique, we recommend using one of the Sample melodies, specifically one Recommended for the Transformers technique. These options represent the complex classical melodies that work best with the model. After selecting your input melody, you can create your first composition by choosing Extend input melody. The model extends your input melody by up to 30 seconds. Similar to when you work with the AR-CNN technique, you can, after performing inference, use the Edit melody tool to add or remove notes, or to change the pitch or the length of notes in the extension that was generated.
Unlike the AR-CNN technique, the Transformers technique doesn't modify the changes that you make to the input melody when you perform inference again. Making changes to the input melody will influence the extension that is generated by the Transformers technique.
You can find the Edit melody tool on the landing page.
To edit your input melody
-
Open the AWS DeepComposer console
. -
In the navigation pane, choose Music studio.
-
To open the Input melody section, choose the right arrow (▶).
-
Under Source of input melody, choose Sample melody, Custom recorded melody, or Imported track.
-
Choose Edit melody.
-
On the Edit melody page, you can edit your track in the following ways:
-
Choose (double-click) a cell to add or remove notes.
-
Drag a cell up or down to change the pitch of a note.
-
Drag the edge of a cell left or right to change the length of a note.
-
-
To listen to your changes, choose Play (▶).
-
When you have finished, choose Apply changes.
Edit melody changes are not saved until inference is performed again.
To save these changes prior to to performing inference again, choose Download melody.
You can find the Edit melody tool on the Input track, Inference output, and Next steps pages.
To edit your melody in the remixed music studio experience
-
Choose Edit melody.
-
On the Edit melody page, you can edit your track in the following ways:
-
Choose (double-click) a cell to add or remove notes.
-
Drag a cell up or down to change the pitch of a note.
-
Drag the edge of a cell left or right to change the length of a note.
-
-
To listen to your changes, choose Play (▶).
-
When you have finished, choose Apply changes.
Edit melody changes are not saved until inference is performed again.
To save these changes prior to to performing inference again, choose Download melody.
Model parameter options for the Transformers technique
To create unique sounding musical creations, you can modify the available inference hyperparameters. In the classic music studio experience these are called Advanced parameters and in the remixed music studio experience they are called Tranformers parameters. These parameters control how your input track is modified. These parameters can be broadly categorized in three different groups.
Group 1
Modifying these parameters directly affects how inference is performed on your input melody. As mentioned previously, during model training, the musical inputs are converted into tokens. The sum of available tokens represents the total musical knowledge learned during training. Contained within each token is also a probability.
- Sampling technique
-
The Transformers technique in AWS DeepComposer supports three different sampling techniques. The sampling technique determines how the new melody notes are chosen.
-
TopK: The next note is chosen by limiting the list of next available notes based on the value selected for Sampling threshold. Then the next note is chosen from that new list.
-
Nucleus: The next note is chosen by limiting the list of next available notes based on the value selected for Sampling threshold. The value set for the Sampling threshold represents the maximum allowable cumulative sum of the token’s individual probabilities when ranked greatest to least. The next note is chosen from that new list of available tokens.
-
Random: Unlike the previous two techniques, when Random is selected, the list of available tokens isn’t modified. Instead, the model can choose from any available note at any point while inference is being performed.
-
- Sampling threshold
-
The threshold sets the number of available notes that the model can choose from during inference. The value selected changes based on the Sampling technique selected. When the Random technique is selected, this parameter is not available.
- Creative risk
-
Increasing this value allows the model to deviate from the music it was trained on, and the generated music sounds more experimental.
This sampling technique uses the total sum of available tokens learned during training to predict an upcoming note. When a new note needs to be predicted, the total number of tokens are ranked numerically from greatest to least based on their probability at that moment. When you modify the Sampling threshold, you are changing the number of notes from which the model can choose.
For example, the model that AWS DeepComposer uses, TransformerXLClassical, learned 310 tokens during training. Setting the Sampling threshold to 0.80 means that the model can randomly choose from, 310 tokens x 0.80, the top 248 tokens during inference. Setting the Sampling threshold to values closer to the maximum input value (0.99) means that the model has a greater chance of choosing lower-probability tokens. Setting the values close to the minimum input value (0.1) means that the model is more restricted to higher-probability tokens.
This sampling techniqueuses the probabilities associated with the tokens to predict the upcoming note. When a new note needs to be predicted, all 310 tokens are ranked from greatest to least using their probabilities. When you modify the Sampling threshold, you are setting the threshold equal to the maximum allowable cumulative sum of the ranked probabilities.
For example, imagine at the top of your descending list of token
probabilities are the following five probabilities: [0.4, 0.3,
0.2, 0.05, 0.05]
. If you take the cumulative sum of these
probabilities, you end up with this list: [0.4, 0.7, 0.9, 0.95,
1.0]
. In this case, each probability represents a note (which
is represented by a token). Setting the Sampling
threshold to 0.96 means that, at that moment, the model
can pick a note randomly from this remaining list of partially summed
probabilities, 0.4, 0.7, 0.9, 0.95]
, which represents the
original list of ranked probabilities, [0.4, 0.3, 0.2,
0.05]
. So, in the moment, the model can pick from two
higher-probability notes, 0.4 and 0.3, a slightly lower-probability
note, 0.2, and a low-probability note, 0.05.
Group 2
These parameters tell the model how much new track it should attempt to generate and what portion of the input track should be used during inference.
- Input duration
-
This parameter selects the portion of your track, counting in seconds from the end, that should be used for inference.
- Track extension duration
-
This parameter selects the amount of time, in seconds, the model will attempt to generate.
Group 3
During inference, the Transformers model can create musical artifacts. Both of the following parameters can help remove either long periods of silence or moments when a note is being held for an expectedly long period of time.
- Maximum rest time
-
Silence is compressed when this value is exceeded.
- Maximum note length
-
Held notes are compressed when this value is exceeded.
How the Transformers technique works with other generative techniques in the music studio
Use the Transformers technique as either the last step or the only step in your music creation process. To use the AR-CNN technique with the Transformers technique, you must start with the AR-CNN technique. Because the Transformers technique extends your input melody, the output is not compatible with the GANs technique.