AI Music Generation: MusicGen

Researchers have recently released a new paper and subsequent model, “Simple and Controllable Music Generation”, where they highlight it “is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models”. What this essentially means in practice is the music generation can now be completed in less steps, and is getting more efficient as we make progress on various different types of models.

I expect AI to hit every industry in an increasingly rapid pace as more and more research becomes available and progress starts leapfrogging based on other models. MUSICGEN was trained with about 20K hours of unlicensed music, and the results are impressive.

Here are some interesting generations I thought sounded nice. As more models from massively trained datasets hit the public, we will see more community efforts and models as well just like with art.

Medium Model

I used the less performant medium model (1.5B parameters and approx 3.7 GB) to demonstrate how even on relatively poor hardware you could achieve reasonable results. Here is some lofi generated from the medium model.

Large Model

A step up is the 6.5 GB model. This produce slightly better sounding results.

What is that melody?

There is also a ‘Melody’ model that is a refined 1.5B parameter version.

Limitations

There are a few limitations on this model, namely the lack of vocals.

Limitations:

  • The model is not able to generate realistic vocals.
  • The model has been trained with English descriptions and will not perform as well in other languages.
  • The model does not perform equally well for all music styles and cultures.
  • The model sometimes generates end of songs, collapsing to silence.

However, future models and efforts will remedy these points. It’s only a matter of time before a trained vocal model is released with how fast machine learning advancements are accelerating.


It helps me if you share this post

Published 2023-06-10 18:36:40

AI

AI will help developer efficiency, not replace it.

One of the most significant use cases I’ve found for AI in my development work is its ability to automate repetitive tasks, such as using a bunch of similarly named, grouped variables. I recently was creating a ‘Human’ class, and needed all body parts for variables. That was suggested and picked up almost immediately by Copilot after a couple lines, and the whole class was done in mere seconds vs a few minutes. This adds up and means that I can focus on other creative tasks, such as developing new features, creating new UI ideas or focusing on user feedback. The result is increased productivity and faster software development.

I imagine a future where one can describe the architecture of my Android app in as much detail as possible and then go in and clean up the resulting code manually to a specific vision. Developers will be fast tracked to a more active management role.


It helps me if you share this post

Published 2023-05-17 01:05:41

How to create a simple voice-activated assistant in C#.

This is really old. I will release another tutorial updating this eventually. Follow my blog to get an update when that happens. Thanks!

While this sounds advanced (and it can be), it’s not that hard to set up a very basic setup where a custom application runs in the background in C# by using the built in speech recognition libraries in Windows 10.

Taking this idea further, I personally have a “Jarvis” that runs on my computer, automating basically all of my common actions, including launching games, music, sleeping my computer, adjusting the volume, minimizing windows, controlling the lights, and (best of all), sending emails and messages. I recommend using an external API for speech recognition if you’re serious about building something similar, as Microsoft’s sucks. You can build your own, or attempt to use something like Google’s API.

Anyway, here’s some simple C# code that should get some ideas flowing.


using System;
using System.Diagnostics;
using System.Globalization;
using System.Runtime.InteropServices;
using System.Threading;
using System.Windows.Forms;
using Microsoft.Speech.Recognition;
using Process = System.Diagnostics.Process;
using System.Diagnostics;
namespace VoiceAssistant
{
class Program
{
#region Native Stuff
const int Hide = 0;
const int Show = 1;
[DllImport("Kernel32.dll")]
private static extern IntPtr GetConsoleWindow();
[DllImport("User32.dll")]
private static extern bool ShowWindow(IntPtr hWnd, int cmdShow);
[DllImport("PowrProf.dll", CharSet = CharSet.Auto, ExactSpelling = true)]
public static extern bool SetSuspendState(bool hiberate, bool forceCritical, bool disableWakeEvent);
#endregion
static SpeechRecognitionEngine speechRecognitionEngine;
static bool speechOn = true;
private static string clipboardText;
private static bool shouldLog = true;
private static readonly string[] commands =
{
"assistant mute",
"assistant open clipboard",
"assistant new tab",
"assistant work music",
"assistant new github",
"assistant sleep computer confirmation 101",
"assistant shut down computer confirmation 101",
"assistant open story",
"assistant open rocket league"
};
static void HideWindow()
{
//Hide window
IntPtr hWndConsole = GetConsoleWindow();
if (hWndConsole != IntPtr.Zero)
{
ShowWindow(hWndConsole, Hide);
shouldLog = false;
//ShowWindow(hWndConsole, Show);
}
}
static void Main(string[] args)
{
HideWindow();
//Console.WriteLine("[ASSISTANT AI INITIALIZED]");
CultureInfo cultureInfo = new CultureInfo("en-us");
speechRecognitionEngine = new SpeechRecognitionEngine(cultureInfo);
speechRecognitionEngine.SetInputToDefaultAudioDevice();
speechRecognitionEngine.SpeechRecognized += SpeechRecognition;
speechRecognitionEngine.SpeechDetected += SpeechDetected;
speechRecognitionEngine.SpeechHypothesized += SpeechHypothesized;
LoadCommands();
while (true)
{
Thread.Sleep(60000);
}
}
static void LoadCommands()
{
/*Grammar muteCommand = new Grammar(new GrammarBuilder(commands[0]));
Grammar browserOpenCopiedLink = new Grammar(new GrammarBuilder(commands[1]));
Grammar browserCopyLink = new Grammar(new GrammarBuilder(commands[2]));
speechRecognitionEngine.LoadGrammar(muteCommand);
speechRecognitionEngine.LoadGrammar(browserOpenCopiedLink);
speechRecognitionEngine.LoadGrammar(browserCopyLink);*/
foreach (string command in commands)
{
speechRecognitionEngine.LoadGrammarAsync(new Grammar(new GrammarBuilder(command)));
}
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
Console.Beep(600, 200);
Console.Beep(600, 200);
}
static void SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
//Log(e.Result.Text);
}
static void SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
//Log("Detected speech.");
}
static void SpeechRecognition(object sender, SpeechRecognizedEventArgs e)
{
string resultText = e.Result.Text.ToLower();
float confidence = e.Result.Confidence;
SemanticValue semantics = e.Result.Semantics;
Log("\nRecognized: " + resultText + " | Confidence:" + confidence);
if (confidence < 0.6)
{
Log("Not sure what if you said that. Not proceeding.", ConsoleColor.Red);
return;
}
if (resultText == commands[0])
{
speechOn = !speechOn;
Log("Speech on: " + speechOn);
if (speechOn)
{
Console.Beep(600, 200);
Console.Beep(600, 200);
}
else
{
Console.Beep(400, 400);
}
return;
}
if (!speechOn)
{
Log("AI is muted. Not doing any commands.");
Console.Beep(400, 200);
return;
}
if (resultText == commands[1]) //Open link on clipboard.
{
Thread clipboardThread = new Thread(param =>
{
if (Clipboard.ContainsText(TextDataFormat.Text))
{
clipboardText = Clipboard.GetText(TextDataFormat.Text);
}
});
clipboardThread.SetApartmentState(ApartmentState.STA);
clipboardThread.Start();
clipboardThread.Join();
Log(clipboardText);
Process.Start(clipboardText);
}
if (resultText == commands[2]) //Open browser
{
Process.Start("https://google.com");
}
if (resultText == commands[3]) //Open work music
{
Process.Start("https://youtu.be/Qku9aoUlTXA?list=PLESPkMaANzSj91tvYnQkKwgx41vkxp6hs");
}
if (resultText == commands[4]) //Open Github new repository
{
Process.Start("https://github.com/new");
}
if (resultText == commands[5]) //Sleep computer
{
SetSuspendState(false, true, true);
}
if (resultText == commands[6]) //Shutdown computer
{
Process.Start("shutdown", "/s /t 0");
}
if (resultText == commands[7]) //Open story
{
Process.Start("https://docs.new");
}
if (resultText == commands[9]) //Open Rocket League
{
Process.Start("C:\\Users\\USER\\Documents\\SteamLauncher\\RocketLeague.exe");
}
}
static void Log(string input, ConsoleColor color = ConsoleColor.White)
{
if (shouldLog)
{
Console.ForegroundColor = color;
Console.WriteLine(input);
Console.ResetColor();
}
}
}
}


It helps me if you share this post

Published 2019-05-22 18:10:00