{"id":2775,"date":"2023-06-10T18:36:40","date_gmt":"2023-06-11T01:36:40","guid":{"rendered":"https:\/\/rose.dev\/blog\/?p=2775"},"modified":"2023-06-10T18:48:47","modified_gmt":"2023-06-11T01:48:47","slug":"ai-music-generation-musicgen","status":"publish","type":"post","link":"https:\/\/rose.dev\/blog\/2023\/06\/10\/ai-music-generation-musicgen\/","title":{"rendered":"AI Music Generation: MusicGen"},"content":{"rendered":"\n<p>Researchers have recently released a new paper and subsequent model, &#8220;Simple and Controllable Music Generation&#8221;, where they highlight it &#8220;is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models&#8221;. What this essentially means in practice is the music generation can now be completed in less steps, and is getting more efficient as we make progress on various different types of models. <\/p>\n\n\n\n<p>I expect AI to hit every industry in an increasingly rapid pace as more and more research becomes available and progress starts leapfrogging based on other models. MUSICGEN was trained with about 20K hours of unlicensed music, and the results are impressive.<\/p>\n\n\n\n<p>Here are some interesting generations I thought sounded nice. As more models from massively trained datasets hit the public, we will see more community efforts and models as well just like with art. <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Medium Model<\/h2>\n\n\n\n<p>I used the less performant <a rel=\"noreferrer noopener\" href=\"https:\/\/huggingface.co\/facebook\/musicgen-medium\" target=\"_blank\">medium model<\/a> (1.5B parameters and approx 3.7 GB) to demonstrate how even on relatively poor hardware you could achieve reasonable results. Here is some lofi generated from the medium model.<\/p>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/ai_cry_lofi.wav\"><\/audio><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Large Model<\/h2>\n\n\n\n<p>A step up is <a rel=\"noreferrer noopener\" href=\"https:\/\/huggingface.co\/facebook\/musicgen-large\" target=\"_blank\">the 6.5 GB model<\/a>. This produce slightly better sounding results.<\/p>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/ai_large_lofi.wav\"><\/audio><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What is that melody?<\/h2>\n\n\n\n<p>There is also <a href=\"https:\/\/huggingface.co\/facebook\/musicgen-melody\" target=\"_blank\" rel=\"noreferrer noopener\">a &#8216;Melody&#8217; model<\/a> that is a refined 1.5B parameter version. <\/p>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/ai_whatmelody_lofi.wav\"><\/audio><\/figure>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/ai_whatmelody_lofi2.wav\"><\/audio><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Limitations<\/h2>\n\n\n\n<p>There are a few limitations on this model, namely the lack of vocals.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p><strong>Limitations:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The model is not able to generate realistic vocals.<\/li>\n\n\n\n<li>The model has been trained with English descriptions and will not perform as well in other languages.<\/li>\n\n\n\n<li>The model does not perform equally well for all music styles and cultures.<\/li>\n\n\n\n<li>The model sometimes generates end of songs, collapsing to silence.<\/li>\n<\/ul>\n<\/blockquote>\n\n\n\n<p>However, future models and efforts will remedy these points. It&#8217;s only a matter of time before a trained vocal model is released with how fast machine learning advancements are accelerating. <\/p>\n\n\n\n<div data-wp-interactive=\"core\/file\" class=\"wp-block-file\"><object data-wp-bind--hidden=\"!state.hasPdfPreview\" hidden class=\"wp-block-file__embed\" data=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/2306.05284.pdf\" type=\"application\/pdf\" style=\"width:100%;height:600px\" aria-label=\"Embed of 2306.05284_research.pdf.\"><\/object><a id=\"wp-block-file--media-dcae7123-efce-4d62-8a83-05eaaf3949be\" href=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/2306.05284.pdf\">2306.05284_research.pdf<\/a><a href=\"https:\/\/rose.dev\/blog\/wp-content\/uploads\/2023\/06\/2306.05284.pdf\" class=\"wp-block-file__button wp-element-button\" download aria-describedby=\"wp-block-file--media-dcae7123-efce-4d62-8a83-05eaaf3949be\">Download<\/a><\/div>\n<hr>\r\nIt helps me if you share this post<br>\r\n<small style=\"user-select: all\">https:\/\/rose.dev\/blog\/2023\/06\/10\/ai-music-generation-musicgen\/<\/small>\r\n<br\/>\r\n<br\/>\r\nPublished 2023-06-10 18:36:40 ","protected":false},"excerpt":{"rendered":"<p>Researchers have recently released a new paper and subsequent model, &#8220;Simple and Controllable Music Generation&#8221;, where they highlight it &#8220;is comprised of a single-stage transformer LM together with efficient token interleaving patterns, which eliminates the need for cascading several models&#8221;. What this essentially means in practice is the music generation can now be completed in &hellip; <a href=\"https:\/\/rose.dev\/blog\/2023\/06\/10\/ai-music-generation-musicgen\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">AI Music Generation: MusicGen<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","jetpack_post_was_ever_published":false},"categories":[835,832,833],"tags":[889,911,1186],"class_list":["post-2775","post","type-post","status-publish","format-standard","hentry","category-misc","category-software","category-technology","tag-ai","tag-generation","tag-musicgen"],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/posts\/2775","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/comments?post=2775"}],"version-history":[{"count":9,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/posts\/2775\/revisions"}],"predecessor-version":[{"id":2789,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/posts\/2775\/revisions\/2789"}],"wp:attachment":[{"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/media?parent=2775"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/categories?post=2775"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/rose.dev\/blog\/wp-json\/wp\/v2\/tags?post=2775"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}