Let’s talk about REGEX.

Cloudflare, one of the biggest companies on the internet in terms of reach, expanse, control, and practicability out there. They specialize in internet security, safety, and general privacy. They’ve been named the leading DDoS Prevention source In fact, I use them for my own DDoS protection needs.

A couple weeks ago, for 30 minutes, a LARGE chunk of the internet was down. This is because CloudFlare serves as the “gateway” or access point for many such websites, including some cryptocurrency endevours. Needless to say, they have quite a bit of control over the internet.

Why did everything break? Because of a little something called REGEX. A CloudFlare engineer apparently screwed up with a “regex” rule.

https://twitter.com/mjos_crypto/status/1146168236393807872?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1146285558769029120&ref_url=https%3A%2F%2Fwww.theregister.co.uk%2F2019%2F07%2F03%2Fcloudflare_cockup%2F

What is REGEX?

It stands for “regular expression”, and is a sequence of characters that defines a search pattern. This search pattern can be used to compare against strings to find characters or sub-strings that match the said pattern. It was original created by a man named Stephen Cole Kleene. He was a mathematician in 1951, and he described what would eventually become early implementations of pattern matching.

REGEX is usually a bad idea, unless you’re attempting a pattern match, or you really know what you’re doing. Whenever working with regular expressions, be VERY careful that you double, and then triple check the logic of what you’re writing.

([A-Z])\w+

This is a pattern that matches strings of characters that start with a capital letter until it hits a space. Confusing, right?

\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b is a more complex pattern. It describes a series of letters, digits, dots, underscores, percentage signs and hyphens, followed by an at sign, followed by another series of letters, digits and hyphens, finally followed by a single dot and two or more letters. In other words: this pattern describes an email address.

Oh yeah. But not only are they confusing, they can be dangerous. Let me give you an example.

Suppose we have the following, very simple regex pattern:

(x+x+)+y

What this does is match any string with two or more “X”s followed by a “Y”. This would match these strings:

“xxxxxy”, “xxxxxxxxxxxxxxxxxxxy”, “xxy”, “xxxy”, and any number of “X”s followed by a Y you can imagine. It will NOT match “xy”. Okay, fair enough. Simple enough, right?

Let’s take a look at what’s going on when compared to this string:

xxxxxxxxxxy

The first X+ will match all 10 X characters. The second X fails. The first x+ then “backtracks” algorithmically to 9 matches, and the second one picks up the remaining x. The group has now matched once. The group repeats, but fails at the first X. Since one repetition was sufficient, the group matches. The Y character then matches and an overall match is found. The coder sees the correct return value, the regex is declared functional, the code is pushed into the wild, and our computers get just a little closer to exploding.

Except… what happens if the Y character didn’t match? Like, what if there wasn’t a Y character in that string? Then what would happen?

The regex engine backtracks. Hard. The group has one iteration it can backtrack into. The second X matched only one X, so there’s nothing it can do there. “But wait,” The program thinks. “What if the X+ gives up a matching X character?”

So it matches the first X+ to 8 Xs instead of 9. The second x+ promptly matches xx. The group again has one iteration, fails the next one, and the Y fails. Stay with me here. Backtracking again, the interpreter now realizes that the second X+ contains a position it can backtrack into, by removing one of the Xs from the second match and combining it with the first X+. It’s all very confusing, but keep in mind it’s basically just the computer trying all possible combinations to attempt to match the string.

The group tries a second iteration. The first X+ matches but the second X+ doesn’t. Backtracking again, the first X+ in the group’s first iteration reduces itself to 7 characters. The second X+ matches XXX. Now maybe it matches? No, there’s still no Y character.

Failing, the interpreter tries again, since there are still many more combinations it can try. The second X+ is reduced to XX, it tries again, fails, and then reduces further back into the original X. Now, the group can match a second iteration, with one X for each X+. But this wacky combination fails too.

Are you starting to see the problem here? This is off ONE expression. If you go to

https://regexr.com/

and put the same string in that I did without the Y, it will actually fail to match after adding a couple Xs because it will time out from taking so long.

With the Y, it completes within a millisecond.
Without the Y and a couple more Xs? Oh boy.

At least it tells us. If this exact situation happened inside a .NET application, it would crash, since a stack overflow would probably happen.

Now, the first thing you might think is: “Uh, just remove the parenthesis to fix  the nested quantifiers, genius”, but let’s replace each “X” with another, more complicated example. Something that might be coded in the workplace:

^(.*?,){19}A

Not too much worse than what we had above. Now some context: The person writing this regex has a text file and they were attempting to figure out where the 20th item on a line started with an A.

Doesn’t seem like that hard of a task, and the regex works. What’s it doing, though? Same as the above example, it’s just harder to tell.

The regex looks like it work fine. The lazy dot and comma match a single comma-delimited field, and the {19} skips the first 19 fields. Finally, the A checks if the 20th field indeed starts with A. In fact, this is exactly what will happen when the 20th field indeed starts with a A. Straightforward logic.

When the 20th field does not start with a A, that’s when the problems start. Let’s say the string is “1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20” At that point, the regex engine will backtrack. It will backtrack to the point where ^(.*?,){19} had consumed “19“, giving up the last match of the comma. The next token is again the dot. The dot matches a comma, since a dot is a wildcard character. The comma does not match the 1 in the 20th field, so the dot continues until the 19th iteration of .*?, has consumed “19, 20” You can already see the root of the problem: the part of the regex (the dot) matching the contents of the field also matches the delimiter (the comma). Because of the double repetition (star inside {19}), this leads to a catastrophic amount of backtracking. At this rate, it would take multiple seconds just to check one line of text which should take less than a tenth of a millisecond. This can lead to HUGE lagspikes. In Cloudflare’s case, their server’s CPUs were maxed out to 100% load instantly, causing the internet as we know it to vanish for a solid 30 minutes as a software engineering positioning was busy being posted on Glassdoor.

How do you fix this?

Like this:

^([^,\r\n]*,){19}A

The trick is to be more specific. The complexity of the original regex expression was exponentially difficult, and contained a complexity of O(2n), which is terrible. The time it takes to complete said RegEx searches are exponentially higher the more characters in the string, and leads to terrible workarounds.

If you’d like to learn more about complex RegEx expressions, pattern matching, and how they could be vastly improved with the Ken Thompson algorithm, read this.

Now excuse me while I go back to something normal and non-confusing, like JavaScript.


It helps me if you share this post

Published 2019-07-15 18:55:26

How to create a simple voice-activated assistant in C#.

This is really old. I will release another tutorial updating this eventually. Follow my blog to get an update when that happens. Thanks!

While this sounds advanced (and it can be), it’s not that hard to set up a very basic setup where a custom application runs in the background in C# by using the built in speech recognition libraries in Windows 10.

Taking this idea further, I personally have a “Jarvis” that runs on my computer, automating basically all of my common actions, including launching games, music, sleeping my computer, adjusting the volume, minimizing windows, controlling the lights, and (best of all), sending emails and messages. I recommend using an external API for speech recognition if you’re serious about building something similar, as Microsoft’s sucks. You can build your own, or attempt to use something like Google’s API.

Anyway, here’s some simple C# code that should get some ideas flowing.


using System;
using System.Diagnostics;
using System.Globalization;
using System.Runtime.InteropServices;
using System.Threading;
using System.Windows.Forms;
using Microsoft.Speech.Recognition;
using Process = System.Diagnostics.Process;
using System.Diagnostics;
namespace VoiceAssistant
{
class Program
{
#region Native Stuff
const int Hide = 0;
const int Show = 1;
[DllImport("Kernel32.dll")]
private static extern IntPtr GetConsoleWindow();
[DllImport("User32.dll")]
private static extern bool ShowWindow(IntPtr hWnd, int cmdShow);
[DllImport("PowrProf.dll", CharSet = CharSet.Auto, ExactSpelling = true)]
public static extern bool SetSuspendState(bool hiberate, bool forceCritical, bool disableWakeEvent);
#endregion
static SpeechRecognitionEngine speechRecognitionEngine;
static bool speechOn = true;
private static string clipboardText;
private static bool shouldLog = true;
private static readonly string[] commands =
{
"assistant mute",
"assistant open clipboard",
"assistant new tab",
"assistant work music",
"assistant new github",
"assistant sleep computer confirmation 101",
"assistant shut down computer confirmation 101",
"assistant open story",
"assistant open rocket league"
};
static void HideWindow()
{
//Hide window
IntPtr hWndConsole = GetConsoleWindow();
if (hWndConsole != IntPtr.Zero)
{
ShowWindow(hWndConsole, Hide);
shouldLog = false;
//ShowWindow(hWndConsole, Show);
}
}
static void Main(string[] args)
{
HideWindow();
//Console.WriteLine("[ASSISTANT AI INITIALIZED]");
CultureInfo cultureInfo = new CultureInfo("en-us");
speechRecognitionEngine = new SpeechRecognitionEngine(cultureInfo);
speechRecognitionEngine.SetInputToDefaultAudioDevice();
speechRecognitionEngine.SpeechRecognized += SpeechRecognition;
speechRecognitionEngine.SpeechDetected += SpeechDetected;
speechRecognitionEngine.SpeechHypothesized += SpeechHypothesized;
LoadCommands();
while (true)
{
Thread.Sleep(60000);
}
}
static void LoadCommands()
{
/*Grammar muteCommand = new Grammar(new GrammarBuilder(commands[0]));
Grammar browserOpenCopiedLink = new Grammar(new GrammarBuilder(commands[1]));
Grammar browserCopyLink = new Grammar(new GrammarBuilder(commands[2]));
speechRecognitionEngine.LoadGrammar(muteCommand);
speechRecognitionEngine.LoadGrammar(browserOpenCopiedLink);
speechRecognitionEngine.LoadGrammar(browserCopyLink);*/
foreach (string command in commands)
{
speechRecognitionEngine.LoadGrammarAsync(new Grammar(new GrammarBuilder(command)));
}
speechRecognitionEngine.RecognizeAsync(RecognizeMode.Multiple);
Console.Beep(600, 200);
Console.Beep(600, 200);
}
static void SpeechHypothesized(object sender, SpeechHypothesizedEventArgs e)
{
//Log(e.Result.Text);
}
static void SpeechDetected(object sender, SpeechDetectedEventArgs e)
{
//Log("Detected speech.");
}
static void SpeechRecognition(object sender, SpeechRecognizedEventArgs e)
{
string resultText = e.Result.Text.ToLower();
float confidence = e.Result.Confidence;
SemanticValue semantics = e.Result.Semantics;
Log("\nRecognized: " + resultText + " | Confidence:" + confidence);
if (confidence < 0.6)
{
Log("Not sure what if you said that. Not proceeding.", ConsoleColor.Red);
return;
}
if (resultText == commands[0])
{
speechOn = !speechOn;
Log("Speech on: " + speechOn);
if (speechOn)
{
Console.Beep(600, 200);
Console.Beep(600, 200);
}
else
{
Console.Beep(400, 400);
}
return;
}
if (!speechOn)
{
Log("AI is muted. Not doing any commands.");
Console.Beep(400, 200);
return;
}
if (resultText == commands[1]) //Open link on clipboard.
{
Thread clipboardThread = new Thread(param =>
{
if (Clipboard.ContainsText(TextDataFormat.Text))
{
clipboardText = Clipboard.GetText(TextDataFormat.Text);
}
});
clipboardThread.SetApartmentState(ApartmentState.STA);
clipboardThread.Start();
clipboardThread.Join();
Log(clipboardText);
Process.Start(clipboardText);
}
if (resultText == commands[2]) //Open browser
{
Process.Start("https://google.com");
}
if (resultText == commands[3]) //Open work music
{
Process.Start("https://youtu.be/Qku9aoUlTXA?list=PLESPkMaANzSj91tvYnQkKwgx41vkxp6hs");
}
if (resultText == commands[4]) //Open Github new repository
{
Process.Start("https://github.com/new");
}
if (resultText == commands[5]) //Sleep computer
{
SetSuspendState(false, true, true);
}
if (resultText == commands[6]) //Shutdown computer
{
Process.Start("shutdown", "/s /t 0");
}
if (resultText == commands[7]) //Open story
{
Process.Start("https://docs.new");
}
if (resultText == commands[9]) //Open Rocket League
{
Process.Start("C:\\Users\\USER\\Documents\\SteamLauncher\\RocketLeague.exe");
}
}
static void Log(string input, ConsoleColor color = ConsoleColor.White)
{
if (shouldLog)
{
Console.ForegroundColor = color;
Console.WriteLine(input);
Console.ResetColor();
}
}
}
}


It helps me if you share this post

Published 2019-05-22 18:10:00

semver.org

Standards are important, especially with computers. Without standards, you end up with crap like JavaScript, HTML, and CSS. Too little, too late.

And one of the things that’s needed a standard for a very, very, very long time are version numbers. Ever notice some versions for software are like 2019.2.4, while others are like 1.0, 1.1, 0.1, alpha, beta, beta-0rc1, and 89.23x? It’s so confusing to know whether anything is up to date, what you’re updating to, and who is on what. Is version 0.9 of the triangle generator library compatible with version v1.3.0m of the graphics processing library?

Who knows, because everyone just kinda comes up with an arbitrary number to represent the state that their letters of code are currently in.

Here’s how Fortnite does version numbers.

Fortnite does version numbers like this, while Overwatch does version numbers like… this…

Yeah… I dunno either. Epic Games and Blizzard are both major companies, though. Surely they follow some sort of protocol?

Here’s how Steam does version numbers.

Cool. None. Just a date.

Meanwhile, Google Chrome over here is on version “74.0.3729” currently, so good for them.

Anyway, the point is, there needs to be a standard so that it’s simple and clear to see how much something has updated since your version, what version you have in relation to the latest version, and for simplicity’s sake, not five hundred characters.

Introducing semver.org, it is a global attempt at a standard for software versioning around the world. It’s a simple, clean, and effective method of versioning your software, and since discovering it I have been adopting it into all of my new projects, and as many of the older ones I’m still currently working on as I can. If you want to help, simply go to semver.org, read the rules, and share it with other software engineers.


It helps me if you share this post

Published 2019-05-03 01:25:27

The future of phones: Is it folding?

The verge released this video, and I wanted to write a couple of my thoughts down as well.

I don’t think that as of right now, you should buy a folding phone. A folding phone is just too bulky, too big, not refined enough, and way too expensive. Your money would be better spent elsewhere.

The folding technology would be better served, I think, in tablets. Portable tablets that you can fit into your pocket while you’re going to your flight sound much better, to me, than a phone that can turn into a tablet.

And, in the future, phones will be able to. But for now, the thickness of the device coupled with the space between the two slabs of device renders it a choice I’m not even gonna consider this year or the year after, until they get flatter and cheaper.

I’m not saying that this wasn’t expected. It is. This is the natural early release of any revolutionary technology or product. The first iPhone compared to the 5th was a WORLD of difference in only a few short years. The same will be true of the folding phones. I just recommend not jumping on the bandwagon just yet, as I have faith that the technology will make leaps and bounds and be nearly unrecognizable from the clunky device that we have in front of us right now.

EDIT 2/7/2022: The Flip3 from Samsung is almost there. Few more years and it’ll be mainstream 🙂


It helps me if you share this post

Published 2019-04-15 15:42:07

How to create custom tiles for your Windows 10 start menu.

Download the program from here that is called “Custom Native Tile” and run it. Type your search query into the bar at the top to filter for programs. Click on the program path listed in the viewing window and then click SELECT IMAGE. Click the image that you’d like to use for the tile image (recommended: square aspect ratio and about a 300×300 resolution), and hit SAVE. That’s it!

Patrons get early access to programs like this and a bunch of other perks!


It helps me if you share this post

Published 2019-04-03 08:38:42

The problems with Unity’s business model.

$125 is a chunk of change. $125/month is even more. When a monthly subscription is offered, it’s because that subscription is consistently bringing in value.

This is exactly what Unity3D, a widely used game engine, is asking from “freelancers”. They recommend using their “Pro” tier, which is $125 per month, if you’re in a team or you’re a “freelancer”, whatever they define that to be.

If you’re a “hobbyist”, you should apparently pay $35 a month, or ~$25 per month if you prepay for a year.

If you’re a “beginner” (or don’t have $300 laying around to pay per year), then you should use the free version.

Now, let’s talk about benefits that these versions give you.

This screenshot may be out of date to their current pricing.

The Unity page lists “benefits” of their Pro and Plus versions, while listing nothing for the Personal version. However, in my opinion, the benefits listed are virtually worthless. I have never used or wanted to use any of them, and I own the Plus version.

Here are my “benefits” that I get with my Plus license:

Support to accelerate learning & development

  • Benefits with Prepaid plan only:
  • Learn the essentials of game development with 12 months access to Unity Game Dev Courses ($144 value)
  • Get 25GB Unity Cloud Storage ($60 value)
  • Attend monthly Expert Live Sessions. Speed up your development with technical know-how from Unity engineers ($240 value)
  • Limited access to a Customer Success Advisor: get help finding the tools and resources you need to succeed
  • Save 20% on top-rated assets in the Asset Store*

Personally, I don’t care about any of these things. You might. However, there are two features I DO care about, being a professional software engineer who wants the things they make to look polished.

1) Dark theme

2) Splash screen controls (and ability to disable built-in Unity splash screen)

Theme Comparisons

Light Theme

 

Dark Theme

Some of you may think, “so what?”, but I can tell you that the light theme is an absolute eye-sore, especially if you’ve been staring at a screen for 8 hours.

The Splash Screen

And of course, the main reason why everyone who’s serious about developing games purchases a license for Unity: the splash screen.

You see, Unity forces non-subscribers to display an obnoxious “Made with Unity” or “Powered by Unity” (depending on which version of said engine you have), that looks something like this:

This is a bad move. You may be thinking to yourself right about now: “Well, makes sense, because they want to get at least SOMETHING out of distributing their engine for free. Why not popularity?”

This is true. Except it will be bad popularity. Let’s walk through this.

Let’s imagine there are two people using Unity. Bob, who has never developed anything in his life, and Kyle, who is a professional at developing games. Bob makes a crappy little box simulation with built in assets and it runs like crap because it is crap. No offense to Bob, he’s just completely new to developing games. He’s also using the personal version of Unity, obviously, because he’s brand new and wants to try to make something cool. He happily publishes his creation online, and some people download his game and see what an absolute mess it is. They also notice a very large, long, “Made with Unity” splash screen that displays for five seconds. Their parting thoughts? “Wow, Unity must be for people who don’t know how to make games.”

Kyle, on the other hand, is a professional. He buys Unity Plus for ~$25 a month because he hates the Unity splash screen, and wants to remove it so that he can put his own splash screen or logo. When Kyle uploads his professionally made, polished game, people enjoy it. And they also don’t know it’s made with Unity, because he removed it.

Notice a pattern here? Unity has received a very bad reputation among the gamer community (and somehow no one can figure out why), because every terrible game ever has a “Made with Unity” splash screen. What Unity SHOULD be doing, is PAYING developers such as the ones who made Cuphead (which is made with Unity if you didn’t know before) to put the Unity Splash on their game, and letting beginners remove it. Beat Saber is an immensely popular VR game that is made with Unity, but no general consumer is aware of that fact. Unity should be trying to control the positive PR as much as possible to drive more developers to their platform and rid the “terrible game engine” stigma from the engine’s name.

Unity states that they’re “the world’s leading real-time engine”, and is “used to create half of the world’s games”. They might want to start trying to put their name on the good ones.


It helps me if you share this post

Published 2019-02-16 01:45:28