Fuzzle🔗

Fuzzle is a search algorithm which uses iterations and calculations to split data into categories as well as sort by coverage and accuracy values to find the most suitable results based on a search.

Due to it being designed as a general use algorithm, it can be written into many object-oriented languages such as Java, C# as well as more scripting-oriented languages such as Python and JavaScript.

Index🔗

  1. Algorithm
  2. Versions
  3. Demo
Overview
Type Search Algorithm
Developer Dan6erbond
Languages Java, JavaScript, Python
Stage Released/Development
Contributors XelaaleX1234,
Dan6erbond
Source Code Open on GitHub

Algorithm🔗

Fuzzle's algorithm can be broken into many little parts allowing it to easily be implemented step-by-step into different languages as well as for new features to be added. The key aspects of the way Fuzzle works are documented in this section. The code examples are written in Python as it's the first language with which Fuzzle was implemented and more readable for programmers that aren't familiar with any of the used languages.

Splitting the Search into Parts🔗

One of the first things Fuzzle does is split the search query into smaller parts for the code to later use to calculate coverage in an efficient matter. Depending on how this part of the algorithm is implemented, certain steps can be skipped, but the most concise version that has been implemented is used in the documentation for clarity.

This process is done with nested for loops we get all the possible sub-strings within a given string:


    

The first for loop iterates over all the possible lengths the sub-strings can have, being the range from 1 to whatever length the string is.
In the Python example range() does not include the last number which is why 1 is being added to len(s).

Within the loop iterating over all the possible lengths, we go through every index in the string until the position where no new sub-strings can be found as the length of the current iteration dictates where that is, we use len(s) - size + 1 to limit that. For good measure, before adding the sub-string to the list of sub-strings (here called parts), we check if the sub-string is already to be found in the list to avoid duplicates and optimize performance in future iterations over this list.

To further optimize, the function sorts the list by each sub-string's length in descending order as the original search algorithm only needed to find the longest sub-string that covered a certain key in the dataset.

With the example of the String "Python", the code will break it into the following list:

['Python', 'Pytho', 'ython', 'Pyth', 'ytho', 'thon', 'Pyt', 'yth', 'tho', 'hon', 'Py', 'yt', 'th', 'ho', 'on', 'P', 'y', 't', 'h', 'o', 'n']

Iterating over all the options🔗

After we have the different parts of a search query as well as calculated some of the more constant values, we can iterate over the given keys. The following variables should be properly initialized before the loop to optimize for performance:

Ensuring input meets the requirements

Depending on what language you're working with, you will want to normalize the given key before moving to the next steps. In the Python and JavaScript version, strings as well as dictionaries (objects in JavaScript) are accepted which means in this step they must all be converted to dictionaries/objects.

In languages such as C# and Java a class can be created to enforce certain parameters for the user whereas in languages like Python and JavaScript this step is necessary.

Iterating over Parts

In this step we iterate over the different parts we previously stored in a list of strings as well as initialize some variables which are set within the loop (or while initalizing) to later check if the current key is eligible to be returned.

For that we need to initialize both the best and construct variable where we store the best part as well as create a pseudo-string of sub-strings that were found in both the key and the search query. Following shows how this is done in the Python version:


    

After looping over all the parts, we can calculate the coverage variable which shows us how much of the search query's parts/letters were found in the key with a certain tolerance.

Iterating over Words

In this step we first initialize some more variables before iterating over the words we previously stored in a list of strings as this iteration will manipulate some of those values:


    

Even though some of these variables could be of the type boolean in typesafe languages, we're using integer for the most part as it allows us to calculate the accuracy later on. With a simple ternary operator the variables can be set to either their maximum value or zero depending on whether the ternary yields True or False.

Now that those variables have been initialized and some of them given a value we can iterate over the words in the search query to set some of those variables' values and then calculate the accuracy:


    

Knowing the coverage value as well as whether the search's words were found in the tags (if there are any for this key), we can decide if the current key is eligible to be added to the list, otherwise return. That is the case if there were either no words found in the tags or the coverage value is below a certain threshold:



    

Splitting into Categories

At this step we know whether the current key will even be returned, so now we can assign it a category and then add it to the list which will be sorted later on and returned at the end of the method/function:


    

The last statement should technically never be executed as the previous if statement should have ensured no more keys make it through, but in the event that one does, the loop will just continue to the next element.

Assigning values to the result

In this step we'll be assigning some of the calculated variables to the object that will be returned as in the next step we'll be using these attributes to sort the list before returning it:



    

Sorting and returning the results

All that we're left with at this point is a list of dictionaries/objects with the attributes key, tags, coverage, accuracy, cat and match. At the end of the function we want to sort the list we'll be returning by accuracy first and then by category as we want to have the list "split" into categories and within the "sub-lists" they're sorted by accuracy:



    

Versions🔗

👷 🚧 Work in progress! 🚧

Python🔗

👷 🚧 Work in progress! 🚧

JavaScript🔗

👷 🚧 Work in progress! 🚧

Java🔗

👷 🚧 Work in progress! 🚧

Demo🔗

Be the first to try Fuzzle over here! The datasets are all open as part of Fuzzle's Git repository.


Search through 1000 movies.

Results