DATATEAM.PL PRESENTS
QueryAnonymizer Logo B&W

QueryAnonymizer is a Python library that allows you to anonymize SQL and DAX queries. The library consists of 2 simple functions that will allow you to effectively anonymize any sensitive data from the queries you are working on before copying it to any online tool you would like to use in your work.

Current Version: 1.1.4
Last Change: 03/2024
PyPI: QueryAnonymizer

QueryAnonymizer Explainer Video Main Page Placeholder
Go to: Instructions
Go to: Case Studies
Go to: FAQ

SECTION SUMMARY
Learn how to

– Install QueryAnonymizer
– Update QueryAnonymizer

installation

Install QueryAnonymizer library

Binary installers for the latest released version are available at the Python Package Index (PyPI). You can download the latest version of QueryAnonymizer library on your device using pip instruction by entering the code below in the command line:

Query Anonymizer Installation
pip install queryanonymizer
provided by DataTeam.pl

The query anonymization library is constantly updated. If you have already downloaded the library to your device and want to update it to the latest version, just enter the following code in the command line:

Query Anonymizer Update
pip install queryanonymizer --upgrade
provided by DataTeam.pl

The source code is currently hosted on GitHub

QueryAnonymizer Application Video Placeholder

Featured DataTeam Solution:

Run_QA_as_app

We’ve crafted a deployable Python script tailored to run QueryAnonymizer seamlessly as an application on your device! This way, you can dedicate your focus to the task at hand without the hassle of dealing with Python code!

Download the script (.py)

SECTION SUMMARY
Learn how to

– Anonymize SQL Query

first demo | Anonymization

SQL Query Anonymization

QueryAnonymizer is a Python library that allows you to anonymize SQL and DAX queries. The following code uses the „anonymize” function to anonymize the sample SQL query contained in the „my_query” parameter. After anonymization, QueryAnonymizer will return the anonymization result – the anonymized query from the parameter: „my_query” and the deanonymization dictionary library to the terminal. Additionally, your anonymized query will be saved in the anonymized_query variable, and the anonymization dictionary will be saved in the decoder_dictionary variable, which is an essential element required during deanonymization, i.e. decoding a previously anonymized query.

SQL Query Anonymization | Example 1
from queryanonymizer import anonymize

my_query = """
SELECT 
  order_id, 
  customer_id, 
  order_date, 
  employee_id,
  first_name, 
  last_name 
FROM 
  orders 
WHERE 
  order_date >= '2023-01-01' 
  AND customer_id > 148615 
  AND order_id between 1641 and 2567
  and first_name in ('Tom','Seth') 
  and last_name like "%ski"
"""

anonymize(query=my_query)
provided by DataTeam.pl

You can change the value of the „my_query” parameter to your own SQL query and then check how your query has been anonymized. By default, QueryAnonymizer anonymizes all query elements that do not belong to the keywords of the anonymized language. This means that by default all texts, numbers and dates contained in the query will be anonymized. This behavior can be changed using the parameters described in the next example.

SECTION SUMMARY
Learn how to

– Anonymize SQL Query
– Deanonymize SQL Query
– Use Custom Parameteres

  • anonymize dates
  • anonymize numbers
  • anonymize strings

– Use Decoder Dictionary File

Second demo | Anonymization / Deanonymization

SQL Query Anonymization with Custom Anonymization Parameters

Each query may be slightly different, which is why parameterizing the way QueryAnonymizer works is so important. The following example anonymizes the sample SQL query (from the previous example), but the code below has been supplemented with four parameters (described below). Test this code to understand how the parameters work.

SQL Query Anonymization | Example 2
from queryanonymizer import anonymize

my_query = """
SELECT 
  order_id, 
  customer_id, 
  order_date, 
  employee_id,
  first_name, 
  last_name 
FROM 
  orders 
WHERE 
  order_date >= '2023-01-01' 
  AND customer_id > 148615 
  AND order_id between 1641 and 2567
  and first_name in ('Tom','Seth') 
  and last_name like "%ski"
"""

anonymized_query = anonymize(query=my_query
          , anonymize_dates=False
          , anonymize_numbers=False
          , anonymize_strings_inside_quotation_marks=True
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

The above code returns an anonymized query contained in the „my_query” parameter, but the result of this anonymization is different from the result of the previous example. The main difference is the exclusion of dates and numbers from anonymization and the inclusion of characters inside quotation marks in full anonymization. All parameters available in the QueryAnonymizer library can be found described: HERE.

The second change that may be unnoticeable at first is the creation of the decoder_dictionary.json file. This file is an essential element of the QueryAnonymizer, which will be used to deanonymize queries in the next step. The decoder_dictionary.json file contains a dictionary of the original query elements and their anonymized equivalents.

To deanonymize a query that was previously anonymized with QueryAnonymizer, use the „deanonymize” function. The following code uses the file: decoder_dictionary.json to decode the query that you should paste inside the parameter: anonymized_text

SQL Query Deanonymization | Example 2
from queryanonymizer import deanonymize

anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

After deanonymization is completed, you will see in the console the query that has been „decoded” according to the previously saved decoder dictionary.

QueryAnonymizer is and always will be free, but if you want, you can support the development of this project with a one-time donation. Like what we do? Buy us a coffee!

black-button

SECTION SUMMARY
Learn how to

– Anonymize SQL from .sql file
– Deanonymize SQL from .txt file
– Use Custom Parameteres

  • Path To Query File
  • Path To Anonymized File

– Use Decoder Dictionary File

third demo | anonymize / deanonymize

SQL Query Anonymization from .sql file
to .txt file

QueryAnonymizer also includes parameters that allow you to work with files that you have saved locally on your device. In the example below, you will no longer find the „my_query” parameter because the query entered using this parameter has been replaced by specifying the path to the file that contains the original query. In the example below, the SQL query you want to anonymize is in a file called „query.sql”. The file from which QueryAnonymizer is to download the query for anonymization is indicated using the parameter: „path_to_query_file”.

SQL Query Anonymization to txt file | Example 3
from queryanonymizer import anonymize

anonymize(  path_to_query_file="query.sql"
          , path_to_anonymized_file="anonymized_file.txt"
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

The above code also includes the parameter: „path_to_anonymized_file” with which you enter the path to the file in which the query is to be saved after the anonymization process. In the case of the above code, the query after encoding will be saved in a file called: „anonymized_file.txt”. Of course, the code also includes the „path_to_decoder_dictionary_file” parameter, which was discussed in the previous example and will be used to deanonymize the query.

In the following case, the deanonymization process is based on a reference to the appropriate files that were created during the anonymization process. The following code uses the parameter: „path_to_anonymized_file” which indicates the path to the file containing the query you want to anonymize. Note that in the example below this file is named: „answer_file.txt” and it was not automatically generated during anonymization. The „answer_file.txt” file is a text file into which you will paste your anonymized query after formatting it or working with AI tools.

SQL Query Deanonymization from txt file to txt file | Example 3
from queryanonymizer import deanonymize

deanonymize(path_to_anonymized_file='answer_file.txt'
          , path_to_deanonymized_file='deanonymized_file.txt'
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

As in the case of anonymization in this example, the deanonymization result is not printed in the terminal and instead is saved to the „deanonymized_file.txt” file. The path to the file in which the deanonymization results are to be saved is indicated using the parameter: „path_to_deanonymized_file”

SECTION SUMMARY
Learn how to

– Anonymize SQL Query
– Deanonymize SQL Query
– Use Decoder Dictionary File
– Use <PROMPT></PROMPT>

fourth demo | Using <PROMPT> tag

SQL Query Anonymization with ChatGPT prompt between queries

QueryAnonymizer was created mainly to enable and facilitate working with ChatGPT during query development. One of the additional functions apart from anonymization is the ability to introduce a prompt that describes how ChatGPT should process the submitted query. The following example shows the use of the <PROMPT> tag directly in the „my_query” parameter. Note that the „my_Query” parameter in the ChataGPT prompts has all the information you might want to pass on to an AI tool.

Each prompt you want to add to the query should be enclosed between <PROMPT> and </PROMPT> tags. The text contained in these tags will be anonymized, but only to the extent of the attached SQL query. This means that if you refer to an SQL query element in the prompt, only the elements of that query will be anonymized.

SQL Query Anonymization with <PROMPT> | Example 4
from queryanonymizer import anonymize

my_query = """
<PROMPT>Combine those two queries to get one SELECT statement with all columns</PROMPT>

SELECT 
  order_id, 
  customer_id, 
  order_date, 
  employee_id,
  product_id   
FROM 
  orders 

<PROMPT>The following query connects to the above using the product_id field</PROMPT>

SELECT 
  product_id,
  product_name,
  product_category,
  product_price
FROM 
  products
"""
anonymize(query=my_query
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

Execute the above code and see how QueryAnonymizer encoded all the elements. Enter ChatGPT and paste the entire „my_query” anonymization result that was returned to the terminal.

Using the <PROMPT> tag makes sense primarily if you expect ChatGPT to write an SQL query for you from the code elements you provide it. You can also use this tag to add comments that you want to include in the ChatGPT prompt so that it better understands what action and what result you expect from it.

SQL Query Anonymization with <PROMPT> | Example 4
from queryanonymizer import deanonymize

anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

Deanonymization works in the same way as in the previous examples already described on this page. Paste the content of the ChataGPT response into the „anonymized_text” parameter and then execute the code to perform deanonymization.

QueryAnonymizer Logo Hash

If you use our query anonymization solution or have developed your own solution using our library, post about it and tag us with our #QueryAnonymizer

SECTION SUMMARY
Learn how to

– Anonymize DAX Code
– Deanonymize DAX Code
– Use Keywords Group
– Use Decoder Dictionary File

Fifth demo | Anonymize DAX

DAX Code Anonymization

QueryAnonymizer allows you to anonymize not only SQL queries but also DAX code. The principle of operation of the script remains unchanged, and all parameters described in the previous examples (and in the parameters section) also apply to anonymizing the DAX code. The anonymization language is selected using the „keywords_group” parameter.

Try the code from the example below. Note that the value of the „my_query” parameter is a DAX query, and the parameters section also consists of active parameters from the „anonymize_string_inside…” group, which we recommend to always be set to TRUE when anonymizing DAX code.

DAX Anonymization | Example 5
from queryanonymizer import anonymize

my_query = """
DEFINE
    MEASURE 'Internet Sales'[Internet Total Sales] =
        SUM ( 'Internet Sales'[Sales Amount] )

EVALUATE
SUMMARIZECOLUMNS (
    'Date'[Calendar Year],
    TREATAS (
        {
            2013,
            2014
        },
        'Date'[Calendar Year]
    ),
    "Total Sales", [Internet Total Sales],
    "Combined Years Total Sales",
        CALCULATE (
            [Internet Total Sales],
            ALLSELECTED ( 'Date'[Calendar Year] )
        )
)
ORDER BY [Calendar Year]
"""
anonymize(query=my_query
          , keywords_group="DAX"
          , anonymize_strings_inside_square_brackets=True
          , anonymize_strings_inside_apostrophes=True
          , anonymize_strings_inside_quotation_marks=True
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

Deanonymizing previously anonymized DAX code is done in the same way as deanonymizing SQL code. When deanonymizing, you no longer need to enter the „keywords_group” parameter – you only need to indicate the code to be deanonymized.

DAX Anonymization | Example 5
from queryanonymizer import deanonymize

anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
          , path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.pl

DAX code anonymization is useful when working with GenAI tools such as ChatGPT, code formatting tools such as DAX Formatter, or when using public forums to get help with a complex problem. Remember that anonymization only encodes sensitive data, the query/code structure remains unchanged, so any tools you would like to use to format or develop your code will still recognize this structure and will be able to work without problems.

SECTION SUMMARY
Learn how to

– Use functions parameters
– Use Anonymize Parameters
– Use DeAnonymize Parameters
– Use KeyWords

PARAMETERS

QueryAnonymizer Parameters

QueryAnonymizer is a library that consists of three basic functions: anonymize, deanonymize and keywords. The default operation of each of these functions can be changed using the parameters described below. Using each parameter will change the logic of QueryAnonymizer, so read the description below before making changes to your code.

Click on the selected function below to expand the list of all available parameters:

Anonymize Function

Query

Query is a parameter with which you can enter the content of the query you want to anonymize. The entire content of this parameter will be anonymized. Parameter type: String

query: str = ""

Keywords Group

Keywords Group is a parameter that allows you to select a predefined group of keywords. Keywords are a dictionary of words that are to be interpreted by the script as query components and by default they are to be excluded from anonymization. By using the appropriate Keywords Group in your queries, only sensitive data will be anonymized and not the entire query entered.

keywords_group=KeywordsGroup.SQL

Custom Keywords

Custom Keywords is a parameter that allows you to enter your own (custom) keywords, which will be added to the default keyword dictionary (Keywords Group) of your choice. Keywords entered into the script using this parameter will not be anonymized.

custom_keywords: List[str] = []

Custom Tokens

Custom Tokens is a parameter that allows you to enter a list of keywords that are to be anonymized even though they are in the anonymization dictionary (Keywords Group). The use of this parameter is used when you want to indicate individual keywords that are to be omitted in the anonymization process because you need to refer to them, e.g. when working with ChatGPT

custom_tokens: List[str] = []

Custom Encoder Dictionary

Custom Encoder Dictionary is a parameter that will allow you to enter your own custom anonymization dictionary (Decoder Dictionary File). Using this parameter is useful when you plan to use one previously created anonymization dictionary multiple times. By using this parameter, you can use the dictionary once created so that anonymization returns the same strings of characters each time as anonymized values for the original query elements.

custom_encoder_dictionary: dict = {}

Prompt

Prompt is a parameter that allows you to enter the content you want to use as a prompt when working with an anonymized query in GenAI tools. The prompt you enter will search for words stored in the decoder dictionary and then the prompt will be anonymized using this dictionary. Thanks to this, when you refer to the elements of the original query in the prompt, you will obtain a prompt that will be compatible with the anonymization in the query.

prompt: str = ""

Anonymize String | Inside Apostrophes

Anonymize String Inside Apostrophes is a parameter that allows you to force anonymization of strings that have been marked with an apostrophe, even if the elements in them prevent anonymization. By default, the value of this parameter is set to „False„. Changing the parameter value to „True” will cause each string inside apostrophes to be anonymized.

anonymize_strings_inside_apostrophes: bool = False

Anonymize String | Inside Quotation Marks

Anonymize String Inside Quotation Marks is a parameter that allows you to force anonymization of strings that have been marked with quotation marks, even if the elements in them prevent anonymization. By default, the value of this parameter is set to „False„. Changing the parameter value to „True” will cause each string inside quotation marks to be anonymized.

anonymize_strings_inside_quotation_marks: bool = False

Anonymize String | Inside Square Brackets

Anonymize String Inside Square Brackets is a parameter that allows you to force anonymization of strings that have been marked with square brackets, even if the elements in them prevent anonymization. By default, the value of this parameter is set to „False„. Changing the parameter value to „True” will cause each string inside square brackets to be anonymized.

anonymize_strings_inside_square_brackets: bool = False

Anonymize String | Inside Curly Brackets

Anonymize String Inside Curly Brackets is a parameter that allows you to force anonymization of strings that have been marked with curly brackets, even if the elements in them prevent anonymization. By default, the value of this parameter is set to „False„. Changing the parameter value to „True” will cause each string inside curly brackets to be anonymized.

anonymize_strings_inside_curly_brackets: bool = False

Anonymize Numbers

Anonymize Numbers is a parameter that allows you to force anonymization of all detected numerical values in an anonymized query. By default, the value of this parameter is set to: True, which means that all numeric values are anonymized by QueryAnonymizer by default. If you want numeric values to remain non-anonymous, set this parameter to False

anonymize_numbers: bool = True

Anonymize Dates

Anonymize Dates is a parameter that allows you to anonymize all detected dates in an anonymized query. By default, the value of this parameter is set to: True, which means that each detected date in the entered query will be replaced with an anonymized equivalent. If you want to leave the dates in non-anonymized form, set the value of this parameter to: False

anonymize_dates: bool = True

Word Length

(min) Word Length is a parameter that specifies the minimum length of a string of characters so that the string of characters is considered by the script as a word to be anonymized. By default, the value of this parameter is set to: 3, which means that every string longer than 3 letters (unless it is a keyword included in the Keywords Group) will be anonymized. You can freely change the value of this parameter, but in most queries QueryAnonymizer works best with the value: 3

min_word_length: int = 3

Path To Query File

Path To Query File is a parameter that allows you to enter the path to the file in which you saved your query that you want to anonymize. You can anonymize your queries directly from .sql or .txt files without having to copy them manually into the script (as the parameter value: query).

path_to_query_file: str = ""

Path To Custom Keywords File

Path To Custom Keywords is a parameter that allows you to enter the path to the file in which you have saved a list of keywords to be added to the Keywords Group and to be omitted when anonymizing your query. Creating your own list of custom keywords is useful when you often anonymize your specific queries and want to be sure that the keywords you choose will always be correctly omitted during anonymization.

path_to_custom_keywords_file: str = ""

Path To Custom Tokens File

Path To Custom Tokens is a parameter that allows you to enter the path to the file in which you saved the list of custom tokens, i.e. words that are always to be anonymized, even though they may be in the Keywords Group. Creating a list of custom tokens is useful when you often anonymize specific queries and want to be sure that the words you choose will be omitted in the anonymization process every time.

path_to_custom_tokens_file: str = ""

Path To Prompt File

Path To Prompt File is a parameter that allows you to enter the path to the file in which you saved the content of the prompt you want to anonymize. The principle of prompt anonymization is described in the section: Prompt and remains unchanged when using the Path To Prompt File parameter.

path_to_prompt_file: str = ""

Path To Anonymized File

Path To Anonymized File is a parameter that will allow you to enter the path to the file in which the anonymization results are to be saved. This is not a parameter used to read data – it is only used to save results locally on your device. Creating a file containing an anonymized query is especially convenient when you are working with extremely long queries (and/or) with queries that have been formatted.

path_to_anonymized_file: str = ""

Path To Custom Encoder Dictionary File

Path To Custom Encoder Dictionary File is a parameter that allows you to indicate the path to the file containing the anonymization dictionary prepared by you, e.g. based on previous anonymization. Indicating a path to a dictionary is also used in conjunction with the Path To Decoder Dictionary File parameter. Setting the same path in Path To Decoder Dictionary File and Path To Custom Encoder Dictionary File causes the system to use the dictionary from the previous anonymization, each time supplementing it with new words, if any appear.

path_to_custom_encoder_dictionary_file: str = ""

Path To Decoder Dictionary File

Path To Decoder Dictionary File is a parameter that allows you to enter the path to the file where the anonymization dictionary will be saved, which QueryAnonymizer uses when deanonymizing queries. An anonymization dictionary is a set of original names and their assigned anonymous equivalents for each anonymized element of your query.

path_to_decoder_dictionary_file: str = ""

Custom Date Format

Custom Date Format is a parameter that allows you to enter your chosen date format, which QueryAnonymizer will use to detect dates in the queries you enter, during the anonymization process.

custom_date_format: str = 'YYYY-MM-DD'

Custom Time Format

Custom Time Format is a parameter that allows you to enter your chosen time format, which QueryAnonymizer will use to detect time elements in the queries you enter, during the anonymization process.

custom_time_format: str = 'HH:mm:ss'

Custom DateTime Format

Custom DateTime Format is a parameter that allows you to enter your chosen datetime format, which QueryAnonymizer will use to detect datetime fields in the queries you enter, during the anonymization process.

custom_datetime_format: str = 'YYYY-MM-DD HH:mm:ss'

Print Result

TBA

print_result: bool = True
Deanonymize Function

Anonymized Text

The value of this parameter is the content of the anonymized query that is to be deanonymized, entered as a string

anonymized_text: str = ""

Path To Anonymized File

The value of this parameter is the path to the file containing the content of the anonymized query that is to be deanonymized

path_to_anonymized_file: str = ""

Path To Deanonymized File

The value of this parameter is the path to the file in which the deanonymization result is to be saved

path_to_deanonymized_file: str = ""

Decoder Dictionary

The value of this parameter is the anonymization decoder that the script will use to decode the anonymized query

decoder_dictionary: dict = {}

Path To Decoder Dictionary

The value of this parameter is the path to the previously created anonymization decoder dictionary

path_to_decoder_dictionary_file: str = ""
Keywords Group

Keywords Group: SQL

Setting the value to „SQL” will read the dictionary keywords assigned to the SQL group.

keywords_group="SQL"

Keywords Group: TSQL

Setting the value to „TSQL” will read the dictionary keywords assigned to the TSQL group.

keywords_group="TSQL"

Keywords Group: PLSQL

Setting the value to „PLSQL” will read the dictionary keywords assigned to the PLSQL group.

keywords_group="PLSQL"

Keywords Group: MySQL

Setting the value to „MySQL” will read the dictionary keywords assigned to the MySQL group.

keywords_group="MySQL"

Keywords Group: DAX

Setting the value to „DAX” will read the dictionary keywords assigned to the DAX group.

keywords_group="DAX"

Keywords Group: CUSTOM_ONLY

Setting the value „CUSTOM_ONLY” will load the dictionary keywords assigned to the custom group created by the user

keywords_group="CUSTOM_ONLY"
QueryAnonymizer CaseStudy Placeholder

featured case studies

Safeguarding Data in the Age of AI Title

In the digital era, AI models like ChatGPT have revolutionized how we interact with technology. However, this advancement brings a critical concern: data safety.

This article delves into these risks and the importance of using tools like QueryAnonymizer.

Read this article
QueryAnonymizer Logo Hash

If you use our query anonymization solution or have developed your own solution using our library, post about it and tag us with our #QueryAnonymizer

Frequently Asked Questions (FAQ)

Why was QueryAnonymizer created and how does it work?

The library allows you to replace sensitive data, such as user ids, emails, or personally identifiable information, with placeholder values. This ensures that sensitive information is not exposed when sharing queries.

Is QueryAnonymizer free? Are there any paid features?

Yes, QueryAnonymizer is a free Python library and we will never expect any payment from you for using it. However, if you want to support the DataTeam developers who created this library, you can buy us a coffee (or a few :D) using THIS LINK

How does QueryAnonymizer allow me to prevent SQL Injection?

By anonymizing queries, QueryAnonymizer helps prevent SQL injection attacks. It sanitizes input values and ensures that user-supplied data cannot be maliciously interpreted as part of the SQL query.

What benefit do I gain by anonymizing my SQL queries?

By anonymizing queries, you reduce the likelihood of unauthorized access or misuse of data, enhancing the overall security posture of your application or system.

Will using QueryAnonymizer make my security team happy?

Yes. Your security team will be very happy 😀 but remember to always comply with the internal policy of the company you work for.

Why it is important to anonymize my SQL query before i copy it to ChatGPT?

Anonymizing SQL queries before sharing them is critical for protecting sensitive information, preventing security breaches, preserving privacy, mitigating the risk of attacks, and ensuring compliance with regulations.

How does QueryAnonymizer help me comply with regulations?

For applications subject to data privacy regulations like GDPR or HIPAA, anonymizing queries is crucial for compliance. QueryAnonymizer library assists in anonymizing queries to ensure that sensitive data is handled in accordance with regulatory requirements.

Watch

drop us a line!

QueryAnonymizer Logo B&W

Hey there! Are you looking to share your thoughts, seek advice, or explore potential collaborations? Our contact form is your direct line to us. Feel free to drop us a line about anything – from inquiries about our services to suggestions for improvement.

Fill out the contact form and let’s kickstart a conversation that could lead to great things. We’re eagerly awaiting your message!

Mariusz Cieciura & Mateusz Cieciura
DataTeam.pl






    QueryAnonymizer Logo Color

    © 2024 | DataTeam | All rights reserved

    www.datateam.pl  |  Szczecin, Poland