QueryAnonymizer is a Python library that allows you to anonymize SQL and DAX queries. The library consists of 2 simple functions that will allow you to effectively anonymize any sensitive data from the queries you are working on before copying it to any online tool you would like to use in your work.
Current Version: 1.1.4
Last Change: 03/2024
PyPI: QueryAnonymizer
SECTION SUMMARY
Learn how to
– Install QueryAnonymizer
– Update QueryAnonymizer
installation
Install QueryAnonymizer library
Binary installers for the latest released version are available at the Python Package Index (PyPI). You can download the latest version of QueryAnonymizer library on your device using pip instruction by entering the code below in the command line:
The query anonymization library is constantly updated. If you have already downloaded the library to your device and want to update it to the latest version, just enter the following code in the command line:
The source code is currently hosted on GitHub
Featured DataTeam Solution:
We’ve crafted a deployable Python script tailored to run QueryAnonymizer seamlessly as an application on your device! This way, you can dedicate your focus to the task at hand without the hassle of dealing with Python code!
SECTION SUMMARY
Learn how to
– Anonymize SQL Query
first demo | Anonymization
SQL Query Anonymization
QueryAnonymizer is a Python library that allows you to anonymize SQL and DAX queries. The following code uses the “anonymize” function to anonymize the sample SQL query contained in the “my_query” parameter. After anonymization, QueryAnonymizer will return the anonymization result – the anonymized query from the parameter: “my_query” and the deanonymization dictionary library to the terminal. Additionally, your anonymized query will be saved in the anonymized_query variable, and the anonymization dictionary will be saved in the decoder_dictionary variable, which is an essential element required during deanonymization, i.e. decoding a previously anonymized query.
from queryanonymizer import anonymize
my_query = """
SELECT
order_id,
customer_id,
order_date,
employee_id,
first_name,
last_name
FROM
orders
WHERE
order_date >= '2023-01-01'
AND customer_id > 148615
AND order_id between 1641 and 2567
and first_name in ('Tom','Seth')
and last_name like "%ski"
"""
anonymize(query=my_query)
provided by DataTeam.plYou can change the value of the “my_query” parameter to your own SQL query and then check how your query has been anonymized. By default, QueryAnonymizer anonymizes all query elements that do not belong to the keywords of the anonymized language. This means that by default all texts, numbers and dates contained in the query will be anonymized. This behavior can be changed using the parameters described in the next example.
SECTION SUMMARY
Learn how to
– Anonymize SQL Query
– Deanonymize SQL Query
– Use Custom Parameteres
- anonymize dates
- anonymize numbers
- anonymize strings
– Use Decoder Dictionary File
Second demo | Anonymization / Deanonymization
SQL Query Anonymization with Custom Anonymization Parameters
Each query may be slightly different, which is why parameterizing the way QueryAnonymizer works is so important. The following example anonymizes the sample SQL query (from the previous example), but the code below has been supplemented with four parameters (described below). Test this code to understand how the parameters work.
from queryanonymizer import anonymize
my_query = """
SELECT
order_id,
customer_id,
order_date,
employee_id,
first_name,
last_name
FROM
orders
WHERE
order_date >= '2023-01-01'
AND customer_id > 148615
AND order_id between 1641 and 2567
and first_name in ('Tom','Seth')
and last_name like "%ski"
"""
anonymized_query = anonymize(query=my_query
, anonymize_dates=False
, anonymize_numbers=False
, anonymize_strings_inside_quotation_marks=True
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plThe above code returns an anonymized query contained in the “my_query” parameter, but the result of this anonymization is different from the result of the previous example. The main difference is the exclusion of dates and numbers from anonymization and the inclusion of characters inside quotation marks in full anonymization. All parameters available in the QueryAnonymizer library can be found described: HERE.
The second change that may be unnoticeable at first is the creation of the decoder_dictionary.json file. This file is an essential element of the QueryAnonymizer, which will be used to deanonymize queries in the next step. The decoder_dictionary.json file contains a dictionary of the original query elements and their anonymized equivalents.
To deanonymize a query that was previously anonymized with QueryAnonymizer, use the “deanonymize” function. The following code uses the file: decoder_dictionary.json to decode the query that you should paste inside the parameter: anonymized_text
from queryanonymizer import deanonymize
anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plAfter deanonymization is completed, you will see in the console the query that has been “decoded” according to the previously saved decoder dictionary.
QueryAnonymizer is and always will be free, but if you want, you can support the development of this project with a one-time donation. Like what we do? Buy us a coffee!
SECTION SUMMARY
Learn how to
– Anonymize SQL from .sql file
– Deanonymize SQL from .txt file
– Use Custom Parameteres
- Path To Query File
- Path To Anonymized File
– Use Decoder Dictionary File
third demo | anonymize / deanonymize
SQL Query Anonymization from .sql file
to .txt file
QueryAnonymizer also includes parameters that allow you to work with files that you have saved locally on your device. In the example below, you will no longer find the “my_query” parameter because the query entered using this parameter has been replaced by specifying the path to the file that contains the original query. In the example below, the SQL query you want to anonymize is in a file called “query.sql”. The file from which QueryAnonymizer is to download the query for anonymization is indicated using the parameter: “path_to_query_file”.
from queryanonymizer import anonymize
anonymize( path_to_query_file="query.sql"
, path_to_anonymized_file="anonymized_file.txt"
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plThe above code also includes the parameter: “path_to_anonymized_file” with which you enter the path to the file in which the query is to be saved after the anonymization process. In the case of the above code, the query after encoding will be saved in a file called: “anonymized_file.txt”. Of course, the code also includes the “path_to_decoder_dictionary_file” parameter, which was discussed in the previous example and will be used to deanonymize the query.
In the following case, the deanonymization process is based on a reference to the appropriate files that were created during the anonymization process. The following code uses the parameter: “path_to_anonymized_file” which indicates the path to the file containing the query you want to anonymize. Note that in the example below this file is named: “answer_file.txt” and it was not automatically generated during anonymization. The “answer_file.txt” file is a text file into which you will paste your anonymized query after formatting it or working with AI tools.
from queryanonymizer import deanonymize
deanonymize(path_to_anonymized_file='answer_file.txt'
, path_to_deanonymized_file='deanonymized_file.txt'
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plAs in the case of anonymization in this example, the deanonymization result is not printed in the terminal and instead is saved to the “deanonymized_file.txt” file. The path to the file in which the deanonymization results are to be saved is indicated using the parameter: “path_to_deanonymized_file”
SECTION SUMMARY
Learn how to
– Anonymize SQL Query
– Deanonymize SQL Query
– Use Decoder Dictionary File
– Use <PROMPT></PROMPT>
fourth demo | Using <PROMPT> tag
SQL Query Anonymization with ChatGPT prompt between queries
QueryAnonymizer was created mainly to enable and facilitate working with ChatGPT during query development. One of the additional functions apart from anonymization is the ability to introduce a prompt that describes how ChatGPT should process the submitted query. The following example shows the use of the <PROMPT> tag directly in the “my_query” parameter. Note that the “my_Query” parameter in the ChataGPT prompts has all the information you might want to pass on to an AI tool.
Each prompt you want to add to the query should be enclosed between <PROMPT> and </PROMPT> tags. The text contained in these tags will be anonymized, but only to the extent of the attached SQL query. This means that if you refer to an SQL query element in the prompt, only the elements of that query will be anonymized.
from queryanonymizer import anonymize
my_query = """
<PROMPT>Combine those two queries to get one SELECT statement with all columns</PROMPT>
SELECT
order_id,
customer_id,
order_date,
employee_id,
product_id
FROM
orders
<PROMPT>The following query connects to the above using the product_id field</PROMPT>
SELECT
product_id,
product_name,
product_category,
product_price
FROM
products
"""
anonymize(query=my_query
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plExecute the above code and see how QueryAnonymizer encoded all the elements. Enter ChatGPT and paste the entire “my_query” anonymization result that was returned to the terminal.
Using the <PROMPT> tag makes sense primarily if you expect ChatGPT to write an SQL query for you from the code elements you provide it. You can also use this tag to add comments that you want to include in the ChatGPT prompt so that it better understands what action and what result you expect from it.
from queryanonymizer import deanonymize
anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plDeanonymization works in the same way as in the previous examples already described on this page. Paste the content of the ChataGPT response into the “anonymized_text” parameter and then execute the code to perform deanonymization.
If you use our query anonymization solution or have developed your own solution using our library, post about it and tag us with our #QueryAnonymizer
SECTION SUMMARY
Learn how to
– Anonymize DAX Code
– Deanonymize DAX Code
– Use Keywords Group
– Use Decoder Dictionary File
Fifth demo | Anonymize DAX
DAX Code Anonymization
QueryAnonymizer allows you to anonymize not only SQL queries but also DAX code. The principle of operation of the script remains unchanged, and all parameters described in the previous examples (and in the parameters section) also apply to anonymizing the DAX code. The anonymization language is selected using the “keywords_group” parameter.
Try the code from the example below. Note that the value of the “my_query” parameter is a DAX query, and the parameters section also consists of active parameters from the “anonymize_string_inside…” group, which we recommend to always be set to TRUE when anonymizing DAX code.
from queryanonymizer import anonymize
my_query = """
DEFINE
MEASURE 'Internet Sales'[Internet Total Sales] =
SUM ( 'Internet Sales'[Sales Amount] )
EVALUATE
SUMMARIZECOLUMNS (
'Date'[Calendar Year],
TREATAS (
{
2013,
2014
},
'Date'[Calendar Year]
),
"Total Sales", [Internet Total Sales],
"Combined Years Total Sales",
CALCULATE (
[Internet Total Sales],
ALLSELECTED ( 'Date'[Calendar Year] )
)
)
ORDER BY [Calendar Year]
"""
anonymize(query=my_query
, keywords_group="DAX"
, anonymize_strings_inside_square_brackets=True
, anonymize_strings_inside_apostrophes=True
, anonymize_strings_inside_quotation_marks=True
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plDeanonymizing previously anonymized DAX code is done in the same way as deanonymizing SQL code. When deanonymizing, you no longer need to enter the “keywords_group” parameter – you only need to indicate the code to be deanonymized.
from queryanonymizer import deanonymize
anonymized_text = """
// Copy Your anonymized query HERE
"""
deanonymize(anonymized_text
, path_to_decoder_dictionary_file="decoder_dictionary.json")
provided by DataTeam.plDAX code anonymization is useful when working with GenAI tools such as ChatGPT, code formatting tools such as DAX Formatter, or when using public forums to get help with a complex problem. Remember that anonymization only encodes sensitive data, the query/code structure remains unchanged, so any tools you would like to use to format or develop your code will still recognize this structure and will be able to work without problems.
SECTION SUMMARY
Learn how to
– Use functions parameters
– Use Anonymize Parameters
– Use DeAnonymize Parameters
– Use KeyWords
PARAMETERS
QueryAnonymizer Parameters
QueryAnonymizer is a library that consists of three basic functions: anonymize, deanonymize and keywords. The default operation of each of these functions can be changed using the parameters described below. Using each parameter will change the logic of QueryAnonymizer, so read the description below before making changes to your code.
Click on the selected function below to expand the list of all available parameters:
featured case studies
In the digital era, AI models like ChatGPT have revolutionized how we interact with technology. However, this advancement brings a critical concern: data safety.
This article delves into these risks and the importance of using tools like QueryAnonymizer.
If you use our query anonymization solution or have developed your own solution using our library, post about it and tag us with our #QueryAnonymizer
Frequently Asked Questions (FAQ)
Why was QueryAnonymizer created and how does it work?
The library allows you to replace sensitive data, such as user ids, emails, or personally identifiable information, with placeholder values. This ensures that sensitive information is not exposed when sharing queries.
Is QueryAnonymizer free? Are there any paid features?
Yes, QueryAnonymizer is a free Python library and we will never expect any payment from you for using it. However, if you want to support the DataTeam developers who created this library, you can buy us a coffee (or a few :D) using THIS LINK
How does QueryAnonymizer allow me to prevent SQL Injection?
By anonymizing queries, QueryAnonymizer helps prevent SQL injection attacks. It sanitizes input values and ensures that user-supplied data cannot be maliciously interpreted as part of the SQL query.
What benefit do I gain by anonymizing my SQL queries?
By anonymizing queries, you reduce the likelihood of unauthorized access or misuse of data, enhancing the overall security posture of your application or system.
Will using QueryAnonymizer make my security team happy?
Yes. Your security team will be very happy 😀 but remember to always comply with the internal policy of the company you work for.
Why it is important to anonymize my SQL query before i copy it to ChatGPT?
Anonymizing SQL queries before sharing them is critical for protecting sensitive information, preventing security breaches, preserving privacy, mitigating the risk of attacks, and ensuring compliance with regulations.
How does QueryAnonymizer help me comply with regulations?
For applications subject to data privacy regulations like GDPR or HIPAA, anonymizing queries is crucial for compliance. QueryAnonymizer library assists in anonymizing queries to ensure that sensitive data is handled in accordance with regulatory requirements.
drop us a line!
Hey there! Are you looking to share your thoughts, seek advice, or explore potential collaborations? Our contact form is your direct line to us. Feel free to drop us a line about anything – from inquiries about our services to suggestions for improvement.
Fill out the contact form and let’s kickstart a conversation that could lead to great things. We’re eagerly awaiting your message!
Mariusz Cieciura & Mateusz Cieciura
DataTeam.pl
© 2024 | DataTeam | All rights reserved
www.datateam.pl | Szczecin, Poland