Michael Anthony PRO
MikeDoes
AI & ML interests
Privacy, Large Language Model, Explainable
Recent Activity
reacted
to
their
post
with ๐
1 day ago
Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes."
While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization.
To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples.
We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately.
This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work!
๐ Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
๐ Stay updated on the latest in privacy-preserving AIโfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
posted
an
update
1 day ago
Can you teach a giant like Google's Gemini to protect user privacy? A new step-by-step guide shows that the answer is a resounding "yes."
While powerful, large language models aren't specialized for privacy tasks. This tutorial by Analytics Vidhya walks through how to fine-tune Gemini into a dedicated tool for PII anonymization.
To teach the model this critical skill, the author needed a robust dataset with thousands of clear 'before' and 'after' examples.
We're thrilled they chose the Ai4Privacy pii-masking-200k dataset for this task. Our data provided the high-quality, paired examples of masked and unmasked text necessary to effectively train Gemini to identify and hide sensitive information accurately.
This is a perfect example of how the community can use open-source data to add a crucial layer of safety to the world's most powerful models. Great work!
๐ Check out the full tutorial here: https://www.analyticsvidhya.com/blog/2024/03/guide-to-fine-tuning-gemini-for-masking-pii-data/
๐ Stay updated on the latest in privacy-preserving AIโfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
#DataPrivacy #AI #LLM #FineTuning #Anonymization #GoogleGemini #Ai4Privacy #World's largest open privacy masking dataset
reacted
to
their
post
with ๐
2 days ago
What if an AI agent could be tricked into stealing your data, just by reading a tool's description? A new paper reports it's possible.
The "Attractive Metadata Attack" paper details this stealthy new threat. To measure the real-world impact of their attack, the researchers needed a source of sensitive data for the agent to leak. We're proud that the AI4Privacy corpus was used to create the synthetic user profiles containing standardized PII for their experiments.
This is a perfect win-win. Our open-source data helped researchers Kanghua Mo, ้พๆฑไธ, Zhihao Li from Guangzhou University and The Hong Kong Polytechnic University to not just demonstrate a new attack, but also quantify its potential for harm. This data-driven evidence is what pushes the community to build better, execution-level defenses for AI agents.
๐ Check out their paper to see how easily an agent's trust in tool metadata could be exploited: https://arxiv.org/pdf/2508.02110
#OpenSource
#DataPrivacy
#LLM
#Anonymization
#AIsecurity
#HuggingFace
#Ai4Privacy
#Worldslargestopensourceprivacymaskingdataset