Live
OpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling SoraOpenAI announces GPT-5 with unprecedented reasoning capabilitiesGoogle DeepMind achieves breakthrough in protein folding for rare diseasesEU passes landmark AI Safety Act with global implicationsAnthropic raises $7B as enterprise demand for Claude surgesMeta open-sources Llama 4 with 1T parameter modelNVIDIA unveils next-gen Blackwell Ultra chips for AI data centersApple integrates on-device AI across entire product lineupSam Altman testifies before Congress on AI regulation frameworkMistral AI reaches $10B valuation after Series C funding roundStability AI launches video generation model rivaling Sora
Tools

OpenAI Open-Sources 'Privacy Filter' — a 1.5B Parameter Model That Strips Personal Data From Text

OpenAI has released Privacy Filter under Apache 2.0 — a compact 1.5 billion parameter model that detects and redacts eight categories of PII including names, addresses, phone numbers, and API keys. The model runs locally on a laptop, processes 128K token contexts in a single pass, and is designed as a pre-processing layer before feeding sensitive text to larger AI models.

D.O.T.S AI Newsroom

D.O.T.S AI Newsroom

AI News Desk

5 min read
OpenAI Open-Sources 'Privacy Filter' — a 1.5B Parameter Model That Strips Personal Data From Text

OpenAI has released Privacy Filter, an open-source model designed to detect and redact personally identifiable information from text before it is processed by downstream AI systems. The model is compact — 1.5 billion parameters total, with only 50 million active parameters per inference request — and is designed to run locally on a laptop or directly in a browser without any cloud dependency. Privacy Filter is available under the Apache 2.0 license on both GitHub and Hugging Face, with commercial use explicitly permitted. The release addresses a practical problem that has limited enterprise adoption of AI tools: many organizations cannot send raw documents, emails, or customer records to cloud AI APIs because those texts contain PII that is subject to GDPR, HIPAA, CCPA, or other data protection requirements.

What It Detects

Privacy Filter is trained to identify eight categories of sensitive content: names, postal addresses, email addresses, phone numbers, URLs, dates, account numbers (including credit cards and social security numbers), and "other secrets" such as passwords, API keys, and authentication tokens. The model makes a single pass through the input text and labels each span that belongs to one of these categories — it does not generate new text, and it does not attempt to understand the semantic meaning of what it reads. This architecture makes it fast and predictable: a 128,000-token context window means it can process a long document or a substantial chat history in one operation, and the labeling approach produces structured output that downstream systems can act on programmatically rather than requiring text parsing.

Tunable Sensitivity

Users can adjust the model's sensitivity threshold to control the tradeoff between recall and precision. High-recall settings catch more PII but produce more false positives — flagging non-sensitive text as personal data. Conservative settings reduce false positives but risk missing some PII. For regulated industries, OpenAI recommends starting with high-recall settings and using human review to catch errors at the boundary. The model also supports fine-tuning on domain-specific datasets, which is important for industries with specialized PII patterns — healthcare records contain different sensitive-data structures than financial services documents, and a fine-tuned variant will outperform the base model in domain-specific deployments.

Honest About Limitations

OpenAI's documentation for Privacy Filter is notably candid about what the model cannot do. It does not provide a legal guarantee of anonymization or GDPR compliance — it is a technical tool, not a legal instrument. Known failure modes include reduced accuracy on rare or regionally uncommon names, false positives for well-known public figures and organizations, and degraded performance on non-English text or non-Latin scripts. For sensitive deployments in healthcare, law, finance, or human resources, OpenAI explicitly recommends maintaining human review alongside automated redaction. The label categories are also fixed at inference time — organizations that need custom PII categories (e.g., proprietary product codes or internal ID formats) must fine-tune the model rather than adjusting behavior through prompting. Despite these caveats, Privacy Filter fills a real gap in the open-source tooling ecosystem: a lightweight, locally-runnable PII redaction model with a permissive license is genuinely useful infrastructure for organizations building privacy-preserving AI pipelines.

Back to Home

Related Stories

Astropad's Workbench Turns a Mac Mini Into an AI Agent Server You Control From Your Phone
Tools

Astropad's Workbench Turns a Mac Mini Into an AI Agent Server You Control From Your Phone

Astropad, the company behind the Luna Display hardware that lets iPads function as Mac monitors, has built a new product for a new era: Workbench lets users remotely monitor and control AI agents running on Mac Minis from an iPhone or iPad. It is remote desktop software reimagined not for IT support but for the AI agent operator — the person who needs to check on autonomous workflows without being at their desk.

D.O.T.S AI Newsroom
Microsoft's Bing Team Open-Sources Harrier, a Multilingual Embedding Model That Tops the MTEB v2 Benchmark
Tools

Microsoft's Bing Team Open-Sources Harrier, a Multilingual Embedding Model That Tops the MTEB v2 Benchmark

Microsoft's Bing search team has released Harrier as an open-source embedding model, and it tops the multilingual MTEB v2 benchmark while supporting over 100 languages. The release is significant not just for the benchmark numbers but for the source: a search team that has spent decades optimizing retrieval systems has built an embedding model for the exact use case — semantic search and retrieval — that underpins most production RAG applications.

D.O.T.S AI Newsroom
Stability AI Pivots to Enterprise With Brand Studio — a Platform for Brand-Consistent AI Image Generation
Tools

Stability AI Pivots to Enterprise With Brand Studio — a Platform for Brand-Consistent AI Image Generation

Stability AI, the company that made open-source image generation mainstream with Stable Diffusion, is repositioning for enterprise with Brand Studio. The platform lets creative teams train brand-specific image models, automate visual production workflows, and route tasks to the best-suited AI model — a commercial play from a company that built its name on open access.

D.O.T.S AI Newsroom