# AI Data Wrangler Overview

### Overview

**What is the AI Data Wrangler?**\
The **Osmos AI Data Wrangler** is an autonomous data agent that transforms your messiest, most irregular files into clean, structured data—hands-free. It’s purpose-built to automate the wrangling of complex file formats found across SharePoint, FTPs, GDrive, and more, enabling faster, more reliable decision-making without scaling up your data engineering team.

Currently available within **Microsoft Fabric**, the AI Data Wrangler helps organizations prepare lakehouse data with precision and minimal effort by leveraging generative AI.

### 🚀 What It Is

The AI Data Wrangler uses GenAI to intelligently and autonomously clean and reshape messy files into structured, SQL-ready datasets. Whether it's inconsistent Excel exports, broken PDFs, or fixed-width legacy system files, the Wrangler selects the most effective processing strategy—writing custom code or chunking through LLMs—so you don’t have to.

No rules. No templates. No manual rework.

### 👥 Who It’s For

* **Primary User Persona:** Business, Operations, and Data Services Teams

### 🎯 Key Use Cases

* Preparing messy source files to deliver SQL-ready data for downstream analytics&#x20;
* Wrangling input from:
  * Excel files with inconsistent headers and merged rows
  * PDFs with embedded or unstructured data
  * Fixed-width or custom-delimited exports
  * “Not really” CSVs from legacy tools
* Mapping irregular data to a standardized schema (e.g., customer master table)

### 🛠️ How It Works

#### 1. **Submit Your Files**

Upload messy files from sources like SharePoint, GDrive, FTPs, or internal systems. Supports a wide range of formats with irregular structure.

#### 2. **Provide Instructions—or Don’t**

You can:

* Point to a golden schema or a Fabric destination table
* Let the AI infer expectations from instructions, example files, or even code
* Use *Autoconfigure* to ingest prior docs and extract transformation logic

#### 3. **Leave the Dirty Work to Osmos**

The Wrangler decides:

* Whether to generate transformation code
* Whether to chunk and semantically analyze the file using LLMs
* How to best get your clean, validated tabular data

Each file is processed independently. No brittle code. No manual tuning.

#### 4. **Review, Approve, Repeat**

* Review outputs before committing
* Compare the output side-by-side with the input for validation
* Request changes or reprocess with new instructions
* Accept the result and move on

### 🧩 Key Capabilities

| Capability                | Description                                                               |
| ------------------------- | ------------------------------------------------------------------------- |
| **Fully Autonomous**      | AI decides optimal logic per file—LLM, code, or both                      |
| **Flexible File Support** | Handles PDFs, Excels, fixed-width, delimited, malformed CSVs, and more    |
| **Golden Schema Mapping** | Aligns source data with your lakehouse schema and business expectations   |
| **Instant Review Cycles** | See results in minutes, give feedback, or approve with a click            |
| **Built for Fabric**      | Seamlessly manages and prepares data in your Microsoft Fabric environment |

### ⚙️ AI Decision-Making Logic

The Wrangler processes each file independently and flexibly:

* Infers structure and formatting quirks
* Chooses between LLM chunking and custom code generation
* Validates results through in-process checks
* Supports multiple data types in a single run

The output is always **clean tabular data**, not reusable code, because messy files change constantly, and brittle code breaks.

### 🧪 Example Scenarios

| Input Scenario                               | Wrangler Outcome                                                           |
| -------------------------------------------- | -------------------------------------------------------------------------- |
| Broken Excel with multi-row headers          | Extracted proper columns, standardized formats, aligned to schema          |
| PDF invoices with nested info                | Parsed PO numbers, product descriptions, and quantities cleanly            |
| Fixed-width export with missing headers      | Inferred headers, extracted fields by position, produced structured output |
| Custom-delimited file with inconsistent rows | Detected delimiters, normalized row lengths, created clean flat file       |
| Semi-structured CSV with embedded fields     | Split merged fields into columns, matched values to categories             |

### Summary

The **Osmos AI Data Wrangler** turns unstructured, irregular data chaos into consistent, actionable insights—fast. With no need for templates or hand-written transformations, it autonomously learns what your data should look like and delivers results you can trust.

Whether you’re prepping data for analytics or just trying to get invoice PDFs into your lakehouse, the AI Data Wrangler is your hands-free, error-free solution.

-From chaos to clean in minutes.\
-Powered by generative AI.\
-Available now in Microsoft Fabric.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://agenticdocs.osmos.io/ai-data-agents-on-microsoft-fabric/ai-data-wrangler/ai-data-wrangler-overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
