Analyze CSV data for null counts, unique values, and inferred data types
CSV (Comma-Separated Values) is one of the most common data exchange formats. Analyzing CSV data helps you understand data quality, identify issues, and plan data transformations before loading into databases or analytics systems.
Null (missing) values indicate incomplete data. High null percentages may indicate:
Rule of thumb: Columns with >50% nulls are often candidates for removal or special handling.
The number of unique values reveals the cardinality of a column:
Cardinality affects indexing strategies and database performance.
The analyzer attempts to infer the data type based on the values:
Quality is assessed based on null percentage:
Can be represented as empty strings, "NULL", "NA", "N/A", or simply blank cells. Always normalize null representations when cleaning data.
A column might contain mixed types (e.g., numbers and text). This causes problems when importing to databases that require consistent types.
Special characters may display incorrectly if the file encoding doesn't match the reader. Use UTF-8 encoding when possible.
id,name,age,email,status 1,John,25,john@example.com,active 2,Jane,30,,inactive 3,Bob,35,bob@example.com,active 4,Alice,,alice@example.com,active 5,Charlie,40,charlie@example.com,