Data Migration Blockers: The Pre-Flight Checklist • san.is

Every HubSpot migration starts with optimism. The new CRM will be cleaner, the processes will be better, the team will love it. Then you receive the export files. The optimism fades rather quickly.

This article catalogues the blockers I’ve encountered across dozens of migrations from Dynamics, Salesforce, Marketo, and legacy systems that shall remain nameless to protect the innocent. Use it as a pre-flight checklist before your migration begins.

Critical Blockers

These issues will stop your migration entirely if not addressed. Ask me how I know.

1. CSV Format Issues

The Problem: Files exported from legacy systems often aren’t properly formatted CSVs. They look like CSVs. They have the .csv extension. They are lying.

Common issues:

Semicolon delimiters instead of commas
Data in a single column requiring parsing
BOM characters () in headers
Inconsistent quoting of text fields
Line breaks within fields

Detection:

import csv

with open('contacts.csv', 'r', encoding='utf-8-sig') as f:
    reader = csv.reader(f)
    header = next(reader)

    # Check if data is in single column
    if len(header) == 1 and ';' in header[0]:
        print("WARNING: Semicolon-delimited - needs conversion")

    # Count columns
    print(f"Column count: {len(header)}")

Solution: Convert to proper CSV before import. In Python:

import pandas as pd

df = pd.read_csv('export.csv', delimiter=';', encoding='utf-8-sig')
df.to_csv('import.csv', index=False, encoding='utf-8')

2. Missing Unique Identifiers

The Problem: HubSpot requires unique identifiers to match records. Without them, every row creates a new record.

Object	Required Identifier
Contacts	Email address
Companies	Domain name
Deals	Record ID (for updates)
Custom Objects	Unique value property

Detection: Check fill rates on identifier columns.

df = pd.read_csv('contacts.csv')

email_fill = df['email'].notna().mean() * 100
print(f"Email fill rate: {email_fill:.1f}%")

If contacts have <90% email fill rate, you’ll create orphaned records that can’t be deduplicated later.

Solution:

For contacts without emails: consider whether they should be migrated
For companies without domains: generate placeholder domains (companyname.placeholder.com) and flag for cleanup
For custom objects: designate a unique identifier property before import

3. Missing Association Columns

The Problem: Your contact file has no way to link records to companies. Your deal file has no way to link to contacts.

HubSpot needs shared identifiers between files:

Contact → Company: Company domain or company name
Deal → Contact: Contact email
Deal → Company: Company domain

Detection: Look for association columns in your exports.

Expected columns for contacts:

company_domain or account_domain → for company association
company_name or account_name → fallback association

Expected columns for deals:

contact_email → for contact association
company_domain → for company association

Solution:

If missing, extract associations from the source system separately
Create association mapping files: contact_id,company_id
Import in order: Companies → Contacts → Deals, using IDs from each import

4. Date/Time Format Inconsistencies

The Problem: Dates in formats HubSpot doesn’t recognise cause import failures or data corruption.

HubSpot accepts:

ISO 8601: 2025-03-12T14:30:00Z
Simple date: 2025-03-12
Unix timestamp (milliseconds): 1710252600000

Common problematic formats:

12/03/2025 (ambiguous: MM/DD or DD/MM?)
March 12, 2025
12-Mar-25

Detection:

date_columns = ['createdate', 'closedate', 'last_modified']

for col in date_columns:
    if col in df.columns:
        sample = df[col].dropna().head(5).tolist()
        print(f"{col}: {sample}")

Solution: Standardise to ISO 8601 before import.

df['createdate'] = pd.to_datetime(
    df['createdate'],
    format='%d/%m/%Y'  # Specify source format
).dt.strftime('%Y-%m-%d')

High-Risk Blockers

These won’t stop the import but will cause significant problems that you’ll discover at the worst possible moment.

5. Property Name Mismatches

The Problem: Source system uses emailaddress1, HubSpot expects email. Source uses firstname, HubSpot uses first_name.

Detection: Create a mapping document comparing source columns to HubSpot properties.

Source Column	HubSpot Property	Status
emailaddress1	email	✅ Needs mapping
firstname	firstname	✅ Exact match
jobtitle	jobtitle	✅ Exact match
company	-	⚠️ Not a valid deal property

Solution: Use HubSpot’s import mapping interface, or pre-rename columns:

column_mapping = {
    'emailaddress1': 'email',
    'company_name': 'company',
    'mobilephone': 'mobilephone'
}
df.rename(columns=column_mapping, inplace=True)

The Problem: Your Lifecycle Stage values don’t match HubSpot’s options.

Source: Prospect, Customer, Churned HubSpot: subscriber, lead, marketingqualifiedlead, salesqualifiedlead, opportunity, customer, evangelist, other

Detection: Extract unique values from dropdown columns.

for col in ['lifecycle_stage', 'lead_status', 'industry']:
    if col in df.columns:
        unique_values = df[col].dropna().unique()
        print(f"\n{col}:")
        for v in unique_values[:20]:
            print(f"  - {v}")

Solution: Either:

Map source values to HubSpot options
Create custom property options in HubSpot before import
Use a calculated property post-migration

7. Owner Assignment Issues

The Problem: Owner fields reference source system user IDs that don’t exist in HubSpot.

Source: owner_id: USR00123 HubSpot: Expects user email or HubSpot owner ID

Detection: Check owner column values.

owners = df['owner'].dropna().unique()
print(f"Unique owners: {len(owners)}")
print(owners[:10])

Solution: Create user mapping:

owner_mapping = {
    'USR00123': '[email protected]',
    'USR00456': '[email protected]'
}

df['hubspot_owner_email'] = df['owner'].map(owner_mapping)

Import using the email column, and HubSpot will resolve to owner IDs.

8. Multi-Select Property Values

The Problem: Multi-select values have inconsistent delimiters or formats.

Source: Technology; Healthcare; Finance HubSpot expects: Technology;Healthcare;Finance (no spaces after semicolons)

Detection:

multi_select_cols = ['industries', 'products', 'interests']

for col in multi_select_cols:
    if col in df.columns:
        sample = df[col].dropna().head(3).tolist()
        print(f"{col}: {sample}")

Solution: Standardise delimiters.

df['industries'] = df['industries'].str.replace('; ', ';')

Data Quality Blockers

These won’t fail the import but will contaminate your CRM in ways that make everyone slightly miserable for years.

9. Duplicate Records

The Problem: Source system has duplicate contacts that will all import as separate records.

Detection:

duplicates = df[df.duplicated(subset=['email'], keep=False)]
print(f"Duplicate emails: {len(duplicates)}")

# Sample duplicates
print(duplicates.groupby('email').size().head(10))

Solution: Deduplicate before import, or plan for post-import cleanup using HubSpot’s deduplication tools.

10. Empty Required Fields

The Problem: Records missing data in fields critical to your processes.

Detection:

required_fields = ['email', 'firstname', 'lastname', 'company']

for field in required_fields:
    if field in df.columns:
        fill_rate = df[field].notna().mean() * 100
        empty_count = df[field].isna().sum()
        print(f"{field}: {fill_rate:.1f}% filled ({empty_count} empty)")

Solution:

For essential fields (email): exclude records without values
For optional fields: accept the gap or enrich post-migration

11. Test and Junk Data

The Problem: Production exports include test records, spam entries, and internal accounts.

Common patterns:

Emails: @test.com, @example.com, @company.com (internal)
Names: Test, Asdf, Delete Me
Companies: Test Account, DO NOT USE

Detection:

test_patterns = ['@test.com', '@example.com', 'test@', 'asdf']

for pattern in test_patterns:
    matches = df[df['email'].str.contains(pattern, na=False, case=False)]
    if len(matches) > 0:
        print(f"'{pattern}': {len(matches)} records")

Solution: Filter out test data before import.

test_domains = ['test.com', 'example.com', 'mailinator.com']

df['domain'] = df['email'].str.split('@').str[1]
df_clean = df[~df['domain'].isin(test_domains)]

The Pre-Migration Checklist

Run through this list before every migration:

File Structure

CSV files are comma-delimited
No BOM characters in headers
Consistent encoding (UTF-8)
No line breaks within fields

Identifiers

Contacts have email addresses (>90%)
Companies have domains or names
Association columns exist between objects

Data Quality

Dates in recognised format
Dropdown values match HubSpot options
Owner fields mappable to HubSpot users
Multi-select delimiters standardised
Duplicates identified and planned for
Test data filtered out

Custom Properties

Custom properties created in HubSpot before import
Property types match data (number, date, text)
Dropdown options created for all values

Import Order

Companies first
Contacts second (with company associations)
Deals third (with contact/company associations)
Custom objects last (with appropriate associations)

Key Takeaways

Inspect your files before import - don’t assume exports are clean
Calculate fill rates - know what data you’re actually getting
Map associations explicitly - don’t rely on HubSpot’s automatic matching
Standardise dates and dropdowns - the most common causes of data corruption
Filter test data - it will pollute your reports forever

The best migrations are boring. They’re boring because the work happened upfront, in the data preparation, not during crisis management after a failed import. Boring is good. Aim for boring.

Planning a HubSpot migration and want a second opinion on your data? Connect with me on LinkedIn.