Data Migration Blockers: The Pre-Flight Checklist
/ 6 min read
Table of Contents
Every HubSpot migration starts with optimism. The new CRM will be cleaner, the processes will be better, the team will love it. Then you receive the export files. The optimism fades rather quickly.
This article catalogues the blockers I’ve encountered across dozens of migrations from Dynamics, Salesforce, Marketo, and legacy systems that shall remain nameless to protect the innocent. Use it as a pre-flight checklist before your migration begins.
Critical Blockers
These issues will stop your migration entirely if not addressed. Ask me how I know.
1. CSV Format Issues
The Problem: Files exported from legacy systems often aren’t properly formatted CSVs. They look like CSVs. They have the .csv extension. They are lying.
Common issues:
- Semicolon delimiters instead of commas
- Data in a single column requiring parsing
- BOM characters () in headers
- Inconsistent quoting of text fields
- Line breaks within fields
Detection:
import csv
with open('contacts.csv', 'r', encoding='utf-8-sig') as f: reader = csv.reader(f) header = next(reader)
# Check if data is in single column if len(header) == 1 and ';' in header[0]: print("WARNING: Semicolon-delimited - needs conversion")
# Count columns print(f"Column count: {len(header)}")Solution: Convert to proper CSV before import. In Python:
import pandas as pd
df = pd.read_csv('export.csv', delimiter=';', encoding='utf-8-sig')df.to_csv('import.csv', index=False, encoding='utf-8')2. Missing Unique Identifiers
The Problem: HubSpot requires unique identifiers to match records. Without them, every row creates a new record.
| Object | Required Identifier |
|---|---|
| Contacts | Email address |
| Companies | Domain name |
| Deals | Record ID (for updates) |
| Custom Objects | Unique value property |
Detection: Check fill rates on identifier columns.
df = pd.read_csv('contacts.csv')
email_fill = df['email'].notna().mean() * 100print(f"Email fill rate: {email_fill:.1f}%")If contacts have <90% email fill rate, you’ll create orphaned records that can’t be deduplicated later.
Solution:
- For contacts without emails: consider whether they should be migrated
- For companies without domains: generate placeholder domains (companyname.placeholder.com) and flag for cleanup
- For custom objects: designate a unique identifier property before import
3. Missing Association Columns
The Problem: Your contact file has no way to link records to companies. Your deal file has no way to link to contacts.
HubSpot needs shared identifiers between files:
- Contact → Company: Company domain or company name
- Deal → Contact: Contact email
- Deal → Company: Company domain
Detection: Look for association columns in your exports.
Expected columns for contacts:
company_domainoraccount_domain→ for company associationcompany_nameoraccount_name→ fallback association
Expected columns for deals:
contact_email→ for contact associationcompany_domain→ for company association
Solution:
- If missing, extract associations from the source system separately
- Create association mapping files:
contact_id,company_id - Import in order: Companies → Contacts → Deals, using IDs from each import
4. Date/Time Format Inconsistencies
The Problem: Dates in formats HubSpot doesn’t recognise cause import failures or data corruption.
HubSpot accepts:
- ISO 8601:
2025-03-12T14:30:00Z - Simple date:
2025-03-12 - Unix timestamp (milliseconds):
1710252600000
Common problematic formats:
12/03/2025(ambiguous: MM/DD or DD/MM?)March 12, 202512-Mar-25
Detection:
date_columns = ['createdate', 'closedate', 'last_modified']
for col in date_columns: if col in df.columns: sample = df[col].dropna().head(5).tolist() print(f"{col}: {sample}")Solution: Standardise to ISO 8601 before import.
df['createdate'] = pd.to_datetime( df['createdate'], format='%d/%m/%Y' # Specify source format).dt.strftime('%Y-%m-%d')High-Risk Blockers
These won’t stop the import but will cause significant problems that you’ll discover at the worst possible moment.
5. Property Name Mismatches
The Problem: Source system uses emailaddress1, HubSpot expects email. Source uses firstname, HubSpot uses first_name.
Detection: Create a mapping document comparing source columns to HubSpot properties.
| Source Column | HubSpot Property | Status |
|---|---|---|
| emailaddress1 | ✅ Needs mapping | |
| firstname | firstname | ✅ Exact match |
| jobtitle | jobtitle | ✅ Exact match |
| company | - | ⚠️ Not a valid deal property |
Solution: Use HubSpot’s import mapping interface, or pre-rename columns:
column_mapping = { 'emailaddress1': 'email', 'company_name': 'company', 'mobilephone': 'mobilephone'}df.rename(columns=column_mapping, inplace=True)6. Dropdown Value Mismatches
The Problem: Your Lifecycle Stage values don’t match HubSpot’s options.
Source: Prospect, Customer, Churned
HubSpot: subscriber, lead, marketingqualifiedlead, salesqualifiedlead, opportunity, customer, evangelist, other
Detection: Extract unique values from dropdown columns.
for col in ['lifecycle_stage', 'lead_status', 'industry']: if col in df.columns: unique_values = df[col].dropna().unique() print(f"\n{col}:") for v in unique_values[:20]: print(f" - {v}")Solution: Either:
- Map source values to HubSpot options
- Create custom property options in HubSpot before import
- Use a calculated property post-migration
7. Owner Assignment Issues
The Problem: Owner fields reference source system user IDs that don’t exist in HubSpot.
Source: owner_id: USR00123
HubSpot: Expects user email or HubSpot owner ID
Detection: Check owner column values.
owners = df['owner'].dropna().unique()print(f"Unique owners: {len(owners)}")print(owners[:10])Solution: Create user mapping:
owner_mapping = {}
df['hubspot_owner_email'] = df['owner'].map(owner_mapping)Import using the email column, and HubSpot will resolve to owner IDs.
8. Multi-Select Property Values
The Problem: Multi-select values have inconsistent delimiters or formats.
Source: Technology; Healthcare; Finance
HubSpot expects: Technology;Healthcare;Finance (no spaces after semicolons)
Detection:
multi_select_cols = ['industries', 'products', 'interests']
for col in multi_select_cols: if col in df.columns: sample = df[col].dropna().head(3).tolist() print(f"{col}: {sample}")Solution: Standardise delimiters.
df['industries'] = df['industries'].str.replace('; ', ';')Data Quality Blockers
These won’t fail the import but will contaminate your CRM in ways that make everyone slightly miserable for years.
9. Duplicate Records
The Problem: Source system has duplicate contacts that will all import as separate records.
Detection:
duplicates = df[df.duplicated(subset=['email'], keep=False)]print(f"Duplicate emails: {len(duplicates)}")
# Sample duplicatesprint(duplicates.groupby('email').size().head(10))Solution: Deduplicate before import, or plan for post-import cleanup using HubSpot’s deduplication tools.
10. Empty Required Fields
The Problem: Records missing data in fields critical to your processes.
Detection:
required_fields = ['email', 'firstname', 'lastname', 'company']
for field in required_fields: if field in df.columns: fill_rate = df[field].notna().mean() * 100 empty_count = df[field].isna().sum() print(f"{field}: {fill_rate:.1f}% filled ({empty_count} empty)")Solution:
- For essential fields (email): exclude records without values
- For optional fields: accept the gap or enrich post-migration
11. Test and Junk Data
The Problem: Production exports include test records, spam entries, and internal accounts.
Common patterns:
- Emails:
@test.com,@example.com,@company.com(internal) - Names:
Test,Asdf,Delete Me - Companies:
Test Account,DO NOT USE
Detection:
test_patterns = ['@test.com', '@example.com', 'test@', 'asdf']
for pattern in test_patterns: matches = df[df['email'].str.contains(pattern, na=False, case=False)] if len(matches) > 0: print(f"'{pattern}': {len(matches)} records")Solution: Filter out test data before import.
test_domains = ['test.com', 'example.com', 'mailinator.com']
df['domain'] = df['email'].str.split('@').str[1]df_clean = df[~df['domain'].isin(test_domains)]The Pre-Migration Checklist
Run through this list before every migration:
File Structure
- CSV files are comma-delimited
- No BOM characters in headers
- Consistent encoding (UTF-8)
- No line breaks within fields
Identifiers
- Contacts have email addresses (>90%)
- Companies have domains or names
- Association columns exist between objects
Data Quality
- Dates in recognised format
- Dropdown values match HubSpot options
- Owner fields mappable to HubSpot users
- Multi-select delimiters standardised
- Duplicates identified and planned for
- Test data filtered out
Custom Properties
- Custom properties created in HubSpot before import
- Property types match data (number, date, text)
- Dropdown options created for all values
Import Order
- Companies first
- Contacts second (with company associations)
- Deals third (with contact/company associations)
- Custom objects last (with appropriate associations)
Key Takeaways
- Inspect your files before import - don’t assume exports are clean
- Calculate fill rates - know what data you’re actually getting
- Map associations explicitly - don’t rely on HubSpot’s automatic matching
- Standardise dates and dropdowns - the most common causes of data corruption
- Filter test data - it will pollute your reports forever
The best migrations are boring. They’re boring because the work happened upfront, in the data preparation, not during crisis management after a failed import. Boring is good. Aim for boring.
Planning a HubSpot migration and want a second opinion on your data? Connect with me on LinkedIn.