Skip to content

KeyError: "There is no item named 'word/document.xml' in the archive" #1513

@eromoe

Description

@eromoe

I got docx files on windows, copy to mac and find docx can't parse the file.

use below code to check, find the path is not correct:

import zipfile
from pathlib import Path

def check_docx_structure(file_path):
    try:
        with zipfile.ZipFile(file_path, 'r') as zip_file:
            file_list = zip_file.namelist()
            print("content:", file_list)
            
            required_files = ['word/document.xml', '[Content_Types].xml']
            missing_files = [f for f in required_files if f not in file_list]
            
            if missing_files:
                print(f"lost: {missing_files}")
                return False
            return True
    except zipfile.BadZipFile:
        print("文件不是有效的ZIP/DOCX格式")
        return False
content: ['word\\footer1.xml', '_rels/.rels', 'docProps\\app.xml', 'docProps\\core.xml', 'word\\document.xml', 'word\\_rels\\document.xml.rels', 'word\\fontTable.xml', 'word\\settings.xml', 'word\\styles.xml', 'word\\webSettings.xml', 'word\\theme\\theme1.xml', '[Content_Types].xml'] 
lost: ['word/document.xml']

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions