# PDF Upload Validation Issues - Solutions

## Problem
The server uses `pdftk` to validate PDFs, which can be very strict and reject valid PDFs. This causes upload failures even for legitimate PDF files.

## Root Cause
The `isPdf.sh` script uses `pdftk dump_data` command which:
- Requires the PDF to be fully parseable by pdftk
- Fails on password-protected PDFs (even if they open without password)
- Fails on certain PDF versions or features
- May fail due to file permissions or temporary file issues

## Solutions

### Frontend Solutions (Already Implemented)
1. ✅ Enhanced client-side validation (PDF header check)
2. ✅ Better error messages with troubleshooting steps
3. ✅ File size and MIME type validation

### Backend Solutions (Recommended)

#### Option 1: Add Fallback Validation (Recommended)
Modify `/var/www/html/pacmny-be/lib/Utility.php` to add a fallback validation method:

```php
static public function isPdf($fileName)
{
    // Primary validation with pdftk
    $command = Utility::getScriptsDirectory() . DIRECTORY_SEPARATOR . "isPdf.sh {$fileName}";
    $pipeHandle = popen($command,"r");
    $tf = trim(fgets($pipeHandle));
    
    if (strcmp($tf,'true') == 0) {
        return true;
    }
    
    // Fallback: Check file header and MIME type
    // Read first 4 bytes
    $handle = @fopen($fileName, "rb");
    if ($handle === false) {
        return false;
    }
    
    $header = @fread($handle, 4);
    @fclose($handle);
    
    // Check for PDF header
    if ($header === '%PDF') {
        // Additional check: verify file ends with %%EOF or has valid PDF structure
        $fileSize = @filesize($fileName);
        if ($fileSize > 100) { // Minimum valid PDF size
            // Read last 1024 bytes to check for PDF footer
            $handle = @fopen($fileName, "rb");
            @fseek($handle, max(0, $fileSize - 1024));
            $footer = @fread($handle, 1024);
            @fclose($handle);
            
            // PDF files should contain "%%EOF" near the end
            if (strpos($footer, '%%EOF') !== false) {
                // Log that we're using fallback validation
                error_log("PDF validation: pdftk failed but file appears valid (fallback used): " . $fileName);
                return true;
            }
        }
    }
    
    return false;
}
```

#### Option 2: Improve Error Reporting
Modify `/var/www/html/pacmny-be/controllers/OnlineController.php` to provide more diagnostic information:

```php
protected function uploadPdf($localFileName) {
    $tempFolder = Utility::createTemporaryFolder();
    $tempFile = $tempFolder . DIRECTORY_SEPARATOR . 'uploaded.pdf';
    
    // Check if file copy succeeded
    if (!@copy($localFileName, $tempFile)) {
        $error = error_get_last();
        return [
            'message' => 'invalidPdfMessage',
            'error' => 'File copy failed: ' . ($error['message'] ?? 'Unknown error')
        ];
    }
    
    @chmod($tempFile, 0777);
    
    // Check file exists and is readable
    if (!file_exists($tempFile) || !is_readable($tempFile)) {
        return [
            'message' => 'invalidPdfMessage',
            'error' => 'Temporary file not accessible'
        ];
    }
    
    // Try pdftk validation
    if (!Utility::isPdf($tempFile)) {
        // Log additional diagnostic info
        $fileSize = filesize($tempFile);
        $mimeType = mime_content_type($tempFile);
        
        error_log("PDF validation failed: file={$tempFile}, size={$fileSize}, mime={$mimeType}");
        
        return ['message' => 'invalidPdfMessage'];
    }
    
    // ... rest of upload logic
}
```

#### Option 3: Use Alternative PDF Validator
Modify `/var/www/html/pacmny-be/scripts/isPdf.sh` to try multiple validation methods:

```bash
#!/bin/sh -x

SCRIPT=$(readlink -f "$0")
SCRIPT_PATH=$(dirname "$SCRIPT")
. $SCRIPT_PATH/ROOTS.sh

file=$1

# Method 1: Try pdftk
RESULT=`$COMMAND_PREFIX "pdftk $file dump_data > /dev/null 1>/dev/null && echo yes" 2>/dev/null`

if [ "$RESULT" = "yes" ]
then
    echo true
    exit 0
fi

# Method 2: Check file header
HEADER=`head -c 4 "$file" 2>/dev/null`
if [ "$HEADER" = "%PDF" ]
then
    # Method 3: Check for PDF footer
    FOOTER=`tail -c 1024 "$file" 2>/dev/null | grep -q "%%EOF" && echo yes`
    if [ "$FOOTER" = "yes" ]
    then
        echo true
        exit 0
    fi
fi

echo false
```

#### Option 4: Use PHP's Built-in PDF Validation
Add a PHP-based validation as a fallback:

```php
static public function isPdf($fileName)
{
    // Try pdftk first
    $command = Utility::getScriptsDirectory() . DIRECTORY_SEPARATOR . "isPdf.sh {$fileName}";
    $pipeHandle = popen($command,"r");
    $tf = trim(fgets($pipeHandle));
    
    if (strcmp($tf,'true') == 0) {
        return true;
    }
    
    // Fallback: PHP-based validation
    if (!file_exists($fileName) || !is_readable($fileName)) {
        return false;
    }
    
    $handle = @fopen($fileName, "rb");
    if ($handle === false) {
        return false;
    }
    
    // Check header
    $header = @fread($handle, 4);
    if ($header !== '%PDF') {
        @fclose($handle);
        return false;
    }
    
    // Check file size
    $fileSize = @filesize($fileName);
    if ($fileSize < 100) {
        @fclose($handle);
        return false;
    }
    
    // Check for PDF version
    @fseek($handle, 0);
    $firstBytes = @fread($handle, min(1024, $fileSize));
    @fclose($handle);
    
    if (!preg_match('/%PDF-\d\.\d/', $firstBytes)) {
        return false;
    }
    
    // Check for PDF footer (%%EOF)
    $handle = @fopen($fileName, "rb");
    @fseek($handle, max(0, $fileSize - 2048));
    $lastBytes = @fread($handle, 2048);
    @fclose($handle);
    
    if (strpos($lastBytes, '%%EOF') === false) {
        return false;
    }
    
    // File appears to be a valid PDF
    error_log("PDF validation: Using PHP fallback for: " . $fileName);
    return true;
}
```

### User Workarounds (Temporary)
1. **Print to PDF**: Use system's "Print to PDF" feature
2. **Re-save with Adobe**: Open in Adobe Acrobat and save
3. **Online Converter**: Use SmallPDF, ILovePDF to convert
4. **Remove Security**: Check PDF properties and remove restrictions
5. **Lower Quality**: For scanned PDFs, re-scan at lower DPI

## Recommended Implementation Order
1. **Immediate**: Implement Option 1 (Fallback Validation) - Low risk, high impact
2. **Short-term**: Implement Option 2 (Better Error Reporting) - Helps diagnose issues
3. **Long-term**: Consider Option 3 or 4 for more robust validation

## Testing
After implementing, test with:
- Simple text PDFs
- Scanned PDFs
- PDFs with embedded fonts
- PDFs created by different tools (Word, Google Docs, etc.)
- Password-protected PDFs (should still fail, but with better error)
