Broken file when uploading it in chunks via the API

Discussion in 'LiquidFiles General' started by Sven, Nov 8, 2018.

  1. Sven

    Sven New Member

    Joined:
    Nov 8, 2018
    Messages:
    2
    Likes Received:
    0
    Hi,
    I use the Attachment API to upload files to a LiquidFiles server. This so far works fine. For big files, however, I run into memory issues with my current implementation, which is why I wanted to change the upload process, to upload the file in chunks. Doing so, however, resulted in broken files being uploaded, where the problem lies, I believe, in the wrong charset being used.

    Correct:
    Code:
    %PDF-1.5
    %ÐÔÅØ
    90 0 obj
    [...]
    Broken:
    Code:
    %PDF-1.5
    %Ã<90>Ã<94>Ã<85>Ã<98>
    90 0 obj
    [...]
    My question now is, whether this could be an error on the LiquidFiles side. I use a perl script, which uses HTTP::Request::Common to do all the file processing, I don't do much at all. If I rebuild the content of the request and just write it to a file, everything is intact. I will provide a minimal example to upload files. Changing the variable $DYNAMIC_FILE_UPLOAD toggles the chunking. All suggestions are appreciated.

    Code:
    #!/usr/bin/perl -w
    
    use strict;
    
    use LWP::UserAgent qw( );
    use HTTP::Request::Common qw($DYNAMIC_FILE_UPLOAD);
    use MIME::Base64;
    
    # Here the chunking can be enabled, which breaks certain files
    $DYNAMIC_FILE_UPLOAD = 1;
    
    # Provide Data
    my $APIKey = 'API Key';
    my $LiquidFileServer = 'https://my.lfserver.com';
    my $File = '/path/to/file.ext';
    
    my $KeyEncoded = encode_base64( "$APIKey:x\n" );
    my $LFAuth = "Basic $KeyEncoded";
    
    my $ua = LWP::UserAgent->new();
    $ua->ssl_opts(verify_hostname => 0);
    $ua->default_header( Authorization => $LFAuth );
    
    # Upload File
    my $request = HTTP::Request::Common::POST(
       "$LiquidFileServer/attachments",
       Content_Type => 'form-data',
       Content => [
          Filedata => [ $File ],
       ],
    );
    
    my $response = $ua->request($request);
    
    my $AttachmentID = $response->content;
    
    # Get URL
    $response = $ua->post(
        "$LiquidFileServer/message",
        Content_Type => 'application/json',
        Content =>
    '{"message":
      {
        "recipients":["foo@bar.com"],
        "subject":"test subject",
        "message":"Please let me know what you think!",
        "send_email":false,
        "authorization":0,
        "attachments":["'.$AttachmentID.'"]
      }
    }'
    );
    
    $response->content =~ /url":"(.+)"/;
    print "DownloadURL: $1\n";
     
    #1 Sven, Nov 8, 2018
    Last edited: Nov 8, 2018
  2. Johan

    Johan Administrator
    Staff Member

    Joined:
    Dec 1, 2015
    Messages:
    39
    Likes Received:
    1
    I agree, it looks like perl is encoding the file upload data, which means that it's treating the file as text and it needs to treat the file as binary. If you run into memory issues it means that the script reads all of the file into memory and then uploads from there. I don't think switching to chunked uploads is going to change that. The script will likely read the entire file into memory and then try to upload in chunks and you'll get the same memory issues. I would switch to using curl for the actual file upload which is going to be faster and it's not going to load the entire file into the memory. You may be able to use Net::Curl from perl or just call curl as an external executable. You can also just use curl for the file upload part and grab the attachment ID's (https://man.liquidfiles.com/api/attachment_upload.html#html_uploads), and then use perl to send the message if you prefer to use perl.
     
  3. Sven

    Sven New Member

    Joined:
    Nov 8, 2018
    Messages:
    2
    Likes Received:
    0
    Thanks Johan,
    I might switch to using curl, I would rather give it one or two trys with HTTP::Request::Common before, however. The actual upload is part of a bigger program, where the latter is already included.

    For the memory issue, you are right when it comes to normal usage. The LWP::UserAgent loads the whole file, and to me it seems not efficient in doing so, even a minimal upload script, which doesn't even catch the response, uses almost four times of the file size as memory, when run via valgrind --tool=massif. The chunked upload however works just fine in keeping the memory usage low (only a fraction of the file itself). The file is uploaded to LiquidFiles, too, it just breaks in the process...

    The two upload versions work as follows:
    In normal mode ($DYNAMIC_FILE_UPLOAD = 0), the file is loaded into a filehandle (see https://metacpan.org/source/OALDERS/HTTP-Message-6.18/lib/HTTP/Request/Common.pm#L156 for reference), in the next line binmode is set for the filehandle, and later (line 166) the whole content is slurped into a variable and somewhen sent as part of the content of the request. For small files this works fine, nothing breaks.

    In the chunk mode ($DYNAMIC_FILE_UPLOAD = 0), instead of a content-string, a subroutine is given as content, which, as far as I understand it, is called again and again by the server, until nothing is returned anymore. The subroutine reads only a part of the file every time it is called, and returns it. (For reference, see https://metacpan.org/source/OALDERS/HTTP-Message-6.18/lib/HTTP/Request/Common.pm#L233 - this is initiated by $DYNAMIC_FILE_UPLOAD = 1.) In the subroutine the filehandle is defined in line 250, and here also, binmode is set in the next line.

    As it is hard to exclude other errors on the perl side by just this, I wrote a small script (I will again include it), which just produces the request for the chunked upload method, and then calls the content-subroutine itself, to rebuild the file (with the boundary beginning and filename stuff in front; pipe the output into some outputfile, if you try it). This gives back the right content. So the error must be introduced somewhere in the transfer process, or maybe on the LiquidFiles side, in my opinion. But then again, I might be overlooking something, in any case, I don't really know where to look further.

    Code:
    #!/usr/bin/perl -w
    
    use strict;
    
    use LWP::UserAgent qw( );
    use HTTP::Request::Common qw($DYNAMIC_FILE_UPLOAD);
    
    # provide
    my $File = "/path/to/file.ext";
    
    $DYNAMIC_FILE_UPLOAD = 1;
    my $ua = LWP::UserAgent->new();
    my $request = HTTP::Request::Common::POST(
      'https://example.server/attachments',
       Content_Type => 'form-data',
       Content => [
          Filedata => [ $File ],
       ],
    );
    
    #optional
    #use Data::Dumper;
    #print Dumper($request);
    
    my $contentsub = $request->content();
    
    my $part = $contentsub->();
    while ($part){
     print $part;
     $part = $contentsub->();
    }
     
  4. Johan

    Johan Administrator
    Staff Member

    Joined:
    Dec 1, 2015
    Messages:
    39
    Likes Received:
    1
    The way to test this is also using curl. Curl is great in that there's no smarts or anything that's trying to guess what you mean or adding any encoding you haven't specified, or anything similar. So if the curl examples on the API documentation doesn't work, there's something wrong with LiquidFiles. And if the curl examples work, there's something wrong either with your code or the perl library. If there's something wrong with your code or the perl library, one way of testing it is would be to set your dev system to allow http and then use Wireshark to capture the raw data that's being sent on the network. You can then compare exactly what curl is sending to what your code is sending and that would tell what's wrong and what's different when you're sending with the chunked upload and when you're not.
     

Share This Page