用 HTTP 协议下载资源(WinINet 实现)
阅读原文时间:2023年07月09日阅读:5

用 HTTP 协议下载资源(WinINet 实现)

InternetCrackUrl 解析 URL

BOOL InternetCrackUrl(
  _In_    LPCTSTR          lpszUrl,           // (1)
  _In_    DWORD            dwUrlLength,       // (2)
  _In_    DWORD            dwFlags,           // (3)
  _Inout_ LPURL_COMPONENTS lpUrlComponents    // (4)
);

(1) Pointer to a string that contains the canonical URL to be cracked.
(2) Size of the lpszUrl string, in TCHARs, or zero if lpszUrl is an ASCIIZ string
(3) Controls the operation: ICU_DECODE(Converts encoded characters back to their normal form), ICU_ESCAPE(Converts all escape sequences (%xx) to their corresponding characters)
(4) Pointer to a URL_COMPONENTS structure that receives the URL components.

InternetOpen 初始化应用程序对 WinINet 的使用

InternetOpen 是应用程序调用的第一个 WinINet 函数。 它用来告诉 Internet DLL 初始化内部数据结构, 为未来应用程序的调用做准备。当应用程序不再使用 Internet 函数时, 需要调用 InternetCloseHandle 来释放句柄及其关联的资源。

HINTERNET InternetOpen(
  _In_ LPCTSTR lpszAgent,               // (1)
  _In_ DWORD   dwAccessType,            // (2)
  _In_ LPCTSTR lpszProxyName,           // (3)
  _In_ LPCTSTR lpszProxyBypass,         // (4)
  _In_ DWORD   dwFlags                  // (5)
);

(1) Pointer to a null-terminated string that specifies the name of the application or entity calling the WinINet functions. This name is used as the user agent in the HTTP protocol.
(2) Type of access required:

  • INTERNET_OPEN_TYPE_DIRECT: Resolves all host names locally;
  • INTERNET_OPEN_TYPE_PRECONFIG: Retrieves the proxy or direct configuration from the registry;
  • INTERNET_OPEN_TYPE_PRECONFIG_WITH_NO_AUTOPROXY: Retrieves the proxy or direct configuration from the registry and prevents the use of a startup Microsoft JScript or Internet Setup (INS) file;
  • INTERNET_OPEN_TYPE_PROXY: Passes requests to the proxy unless a proxy bypass list is supplied and the name to be resolved bypasses the proxy. In this case, the function uses INTERNET_OPEN_TYPE_DIRECT

(3) Pointer to a null-terminated string that specifies the name of the proxy server(s) to use when proxy access is specified by setting dwAccessType to INTERNET_OPEN_TYPE_PROXY.
(4) Pointer to a null-terminated string that specifies an optional list of host names or IP addresses, or both, that should not be routed through the proxy when dwAccessType is set to INTERNET_OPEN_TYPE_PROXY.
(5) Options:

  • INTERNET_FLAG_ASYNC: Makes only asynchronous requests on handles descended from the handle returned from this function;
  • INTERNET_FLAG_FROM_CACHE: Does not make network requests. All entities are returned from the cache);
  • INTERNET_FLAG_OFFLINE: Identical to INTERNET_FLAG_FROM_CACHE

InternetConnect 为指定网站打开一个文件传输协议(File Transfer Protocol, FTP) 或 HTTP 协议 的会话(session)

HINTERNET InternetConnect(
  _In_ HINTERNET     hInternet,         // (1)
  _In_ LPCTSTR       lpszServerName,    // (2)
  _In_ INTERNET_PORT nServerPort,       // (3)
  _In_ LPCTSTR       lpszUsername,      // (4)
  _In_ LPCTSTR       lpszPassword,      // (5)
  _In_ DWORD         dwService,         // (6)
  _In_ DWORD         dwFlags,           // (7)
  _In_ DWORD_PTR     dwContext          // (8)
);

(1) Handle returned by a previous call to InternetOpen.
(2) Pointer to a null-terminated string that specifies the host name of an Internet server. Alternately, the string can contain the IP number of the site, in ASCII dotted-decimal format (for example, 11.0.1.45).
(3) Transmission Control Protocol/Internet Protocol (TCP/IP) port on the server.
(4) Pointer to a null-terminated string that specifies the name of the user to log on. If this parameter is NULL, the function uses an appropriate default.
(5) Pointer to a null-terminated string that contains the password to use to log on. If both lpszPassword and lpszUsername are NULL, the function uses the default “anonymous” password.
(6) Type of service to access:

  • INTERNET_SERVICE_FTP: FTP service;
  • INTERNET_SERVICE_GOPHER: Gopher service;
  • INTERNET_SERVICE_HTTP: HTTP service

(7) Options specific to the service used.
(8) Pointer to a variable that contains an application-defined value that is used to identify the application context for the returned handle in callbacks.

HttpOpenRequest 创建 HTTP 请求(request) 句柄

如果指定了除 “GET” 或 “POST” 以外的请求方法动词, HttpOpenRequest 自动为请求设置 INTERNET_FLAG_NO_CACHE_WRITEINTERNET_FLAG_RELOAD.

HINTERNET HttpOpenRequest(
  _In_ HINTERNET hConnect,              // (1)
  _In_ LPCTSTR   lpszVerb,              // (2)
  _In_ LPCTSTR   lpszObjectName,        // (3)
  _In_ LPCTSTR   lpszVersion,           // (4)
  _In_ LPCTSTR   lpszReferer,           // (5)
  _In_ LPCTSTR   *lplpszAcceptTypes,    // (6)
  _In_ DWORD     dwFlags,               // (7)
  _In_ DWORD_PTR dwContext              // (8)
);

(1) A handle to an HTTP session returned by InternetConnect.
(2) A pointer to a null-terminated string that contains the HTTP verb to use in the request. If this parameter is NULL, the function uses GET as the HTTP verb.
(3) A pointer to a null-terminated string that contains the name of the target object of the specified HTTP verb. This is generally a file name, an executable module, or a search specifier. (即, 请求资源的 URI)
(4) A pointer to a null-terminated string that contains the HTTP version to use in the request.If this parameter is NULL, the function uses an HTTP version of 1.1 or 1.0, depending on the value of the Internet Explorer settings.(一般设置为 “HTTP/1.0” 或 “HTTP/1.1”)
(5) A pointer to a null-terminated string that specifies the URL of the document from which the URL in the request (lpszObjectName) was obtained. If this parameter is NULL, no referrer is specified.
(6) A pointer to a null-terminated array of strings that indicates media types accepted by the client.Here is an example.

PCTSTR rgpszAcceptTypes[] = {_T(“text/*”), NULL};

(7) Internet options: INTERNET_FLAG_RELOAD (Forces a download of the requested file, object, or directory listing from the origin server, not from the cache), INTERNET_FLAG_NO_CACHE_WRITE (Does not add the returned entity to the cache) 等。
(8) A pointer to a variable that contains the application-defined value that associates this operation with any application data.

HttpAddRequestHeaders 向 HTTP 的请求句柄添加首部字段

BOOL HttpAddRequestHeaders(
  _In_ HINTERNET hRequest,          // (1)
  _In_ LPCTSTR   lpszHeaders,       // (2)
  _In_ DWORD     dwHeadersLength,   // (3)
  _In_ DWORD     dwModifiers        // (4)
);

(1) A handle returned by a call to the HttpOpenRequest function.
(2) A pointer to a string variable containing the headers to append to the request. Each header must be terminated by a CR/LF (carriage return/line feed) pair.
(3) The size of lpszHeaders, in TCHARs. If this parameter is -1L, the function assumes that lpszHeaders is zero-terminated (ASCIIZ), and the length is computed.
(4) A set of modifiers that control the semantics of this function:

  • HTTP_ADDREQ_FLAG_ADD: Adds the header if it does not exist. Used with HTTP_ADDREQ_FLAG_REPLACE;
  • HTTP_ADDREQ_FLAG_ADD_IF_NEW: Adds the header only if it does not already exist; otherwise, an error is returned;
  • HTTP_ADDREQ_FLAG_COALESCE: Coalesces(使联合;使合并) headers of the same name
  • HTTP_ADDREQ_FLAG_COALESCE_WITH_COMMA: Coalesces headers of the same name with comma(逗号). For example, adding “Accept: text/” followed by “Accept: audio/” with this flag results in the formation of the single header “Accept: text/, audio/“;
  • HTTP_ADDREQ_FLAG_COALESCE_WITH_SEMICOLON: Coalesces headers of the same name using a semicolon(分号);
  • HTTP_ADDREQ_FLAG_REPLACE: Replaces or removes a header. If the header value is empty and the header is found, it is removed. If not empty, the header value is replaced.

HttpSendRequest 发送 Http 请求

BOOL HttpSendRequest(
  _In_ HINTERNET hRequest,              (1)
  _In_ LPCTSTR   lpszHeaders,           (2)
  _In_ DWORD     dwHeadersLength,       (3)
  _In_ LPVOID    lpOptional,            (4)
  _In_ DWORD     dwOptionalLength       (5)
);

(1) A handle returned by a call to the HttpOpenRequest function.
(2) A pointer to a null-terminated string that contains the additional headers to be appended to the request. This parameter can be NULL if there are no additional headers to be appended.
(3) The size of the additional headers, in TCHARs. If this parameter is -1L and lpszHeaders is not NULL, the function assumes that lpszHeaders is zero-terminated (ASCIIZ), and the length is calculated.
(4) A pointer to a buffer containing any optional data to be sent immediately after the request headers. This parameter is generally used for “POST” and “PUT” operations.
(5) The size of the optional data, in bytes.

HttpQueryInfo 获取 HTTP 请求的响应情况

例子: Retrieving HTTP Headers

BOOL HttpQueryInfo(
  _In_    HINTERNET hRequest,           // (1)
  _In_    DWORD     dwInfoLevel,        // (2)
  _Inout_ LPVOID    lpvBuffer,          // (3)
  _Inout_ LPDWORD   lpdwBufferLength,   // (4)
  _Inout_ LPDWORD   lpdwIndex           // (5)
);

(1) A handle returned by a call to the HttpOpenRequest or InternetOpenUrl function.
(2) A combination of an attribute to be retrieved and flags that modify the request. For a list of possible attribute and modifier values, see Query Info Flags.

HTTP_QUERY_CONTENT_LENGTH (Retrieves the size of the resource, in bytes), HTTP_QUERY_ACCEPT_RANGES (Retrieves the types of range requests that are accepted for a resource), HTTP_QUERY_CONTENT_RANGE (HTTP_QUERY_CONTENT_RANGE), HTTP_QUERY_FLAG_NUMBER (Returns the data as a 32-bit number for headers whose value is a number, such as the status code), HTTP_QUERY_STATUS_CODE (Receives the status code returned by the server) 等

(3) A pointer to a buffer to receive the requested information.
(4) A pointer to a variable that contains, on entry, the size in bytes of the buffer pointed to by lpvBuffer. When the function returns successfully, this variable contains the number of bytes of information written to the buffer. In the case of a string, the byte count does not include the string’s terminating null character.
(5) A pointer to a zero-based header index used to enumerate multiple headers with the same name.

InternetReadFileInternetOpenUrl, FtpOpenFile, 或 HttpOpenRequest 打开的句柄中读取数据。

为了保证所有的数据都被读取, 需要循环调用 InternetReadFile 函数, 直到返回的 lpdwNumberOfBytesRead 参数为 0。

BOOL InternetReadFile(
  _In_  HINTERNET hFile,                    // (1)
  _Out_ LPVOID    lpBuffer,                 // (2)
  _In_  DWORD     dwNumberOfBytesToRead,    // (3)
  _Out_ LPDWORD   lpdwNumberOfBytesRead     // (4)
);

(1) Handle returned from a previous call to InternetOpenUrl, FtpOpenFile, or HttpOpenRequest.
(2) Pointer to a buffer that receives the data.
(3) Number of bytes to be read.
(4) Pointer to a variable that receives the number of bytes read. InternetReadFile sets this value to zero before doing any work or error checking.

#include <string>
#include <iostream>
#include <windows.h>
#include <WinINet.h>  

using namespace std;  

#pragma comment(lib, "WinINet.lib")  

int main(int argc, char* argv[])
{
    wstring strURL = L"http://blog.csdn.net/yanglingwell/article/details/78258081";
    // 解析 URL
    URL_COMPONENTS urlComponents;

    ZeroMemory(&urlComponents, sizeof(urlComponents));
    WCHAR lpszHostName[INTERNET_MAX_HOST_NAME_LENGTH]    = {0};
    WCHAR lpszUserName[INTERNET_MAX_USER_NAME_LENGTH]    = {0};
    WCHAR lpszPassword[INTERNET_MAX_PASSWORD_LENGTH]     = {0};
    WCHAR lpszURLPath[INTERNET_MAX_URL_LENGTH]           = {0};
    WCHAR lpszScheme[INTERNET_MAX_SCHEME_LENGTH]         = {0};

    urlComponents.dwStructSize      = sizeof(URL_COMPONENTSA);
    urlComponents.lpszScheme        = lpszScheme;
    urlComponents.dwSchemeLength    = INTERNET_MAX_SCHEME_LENGTH;
    urlComponents.lpszHostName      = lpszHostName;
    urlComponents.dwHostNameLength  = INTERNET_MAX_HOST_NAME_LENGTH;
    urlComponents.lpszUserName      = lpszUserName;
    urlComponents.dwUserNameLength  = INTERNET_MAX_USER_NAME_LENGTH;
    urlComponents.lpszPassword      = lpszPassword;
    urlComponents.dwPasswordLength  = INTERNET_MAX_PASSWORD_LENGTH;
    urlComponents.lpszUrlPath       = lpszURLPath;
    urlComponents.dwUrlPathLength   = INTERNET_MAX_URL_LENGTH;  

    BOOL bSuccess = InternetCrackUrl(strURL.data(), 0, NULL, &urlComponents);
    if(bSuccess == FALSE)
    {
        wcout << strURL << L" 解析失败!" << endl;
        return 0;
    }
    else if(urlComponents.nScheme != INTERNET_SCHEME_HTTP)
    {
        wcout << strURL << L" 不是 HTTP 协议!" << endl;
        return 0;
    }

    HINTERNET hSession  = NULL;
    HINTERNET hInternet = NULL;
    HINTERNET hRequest  = NULL;

    do
    {
        // Initializes an application's use of the WinINet functions.
        // Returns a valid handle that the application passes to subsequent WinINet functions.
        // If InternetOpen fails, it returns NULL.
        hInternet = InternetOpen(L"yanglingwell", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL, 0);
        if(hInternet == NULL)
        {
            cout << "InternetOpen failed. errCode: " << GetLastError() << endl;
            break;
        }

        // Opens an HTTP session for a given site.
        // Returns a valid handle to the session if the connection is successful, or NULL otherwise.
        HINTERNET hSession  = InternetConnect(hInternet, urlComponents.lpszHostName, urlComponents.nPort, urlComponents.lpszUserName,
            urlComponents.lpszPassword, INTERNET_SERVICE_HTTP, 0, NULL);
        if(hSession == NULL)
        {
            cout << "InternetConnect failed. errCode: " << GetLastError() << endl;
            break;
        }

        // Creates an HTTP request handle
        // Returns an HTTP request handle if successful, or NULL otherwise.
        hRequest = HttpOpenRequest(hSession, L"GET", urlComponents.lpszUrlPath, NULL, L"", NULL, 0, 0);
        if(hRequest == NULL)
        {
            cout << "HttpOpenRequest failed. errCode: " << GetLastError() << endl;
            break;
        }

        // 设置首部字段
        wstring strHeader;
        // 设置接受数据类型
        strHeader += L"Accept: */*\r\n";
        // 设置禁止用缓存和缓存控制
        strHeader += L"Pragma: no-cache\r\n";
        strHeader += L"Cache-Control: no-cache\r\n";
        // 设置其它首部字段.... 

        // Adds one or more HTTP request headers to the HTTP request handle.
        if (!HttpAddRequestHeaders(hRequest, strHeader.data(), strHeader.length(), HTTP_ADDREQ_FLAG_ADD|HTTP_ADDREQ_FLAG_REPLACE))
        {
            cout << "HttpAddRequestHeaders failed. errCode: " << GetLastError() << endl;
            break;
        }

        if (!HttpSendRequest(hRequest, NULL, 0, NULL, 0))
        {
            cout << "HttpAddRequestHeaders failed. errCode: " << GetLastError() << endl;
            break;
        }

        DWORD dwStatusCode;
        DWORD dwSizeDW = sizeof(DWORD);
        if (!HttpQueryInfo(hRequest, HTTP_QUERY_FLAG_NUMBER | HTTP_QUERY_STATUS_CODE, &dwStatusCode, &dwSizeDW, NULL))
        {
            cout << "HttpQueryInfo failed. errCode: " << GetLastError() << endl;
            break;
        }
        else
        {
            cout << "StatusCode: " << dwStatusCode << endl;
        }

        WCHAR buf[2048];
        DWORD bufSize = sizeof(buf);
        DWORD bufRead = 0;
        do
        {
            if(!InternetReadFile(hRequest, &buf, bufSize, &bufRead))
            {
                cout << "InternetReadFile failed. errCode: " << GetLastError() << endl;
                break;
            }
            wcout << L"reading..." << endl;
        } while (bufRead != 0);

    } while (FALSE);

    if(hInternet != NULL)
    {
        InternetCloseHandle(hInternet);
    }
    if(hSession != NULL)
    {
        InternetCloseHandle(hSession);
    }
    if(hRequest != NULL)
    {
        InternetCloseHandle(hRequest);
    }

    return 0;
}