使用DownloadFileTaskAsync一次下载所有文件

本文关键字:下载 文件 一次 DownloadFileTaskAsync 使用 | 更新日期: 2023-09-27 18:11:04

给定一个包含url的输入文本文件,我想一次性下载相应的文件。我用这个问题的答案UserState使用WebClient和TaskAsync从Async CTP下载作为参考。

public void Run()
{
    List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();
    int index = 0;
    Task[] tasks = new Task[urls.Count()];
    foreach (string url in urls)
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
        Task downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Task outputTask = downloadTask.ContinueWith(t => Output(path));
        tasks[index] = outputTask;
    }
    Console.WriteLine("Start now");
    Task.WhenAll(tasks);
    Console.WriteLine("Done");
}
public void Output(string path)
{
    Console.WriteLine(path);
}

我预计文件的下载将在"Task.WhenAll(tasks)"点开始。但结果是输出看起来像

<>之前c:/temp/输出/- 2. jpg图像c:/temp/输出/- 1. jpg图像c:/temp/输出/- 4. jpg图像c:/temp/输出/- 6. jpg图像c:/temp/输出/- 3. jpg图像[删去多行]现在就开始c:/temp/输出/- 18. jpg图像c:/temp/输出/- 19. jpg图像c:/temp/输出/- 20. jpg图像c:/temp/输出/- 21. jpg图像c:/temp/输出/- 23. jpg图像[删去多行]完成之前

为什么在调用WaitAll之前开始下载?我可以改变什么来实现我想要的(即所有任务将在同一时间开始)?

谢谢

使用DownloadFileTaskAsync一次下载所有文件

为什么在调用WaitAll之前开始下载?

首先,你不是在调用Task.WaitAll,它会同步阻塞,你在调用Task.WhenAll,它返回一个应该等待的可等待对象

现在,正如其他人所说,当你调用一个异步方法时,即使没有使用await,它也会触发异步操作,因为任何符合TAP的方法都会返回一个"热任务"。

我可以改变什么来实现我想要的(即所有任务都会)同时开始)?

现在,如果您想将执行推迟到Task.WhenAll,您可以使用Enumerable.Select将每个元素投影到Task,并在将其传递给Task.WhenAll时将其具体化:

public async Task RunAsync()
{
    IEnumerable<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt");
    var urlTasks = urls.Select((url, index) =>
    {
        WebClient wc = new WebClient();
        string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index);
        var downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
        Output(path);
        return downloadTask;
    });
    Console.WriteLine("Start now");
    await Task.WhenAll(urlTasks);
    Console.WriteLine("Done");
}

为什么在调用WaitAll之前开始下载?

因为:

由其公共构造函数创建的任务被称为" cold "。任务,因为它们在非计划的情况下开始它们的生命周期TaskStatus。创建状态,直到在这些上调用Start它们进展到被调度的实例。所有其他任务开始它们的生命周期处于"热"状态,即异步他们所代表的行动已经开始,他们的TaskStatus是一个非Created的枚举值。所有的任务TAP方法的返回值必须是" hot "

由于DownloadFileTaskAsync是一个TAP方法,它返回"hot"(即已经启动的)任务。

我可以改变什么来实现我想要的(即所有任务将同时开始)?

我会看看TPL数据流。类似这样的内容(我使用HttpClient而不是WebClient,但是,实际上,这并不重要):

    static async Task DownloadData(IEnumerable<string> urls)
    {
        // we want to execute this in parallel
        var executionOptions = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };
        // this block will receive URL and download content, pointed by URL
        var donwloadBlock = new TransformBlock<string, Tuple<string, string>>(async url =>
        {
            using (var client = new HttpClient())
            {
                var content = await client.GetStringAsync(url);
                return Tuple.Create(url, content);
            }
        }, executionOptions);
        // this block will print number of bytes downloaded
        var outputBlock = new ActionBlock<Tuple<string, string>>(tuple =>
        {
            Console.WriteLine($"Downloaded {(string.IsNullOrEmpty(tuple.Item2) ? 0 : tuple.Item2.Length)} bytes from {tuple.Item1}");
        }, executionOptions);
        // here we tell to donwloadBlock, that it is linked with outputBlock;
        // this means, that when some item from donwloadBlock is being processed, 
        // it must be posted to outputBlock
        using (donwloadBlock.LinkTo(outputBlock))
        {
            // fill downloadBlock with input data
            foreach (var url in urls)
            {
                await donwloadBlock.SendAsync(url);
            }
            // tell donwloadBlock, that it is complete; thus, it should start processing its items
            donwloadBlock.Complete();
            // wait while downloading data
            await donwloadBlock.Completion;
            // tell outputBlock, that it is completed
            outputBlock.Complete();
            // wait while printing output
            await outputBlock.Completion;
        }
    }
    static void Main(string[] args)
    {
        var urls = new[]
        {
            "http://www.microsoft.com",
            "http://www.google.com",
            "http://stackoverflow.com",
            "http://www.amazon.com",
            "http://www.asp.net"
        };
        Console.WriteLine("Start now.");
        DownloadData(urls).Wait();
        Console.WriteLine("Done.");
        Console.ReadLine();
    }
输出:


现在开始。从http://www.microsoft.com下载1020字节
从http://www.google.com下载53108字节
从http://stackoverflow.com下载244143字节
从http://www.amazon.com下载468922字节
从http://www.asp.net下载27771字节
完成了。

我可以改变什么来实现我想要的(即所有任务都会)同时开始)?

同步下载的开始,你可以使用Barrier类。

  public void Run()
  {
      List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();

      Barrier barrier = new Barrier(url.Count, ()=> {Console.WriteLine("Start now");} );
      Task[] tasks = new Task[urls.Count()];
      Parallel.For(0, urls.Count, (int index)=>
      {
           string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
          tasks[index] = DownloadAsync(Uri(urls[index]), path, barrier);        
      })

      Task.WaitAll(tasks); // wait for completion
      Console.WriteLine("Done");
    }
    async Task DownloadAsync(Uri url, string path, Barrier barrier)
    {
           using (WebClient wc = new WebClient())
           {
                barrier.SignalAndWait();
                await wc.DownloadFileAsync(url, path);
                Output(path);
           }
    }