使用DownloadFileTaskAsync一次下载所有文件
本文关键字:下载 文件 一次 DownloadFileTaskAsync 使用 | 更新日期: 2023-09-27 18:11:04
给定一个包含url的输入文本文件,我想一次性下载相应的文件。我用这个问题的答案UserState使用WebClient和TaskAsync从Async CTP下载作为参考。
public void Run()
{
List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();
int index = 0;
Task[] tasks = new Task[urls.Count()];
foreach (string url in urls)
{
WebClient wc = new WebClient();
string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
Task downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
Task outputTask = downloadTask.ContinueWith(t => Output(path));
tasks[index] = outputTask;
}
Console.WriteLine("Start now");
Task.WhenAll(tasks);
Console.WriteLine("Done");
}
public void Output(string path)
{
Console.WriteLine(path);
}
我预计文件的下载将在"Task.WhenAll(tasks)"点开始。但结果是输出看起来像
<>之前c:/temp/输出/- 2. jpg图像c:/temp/输出/- 1. jpg图像c:/temp/输出/- 4. jpg图像c:/temp/输出/- 6. jpg图像c:/temp/输出/- 3. jpg图像[删去多行]现在就开始c:/temp/输出/- 18. jpg图像c:/temp/输出/- 19. jpg图像c:/temp/输出/- 20. jpg图像c:/temp/输出/- 21. jpg图像c:/temp/输出/- 23. jpg图像[删去多行]完成之前为什么在调用WaitAll之前开始下载?我可以改变什么来实现我想要的(即所有任务将在同一时间开始)?
谢谢
为什么在调用WaitAll之前开始下载?
首先,你不是在调用Task.WaitAll
,它会同步阻塞,你在调用Task.WhenAll
,它返回一个应该等待的可等待对象
现在,正如其他人所说,当你调用一个异步方法时,即使没有使用await
,它也会触发异步操作,因为任何符合TAP的方法都会返回一个"热任务"。
我可以改变什么来实现我想要的(即所有任务都会)同时开始)?
现在,如果您想将执行推迟到Task.WhenAll
,您可以使用Enumerable.Select
将每个元素投影到Task
,并在将其传递给Task.WhenAll
时将其具体化:
public async Task RunAsync()
{
IEnumerable<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt");
var urlTasks = urls.Select((url, index) =>
{
WebClient wc = new WebClient();
string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index);
var downloadTask = wc.DownloadFileTaskAsync(new Uri(url), path);
Output(path);
return downloadTask;
});
Console.WriteLine("Start now");
await Task.WhenAll(urlTasks);
Console.WriteLine("Done");
}
为什么在调用WaitAll之前开始下载?
因为:
由其公共构造函数创建的任务被称为" cold "。任务,因为它们在非计划的情况下开始它们的生命周期TaskStatus。创建状态,直到在这些上调用Start它们进展到被调度的实例。所有其他任务开始它们的生命周期处于"热"状态,即异步他们所代表的行动已经开始,他们的TaskStatus是一个非Created的枚举值。所有的任务TAP方法的返回值必须是" hot "
由于DownloadFileTaskAsync
是一个TAP方法,它返回"hot"(即已经启动的)任务。
我可以改变什么来实现我想要的(即所有任务将同时开始)?
我会看看TPL数据流。类似这样的内容(我使用HttpClient
而不是WebClient
,但是,实际上,这并不重要):
static async Task DownloadData(IEnumerable<string> urls)
{
// we want to execute this in parallel
var executionOptions = new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = Environment.ProcessorCount };
// this block will receive URL and download content, pointed by URL
var donwloadBlock = new TransformBlock<string, Tuple<string, string>>(async url =>
{
using (var client = new HttpClient())
{
var content = await client.GetStringAsync(url);
return Tuple.Create(url, content);
}
}, executionOptions);
// this block will print number of bytes downloaded
var outputBlock = new ActionBlock<Tuple<string, string>>(tuple =>
{
Console.WriteLine($"Downloaded {(string.IsNullOrEmpty(tuple.Item2) ? 0 : tuple.Item2.Length)} bytes from {tuple.Item1}");
}, executionOptions);
// here we tell to donwloadBlock, that it is linked with outputBlock;
// this means, that when some item from donwloadBlock is being processed,
// it must be posted to outputBlock
using (donwloadBlock.LinkTo(outputBlock))
{
// fill downloadBlock with input data
foreach (var url in urls)
{
await donwloadBlock.SendAsync(url);
}
// tell donwloadBlock, that it is complete; thus, it should start processing its items
donwloadBlock.Complete();
// wait while downloading data
await donwloadBlock.Completion;
// tell outputBlock, that it is completed
outputBlock.Complete();
// wait while printing output
await outputBlock.Completion;
}
}
static void Main(string[] args)
{
var urls = new[]
{
"http://www.microsoft.com",
"http://www.google.com",
"http://stackoverflow.com",
"http://www.amazon.com",
"http://www.asp.net"
};
Console.WriteLine("Start now.");
DownloadData(urls).Wait();
Console.WriteLine("Done.");
Console.ReadLine();
}
输出:现在开始。从http://www.microsoft.com下载1020字节
从http://www.google.com下载53108字节
从http://stackoverflow.com下载244143字节
从http://www.amazon.com下载468922字节
从http://www.asp.net下载27771字节
完成了。
我可以改变什么来实现我想要的(即所有任务都会)同时开始)?
同步下载的开始,你可以使用Barrier
类。
public void Run()
{
List<string> urls = File.ReadAllLines(@"c:/temp/Input/input.txt").ToList();
Barrier barrier = new Barrier(url.Count, ()=> {Console.WriteLine("Start now");} );
Task[] tasks = new Task[urls.Count()];
Parallel.For(0, urls.Count, (int index)=>
{
string path = string.Format("{0}image-{1}.jpg", @"c:/temp/Output/", index+1);
tasks[index] = DownloadAsync(Uri(urls[index]), path, barrier);
})
Task.WaitAll(tasks); // wait for completion
Console.WriteLine("Done");
}
async Task DownloadAsync(Uri url, string path, Barrier barrier)
{
using (WebClient wc = new WebClient())
{
barrier.SignalAndWait();
await wc.DownloadFileAsync(url, path);
Output(path);
}
}