用于确定文件夹大小的并行循环
本文关键字:并行 循环 文件夹 用于 | 更新日期: 2023-09-27 17:56:41
总体程序目标是确定目录中主文件夹的大小。它适用于小型驱动器,但对于较大的驱动器则很困难。我绝对需要的驱动器之一花了 3 个多小时。这是我正在使用的文件夹大小调整程序的副本。
public double getDirectorySize(string p)
{
//get array of all file names
string[] a = Directory.GetFiles(p, "*.*", SearchOption.AllDirectories);
//calculate total bytes in loop
double b = 0;
foreach (string name in a)
{
if (name.Length < 250) // prevents path too long errors
{
//use file info to get length of each file
FileInfo info = new FileInfo(name);
b += info.Length;
}
}
//return total size
return b;
}
所以我在考虑以并行循环的形式使用并行循环。每个 p 表示主文件夹的名称。我正在考虑以某种方式将路径 p 拆分为其子文件夹并使用并行 foreach 循环继续收集文件大小;但是,它们具有未知数量的子目录。这是我在尝试恢复文件夹大小时遇到问题的地方。提前感谢您的帮助
更新
我通过下面的这个 foreach 循环调用这个函数
DirectoryInfo di = new DirectoryInfo (Browse_Folders_Text_Box.Text);
FileInfo[] parsedfilename = di.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
parsedfoldername = System.IO.Directory.GetDirectories(Browse_Folders_Text_Box.Text, "*.*", System.IO.SearchOption.TopDirectoryOnly);
//parsedfilename = System.IO.Directory.GetDirectories(textBox1.Text, "*.*", System.IO.SearchOption.AllDirectories);
// Process the list of folders found in the directory.
type_label.Text = "Folder Names 'n";
List<string> NameList = new List<string>();
foreach (string transfer2 in parsedfoldername)
{
this.Cursor = Cursors.WaitCursor;
//Uses the path and takes the name from last folder used
string dirName = new DirectoryInfo(@transfer2).Name;
string dirDate = new DirectoryInfo(@transfer2).LastWriteTime.ToString();
NameList.Add(dirName);
//Form2 TextTable = new Form2(NameList.ToString());
//Display_Rich_Text_Box.AppendText(dirName);
//Display_Rich_Text_Box.AppendText("'n");
Last_Date_Modified_Text_Box.AppendText(dirDate);
Last_Date_Modified_Text_Box.AppendText("'n");
try
{
double b;
b = getDirectorySize(transfer2);
MetricByte(b);
}
catch (Exception)
{
Size_Text_Box.AppendText("N/A 'n");
}
}
Display_Rich_Text_Box.Text = string.Join(Environment.NewLine, NameList);
this.Cursor = Cursors.Default;
因此,当我想到并行 foreach 循环时,我想的是获取下一个实例名称(子文件夹名称),这些名称将全部在同一级别,并使用 getDirectorySize() 同时运行它们,因为我知道主文件夹名称正下方至少有 7 个子文件夹。
并行访问同一物理驱动器不会加快工作速度。
您的主要问题是GetFiles
方法。它遍历收集所有文件名的所有子文件夹。然后再次循环访问相同的文件。
请改用EnumerateFiles
方法。
试试这段代码。它会快得多。
public long GetDirectorySize(string path)
{
var dirInfo = new DirectoryInfo(path);
long totalSize = 0;
foreach (var fileInfo in dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories))
{
totalSize += fileInfo.Length;
}
return totalSize;
}
MSDN:
枚举文件和 GetFiles 方法的区别如下:使用 EnumerateFiles 时,可以在返回整个集合之前开始枚举名称集合;使用 GetFiles 时,必须等待返回整个名称数组,然后才能访问数组。因此,当您处理许多文件和目录时,枚举文件可以更有效。
我不得不做类似的事情,尽管不是文件夹/文件大小。
我手边没有代码,但我使用了以下内容作为入门。如果目录中有足够的文件,它将并行执行
从 MSDN 上的源代码:
以下示例按顺序迭代目录,但 并行处理文件。这可能是最好的方法 当文件与目录的比率较大时。也可以 并行化目录迭代,并访问每个文件 顺序。并行化两个循环可能效率不高 除非您专门针对具有大量 处理器。但是,与所有情况一样,您应该测试您的应用程序 彻底确定最佳方法。
static void Main()
{
try
{
TraverseTreeParallelForEach(@"C:'Program Files", (f) =>
{
// Exceptions are no-ops.
try {
// Do nothing with the data except read it.
byte[] data = File.ReadAllBytes(f);
}
catch (FileNotFoundException) {}
catch (IOException) {}
catch (UnauthorizedAccessException) {}
catch (SecurityException) {}
// Display the filename.
Console.WriteLine(f);
});
}
catch (ArgumentException) {
Console.WriteLine(@"The directory 'C:'Program Files' does not exist.");
}
// Keep the console window open.
Console.ReadKey();
}
public static void TraverseTreeParallelForEach(string root, Action<string> action)
{
//Count of files traversed and timer for diagnostic output
int fileCount = 0;
var sw = Stopwatch.StartNew();
// Determine whether to parallelize file processing on each folder based on processor count.
int procCount = System.Environment.ProcessorCount;
// Data structure to hold names of subfolders to be examined for files.
Stack<string> dirs = new Stack<string>();
if (!Directory.Exists(root)) {
throw new ArgumentException();
}
dirs.Push(root);
while (dirs.Count > 0) {
string currentDir = dirs.Pop();
string[] subDirs = {};
string[] files = {};
try {
subDirs = Directory.GetDirectories(currentDir);
}
// Thrown if we do not have discovery permission on the directory.
catch (UnauthorizedAccessException e) {
Console.WriteLine(e.Message);
continue;
}
// Thrown if another process has deleted the directory after we retrieved its name.
catch (DirectoryNotFoundException e) {
Console.WriteLine(e.Message);
continue;
}
try {
files = Directory.GetFiles(currentDir);
}
catch (UnauthorizedAccessException e) {
Console.WriteLine(e.Message);
continue;
}
catch (DirectoryNotFoundException e) {
Console.WriteLine(e.Message);
continue;
}
catch (IOException e) {
Console.WriteLine(e.Message);
continue;
}
// Execute in parallel if there are enough files in the directory.
// Otherwise, execute sequentially.Files are opened and processed
// synchronously but this could be modified to perform async I/O.
try {
if (files.Length < procCount) {
foreach (var file in files) {
action(file);
fileCount++;
}
}
else {
Parallel.ForEach(files, () => 0, (file, loopState, localCount) =>
{ action(file);
return (int) ++localCount;
},
(c) => {
Interlocked.Add(ref fileCount, c);
});
}
}
catch (AggregateException ae) {
ae.Handle((ex) => {
if (ex is UnauthorizedAccessException) {
// Here we just output a message and go on.
Console.WriteLine(ex.Message);
return true;
}
// Handle other exceptions here if necessary...
return false;
});
}
// Push the subdirectories onto the stack for traversal.
// This could also be done before handing the files.
foreach (string str in subDirs)
dirs.Push(str);
}
// For diagnostic purposes.
Console.WriteLine("Processed {0} files in {1} milleseconds", fileCount, sw.ElapsedMilliseconds);
}
没有隐藏的托管或 Win32 API 允许您在不递归的情况下获取磁盘上文件夹的大小,否则 Windows 资源管理器肯定会利用它。
下面是一个示例方法,该方法将并行化工作,您可以将其与标准非并行递归函数进行比较以实现相同的目的:
private static long GetFolderSize(string sourceDir)
{
long size = 0;
string[] fileEntries = Directory.GetFiles(sourceDir);
foreach (string fileName in fileEntries)
{
Interlocked.Add(ref size, (new FileInfo(fileName)).Length);
}
var subFolders = Directory.EnumerateDirectories(sourceDir);
var tasks = subFolders.Select(folder => Task.Factory.StartNew(() =>
{
if ((File.GetAttributes(folder) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
{
Interlocked.Add(ref size, (GetFolderSize(folder)));
return size;
}
return 0;
}));
Task.WaitAll(tasks.ToArray());
return size;
}
此示例不会消耗大量内存,除非单个文件夹中有数百万个文件。
使用 Microsoft Scripting Runtime
似乎快了大约 90%:
var fso = new Scripting.FileSystemObject();
double size = fso.GetFolder(path).Size;
参考:计算 Windows 文件夹大小的最快方法是什么?