用于确定文件夹大小的并行循环

本文关键字:并行 循环 文件夹 用于 | 更新日期: 2023-09-27 17:56:41

总体程序目标是确定目录中主文件夹的大小。它适用于小型驱动器,但对于较大的驱动器则很困难。我绝对需要的驱动器之一花了 3 个多小时。这是我正在使用的文件夹大小调整程序的副本。

    public  double getDirectorySize(string p)
    {
        //get array of all file names
        string[] a = Directory.GetFiles(p, "*.*", SearchOption.AllDirectories);
        //calculate total bytes in loop
        double b = 0;
        foreach (string name in a)
        {
            if (name.Length < 250) // prevents path too long errors
            {

                    //use file info to get length of each file 
                    FileInfo info = new FileInfo(name);
                    b += info.Length;
            }
        }
        //return total size
        return b;
    }

所以我在考虑以并行循环的形式使用并行循环。每个 p 表示主文件夹的名称。我正在考虑以某种方式将路径 p 拆分为其子文件夹并使用并行 foreach 循环继续收集文件大小;但是,它们具有未知数量的子目录。这是我在尝试恢复文件夹大小时遇到问题的地方。提前感谢您的帮助

更新

我通过下面的这个 foreach 循环调用这个函数

           DirectoryInfo di = new DirectoryInfo    (Browse_Folders_Text_Box.Text);
            FileInfo[] parsedfilename = di.GetFiles("*.*", System.IO.SearchOption.TopDirectoryOnly);
            parsedfoldername = System.IO.Directory.GetDirectories(Browse_Folders_Text_Box.Text, "*.*", System.IO.SearchOption.TopDirectoryOnly);
            //parsedfilename = System.IO.Directory.GetDirectories(textBox1.Text, "*.*", System.IO.SearchOption.AllDirectories);


            // Process the list of folders found in the directory.
            type_label.Text = "Folder Names 'n";

            List<string> NameList = new List<string>();
            foreach (string transfer2 in parsedfoldername)
            {
                this.Cursor = Cursors.WaitCursor;
                //Uses the path and takes the name from last folder used
                string dirName = new DirectoryInfo(@transfer2).Name;
                string dirDate = new DirectoryInfo(@transfer2).LastWriteTime.ToString();

                NameList.Add(dirName);
                //Form2 TextTable = new Form2(NameList.ToString());

                //Display_Rich_Text_Box.AppendText(dirName);
                //Display_Rich_Text_Box.AppendText("'n");
                Last_Date_Modified_Text_Box.AppendText(dirDate);
                Last_Date_Modified_Text_Box.AppendText("'n");

                try
                {
                    double b;
                    b = getDirectorySize(transfer2);
                    MetricByte(b);

                }
                catch (Exception)
                {
                    Size_Text_Box.AppendText("N/A 'n");                      
                }
            }
            Display_Rich_Text_Box.Text = string.Join(Environment.NewLine, NameList);
            this.Cursor = Cursors.Default;

因此,当我想到并行 foreach 循环时,我想的是获取下一个实例名称(子文件夹名称),这些名称将全部在同一级别,并使用 getDirectorySize() 同时运行它们,因为我知道主文件夹名称正下方至少有 7 个子文件夹。

用于确定文件夹大小的并行循环

并行访问同一物理驱动器不会加快工作速度。

您的主要问题是GetFiles方法。它遍历收集所有文件名的所有子文件夹。然后再次循环访问相同的文件。

请改用EnumerateFiles方法。

试试这段代码。它会快得多。

public long GetDirectorySize(string path)
{
    var dirInfo = new DirectoryInfo(path);
    long totalSize = 0;
    foreach (var fileInfo in dirInfo.EnumerateFiles("*.*", SearchOption.AllDirectories))
    {
        totalSize += fileInfo.Length;
    }
    return totalSize;
}

MSDN:

枚举文件和 GetFiles 方法的区别如下:使用 EnumerateFiles 时,可以在返回整个集合之前开始枚举名称集合;使用 GetFiles 时,必须等待返回整个名称数组,然后才能访问数组。因此,当您处理许多文件和目录时,枚举文件可以更有效。

我不得不做类似的事情,尽管不是文件夹/文件大小。

我手边没有代码,但我使用了以下内容作为入门。如果目录中有足够的文件,它将并行执行

从 MSDN 上的源代码:

以下示例按顺序迭代目录,但 并行处理文件。这可能是最好的方法 当文件与目录的比率较大时。也可以 并行化目录迭代,并访问每个文件 顺序。并行化两个循环可能效率不高 除非您专门针对具有大量 处理器。但是,与所有情况一样,您应该测试您的应用程序 彻底确定最佳方法。

   static void Main()
   {            
      try 
      {
         TraverseTreeParallelForEach(@"C:'Program Files", (f) =>
         {
            // Exceptions are no-ops.
            try {
               // Do nothing with the data except read it.
               byte[] data = File.ReadAllBytes(f);
            }
            catch (FileNotFoundException) {}
            catch (IOException) {}
            catch (UnauthorizedAccessException) {}
            catch (SecurityException) {}
            // Display the filename.
            Console.WriteLine(f);
         });
      }
      catch (ArgumentException) {
         Console.WriteLine(@"The directory 'C:'Program Files' does not exist.");
      }   
      // Keep the console window open.
      Console.ReadKey();
   }
   public static void TraverseTreeParallelForEach(string root, Action<string> action)
   {
      //Count of files traversed and timer for diagnostic output
      int fileCount = 0;
      var sw = Stopwatch.StartNew();
      // Determine whether to parallelize file processing on each folder based on processor count.
      int procCount = System.Environment.ProcessorCount;
      // Data structure to hold names of subfolders to be examined for files.
      Stack<string> dirs = new Stack<string>();
      if (!Directory.Exists(root)) {
             throw new ArgumentException();
      }
      dirs.Push(root);
      while (dirs.Count > 0) {
         string currentDir = dirs.Pop();
         string[] subDirs = {};
         string[] files = {};
         try {
            subDirs = Directory.GetDirectories(currentDir);
         }
         // Thrown if we do not have discovery permission on the directory.
         catch (UnauthorizedAccessException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         // Thrown if another process has deleted the directory after we retrieved its name.
         catch (DirectoryNotFoundException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         try {
            files = Directory.GetFiles(currentDir);
         }
         catch (UnauthorizedAccessException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         catch (DirectoryNotFoundException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         catch (IOException e) {
            Console.WriteLine(e.Message);
            continue;
         }
         // Execute in parallel if there are enough files in the directory.
         // Otherwise, execute sequentially.Files are opened and processed
         // synchronously but this could be modified to perform async I/O.
         try {
            if (files.Length < procCount) {
               foreach (var file in files) {
                  action(file);
                  fileCount++;                            
               }
            }
            else {
               Parallel.ForEach(files, () => 0, (file, loopState, localCount) =>
                                            { action(file);
                                              return (int) ++localCount;
                                            },
                                (c) => {
                                          Interlocked.Add(ref fileCount, c);                          
                                });
            }
         }
         catch (AggregateException ae) {
            ae.Handle((ex) => {
                         if (ex is UnauthorizedAccessException) {
                            // Here we just output a message and go on.
                            Console.WriteLine(ex.Message);
                            return true;
                         }
                         // Handle other exceptions here if necessary...
                         return false;
            });
         }
         // Push the subdirectories onto the stack for traversal.
         // This could also be done before handing the files.
         foreach (string str in subDirs)
            dirs.Push(str);
      }
      // For diagnostic purposes.
      Console.WriteLine("Processed {0} files in {1} milleseconds", fileCount, sw.ElapsedMilliseconds);
   }
不幸的是,

没有隐藏的托管或 Win32 API 允许您在不递归的情况下获取磁盘上文件夹的大小,否则 Windows 资源管理器肯定会利用它。

下面是一个示例方法,该方法将并行化工作,您可以将其与标准非并行递归函数进行比较以实现相同的目的:

private static long GetFolderSize(string sourceDir)
{
    long size = 0;
    string[] fileEntries = Directory.GetFiles(sourceDir);
    foreach (string fileName in fileEntries)
    {
        Interlocked.Add(ref size, (new FileInfo(fileName)).Length);
    }
    var subFolders = Directory.EnumerateDirectories(sourceDir);
    var tasks = subFolders.Select(folder => Task.Factory.StartNew(() =>
    {
        if ((File.GetAttributes(folder) & FileAttributes.ReparsePoint) != FileAttributes.ReparsePoint)
        {
            Interlocked.Add(ref size, (GetFolderSize(folder)));
            return size;
        }
        return 0;
    }));
    Task.WaitAll(tasks.ToArray());
    return size;
}

此示例不会消耗大量内存,除非单个文件夹中有数百万个文件。

使用 Microsoft Scripting Runtime 似乎快了大约 90%:

var fso = new Scripting.FileSystemObject();
double size = fso.GetFolder(path).Size;

参考:计算 Windows 文件夹大小的最快方法是什么?