循环 c# 中的多线程

本文关键字:多线程 循环 | 更新日期: 2023-09-27 17:55:50

我正在使用c#制作一个工具,该工具循环访问大型文件目录并提取某些信息。目录按语言 (LCID) 组织,所以我想使用多线程来遍历目录 - 每个语言文件夹一个线程。

我的代码目前扫描少量文件并在没有多线程的情况下提取所需的数据,但大规模地需要很长时间。

我在循环中设置了一个获取 LCID 文件夹的线程,但收到以下错误:"'HBscan'没有重载与委托 System.threading.threadstart 匹配"。根据我在网上阅读的内容,我将我的方法放在一个类中,以便我可以有参数,现在没有错误,但代码没有正确迭代文件。它将文件排除在扫描之外。

我想知道是否有人可以看到我的代码出了什么问题,使其无法正常运行?谢谢。

public static void Main(string[] args)
    {
        //change rootDirectory variable to point to directory which you wish to scan through
        string rootDirectory = @"C:'sample";
        DirectoryInfo dir = new DirectoryInfo(rootDirectory);
        //get the LCIDs from the folders
        string[] filePaths = Directory.GetDirectories(rootDirectory);
        for (int i = 0; i < filePaths.Length; i++)
        {
            string LCID = filePaths[i].Split('''').Last();
            Console.WriteLine(LCID);
            HBScanner scanner = new HBScanner(new DirectoryInfo(filePaths[i]));
            Thread t1 = new Thread(new ThreadStart(scanner.HBscan));              
            t1.Start();             
        } 
        Console.WriteLine("Scanning through files...");
    }
    public class HBScanner
    {
        private DirectoryInfo DirectoryToScan { get; set; }
        public HBScanner(DirectoryInfo startDir)
        {
            DirectoryToScan = startDir;
        }
        public void HBscan()
        {
            HBscan(DirectoryToScan);
        } 
        public static void HBscan(DirectoryInfo directoryToScan)
        {
            //create an array of files using FileInfo object
            FileInfo[] files;
            //get all files for the current directory
            files = directoryToScan.GetFiles("*.*");
            string asset = "";
            string lcid = "";
            //iterate through the directory and get file details
            foreach (FileInfo file in files)
            {
                String name = file.Name;
                DateTime lastModified = file.LastWriteTime;
                String path = file.FullName;
                //first check the file name for asset id using regular expression
                Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})'.");
                asset = regEx.Match(file.Name).Groups[1].Value.ToString();
                //get LCID from the file path using regular expression
                Regex LCIDregEx = new Regex(@"sample''('d{4,5})");
                lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();
                //if it can't find it from filename, it looks into xml
                if (file.Extension == ".xml" && asset == "")
                {
                    System.Diagnostics.Debug.WriteLine("File is an .XML");
                    System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                    XmlDocument xmlDoc = new XmlDocument();
                    xmlDoc.Load(path);
                    //load XML file in 
                    //check for <assetid> element
                    XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                    //check for <Asset> element
                    XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];
                    //if there is an <assetid> element
                    if (assetIDNode != null)
                    {
                        asset = assetIDNode.InnerText;
                    }
                    else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                    {
                        //get the attribute 
                        asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;
                        if (AssetIdNodeWithAttribute.Attributes != null)
                        {
                            var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                            if (attributeTest != null)
                            {
                                asset = attributeTest.Value;
                            }
                        }
                    }
                }
                Item newFile = new Item
                {
                    AssetID = asset,
                    LCID = lcid,
                    LastModifiedDate = lastModified,
                    Path = path,
                    FileName = name
                };
                Console.WriteLine(newFile);
            }
            //get sub-folders for the current directory
            DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
            foreach (DirectoryInfo dir in dirs)
            {
                HBscan(dir);
            }
        }
    }

循环 c# 中的多线程

我还没有检查过,但我认为这可以工作。

该代码将为每个线程创建一个扫描程序并执行 HBscan 方法。

public static void Main(string[] args)
        {
            //change rootDirectory variable to point to directory which you wish to scan through
            string rootDirectory = @"C:'sample";
            DirectoryInfo dir = new DirectoryInfo(rootDirectory);
            //get the LCIDs from the folders
            string[] filePaths = Directory.GetDirectories(rootDirectory);
            for (int i = 0; i < filePaths.Length; i++)
            {
                string LCID = filePaths[i].Split('''').Last();
                Console.WriteLine(LCID);
                Thread t1 = new Thread(() => new HBScanner(new DirectoryInfo(filePaths[i])).HBscan());
                t1.Start();
            }
            Console.WriteLine("Scanning through files...");
        }
        public class HBScanner
        {
            private DirectoryInfo DirectoryToScan { get; set; }
            public HBScanner(DirectoryInfo startDir)
            {
                DirectoryToScan = startDir;
            }
            public void HBscan()
            {
                HBscan(DirectoryToScan);
            }
            public static void HBscan(DirectoryInfo directoryToScan)
            {
                //create an array of files using FileInfo object
                FileInfo[] files;
                //get all files for the current directory
                files = directoryToScan.GetFiles("*.*");
                string asset = "";
                string lcid = "";
                //iterate through the directory and get file details
                foreach (FileInfo file in files)
                {
                    String name = file.Name;
                    DateTime lastModified = file.LastWriteTime;
                    String path = file.FullName;
                    //first check the file name for asset id using regular expression
                    Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})'.");
                    asset = regEx.Match(file.Name).Groups[1].Value.ToString();
                    //get LCID from the file path using regular expression
                    Regex LCIDregEx = new Regex(@"sample''('d{4,5})");
                    lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();
                    //if it can't find it from filename, it looks into xml
                    if (file.Extension == ".xml" && asset == "")
                    {
                        System.Diagnostics.Debug.WriteLine("File is an .XML");
                        System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
                        XmlDocument xmlDoc = new XmlDocument();
                        xmlDoc.Load(path);
                        //load XML file in 
                        //check for <assetid> element
                        XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
                        //check for <Asset> element
                        XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];
                        //if there is an <assetid> element
                        if (assetIDNode != null)
                        {
                            asset = assetIDNode.InnerText;
                        }
                        else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
                        {
                            //get the attribute 
                            asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;
                            if (AssetIdNodeWithAttribute.Attributes != null)
                            {
                                var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
                                if (attributeTest != null)
                                {
                                    asset = attributeTest.Value;
                                }
                            }
                        }
                    }
                    Item newFile = new Item
                    {
                        AssetID = asset,
                        LCID = lcid,
                        LastModifiedDate = lastModified,
                        Path = path,
                        FileName = name
                    };
                    Console.WriteLine(newFile);
                }
                //get sub-folders for the current directory
                DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
                foreach (DirectoryInfo dir in dirs)
                {
                    HBscan(dir);
                }
            }
        }

如果您使用的是 .NET 4.0,则可以使用 TPL 并使用 Parallel.For/Parallel.ForEach 相当轻松地同时处理多个项目。

我几天前才接触到它,这很有趣。它通过在不同内核上使用多个线程来加快您的工作速度,从而为您提供出色的性能。原因,由于 IO 访问过多,这在您的情况下可能会受到限制。

但可能值得一试!(更改当前源很容易完成检查即可)

像这样的东西呢,

public static void Main(string[] args)
{
    const string rootDirectory = @"C:'sample";
    Directory.EnumerateDirectories(rootDirectory)
        .AsParallel()
        .ForAll(f => HBScannner.HBScan(new DirectoryInfo(f)));
}

毕竟,您只能在循环体中获取 LCID 以将其写入控制台。如果你想维护对控制台的写入,你可以这样做,

public static void Main(string[] args)
{
    const string rootDirectory = @"C:'sample";
    Console.WriteLine("Scanning through files...");
    Directory.EnumerateDirectories(rootDirectory)
        .AsParallel()
        .ForAll(f => 
            {
                var lcid = f.Split('''').Last();
                Console.WriteLine(lcid);
                HBScannner.HBScan(new DirectoryInfo(f));
            });
}

请注意,EnumerateDirectories的使用应优先于GetDirectories,因为它是延迟计算的,因此您的处理可以在找到第一个目录后立即开始。您不必等待所有目录加载到列表中。

使用BlockingCollection http://msdn.microsoft.com/en-us/library/dd267312.aspx 可以大大改进您的任务。

整体结构是这样的:你创建一个线程(或在主线程中执行此操作),它将枚举文件并将它们添加到 BlockingCollection 中。简单地枚举文件应该相当快,并且此线程的完成速度应该比工作线程快得多。

然后,创建许多任务(与 Environment.ProcessorCount 相同的数字会很好)。这些任务应该类似于文档(集合。Take())。任务应对一个单独的文件执行检查。

因此,这将导致一个线程正在查找文件名并将它们放入 BlockingCollection 中,而其他线程并行将检查文件内容。这样你会有更好的并行性,因为如果你为文件夹创建线程,这可能会造成不均匀的工作分布(你不知道每个文件夹中都有很多文件,对吧?