循环 c# 中的多线程
本文关键字:多线程 循环 | 更新日期: 2023-09-27 17:55:50
我正在使用c#制作一个工具,该工具循环访问大型文件目录并提取某些信息。目录按语言 (LCID) 组织,所以我想使用多线程来遍历目录 - 每个语言文件夹一个线程。
我的代码目前扫描少量文件并在没有多线程的情况下提取所需的数据,但大规模地需要很长时间。
我在循环中设置了一个获取 LCID 文件夹的线程,但收到以下错误:"'HBscan'没有重载与委托 System.threading.threadstart 匹配"。根据我在网上阅读的内容,我将我的方法放在一个类中,以便我可以有参数,现在没有错误,但代码没有正确迭代文件。它将文件排除在扫描之外。
我想知道是否有人可以看到我的代码出了什么问题,使其无法正常运行?谢谢。
public static void Main(string[] args)
{
//change rootDirectory variable to point to directory which you wish to scan through
string rootDirectory = @"C:'sample";
DirectoryInfo dir = new DirectoryInfo(rootDirectory);
//get the LCIDs from the folders
string[] filePaths = Directory.GetDirectories(rootDirectory);
for (int i = 0; i < filePaths.Length; i++)
{
string LCID = filePaths[i].Split('''').Last();
Console.WriteLine(LCID);
HBScanner scanner = new HBScanner(new DirectoryInfo(filePaths[i]));
Thread t1 = new Thread(new ThreadStart(scanner.HBscan));
t1.Start();
}
Console.WriteLine("Scanning through files...");
}
public class HBScanner
{
private DirectoryInfo DirectoryToScan { get; set; }
public HBScanner(DirectoryInfo startDir)
{
DirectoryToScan = startDir;
}
public void HBscan()
{
HBscan(DirectoryToScan);
}
public static void HBscan(DirectoryInfo directoryToScan)
{
//create an array of files using FileInfo object
FileInfo[] files;
//get all files for the current directory
files = directoryToScan.GetFiles("*.*");
string asset = "";
string lcid = "";
//iterate through the directory and get file details
foreach (FileInfo file in files)
{
String name = file.Name;
DateTime lastModified = file.LastWriteTime;
String path = file.FullName;
//first check the file name for asset id using regular expression
Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})'.");
asset = regEx.Match(file.Name).Groups[1].Value.ToString();
//get LCID from the file path using regular expression
Regex LCIDregEx = new Regex(@"sample''('d{4,5})");
lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();
//if it can't find it from filename, it looks into xml
if (file.Extension == ".xml" && asset == "")
{
System.Diagnostics.Debug.WriteLine("File is an .XML");
System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(path);
//load XML file in
//check for <assetid> element
XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
//check for <Asset> element
XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];
//if there is an <assetid> element
if (assetIDNode != null)
{
asset = assetIDNode.InnerText;
}
else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
{
//get the attribute
asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;
if (AssetIdNodeWithAttribute.Attributes != null)
{
var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
if (attributeTest != null)
{
asset = attributeTest.Value;
}
}
}
}
Item newFile = new Item
{
AssetID = asset,
LCID = lcid,
LastModifiedDate = lastModified,
Path = path,
FileName = name
};
Console.WriteLine(newFile);
}
//get sub-folders for the current directory
DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
foreach (DirectoryInfo dir in dirs)
{
HBscan(dir);
}
}
}
我还没有检查过,但我认为这可以工作。
该代码将为每个线程创建一个扫描程序并执行 HBscan 方法。
public static void Main(string[] args)
{
//change rootDirectory variable to point to directory which you wish to scan through
string rootDirectory = @"C:'sample";
DirectoryInfo dir = new DirectoryInfo(rootDirectory);
//get the LCIDs from the folders
string[] filePaths = Directory.GetDirectories(rootDirectory);
for (int i = 0; i < filePaths.Length; i++)
{
string LCID = filePaths[i].Split('''').Last();
Console.WriteLine(LCID);
Thread t1 = new Thread(() => new HBScanner(new DirectoryInfo(filePaths[i])).HBscan());
t1.Start();
}
Console.WriteLine("Scanning through files...");
}
public class HBScanner
{
private DirectoryInfo DirectoryToScan { get; set; }
public HBScanner(DirectoryInfo startDir)
{
DirectoryToScan = startDir;
}
public void HBscan()
{
HBscan(DirectoryToScan);
}
public static void HBscan(DirectoryInfo directoryToScan)
{
//create an array of files using FileInfo object
FileInfo[] files;
//get all files for the current directory
files = directoryToScan.GetFiles("*.*");
string asset = "";
string lcid = "";
//iterate through the directory and get file details
foreach (FileInfo file in files)
{
String name = file.Name;
DateTime lastModified = file.LastWriteTime;
String path = file.FullName;
//first check the file name for asset id using regular expression
Regex regEx = new Regex(@"([A-Z][A-Z][0-9]{8,10})'.");
asset = regEx.Match(file.Name).Groups[1].Value.ToString();
//get LCID from the file path using regular expression
Regex LCIDregEx = new Regex(@"sample''('d{4,5})");
lcid = LCIDregEx.Match(file.FullName).Groups[1].Value.ToString();
//if it can't find it from filename, it looks into xml
if (file.Extension == ".xml" && asset == "")
{
System.Diagnostics.Debug.WriteLine("File is an .XML");
System.Diagnostics.Debug.WriteLine("file.FullName is: " + file.FullName);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(path);
//load XML file in
//check for <assetid> element
XmlNode assetIDNode = xmlDoc.GetElementsByTagName("assetid")[0];
//check for <Asset> element
XmlNode AssetIdNodeWithAttribute = xmlDoc.GetElementsByTagName("Asset")[0];
//if there is an <assetid> element
if (assetIDNode != null)
{
asset = assetIDNode.InnerText;
}
else if (AssetIdNodeWithAttribute != null) //if there is an <asset> element, see if it has an AssetID attribute
{
//get the attribute
asset = AssetIdNodeWithAttribute.Attributes["AssetId"].Value;
if (AssetIdNodeWithAttribute.Attributes != null)
{
var attributeTest = AssetIdNodeWithAttribute.Attributes["AssetId"];
if (attributeTest != null)
{
asset = attributeTest.Value;
}
}
}
}
Item newFile = new Item
{
AssetID = asset,
LCID = lcid,
LastModifiedDate = lastModified,
Path = path,
FileName = name
};
Console.WriteLine(newFile);
}
//get sub-folders for the current directory
DirectoryInfo[] dirs = directoryToScan.GetDirectories("*.*");
foreach (DirectoryInfo dir in dirs)
{
HBscan(dir);
}
}
}
如果您使用的是 .NET 4.0,则可以使用 TPL 并使用 Parallel.For/Parallel.ForEach 相当轻松地同时处理多个项目。
我几天前才接触到它,这很有趣。它通过在不同内核上使用多个线程来加快您的工作速度,从而为您提供出色的性能。原因,由于 IO 访问过多,这在您的情况下可能会受到限制。
但可能值得一试!(更改当前源很容易完成检查即可)
像这样的东西呢,
public static void Main(string[] args)
{
const string rootDirectory = @"C:'sample";
Directory.EnumerateDirectories(rootDirectory)
.AsParallel()
.ForAll(f => HBScannner.HBScan(new DirectoryInfo(f)));
}
毕竟,您只能在循环体中获取 LCID 以将其写入控制台。如果你想维护对控制台的写入,你可以这样做,
public static void Main(string[] args)
{
const string rootDirectory = @"C:'sample";
Console.WriteLine("Scanning through files...");
Directory.EnumerateDirectories(rootDirectory)
.AsParallel()
.ForAll(f =>
{
var lcid = f.Split('''').Last();
Console.WriteLine(lcid);
HBScannner.HBScan(new DirectoryInfo(f));
});
}
请注意,EnumerateDirectories
的使用应优先于GetDirectories
,因为它是延迟计算的,因此您的处理可以在找到第一个目录后立即开始。您不必等待所有目录加载到列表中。
使用BlockingCollection http://msdn.microsoft.com/en-us/library/dd267312.aspx 可以大大改进您的任务。
整体结构是这样的:你创建一个线程(或在主线程中执行此操作),它将枚举文件并将它们添加到 BlockingCollection 中。简单地枚举文件应该相当快,并且此线程的完成速度应该比工作线程快得多。
然后,创建许多任务(与 Environment.ProcessorCount 相同的数字会很好)。这些任务应该类似于文档(集合。Take())。任务应对一个单独的文件执行检查。
因此,这将导致一个线程正在查找文件名并将它们放入 BlockingCollection 中,而其他线程并行将检查文件内容。这样你会有更好的并行性,因为如果你为文件夹创建线程,这可能会造成不均匀的工作分布(你不知道每个文件夹中都有很多文件,对吧?