通过 url 提取文件的正则表达式模式

本文关键字:正则表达式 模式 文件 url 提取 通过 | 更新日期: 2023-09-27 18:37:17

所以,我正在查看的html数据是:

<A HREF="/data/client/Action.log">Action.log</A><br>  6/8/2015  3:45 PM 

从中我需要提取操作的任一实例.log,

我的问题是我已经阅读了大量的正则表达式教程,但我似乎仍然无法脑出一种模式来提取它。我想我对正则表达式缺乏一些基本的了解,但任何帮助将不胜感激。

编辑:

internal string[] ParseFolderIndex_Alpha(string url, WebDirectory directory)
    {
        try
        {
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            request.Timeout = 3 * 60 * 1000;
            request.KeepAlive = true;
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            if (response.StatusCode == HttpStatusCode.OK)
            {
                List<string> fileLocations = new List<string>(); string line;
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    while ((line = reader.ReadLine()) != null)
                    {
                        int index = line.IndexOf("<a href=");
                        if (index >= 0)
                        {
                            string[] segments = line.Substring(index).Split(''"');
                            ///Can Parse File Size Here: Add todo
                            if (!segments[1].Contains("/"))
                            {
                                fileLocations.Add(segments[1]);
                                UI.UpdatePatchNotes("Web File Found: " + segments[1]);
                                UI.UpdateProgressBar();
                            }
                            else
                            {
                                if (segments[1] != @"../")
                                {
                                    directory.SubDirectories.Add(new WebDirectory(url + segments[1], this));
                                    UI.UpdatePatchNotes("Web Directory Found: " + segments[1].Replace("/", string.Empty));
                                }
                            }
                        }
                        else if (line.Contains("</pre")) break;
                    }
                }
                response.Dispose(); /// After ((line = reader.ReadLine()) != null)
                return fileLocations.ToArray<string>();
            }
            else return new string[0]; /// !(HttpStatusCode.OK)
        }
        catch (Exception e)
        {
            LogHandler.LogErrors(e.ToString(), this);
            LogHandler.LogErrors(url, this);
            return null;
        }
    }

这就是我正在做的事情,问题是我更换了服务器并且 html IIS 显示的不同,所以我必须制定新的逻辑。

编辑/结论:

首先,很抱歉我什至提到了正则表达式:P其次,每个平台都必须根据环境单独处理。

这就是我目前收集文件名的方式。

internal string[] ParseFolderIndex(string url, WebDirectory directory)
        {
            try
            {
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
                request.Timeout = 3 * 60 * 1000;
                request.KeepAlive = true;
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                bool endMet = false;
                if (response.StatusCode == HttpStatusCode.OK)
                {
                    List<string> fileLocations = new List<string>(); string line;
                    using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                    {
                        while (!endMet)
                        {
                            line = reader.ReadLine();
                            if (line != null && line != "" && line.IndexOf("</A>") >= 0)
                            {
                                if (line.Contains("</html>")) endMet = true;
                                string[] segments = line.Replace("''", "").Split(''"');
                                List<string> paths = new List<string>();
                                List<string> files = new List<string>();
                                for (int i = 0; i < segments.Length; i++)
                                {
                                    if (!segments[i].Contains('<'))
                                        paths.Add(segments[i]);
                                }
                                paths.RemoveAt(0);
                                foreach (String s in paths)
                                {
                                    string[] secondarySegments = s.Split('/');
                                    if (s.Contains(".") || s.Contains("Verinfo"))
                                        files.Add(secondarySegments[secondarySegments.Length - 1]);
                                    else
                                    {
                                        directory.SubDirectories.Add(new WebDirectory
                                            (url + "/" + secondarySegments[secondarySegments.Length - 2], this));
                                        UI.UpdatePatchNotes("Web Directory Found: " + secondarySegments[secondarySegments.Length - 2]);
                                    }
                                }
                                foreach (String s in files)
                                {
                                    if (!String.IsNullOrEmpty(s) && !s.Contains('%'))
                                    {
                                        fileLocations.Add(s);
                                        UI.UpdatePatchNotes("Web File Found: " + s);
                                        UI.UpdateProgressBar();
                                    }
                                }
                                if (line.Contains("</pre")) break;
                            }
                        }
                    }
                    response.Dispose(); /// After ((line = reader.ReadLine()) != null)
                    return fileLocations.ToArray<string>();
                }
                else return new string[0]; /// !(HttpStatusCode.OK)
            }
            catch (Exception e)
            {
                LogHandler.LogErrors(e.ToString(), this);
                LogHandler.LogErrors(url, this);
                return null;
            }
        }

通过 url 提取文件的正则表达式模式

则表达式是矫枉过正的。它太重了,并且考虑到字符串的格式始终相同,您会发现使用拆分和子字符串更容易调试和维护。

 class Program {
    static void Main(string[] args) {
        String s = "<A HREF='"/data/client/Action.log'">Action.log</A><br>  6/8/2015  3:45 PM ";
        String[] t = s.Split('"');
        String fileName = String.Empty;
        //To get the entire file name and path....
        fileName = t[1].Substring(0, (t[1].Length));
        //To get just the file name (Action.log in this case)....
        fileName = t[1].Substring(0, (t[1].Length)).Split('/').Last();
    }
}

尝试匹配以下模式:

<A HREF="(?<url>.*)">

然后从比赛结果中获取名为url的组。

工作示例:https://regex101.com/r/hW8iH6/1

string text = @"<A HREF=""/data/client/Action.log"">Action.log</A><br>  6/8/2015  3:45 PM";
            var match = Regex.Match(text, @"^<A HREF='""'/data'/client'/.*'.log'"">(.*)</A>.*$");
            var result = match.Groups[1].Value;

试试 http://regexr.com/或正则伙伴!