从抓取的url中获取绝对url的最合适方式
本文关键字:url 方式 获取 抓取 | 更新日期: 2023-09-27 18:17:54
假设根url如下
http://www.monstermmorpg.com
现在我将展示几个url示例以及如何获取目标
url1: http://www.monstermmorpg.com/
url2: http://www.monstermmorpg.com/Register#21312
url3: Register#21312
url4: /Register
url5: Register
url6: /Register?news=true&news2=true
// there may be more that goes to same url but i don't have full list atm
我需要一个函数,将导致以下url与根url
的帮助url1: http://www.monstermmorpg.com
url2: http://www.monstermmorpg.com/Register
url3: http://www.monstermmorpg.com/Register
url4: http://www.monstermmorpg.com/Register
url5: http://www.monstermmorpg.com/Register
url6: http://www.monstermmorpg.com/Register?news=true&news2=true
有这种方法,但我认为这是不够的,还有更好的方法吗?
c# .net 4.5 WPF应用程序Uri baseUri= new Uri("http://www.contoso.com");
Uri myUri = new Uri(baseUri,"catalog/shownew.htm?date=today");
Console.WriteLine(myUri.AbsoluteUri);
static void Main(string[] args)
{
var baseUrl = "http://www.monstermmorpg.com";
var urls = new string[] {
"http://www.monstermmorpg.com/",
"http://www.monstermmorpg.com/Register#21312",
"Register#21312",
"/Register",
"Register",
"/Register?news=true&news2=true" };
var absoluteUrls = new List<string>();
foreach (var url in urls)
{
if (url.StartsWith("http"))
{
var uri = new Uri(url);
absoluteUrls.Add(uri.Host + uri.PathAndQuery);
}
else
{
var urlWithSlash = url;
if (!urlWithSlash.StartsWith("/"))
urlWithSlash = "/" + url;
var uri = new Uri(baseUrl + urlWithSlash);
absoluteUrls.Add(uri.Host + uri.PathAndQuery);
}
}
// Now absoluteUrls contains
//url1: http://www.monstermmorpg.com
//url2: http://www.monstermmorpg.com/Register
//url3: http://www.monstermmorpg.com/Register
//url4: http://www.monstermmorpg.com/Register
//url5: http://www.monstermmorpg.com/Register
//url6: http://www.monstermmorpg.com/Register?news=true&news2=true
}