从浏览器c#中选择特定的文本
本文关键字:文本 选择 浏览器 | 更新日期: 2023-09-27 18:16:42
所以我是在Win-forms中使用浏览器的新手,我遇到了一个特殊的问题。
我想做什么,为浏览器打开一个页面(我已经得到了这一点)。一旦页面打开,它必须导航到一个特定的部分(它在页面中间的某个地方)并选择它。然后复制并存储它,当我需要它的时候,只是文本。
我已经能够通过使用以下代码选择页面上的所有文本,只是作为一个例子:
WebBrowser wb = (WebBrowser)sender;
wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);
richTextBox1.Text = Clipboard.GetText();
它可以为我的程序工作,但我想知道是否有更好的方式,将选择只是我需要的文本或信息。如果可以的话,把它们放在文本框中,或者直接放到我的数据库中。
这是到页面的链接:http://www.lolking.net/news/league-trends-jul30
我想从页面的这些部分选择并获取信息:
冠军挑选率-前5名增加或减少
冠军胜率-前5名的增减
冠军封杀率-前5名增加和减少
你的foreach循环看起来像这样:
foreach (var item in list_ban)
{
string rtbpicker = item.ToString();
foreach (var comp in list_Comp)
{
int count = 0; //Counts for the number of occurences
foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
{
int matchindex = m.Index;
int matchlength = m.Length;
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " "); //Count just moves the index forward by however many postions the original index was shifted
if(Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 1)
{
count++;
}
}
}
richTextBox6.Text += rtbpicker + "'n";
//rtbBan.AppendText(rtbpicker + System.Environment.NewLine);
}
我还没有完整的解决方案,但我可以帮你一点:
一旦你得到了纯文本从完全加载的webBrowser,并在richTextBox1中写入,然后你可以打印3部分到其他文本框:
private void button_Click(object sender, EventArgs e)
{
List<string> rawhtml = new List<string>(); //List for the whole page
List<string> list_pick = new List<string>(); //PICK section
List<string> list_win = new List<string>(); //WIN section
List<string> list_ban = new List<string>(); //BAN section
rawhtml = richTextBox1.Lines.ToList(); //FILL the page to list
int ID_pick = 0;
int ID_win = 0;
int ID_ban = 0;
int ID_cmt = 0; // We need to specify the end of BAN section
for (int i = 0; i < rawhtml.Count; i++) //Search for the line number of section-start
{
if (rawhtml[i] == "Champion Pick Rates") ID_pick = i;
if (rawhtml[i] == "Champion Win Rates") ID_win = i;
if (rawhtml[i] == "Champion Ban Rates") ID_ban = i;
if (rawhtml[i].Contains("Comments")) ID_cmt = i;
}
// PICK
for (int i = ID_pick; i < ID_pick + (ID_win - ID_pick); i++) //Calculate the start and the end line-number
{
list_pick.AddRange(Regex.Split(rawhtml[i], "(?<=[)])")); //Split the five characters, without losing the ')'
}
foreach (var item in list_pick)
{
richTextBox2.AppendText(item + System.Environment.NewLine); //Optinal: Add to richtextbox
}
// WIN
for (int i = ID_win; i < ID_win + (ID_ban - ID_win); i++)
{
list_win.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
}
foreach (var item in list_win)
{
richTextBox3.AppendText(item + System.Environment.NewLine);
}
// BAN
for (int i = ID_ban; i < ID_ban + (ID_cmt - ID_ban); i++)
{
list_ban.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
}
foreach (var item in list_ban)
{
richTextBox4.AppendText(item + System.Environment.NewLine);
}
}
这段代码将使"冠军胜率"的输出如下:
冠军胜率
涨幅最大的五个
Urgot41.38% -> 43.67% (+ 2.29%)
Kennen47.7% -> 49.28% (+ 1.58%)
Lucian51.61% -> 53.1% (+ 1.49%)
Singed48.95% -> 50.31% (+ 1.36%)
Fiora53.48% -> 54.71% (+ 1.23%)
降幅最大的五个
Kassadin48.7% -> 46.67% (-2.03%)
Galio53.18% -> 51.42% (-1.76%)
曹'Gath48.03% -> 46.37% (-1.66%)
Corki50.05% -> 48.43% (-1.62%)
Graves49.49% -> 47.98% (-1.51%)
更好……;)
我遇到一个空格的问题,但是我还不能解决它。
我希望你明白这一点,如果你有任何问题请评论!
p。: Sorry for bad engineering
Pss。:我知道这不是完整的解决方案,但我必须与你分享:)
现在是完整的解决方案,加上完美的空格。Regex对我来说很难,但我认为这更简单,但也更长。
private void btnspace_Click(object sender, EventArgs e)
{
richTextBox6.Text = null;
for (int i = 0; i < list_ban.Count; i++)
{
string rebuilder = ""; //for the output string (one line)
List<char> temp_chars = list_ban[i].ToCharArray().ToList(); //split one line into char sequence
int number_occur = 0; //occurence counter for numbers
int minus_occur = 0;// occurence counter for '-'
for (int j = 0; j < temp_chars.Count; j++)
{
// NUMBERS
// I don't wanted to hardcode the champions :/
if (number_occur < 2 && (temp_chars[j] == '1' || temp_chars[j] == '2' || temp_chars[j] == '3' || temp_chars[j] == '4' || temp_chars[j] == '5' || temp_chars[j] == '6' || temp_chars[j] == '7' || temp_chars[j] == '8' || temp_chars[j] == '9' || temp_chars[j] == '0')) //looks pretty, isn't?
{
temp_chars.Insert(j, ' '); //insert a space into char seq
j = j + 5; // in the longest case: 12.34, so skip 5 char, or 1 2. 3 4
number_occur = number_occur + 1; //for the difference percentage we don't need spaces, so insert by number only twice
}
// NUMBERS DONE
}
for (int j = 0; j < temp_chars.Count; j++)
{
// ( and -
if (temp_chars[j] == '-' || temp_chars[j] == '(')
{
if (temp_chars[j] == '-') minus_occur = minus_occur + 1; //if the difference is negative, there will be one more minus, which doesn't need space
if (minus_occur <= 1) temp_chars.Insert(j, ' ');
j = j + 1; //avoid endless loop
}
// ( and - DONE
}
foreach (var item in temp_chars)
{
rebuilder = rebuilder + item; //rebuild the line from the char list, with spaces
}
list_ban.RemoveAt(i); //replace the old spaceless lines...
list_ban.Insert(i, rebuilder);
richTextBox1.AppendText(list_ban[i] + System.Environment.NewLine);
}
}
我希望这是清楚的,我试图评论一切。祝你好运,尽管问吧。请提及如果它的工作,因为我想完美地回答这个问题:D
好,这是我的最终解决方案,它100%有效,它采用你的第一个答案,你可以看到,p,并使用我的regex.matches
。我认为我添加到foreach
循环中的部分,可以在一个方法中完成,所以你可以在需要的时候调用它。我只是还没到那一步!:)
private void button3_Click(object sender, EventArgs e)
{
List<string> rawhtml = new List<string>(); //List for the whole page
List<string> list_pick = new List<string>(); //PICK section
List<string> list_win = new List<string>(); //WIN section
List<string> list_ban = new List<string>(); //BAN section
List<string> list_Comp = new List<string>(); //Champion names
fillchamplist(list_Comp);
rawhtml = richTextBox1.Lines.ToList(); //FILL the page to list
int ID_pick = 0;
int ID_win = 0;
int ID_ban = 0;
int ID_cmt = 0; // We need to specify the end of BAN section
for (int i = 0; i < rawhtml.Count; i++) //Search for the line number of section-start
{
if (rawhtml[i] == "Champion Pick Rates") ID_pick = i;
if (rawhtml[i] == "Champion Win Rates") ID_win = i;
if (rawhtml[i] == "Champion Ban Rates") ID_ban = i;
if (rawhtml[i].Contains("Comments")) ID_cmt = i;
}
// PICK
for (int i = ID_pick; i < ID_pick + (ID_win - ID_pick); i++) //Calculate the start and the end line-number
{
list_pick.AddRange(Regex.Split(rawhtml[i], "(?<=[)])")); //Split the five characters, without losing the ')'
}
foreach (var item in list_pick)
{
string rtbpicker = item.ToString();
foreach (var comp in list_Comp)
{
int count = 0; //To see which match we working with later
foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + "")) // Checks for all matches and cycles through them
{
if (count == 2) // if the count == 2, it means that its on its 3rd match(the one we dont wana give a space to
{
}
else // puts the space in
{
int matchindex = m.Index;
int matchlength = m.Length;
if (m.Length >= 2) // only champ names are >=2
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "'t");
}
else
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " "); // the count variable updates he index so the space doesnt occur before the % sign
}
if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)// just to update the index for the 2nd %
{
count++;
}
}
}
}
rtbPick.AppendText(rtbpicker + System.Environment.NewLine); //Optinal: Add to richtextbox
}
// WIN
for (int i = ID_win; i < ID_win + (ID_ban - ID_win); i++)
{
list_win.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
}
foreach (var item in list_win)
{
string rtbpicker = item.ToString();
foreach (var comp in list_Comp)
{
int count = 0;
foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
{
if (count == 2)
{
}
else
{
int matchindex = m.Index;
int matchlength = m.Length;
if (m.Length >= 2)
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "'t");
}
else
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " ");
}
if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)
{
count++;
}
}
}
}
rtbWin.AppendText(rtbpicker + System.Environment.NewLine);
}
// BAN
for (int i = ID_ban; i < ID_ban + (ID_cmt - ID_ban); i++)
{
list_ban.AddRange(Regex.Split(rawhtml[i], "(?<=[)])"));
}
foreach (var item in list_ban)
{
string rtbpicker = item.ToString();
foreach (var comp in list_Comp)
{
int count = 0;
foreach (Match m in Regex.Matches(rtbpicker, "" + comp.ToString() + ""))
{
if (count == 2)
{
}
else
{
int matchindex = m.Index;
int matchlength = m.Length;
if (m.Length >= 2)
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, "'t");
}
else
{
rtbpicker = rtbpicker.Insert(matchindex + matchlength + count, " ");
}
if (Regex.Matches(rtbpicker, "" + comp.ToString() + "").Count > 0)
{
count++;
}
}
}
}
rtbBan.AppendText(rtbpicker + System.Environment.NewLine);
}
}
结果如下:(由于某些原因,这里没有显示制表符)
冠军挑选率
涨幅最大的五个
卢西安27.75% -> 32.3% (+4.55%)
Ahri 8.7% -> 11.3% (+2.6%)
Rengar 11.25% -> 13.84% (+2.59%)
Nidalee 10.7% -> 12.93% (+2.23%)
Tristana 30.07% -> 32.02% (+1.95%)
降幅最大的五个
Caitlyn 34.44% -> 30.63% (-3.81%)
Vayne 17.25% -> 15.69% (-1.56%)
Ezreal 15.08% -> 13.6% (-1.48%)
renkton 13.84% -> 12.6% (-1.24%)
李欣30.54% -> 23.36% (-7.18%)
好的,所以:D,这对我来说是完美的,但那是因为我知道我想要的结果是这个特定的事情。你的方法也有效,我实际上会推荐它的场景。
如果你有任何问题,不要害怕问他们:)