去除html标记，以及文件的读写 - 服务器托管|北京服务器租用|机房托管租用|IDC托管租用|机房机柜带宽租用-价格及费用咨询

今天在去字段的时候把标记取出来了没有办法只能去除

下面引用别人的文章

/// 去除HTML标记 

 ///  

 /// 包括HTML的源码  

 /// 已经去除后的文字 

 public string NoHTML(string Htmlstring) 

 { 

 //删除脚本 

 Htmlstring = Htmlstring.Replace("rn",""); 

 Htmlstring = Regex.Replace(Htmlstring,@"","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"","",RegexOptions.IgnoreCase); 

 //删除HTML 

 Htmlstring = Regex.Replace(Htmlstring,@"]*)>","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"([rn])[s]+","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"-->","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"<!--.*","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(quot|#34);",""",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(amp|#38);","&",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(lt|#60);","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(nbsp|#160);","",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(iexcl|#161);","xa1",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(cent|#162);","xa2",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(pound|#163);","xa3",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&(copy|#169);","xa9",RegexOptions.IgnoreCase); 

 Htmlstring = Regex.Replace(Htmlstring,@"&#(d+);","",RegexOptions.IgnoreCase); 

 Htmlstring = Htmlstring.Replace("",""); 

 Htmlstring = Htmlstring.Replace("rn",""); 

 Htmlstring=HttpContext.Current.Server.HtmlEncode(Htmlstring).Trim(); 

 return Htmlstring; 

 }

使用 axWebBrowser 控件

引用 mshtml

[1)去掉HTML标记及其标记中的属性

[2)axWebBrower　打开某个页面

[3)取出HTML源代码

1)去掉HTML标记及其标记中的属性

private string getOneValue(string TempStr) 

 { 

 if(TempStr.Length >0) 

 { 

 TempStr = regularExpressionsOfHTML(TempStr); 

 TempStr = TempStr.Substring(0,TempStr.Length-1); 

 } 

 return TempStr; 

 } 

public static string regularExpressionsOfHTML(string TempContent) 

 { 

 //TempContent = System.Text.RegularExpressions.Regex.Replace(TempContent,"]+>",""); //任意多个 

 TempContent = System.Text.RegularExpressions.Regex.Replace(TempContent,"]*>",""); //匹配一个 

 return TempContent; 

 }

2)axWebBrower　打开某个页面

string Url = "**********"; 

 object Zero = 0; 

 object EmptyString = ""; 


 axWebBrowser.Navigate(Url ,ref Zero, ref EmptyString, ref EmptyString, ref EmptyString);

3)取出HTML源代码

在axWebBrower_DocumentComplete事件中比较好

引用:using mshtml;

IHTMLDocument2 HTMLDocument =(IHTMLDocument2) axWebBrowser1.Document; 

 string strHtml = HTMLDocument.body.innerHTML.ToString(); //Get HTML 

 string[] arHtml = strHtml.Split('n');

此时arHtml中保存了所有的HTML source．

我把我自己做的调试程序上传了，这样的文章不能算原创也不算自己的还是写转载吧

代码可是我的啊里面有测试字符串的还有测试文件的用到了文件的读和写。

服务器托管，北京服务器托管，服务器租用，机房机柜带宽租用