深圳市乐思软件技术有限公司
 
乐思软件 Tel: 0755-8603-2826
首页   |  方案  |  产品  |   服务  |   技术  |   支持  |   公司 
  资源 

免费软件
Search & Replace Master
  利用简单通配符模式对文本文件进行搜索与替换的工具
 
Glyph Font Viewer
  图标字体浏览器
Free Rename Master
  利用简单通配符模式对文本文件进行批量重命名的工具
 

定义
Screen Scraping
v.
The act of capturing data from a system or program by snooping the contents of some display that is not actually intended for data transport or inspection by programs. Around 1980 this term referred to tricks like reading the display memory of a smart terminal through its auxiliary port. Nowadays it often refers to parsing the HTML in generated web pages with programs designed to mine out particular patterns of content. In either guise screen-scraping is an ugly, ad-hoc, last-resort technique that is very likely to break on even minor changes to the format of the data being snooped.

Deep Web/Hidden Web
n.
The Deep Web (or Hidden Web) comprises all information that resides in autonomous databases behind portals and information providers' web front-ends. Web pages in the Deep Web are dynamically-generated in response to a query through a web site's search form and often contain rich content. A recent study has estimated the size of the Deep Web to be more than 500 billion pages, whereas the size of the "crawlable" web is only 1% of the Deep Web (i.e., less than 5 billion pages). Even those web sites with some static links that are "crawlable" by a search engine often have much more information available only through a query interface. Unlocking this vast deep web content presents a major research challenge.

垂直搜索
垂直搜索的本质是对垂直门户信息提供方式的一次简化性的整合。
普通水平搜索引擎的搜索范围为网页级,而垂直搜索的搜索范围为数据项级,粒度更小,精确度更高。垂直搜索是服务于某项功能的,比如:用户搜索租房,买房信息就是一种垂直搜索。对信息的再加工处理是非常关键的,不管是结构化的数据,还是非结构化的数据。 垂直搜索的内容来源: A门户网站自身的资源 B以开放接口方式让行业用户提供的资源 C普通用户发布的资源 D抓取行业用户的资源 更多...
友情链接
Articles on Web Data Extraction
 
北京亚库
  电话QQ供应商
 

我们愿意与你交换链接如果你有一个网站的话 -) 请将你的网站信息发给我们,就像我们的 :

标题: 乐思软件 - 专业的网页数据抓取服务与软件提供商

URL: http://www.knowlesys.com

描述: 提供专业的网页数据抓取,网站内容抓取,网络新闻采集等网络信息采集与整合软件。

 

 
版权所有 ©2011 深圳市乐思软件技术有限公司