使用DOM方法来遍历一个文档

阅读原文时间：2023年07月09日阅读：2

你有一个HTML文档要从中提取数据，并了解这个HTML文档的结构。

将HTML解析成一个[Document](http://jsoup.org/apidocs/org/jsoup/nodes/Document.html)之后，就可以使用类似于DOM的方法进行操作。示例代码：

File input = new File("/tmp/input.html");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
String linkHref = link.attr("href");
String linkText = link.text();
}

Elements这个对象提供了一系列类似于DOM的方法来查找元素，抽取并处理其中的数据。具体如下：

查找元素

[getElementById(String id)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#getElementById(java.lang.String))
[getElementsByTag(String tag)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#getElementsByTag(java.lang.String))
[getElementsByClass(String className)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#getElementsByClass(java.lang.String))
[getElementsByAttribute(String key)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#getElementsByAttribute(java.lang.String)) (and related methods)
Element siblings: [siblingElements()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#siblingElements()), [firstElementSibling()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#firstElementSibling()), [lastElementSibling()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#lastElementSibling()); [nextElementSibling()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#nextElementSibling()), [previousElementSibling()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#previousElementSibling())
Graph: [parent()](http://jsoup.org/apidocs/org/jsoup/nodes/Node.html#parent()), [children()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#children()), [child(int index)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#child(int))

元素数据

[attr(String key)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#attr(java.lang.String))获取属性[attr(String key, String value)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#attr(java.lang.String,%20java.lang.String))设置属性
[attributes()](http://jsoup.org/apidocs/org/jsoup/nodes/TextNode.html#attributes())获取所有属性
[id()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#id()), [className()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#className()) and [classNames()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#classNames())
[text()](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#text())获取文本内容[text(String value)](http://jsoup.org/apidocs/org/jsoup/nodes/TextNode.html#text(java.lang.String)) 设置文本内容
[html()](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#html())获取元素内HTML[html(String value)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#html(java.lang.String))设置元素内的HTML内容
[outerHtml()](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#outerHtml())获取元素外HTML内容
[data()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#data())获取数据内容（例如：script和style标签)
[tag()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#tag()) and [tagName()](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#tagName())

操作HTML和文本

[append(String html)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#append(java.lang.String)), [prepend(String html)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#prepend(java.lang.String))
[appendText(String text)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#appendText(java.lang.String)), [prependText(String text)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#prependText(java.lang.String))
[appendElement(String tagName)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#appendElement(java.lang.String)), [prependElement(String tagName)](http://jsoup.org/apidocs/org/jsoup/nodes/Element.html#prependElement(java.lang.String))
[html(String value)](http://jsoup.org/apidocs/org/jsoup/select/Elements.html#html(java.lang.String))

手机扫一扫

移动阅读更方便

你可能感兴趣的文章

React 前端应用中快速实践 OpenTelemetry 云原生可观测性(SigNoz/K8S)

对象存储？CRUD Boy实现对文件的增删改查

GaoNeng：我是如何为OpenTiny贡献新组件的？