Java 實(shí)例 - 網(wǎng)頁(yè)抓取
以下實(shí)例演示了如何使用 net.URL 類的 URL() 構(gòu)造函數(shù)來(lái)抓取網(wǎng)頁(yè):
/* author by w3cschool.cn Main.java */ import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileWriter; import java.io.InputStreamReader; import java.net.URL; public class Main { public static void main(String[] args) throws Exception { URL url = new URL("http://hgci.cn"); BufferedReader reader = new BufferedReader (new InputStreamReader(url.openStream())); BufferedWriter writer = new BufferedWriter (new FileWriter("data.html")); String line; while ((line = reader.readLine()) != null) { System.out.println(line); writer.write(line); writer.newLine(); } reader.close(); writer.close(); } }
以上代碼運(yùn)行輸出結(jié)果為(網(wǎng)頁(yè)的源代碼,存儲(chǔ)在當(dāng)前目錄下的 data.html 文件中):
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"/> <meta http-equiv="X-UA-Compatible" content="IE=11,IE=10,IE=9,IE=8"/>……
更多建議: