Bryan's Notes for Big Data & Career: 7月 2016

2016年7月25日星期一

[Python] 簡易網站爬蟲 ( Web Crawler) 抓取公開資訊觀測站的資料 - Payload 和 Session (下)

上一篇說明如何使用session 和 payload 進入資料畫面，這一篇就會非常簡單的介紹怎樣 parsing 要用的資料．有大大提示使用Pandas 的 read_html 會輕鬆很多，但是我還沒試驗，等我試驗成功再來介紹．

2016年7月21日星期四

[Python] 簡易網站爬蟲 ( Web Crawler) 抓取公開資訊觀測站的資料 - Payload 和 Session (上)

平常工作上沒啥機會用到，所以很久沒有寫爬蟲了．最近因為~~平常下班沒事幹~~某人有工作上的需求，看他每天人工複製貼上很辛苦，就來從操舊業一下．這次要抓的是公開資訊觀測站中，上市上櫃公司資料．上市上櫃公司資料可以依照產業別看到各公司的股東，地址，會計師事務所等基本資料，格式非常整齊，是個很好解析的資料．但是卻沒辦法直接從網址抓取，因為這個網頁有兩個特別的地方：

傳遞資料是用 POST 方法，而不是 GET．
網站有認 Session ，要在同一個 Session 中使用POST才能正確取得資料．

一些基本網站分析方式可以參考先前的拙作：

[Python][教學] 網路爬蟲（crawler）實務（上）--網頁元件解析

[Python][教學] 網路爬蟲（crawler）實務（下）--爬蟲策略以及設定

2016年7月15日星期五

A Traditional and Fantastic Trip in West Japan (Day 1- Kamigamo-jinja)

This trip had been scheduled half a year ago. This was my first time to go to west Japan, so I was very excited and did a lot of homework about Kyoto. I heard from my friend that he saw many cute and interesting things such as cups, hand-made shoes, and cookies in the Japan-style hand-made market and I really like that. There are many little markets in Kyoto, but not open every day. After research the schedule of those markets and read a lot of blogs, I choose Kamigamo-jinja(上賀茂神社) to be my first stop on this trip.