2016-09-09

Windows コマンドbatで処理時間の計測をするメモ

メモ Windows bat

システム負荷検証でbatファイル内のコマンドの処理時間を計測したいので調査。

単純に開始時間と終了時間を%time%でとって差分を表示すれば良いかなと思ったら、datetime型の変数ではなく、単なる文字列が返ってくるだけなので簡単には計算が出来ない模様。

qiita.com

こんな感じで細かく時分秒を切り出して各自計算しないといけない駄目。

もっと簡単な手法は無いものかと調べると、毎度おなじみのstackoverflowのスレッドに行き当たる。

stackoverflow.com

この中で一番お手軽そうなものが、

powershell -Command "Measure-Command {echo hi}"

｛｝部分に計測したいコマンドを入れれば良いみたい。
処理の流れが　cmd⇒powershell⇒cmd　になるので純粋なコマンドの処理時間といえるかは少し気になるけど、負荷をかけている途中で処理が遅れているかどうかは分かるので良しとする。

2016-09-01

動的サイトのスクレイピング

メモ cloud9

cloud9上で動的サイトのスクレイピングを行おうと調査。

静的サイトは以下のサイトを参考にScrapy+scrapinghub（サイト）で処理したので、

Scrapy + Scrapy Cloudで快適Pythonクロール+スクレイピングライフを送る - Gunosyデータ分析ブログ

scrapinghubの提供するSplashというライブラリを利用しようとした。

JavaScriptレンダリングサーバーSplashでスクレイピング - orangain flavor

しかし、Splashはdockerを利用し仮想サーバ上で動的サイトの内容を取得する仕組みらしいのだが、cloud9のワークスペース自体がdockerで運用されているためなのか、Permission deniedになってしまう。

次善策としてPython + webdriver + phantomjsでやってみる。

$ pip install selenium
$ wget -O /tmp/phantomjs-2.1.1-linux-x86_64.tar.bz2 https://bitbucket.org/ariya/phantomjs/downloads/phantomjs-2.1.1-linux-x86_64.tar.bz2
$ cd /tmp
$ bzip2 -dc /tmp/phantomjs-2.1.1-linux-x86_64.tar.bz2 | tar xvf -
$ sudo mv /tmp/phantomjs-2.1.1-linux-x86_64/bin/phantomjs /usr/local/bin/
$ phantomjs --version

とりあえずこれで環境は出来ているはず。

2016-07-28

冗談はプログラムできるのか問題

自然言語処理

自然言語処理の学習の一環として、冗談を言えるAIは作れるのか気になるのでリサーチ。

とりあえず以下のような本が出ているらしい。

The Humor Code: A Global Search for What Makes Things Funny (English Edition)

作者: Peter McGraw,Joel Warner
出版社/メーカー: Simon & Schuster
発売日: 2014/04/01
メディア: Kindle版
この商品を含むブログを見る

1章が無料で読めるので適当に読んでみると、

「Veatch posited what he called the “N+V Theory,” the idea that humor occurs when someone perceives a situation is a violation of a “subjective moral principle” (V ) while simultaneously realizing that the situation is normal (N).

～～～～～～～～　中略　～～～～～～～～

The N+V Theory started with a simple joke, Veatch told me:
Why did the monkey fall out of the tree?
Because it was dead.

“I first heard it in ’85 or ’86, and I laughed for like an hour,” said
Veatch.」

とのこと。

「普通の状況でちょっとイケナイ言葉をぶち込むとウケル。

～～～～～～～～　中略　～～～～～～～～

たとえば　”なんで猿が木から落ちたんだい？⇒死んでるからだよ。”　これを聞いたときは１時間くらい笑っちゃったねぇ、とVeatchは言った。」ということかな。

・・・・・・これ面白いのか？

でも確かにこれが　”向かいの奥さん最近なぜだか機嫌がいいね。⇒姑さんが死んだからじゃね？”　とかだと、黒いけどグッとjokeぽくなる気がする。

さらにこのVeatchさんの「普通の状況でちょっとイケナイ言葉」方程式を発展させたのがPeteさんとCalebさん。曰く、「The benign violation theory. According to this amended theory, humor only occurs when something seems wrong, unsettling or threatening (i.e., a violation), but simultaneously seems okay, acceptable, or safe (i.e., benign). 」とのこと（例えはバリバリ下ネタなので割愛）。「笑いはイケナイことが、実は良いことだと分かったときに起こる」ぐらいのことかな。

これは18世紀にカントが言った「laughter is a sudden transformation of a strained expectation into nothing.」とか、桂枝雀の「緊張と緩和」と同じ意味だな。

ただこれは対話型のAIを想定したときにはネタ作りのロジックが難しそう。

wikiには、他にも「superiority theory:The general idea is that a person laughs about misfortunes of others (so called schadenfreude), because these misfortunes assert the person's superiority on the background of shortcomings of others.

For Aristotle, we laugh at inferior or ugly individuals, because we feel a joy at feeling superior to them.」とかある。要は「他人を蔑んで笑いをとる」方程式ってとこだろうけど、個人的には面白くない上にサービスとして提供は出来ない。でもこれは、AIが自分をsageて笑いを取る自嘲すたいるならフィットしそうだな。

2016-07-01

Excelの印刷ページ数を取得する　VBAメモ

VBA

指定したフォルダ配下のファイル一覧を取得し、Excelファイルなら印刷ページ数を取得する。

参考サイト
http://d.hatena.ne.jp/asuka0801/20110605/1307232920
http://www.moug.net/tech/exvba/0150117.html

Dim cnt As Long
Dim pageCount As Integer
Dim xlApp As Excel.Application
Dim objBooks As Excel.Workbooks
Dim sh As Excel.Worksheet
Sub test()
    Set xlApp = New Excel.Application
    Set objBooks = xlApp.Workbooks
    cnt = 0
    Application.ScreenUpdating = False
    With Application.FileDialog(msoFileDialogFolderPicker)
        If .Show = True Then
            FolderSearch .SelectedItems(1)
            If Not objBooks Is Nothing Then objBooks.Close
            'Excelを閉じる
            If Not xlApp Is Nothing Then xlApp.Quit
        End If
    End With
    Application.ScreenUpdating = True
    Set sh = Nothing
    Set objBooks = Nothing
    Set xlApp = Nothing
End Sub
Public Sub FolderSearch(Path As String)
    Dim objBook As Excel.Workbook
    Dim buf As String, f As Object
    buf = Dir(Path & "\*.*")
    Do While buf <> ""
        'SVN管理ファイルやWindows管理ファイルを無視する
        If InStr(buf, "svn") = 0 And InStr(buf, "Thumbs") = 0 Then
            cnt = cnt + 1
            Cells(cnt, 1) = Path & "\" & buf
            pagecnt = 0
            pos = InStrRev(buf, ".")
            'xls、xlsx等のファイルを対象とする
            'xls123等のファイルがない前提
            If LCase(Mid(buf, pos + 1)) Like "xls*" Then
                Set objBook = objBooks.Open( _
                        Filename:=Path & "\" & buf, _
                        UpdateLinks:=False, _
                        ReadOnly:=True, _
                        IgnoreReadOnlyRecommended:=True)
                For Each sh In objBook.Sheets
                    '非表示シートは印刷ページカウント対象としない
                    If sh.Visible = True Then
                        sh.Select
                        ActiveWindow.View = xlPageBreakPreview
                        pagecnt = pagecnt + xlApp.ExecuteExcel4Macro("get.document(50)")
                    End If
                Next
                'Workbookを閉じる
                If Not objBook Is Nothing Then objBook.Saved = True
                If Not objBook Is Nothing Then objBook.Close
            End If
            Cells(cnt, 2) = pagecnt
        End If
        buf = Dir()
    Loop
    Set objBook = Nothing
    With CreateObject("Scripting.FileSystemObject")
        For Each f In .GetFolder(Path).SubFolders
            Call FolderSearch(f.Path)
        Next f
    End With
End Sub

2016-06-06

cloud9上でのPython による日本語自然言語処理その5

cloud9 python 自然言語処理

Mecabのインストール中にややはまった。

どうにもpython バインディングがMake時にエラーとなってしまう。

ググッて見るとこちらの方が記述されているのと同じ事態だった模様。

python用 MeCabのインストール - mathematikのすうがくブログ

apt-get install libmecab-devが足りなかったみたいですね。

2016-06-03

cloud9上でのPython による日本語自然言語処理その4

cloud9 python 自然言語処理

飽きずにPython による日本語自然言語処理のコードを写経する行為を継続中。

「12.3.1　句構造解析」の以下のコードでエラーが発生。

jpcfg1 = nltk.parse_cfg("""
　　　　....
""")
for tree in parser.nbest_parse(sent)

Porting your code to NLTK 3.0 · nltk/nltk Wiki · GitHubによると、
①parse_cfg()のエラーはgrammar.parse_cfg() → CFG.fromstring()に変えないといけないようです。
②nbest_parse()のエラーはnbest_parse() → parse()に変えないといけないようです。

以下の様に修正するとテキストと同じ結果が表示されました。

# -*- coding: utf-8 -*-
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
import nltk
from nltk import CFG

jpcfg1 = CFG.fromstring("""
S -> PP VP
PP -> NP P
VP -> PP VP
VP -> V TENS
NP -> NP 'の' NP
NP -> NP 'と' NP
NP -> N
N -> '先生' | '自転車' | '学校' | '僕'
P -> 'は' | 'が' | 'を' | 'で' | 'に'
V -> '行k' | '殴r' | '見'
TENS -> 'ru' | 'ita'
""")

sent = u"先生 は 自転車 で 学校 に 行k ita".split(' ')
parser = nltk.ChartParser(jpcfg1)
for tree in parser.parse(sent):
    print unicode(tree)

2016-06-02

cloud9上でのPython による日本語自然言語処理その3

cloud9 python 自然言語処理

言語処理を少しでも理解しようとPython による日本語自然言語処理のコードを写経する行為を継続中。

「12.1.4　コーパスを用いたテキスト処理」の以下のコードをどうしても実行できない。

genpaku_t.generate()

1. Language Processing and Pythonを読んでみるとこんな記述が！

The generate() method is not available in NLTK 3.0 but will be reinstated in a subsequent version.

nltk 3ではもう使えないのね、、、。
次のバージョンで復活するっていってるけどいつになるのかなぁ。
お手軽にお遊びが出来るメソッドなので早く実装してほしいっ

Mecabインストール方法メモ

$ sudo apt-get install mecab libmecab-dev mecab-ipadic
$ sudo aptitude install mecab-ipadic-utf8
$ sudo apt-get install python-mecab