用groovy写抓票程序

    博客分类:

  • Groovy
GroovyGoogleHTML
年底了能买到火车票是非常幸运的事儿, 比如我同事, 通过电话就订到了车票, 而我死活都没打进那个电话.

于是用groovy写了个程序, 用来抓取火车票信息, 网上相关的程序还不少, 我只是用groovy来练练手而已, 本来可以完善一下, 像这个(http://www.cnblogs.com/guozili/archive/2011/01/19/1939157.html)可以从多个网站抓取, 像这个(http://www.notedit.com/2010/11/%E6%8A%A2%E7%81%AB%E8%BD%A6%E7%A5%A8%E7%9A%84%E7%A8%8B%E5%BA%8F/)可以定时抓取, 本来我想通过定时抓取发消息的, 后来搞到了票, 就这样吧.

Java代码
  1. classGetTicket{
  2. finalstaticStringhost="http://hz.58.com/huochepiao/?StartStation=%25u676D%25u5DDE&EndStation=%25u5B9C%25u660C"
  3. //最早发车时间
  4. finalstaticintearliest=120
  5. //已经确认无票的过滤掉
  6. finalstaticListfilterList=[
  7. "http://hz.58.com/huochepiao/4538967059457x.shtml",
  8. "http://hz.58.com/huochepiao/4536633437697x.shtml"
  9. ]
  10. defvoidget(){
  11. defhtmlSource=newHttp().get(host).source.toString()
  12. inti=0
  13. LinkedList<Entry>list=[]asLinkedList<Entry>;
  14. htmlSource.eachLine{
  15. if(i>0&&i<=4){
  16. switch(i){
  17. case1:
  18. list[list.size()-1].location=it.trim()
  19. break;
  20. case2:
  21. list[list.size()-1].number=it.trim()
  22. break;
  23. case3:
  24. list[list.size()-1].type=it.trim()
  25. break;
  26. case4:
  27. defmatcher=it.trim()=~/(.+)<\/a>/
  28. defpair=matcher[0][1].split("")
  29. pair[1]=pair[1].replaceAll(/月|日/,"")
  30. list[list.size()-1].count=pair[0]
  31. list[list.size()-1].date=pair[1]
  32. if(Integer.valueOf(pair[1])<earliest){
  33. list.removeLast()
  34. }
  35. break;
  36. }
  37. i++
  38. return;
  39. }else{
  40. i=0;
  41. }
  42. if(it==~/^+<ahref="http:\/\/hz\.58\.com\/huochepiao.+/){
  43. defmatcher=it=~/"(http:\/\/hz\.58\.com\/huochepiao.+?)"/
  44. defurl=matcher[0][1].trim()
  45. if(filterList.contains(url)){
  46. return;
  47. }
  48. Entryentry=[:]asEntry
  49. entry.url=url
  50. list<<entry
  51. i++
  52. //临近站信息
  53. matcher=it=~/.+>(.+)$/
  54. if(matcher.matches()){
  55. entry.location=matcher[0][1].trim()
  56. i++
  57. }
  58. }
  59. }
  60. list=list.sort()
  61. list.each{println"${it.date}\t${it.count}\t${it.type}\t${it.number}\t${it.location}\t${it.url}"}
  62. }
  63. }
  64. classEntryimplementsComparable{
  65. defurl
  66. deflocation
  67. defnumber
  68. deftype
  69. defcount
  70. defdate
  71. intcompareTo(defother){
  72. returnInteger.valueOf(other.date)-Integer.valueOf(date)
  73. }
  74. @Override
  75. publicStringtoString(){
  76. returnToStringBuilder.reflectionToString(this);
  77. }
  78. }
class GetTicket {
final static String host = "http://hz.58.com/huochepiao/?StartStation=%25u676D%25u5DDE&EndStation=%25u5B9C%25u660C"
// 最早发车时间
final static int earliest = 120
// 已经确认无票的过滤掉
final static List filterList = [
"http://hz.58.com/huochepiao/4538967059457x.shtml",
"http://hz.58.com/huochepiao/4536633437697x.shtml"
]
def void get() {
def htmlSource =  new Http().get(host).source.toString()
int i = 0
LinkedList<Entry> list = [] as LinkedList<Entry>;
htmlSource.eachLine{
if (i > 0 && i <= 4) {
switch(i) {
case 1:
list[list.size()-1].location = it.trim()
break;
case 2:
list[list.size()-1].number = it.trim()
break;
case 3:
list[list.size()-1].type = it.trim()
break;
case 4:
def matcher = it.trim() =~ /(.+)<\/a>/
def pair = matcher[0][1].split(" ")
pair[1] = pair[1].replaceAll(/月|日/, "")
list[list.size()-1].count = pair[0]
list[list.size()-1].date = pair[1]
if (Integer.valueOf(pair[1]) < earliest) {
list.removeLast()
}
break;
}
i++
return;
}else {
i = 0;
}
if (it ==~ /^ +<a href="http:\/\/hz\.58\.com\/huochepiao.+/){
def matcher = it =~ /"(http:\/\/hz\.58\.com\/huochepiao.+?)"/
def url = matcher[0][1].trim()
if (filterList.contains(url)) {
return;
}
Entry entry = [:] as Entry
entry.url = url
list << entry
i++
// 临近站信息
matcher = it =~ /.+>(.+)$/
if (matcher.matches()) {
entry.location = matcher[0][1].trim()
i++
}
}
}
list = list.sort()
list.each{ println "${it.date}\t${it.count}\t ${it.type}\t ${it.number}\t ${it.location}\t ${it.url}" }
}
}
class Entry implements Comparable{
def url
def location
def number
def type
def count
def date
int compareTo( def other) {
return Integer.valueOf(other.date) - Integer.valueOf(date)
}
@Override
public String toString() {
return ToStringBuilder.reflectionToString(this);
}
}

注: 解析html用到了com.googlecode.groovyhttp.Http 这个lib(http://code.google.com/p/groovy-http/)

输出结果:

引用

131 1张 硬卧 K529 杭州 - 恩施 http://hz.58.com/huochepiao/4555300451203x.shtml

129 1张 硬卧 K529 杭州 - 荆门 http://hz.58.com/huochepiao/4547173468803x.shtml

128 1张 K529 杭州 - 恩施 http://hz.58.com/huochepiao/4547524308355x.shtml

124 1张 硬座 K253 杭州 - 宜昌 http://hz.58.com/huochepiao/4557214747137x.shtml

123 1张 硬卧 K253 杭州南 - 宜昌 http://hz.58.com/huochepiao/4557440945411x.shtml

123 1张 站票 K529 杭州 - 宜昌 http://hz.58.com/huochepiao/4532977245187x.shtml

122 1张 硬座 K253 杭州南 - 宜昌 http://hz.58.com/huochepiao/4557377085571x.shtml

122 1张 硬座 K253 杭州南 - 宜昌 http://hz.58.com/huochepiao/4544944871170x.shtml