模拟登陆

依赖

  • Python3.6.4_x64
  • requests: 安装pip install requests
  • numpy: 安装pip install numpy
  • Pillow(PIL): 官方手册,安装pip --trusted-host pypi.tuna.tsinghua.edu.cn install -i https://pypi.tuna.tsinghua.edu.cn/simple Pillow
  • tesseract-ocr: 项目介绍,安装https://github.com/tesseract-ocr/tesseract/wiki
  • pytesseract: 安装pip --trusted-host pypi.tuna.tsinghua.edu.cn install -i https://pypi.tuna.tsinghua.edu.cn/simple pytesseract

环境变量配置

  • 配置安装目录Tesseract_Home=D:\Tesseract-OCR
  • Tesseract-OCR的安装目录$Tesseract_Home添加到PATH
  • 将数据模型的路径添加到环境变量中TESSDATA_PREFIX=$Tesseract_Home\tessdata
  • (可选)下载简体、繁体中文数据模型,下载地址,下载完后放到$Tesseract_Home\tessdata目录下。

模拟登陆

源码放在github

from PIL import Image
import requests
import re
import io
import urllib.parse

from pytesseract import pytesseract


def kq():
    session = requests.session()
    response = session.post('http://www.xxxsoft.com/')

    name_pattern = re.compile(r'name="(.*?)"')
    # response.text 返回 string, response.content 返回 bytes-like object
    names = name_pattern.findall(response.text)

    value_pattern = re.compile(r'value="(.*?)"')
    values = value_pattern.findall(response.text)

    binary_img = session.post('http://www.xxxsoft.com/imageRandeCode')

    img = Image.open(io.BytesIO(binary_img.content))
    img_grey = img.convert('L')

    # 二值化,采用阈值分割法,threshold为分割点
    threshold = 140
    table = []
    for j in range(256):
        if j < threshold:
            table.append(0)
        else:
            table.append(1)

    pytesseract.tesseract_cmd = 'D:\\Tesseract-OCR\\tesseract.exe'

    out = img_grey.point(table, '1')

    # 验证码识别
    validate_code = pytesseract.image_to_string(out)
    validate_code = validate_code.strip()
    validate_code = validate_code.upper()
    print('validate_code = %s' % validate_code)
    # 语言包下载 https://github.com/tesseract-ocr/tesseract/wiki/Data-Files
    # 下载后放 $Tesseract_Home\tessdata 目录下
    # now = time.time()
    # validate_code = pytesseract.image_to_string(Image.open('grey_%s.jpg' % now), lang="eng")
    # print(validate_code)

    params = {
        names[1]: values[0],
        names[2]: "",
        names[3]: "",
        names[4]: values[1],
        names[5]: "account",
        names[6]: "password",
        names[7]: validate_code
    }

    params = urllib.parse.urlencode(params)

    headers = {'user-agent': 'mozilla/5.0'}
    # 登陆
    response = session.post("http://www.xxxsoft.com/login.jsp", timeout=30, headers=headers, params=params)
    names = name_pattern.findall(response.text)
    print(names)
    values = value_pattern.findall(response.text)
    print(values)

    params = {names[1]: values[0]}
    params = urllib.parse.urlencode(params)
    # 打卡
    response = session.post("http://www.xxxsoft.com/record.jsp", timeout=30, headers=headers, params=params,
                            allow_redirects=False)
    print(response.text)
    response.raise_for_status()

打赏一个呗

取消

感谢您的支持,我会继续努力的!

扫码支持
扫码支持
扫码打赏,一毛也是爱

打开支付宝扫一扫,即可进行扫码打赏哦