So lately I have been trying to expand on my knowledge by taking on personal projects that help me teach myself new things. While it can be a bit difficult finding time to work on side projects with my official school studies and work, I'm quite happy I found the time for this little project.
My program experience is limited to a brief python introduction last semester, and a java programming class that I am currently in. Personally I favor the python and would really like to get better at it. Unfortunately, after this Java class my programming requirements will be meant for my major so I must work on teaching myself in my spare time. Something that I really wanted to dive into was the networking portion of python using modules such as socket, http.client, urllib.request, and so on. (These are the python v3 variants as I want to focus on learning the newest version of python as opposed to python versions < 3).
After about an hour of reading up on some of the modules in question I decided to just throw myself into a program and try my best. I came up with the idea of making a simple webcrawler as it seemed simple utilizing the little knowledge I just gained on the networking modules.
In an attempt to utilize more then just one module that I read about I decided to go with http.client, and urlib.request as the http support for the socket module was a bit limited for my goal of a webcrawler. After just a little bit of time I ended up with this:
Code:
import http.client
import urllib.request
#WebCrawler
#Brought to you by Hunter Gregal
host=str(input("Please input the target host url. Ex: aptgetswag.com:\n"))
dir1=str(input("Please input the directory to crawl. Ex: '/pages/' or simple '/':\n"))
myExt=["php","html","js","jpeg","jpg","png","txt"]
myName=["index.","robots.","page.","password.","secret."]
myDict=[]
a=0
b=0
while (a < len(myName)):
while (b < len(myExt)):
myDict.append(myName[a] + myExt[b])
b=b+1
b=0
a=a+1
i=0
while (i < len(myDict)):
conn = http.client.HTTPConnection(host)
conn.request("HEAD", dir1 + myDict[i])
res = conn.getresponse()
page = str(("http://" + host + dir1 + myDict[i]))
if (res.status == 200):
print(page + " " + res.reason)
usock = urllib.request.urlopen(page)
print(usock.info())
conn.close()
i = i+1
How It Works :
The code started out as simply a way to check if a specific URL returned an active response. I used http.client to request a connection to a user defined host in a user define directory. The program attempts to make a connection to a predefined list of page names that iterated through another list of predefined extensions. Upon the return of a "200" value (as opposed to a "404" or similar) the urllib.request module then retrieves the pages response headers to print out to the user.
Learning Experience:
Making this simple little program was a huge step in teaching myself python. It gave me the chance to learn in the best way that I know possible: by just doing it. I jumped right in with no idea how to even open a socket in python. Each problem and even syntax error I ran into was a chance for me to learn from my mistakes. At the end of this program I left feeling comfortable that I at least had a grasp on networking in Python. I recommend this method of learning to anyone attempting to teach themselves a programming language.
I and my friends were going through the nice, helpful tips from the blog then the sudden came up with an awful suspicion I never expressed respect to the website owner for those secrets.
ReplyDeletepython training in bangalore|
Thanks for posting the best information and the blog is very helpful.python course in Bangalore
ReplyDeleteWow, amazing post! Really engaging, thank you.
ReplyDeletePython Training In Bangalore
Python Training Institute In Bangalore