First question: Why? Avoid reinventing the wheel at any cost. I can understand, as a beginner, you might not enjoy the API of an existing library that does the parsing. But the safer bet is to use an existing library/package for this.
Moving on to your question now.
Quote:
parse HTML code that I retrieved using either the Requests or Sockets libraries.
The response that you receive from the HTTP clients (in Python and other languages) is a string. You can parse the string (which would be an HTML document) following any XML parsing method. But note that, it would be quite difficult to write a parser, especially by keeping all the escape characters in the account.
BeautifulSoup[
^] is a nice library that allows you to parse HTML and process it—
yes, not just read it, but also filter or query elements and their details.
Quote:
I know about tokenizers and lexers but I don't know how to put it into code.
If you do, then you can use
tokenize[
^] and process the output;
but this is not the recommended approach.