Click here to Skip to main content
15,924,036 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
How to write a lexical analyser with php?
Posted
Updated 10-Aug-22 22:49pm

As you would in any other programming language: coding it. PHP features Regular Expressions[^] that may help you a lot in the process.
 
Share this answer
 
What language do you want to scan? PHP?

If you want to write your own analyzer, you need to identify the following lexical tokens of you language to scan:

  1. Comments (//..., #..., /*...*/)
  2. Strings ("...", '...', handle escaping within the string literals)
  3. Numbers (0, 1, ..., 3.141592653589793238462643, ...)
  4. Words (including keywords)
  5. Operators and punctuation (=>, <<, >>, ++, --, ... +, -, ... $, ... {, }, ...)
  6. Spaces (space, tab, nl, cr, ...)


Write the regex for each of these tokens and concatenate them into one regex with each sub-regex as alternative ((Comment)|(String)|(Number)|(Word)|(Op)|(Space)|(Error)).

Scan the text with the given regex until no match is found anymore by detecting which of the sub-regex group is matched.

Cheers

Andi
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900