jeudi 15 mai 2014

Python 2.7 - convertir BeautifulSoup NavigableString d'utiliser les entités html - Stack Overflow


So BeautifulSoup parses HTML entities into Unicode upon reading input. I can convert these back to HTML entities if I use .prettify(formatter='html') on an HTML element.


But the NavigableString class doesn't have a .prettify() method. I want to turn a NavigableString into a string containing the proper HTML entities. How can I do this?


The only way I can think of is surround it with a fake <a> tag, use .prettify() on the tag, and strip out the beginning and ending characters from the resulting string.




Had to resort to UTSL. The corresponding method is NavigableString.output_ready().


>>> u=BeautifulSoup('&alpha;')
>>> u
<html><body><p>╬▒</p></body></html>
>>> u.p.contents[0]
u'\u03b1'
>>> u.p.contents[0].output_ready(formatter='html')
u'&alpha;'


So BeautifulSoup parses HTML entities into Unicode upon reading input. I can convert these back to HTML entities if I use .prettify(formatter='html') on an HTML element.


But the NavigableString class doesn't have a .prettify() method. I want to turn a NavigableString into a string containing the proper HTML entities. How can I do this?


The only way I can think of is surround it with a fake <a> tag, use .prettify() on the tag, and strip out the beginning and ending characters from the resulting string.



Had to resort to UTSL. The corresponding method is NavigableString.output_ready().


>>> u=BeautifulSoup('&alpha;')
>>> u
<html><body><p>╬▒</p></body></html>
>>> u.p.contents[0]
u'\u03b1'
>>> u.p.contents[0].output_ready(formatter='html')
u'&alpha;'

0 commentaires:

Enregistrer un commentaire