Bug 126702 - REGRESSION (r160085): check-webkit-style: utf8' codec can't decode byte 0x89 in position 0: invalid start byte
Summary: REGRESSION (r160085): check-webkit-style: utf8' codec can't decode byte 0x89 ...
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: 528+ (Nightly build)
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Nobody
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-09 08:29 PST by Daniel Bates
Modified: 2014-01-15 02:26 PST (History)
11 users (show)

See Also:


Attachments
Patch (2.71 KB, patch)
2014-01-13 00:32 PST, László Langó
no flags Details | Formatted Diff | Diff
Patch (4.83 KB, patch)
2014-01-14 04:56 PST, László Langó
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Bates 2014-01-09 08:29:36 PST
When style bot/check-webkit-style processed attachment 220739 [details] (bug #126698) it died with the the following backtrace: 

Traceback (most recent call last):
  File "Tools/Scripts/check-webkit-style", line 48, in <module>
    sys.exit(CheckWebKitStyle().main())
  File "/Volumes/Data/StyleQueue/WebKit/Tools/Scripts/webkitpy/style/main.py", line 154, in main
    patch_checker.check(patch)
  File "/Volumes/Data/StyleQueue/WebKit/Tools/Scripts/webkitpy/style/patchreader.py", line 71, in check
    self._text_file_reader.process_file(file_path=path, line_numbers=None)
  File "/Volumes/Data/StyleQueue/WebKit/Tools/Scripts/webkitpy/style/filereader.py", line 118, in process_file
    lines = self._read_lines(file_path)
  File "/Volumes/Data/StyleQueue/WebKit/Tools/Scripts/webkitpy/style/filereader.py", line 86, in _read_lines
    contents = file.read()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 671, in read
    return self.reader.read(size)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/codecs.py", line 477, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x89 in position 0: invalid start byte

For completeness, see bug #126698 comment 2 for more details.
Comment 1 Alexey Proskuryakov 2014-01-12 23:46:03 PST
Also bug 125689 comment 28.
Comment 2 Alexey Proskuryakov 2014-01-13 00:05:02 PST
This regressed in <http://trac.webkit.org/changeset/160085>.
Comment 3 László Langó 2014-01-13 00:32:45 PST
Created attachment 221013 [details]
Patch
Comment 4 Ryosuke Niwa 2014-01-13 08:33:11 PST
Comment on attachment 221013 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=221013&action=review

> Tools/Scripts/webkitpy/common/system/filesystem.py:203
> +    def open_file_for_read(self, path):
> +        return codecs.open(path, 'r', 'utf8', 'replace')
> +

We shouldn't be adding this one off function with such an ambitious name here. r-.

I could see open_text_file_for_reading having an optional argument to do this instead.
Comment 5 Alexey Proskuryakov 2014-01-13 10:07:43 PST
Can a regression test be added for this fix?
Comment 6 László Langó 2014-01-14 04:56:23 PST
Created attachment 221142 [details]
Patch
Comment 7 László Langó 2014-01-14 04:59:31 PST
(In reply to comment #4)
> We shouldn't be adding this one off function with such an ambitious name here.

You are right, i think it looks better now.

(In reply to comment #5)
> Can a regression test be added for this fix?

Yes, good idea. I added a new test case for that.
Comment 8 Alexey Proskuryakov 2014-01-14 10:39:57 PST
Comment on attachment 221142 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=221142&action=review

Seems fine to land to stop bot failures as an emergency fix. Thanks for looking into this!

Longer term, I think that the approach could be further improved. We could skip known non-text files, and we could also handle decoding failures more gracefully.

In fact, we should probably report an explicit error when there is malformed utf-8 in actual text files. I do not know what our compilers do in this case, especially when it's in comments.

> Tools/ChangeLog:8
> +        Resolve regression and remove a FIXME comment that is already fixed.

The final patch doesn't remove any FIXMEs.

> Tools/Scripts/webkitpy/common/system/filesystem.py:201
> +    def open_text_file_for_reading(self, path, errors='strict'):

It's not good that we are calling a function named "open_text_file_for_reading" on all files, not only text ones. This should be fixed.
Comment 9 WebKit Commit Bot 2014-01-14 11:09:45 PST
Comment on attachment 221142 [details]
Patch

Clearing flags on attachment: 221142

Committed r161986: <http://trac.webkit.org/changeset/161986>
Comment 10 WebKit Commit Bot 2014-01-14 11:09:49 PST
All reviewed patches have been landed.  Closing bug.
Comment 11 László Langó 2014-01-15 02:26:21 PST
(In reply to comment #8)
> > Tools/ChangeLog:8
> > +        Resolve regression and remove a FIXME comment that is already fixed.
> 
> The final patch doesn't remove any FIXMEs.
> 

:( I missed it from the second version of the patch. That FIXME was fixed in r160085. We Should remove that comment. Sorry for that.