Bug 217567 - Strip CR characters out of text files in the LayoutTests directory
Summary: Strip CR characters out of text files in the LayoutTests directory
Status: ASSIGNED
Alias: None
Product: WebKit
Classification: Unclassified
Component: Tools / Tests (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Darin Adler
URL:
Keywords: InRadar
Depends on:
Blocks:
 
Reported: 2020-10-10 22:08 PDT by Darin Adler
Modified: 2020-11-17 10:26 PST (History)
2 users (show)

See Also:


Attachments
Patch (1.50 MB, patch)
2020-10-18 13:54 PDT, Darin Adler
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Darin Adler 2020-10-10 22:08:10 PDT
Some source files in the LayoutTests directory have CRLF instead of LF as their line separators, likely because they were edited on Windows and checked in without the source code control system understanding they are text files and normalizing their line endings. We should fix them.
Comment 1 Darin Adler 2020-10-11 11:25:01 PDT
Some statistics for files in the LayoutTests directories that end in ".html":

Right now, there are 435 out of the 75409 that have CR characters in them.

6: These are in Web Platform Tests, which I suggest we not touch; at least some are intentional. But the ones outside WPT seem obviously accidents, not intentional.

4: There are 2 of these in html5lib. And because of this, I noticed that our web-platform-tests directory includes two copies of html5lib, one in tools/html5lib and another in tools/third_party/html5lib. I think our way of re-importing new versions of WPT does not delete obsolete files!

2: Both fast/encoding/utf-16-little-endian.html and fast/encoding/utf-16-big-endian.html use CR characters instead of LF characters. That’s probably not important to fix, but also not necessary to the mission of testing UTF-16 decoding. My guess is that these weren’t fixed when we added Subversion newline properties because it’s not safe to tell Subversion to treat them as text files.

The other 423 all look like accidents that could be fixed.
Comment 2 Darin Adler 2020-10-11 11:29:14 PDT
Some statistics for files in the LayoutTests directories that end in ".txt":

Right now, there are 15 out of the 85738 that have CR characters in them.

2: Both fast/text/resources/line-breaks-cr.txt and fast/text/resources/line-breaks-crlf.txt seem like they should be left alone.

1: And fast/text/resources/gw432047-license.txt could also be left alone.

1: This file ends in CRLF but I guess I would leave it alone: imported/w3c/web-platform-tests/tools/wptrunner/requirements_edge_chromium.txt

The other 11 all look like accidents that could be fixed.
Comment 3 Darin Adler 2020-10-11 11:30:04 PDT
By the way, some of these files have CRLF as their line separators, but some have CR as their line separators!
Comment 4 Radar WebKit Bug Importer 2020-10-17 22:09:17 PDT
<rdar://problem/70413992>
Comment 5 Darin Adler 2020-10-18 13:54:26 PDT
Created attachment 411717 [details]
Patch
Comment 6 Darin Adler 2020-10-18 13:59:22 PDT
Comment on attachment 411717 [details]
Patch

Uh oh, looks like EWS can’t handle the patch that webkit-patch made. I wonder how I can get this tested!
Comment 7 Alexey Proskuryakov 2020-10-19 10:57:46 PDT
Perhaps there is large enough subset that applies cleanly?

Glancing at the patch, I see a few changes that may need a closer look (perhaps you had one already, but I didn't immediately notice explanations):

- Error saying "patch: **** malformed patch at line 10: foo">CR after</div>" suggests that this file may need CR.

- All the changes to WPT scripts seem like they may deserve a separate patch.

- In particular, why were many files in LayoutTests/imported/w3c/web-platform-tests/tools removed?
Comment 8 Darin Adler 2020-10-19 11:13:43 PDT
Thank you for the advice, Alexey!

(In reply to Alexey Proskuryakov from comment #7)
> Perhaps there is large enough subset that applies cleanly?

Good point!

> Glancing at the patch, I see a few changes that may need a closer look
> (perhaps you had one already, but I didn't immediately notice explanations):
> 
> - Error saying "patch: **** malformed patch at line 10: foo">CR after</div>"
> suggests that this file may need CR.

I don’t think they need CR, but I think you’re right that they are the ones that I can leave out to make the patch work!

> - All the changes to WPT scripts seem like they may deserve a separate patch.

Makes sense. I can leave those out.

> - In particular, why were many files in
> LayoutTests/imported/w3c/web-platform-tests/tools removed?

That’s basically a completely separate issue. We have two copies of html5lib, presumably because our WPT importer does not delete files that are *removed* from WPT. I deleted the unused copy. Can go into a separate patch.
Comment 9 Darin Adler 2020-11-17 10:26:33 PST
(In reply to Darin Adler from comment #8)
> (In reply to Alexey Proskuryakov from comment #7)
> > - Error saying "patch: **** malformed patch at line 10: foo">CR after</div>"
> > suggests that this file may need CR.
> 
> I don’t think they need CR, but I think you’re right that they are the ones
> that I can leave out to make the patch work!

Oops, I was wrong. This test includes CR intentionally and should not be stripped.

fast/dom/Element/class-attribute-whitespace.html